VTARC - Business Analysis Cell Web Crawler Module Architecture

Register
Submit a solution
The challenge is finished.

Challenge Overview

Project Overview

VTARC currently has a requisition team that receives requests for supplies from various internal departments.  They must then research available products and vendors that meet the criteria of the request.  Currently this research is done primarily via Google.  VTARC recently developed their own back end search engine to make this research more automated and efficient.

On top of this new search engine is a web crawler.  This takes a list of urls and search terms, crawls those urls, and utilizes the back end search engine to generate results that will be passed to the UI.

In this competition you will re-architect this web crawler to make it faster and more robust.

Competition Task Overview

Attached to this competition you will find the current Python source code for the web crawler. 

Examine this code and then devise a new architecture that will:

Detailed Requirements

  • Make the application more robust and stable
  • Increase the speed of crawling.  The first step here will be to make the application multi-threaded, with each thread crawling one url.  You should also come up with other methods for speeding up the crawl, if possible.
  • Change the output from a simple Red/Yellow/Green status for the overall task to a detailed percentage complete
  • Define and document the interface with the new front end.  This will include a list of urls as well as a list of search terms.  Each search term will also have an indicator for 'required', 'optional', or 'do not include'.
  • Define and document the interface with the back end search engine (called ALNLP in the code).
  • Define the Assembly Specification(s) to build this new sysytem.

Open Source Library

None have been identified, but please ask in the forum if you find one that

TC Components

No, as this is Python

Technology Overview

  • Python 3

References

None

Documentation Provided

  • REFGUI.zip


Final Submission Guidelines

Submission Deliverables

  • Application Design Specification
  • TCUML containing interface/class definitions, assembly diagram, sequence diagrams, etc.
  • Assembly Specifications (NO COMPONENTS)
  • Must provide sufficient details because this project is assembly direct

Submission Guidelines

For each member, the final submission should be uploaded to the Online Review Tool.

ELIGIBLE EVENTS:

2013 TopCoder(R) Open

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30033113