Register
Submit a solution
The challenge is finished.

Challenge Overview

Challenge Details
  • Write code which will extract the details related to the keywords provided in the forum.
  • Persist the extracted data in a database table.

Project Background

Customer is a global healthcare company, they research, develop and manufacture consumer healthcare products. The purpose of this project to help find the universities and colleges that have the mandatory Seasonal Flu vaccination and also find universities/colleges that just recommend it.

Technology Stack
  • Python is preferred, however nodejs or java can be used for data extraction script.
  • MS SQL Server 2017.

Individual Requirements

Challenge Input
We ran few challenges previously to extract the list of colleges and specific URLs in the college website to look for the keywords of our interest.  The aggregated data gathered is provided as SQL scripts in the forum that can be loaded to an MS SQL database. Also the keywords are provided in the forum.

Scope
  • You have to write scrapers that can extract paragraph's that has the keywords present in it.
  • There are 2 type of URLs provided - HTML and PDF
  • For the HTML links web scrapers like scrapy can be used.
  • For the PDF Urls PDF parsing modules like PyPDF2 can be used.
  • The parsed data need be persisted in a new table with the associated college id.
Deployment Guide and Validation Document

Make sure to require two separate documents for validation.

A README.md that covers:
  • Deployment - that covers how to build and test your submission.
  • Configuration - make sure to document the configuration that are used by the submission.
  • Dependency Installation -  should clearly describe the step-by-step guide for installing dependencies and should be up to date.
A Validation.md that covers:
Validation of each requirement can be mentioned in this document which will be easier for reviewers to map the requirements with your submission.

Final Submission Guidelines

Submit your source code as zip file

ELIGIBLE EVENTS:

Topcoder Open 2019

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30072565