Topcoder Challenge | Topcoder Community

Challenge Overview

A previous challenge has implemented a set of REST APIs for handling video assets, including storing them and managing them (create, retrieve, update, delete). We also built a sample RSS scraper that parses data out of configured feeds and puts video assets in to the data store using the video REST API. This challenge will implement a new parser to parse out foodnetwork.com videos

Existing Code

The existing application is in Gitlab and access will be provided through links in the forum.

Scraper

The scraper will be implemented as a configurable delayed job. The job will run at a configurable interval and will read in the foodnetwork video page, looking for assets added since the last time it ran. Each asset will be parsed and placed into the data store using the REST API.

The scraper will be configured with:

* A URL to the Food Network video page
* A category to use when adding videos
* A provider value to use when adding videos

Sample data

For this challenge, please target the data in the Food Network page here:

view-source:http://www.foodnetwork.com/videos/channels/picnic-favorites-from-food-network-chefs.vc.html

The category value should be "Lifestyle" for the scraper, and the provider will be "Food Network"

Note: I have not found an easily parseable feed format for Food Network (RSS, ATOM, etc...), but if you can find one, that would be preferred, as long as the data matches what is on the videos page.

The parser will be expected to scrape the page, finding references to specific videos. There does appear to be an easily parseable "videos" array as part of an embedded script on the page. We can use that to grab:

* Title
* Description
* Duration
* Thumbnail URL (16x9)

To get the playback URL, grab the releaseUrl parameter for a video, download the smil file, and parse it, looking for the "video" tag and the "src" attribute. It should be something like this:

http://sniidevices.scrippsnetworks.com/0222/0222121_3.mp4

Heroku deploy

Your deployment documentation should extend the existing documentation for the Node services and should cover how to deploy the newly created job to Heroku to run at a regular interval on a separate dyno from the service.

Existing bugs

There may be a few minor bugs in the code right now - these are not your responsibility to fix, unless they block implementation of the requirements above. It would be appreciated if you logged them as part of your submission.

Submission format

Your submission should be a Git patch file against commit hash f9090ce94db2c9f8fd7f987ccb940a5529989045. Make sure to test your patch file before submitting!

Deployment document

Your patch file should update the README with information about configuring and using the Fox News parser.

Final Submission Guidelines

Please see above

Hercules TV Web Apps News and Lifestyle Pages - Food Network Content Scraper

Challenge Overview

Final Submission Guidelines

Learn

Review style

Final Review

Approval

Challenge links

Toolbox

ID: 30054459