Topcoder Challenge | Topcoder Community

Challenge Overview

A previous challenge has implemented a set of REST APIs for handling video assets, including storing them and managing them (create, retrieve, update, delete). This challenge will add some new scrapers to support Reuters and BBC videos.

Existing API

The existing Node application and deployment details are in Gitlab, and the URL to the repository can be found in the forum.

BBC:

The BBC scraper will be configured against a URL like this:

http://www.bbc.com/news/video_and_audio/international We want to scrape out the individual vidoes that are on the page and under each section (World, UK, Business, Politics, etc...)

The goal of this challenge is to properly "scrape" the video metadata off the page, filling in the metadata for the video data structure in the existing app. For the video URL, we want either an MP4 URL or an HLS URL (.m3u8 extension). You can see HLS or MP4 videos usually by switching your user agent to a mobile device like an iPhone.

Reuters

The Reuters parser will be configured against an RSS feed like one of these:

http://feeds.reuters.com/reuters/USVideoBreakingviews
http://feeds.reuters.com/reuters/USVideoBusiness
http://feeds.reuters.com/reuters/USVideoBusinessTravel
http://feeds.reuters.com/reuters/USVideoPolitics
http://feeds.reuters.com/reuters/USVideoLifestyle

We can grab thumbnails from the ATOM, along with video information, descriptions, publish date, etc..

For the video, we'll need to scrape that from the HTML link associated with each item in the feed. We'll need to grab that HTML and then parse out the video URL (either MP4 or HLS link).

One thing to do custom for the Reuters scraper - please remove the HTML from the descriptions. We just want the plain text *before* the encoded HTML in the description.

Integration

These additional scrapers must integrate back into the app the same way the other scrapers work. They should be configured using the admin pages to add and edit scrapers, and they should work using the src/feedscript.js --scraperName=... flow that the other scrapers use. Basically, all the admin should have to do is add the scraper in the admin panel and run it. The admin shouldn't have to know what exactly the scraper is doing or have to configure each one with all sorts of custom information.

In addition, make sure your scrapers work with this functionality:

* Configurable category and sub-category
* Scraper limits
* Thumbnail limits for height and width

Don't pull all videos and *then* limit the number of videos added. Only request and parse the number of videos that match the scraper limit.

README

Make sure the README is updated with verification information about the new scrapers and configuration information so they can be easily added.

Unit tests

As with the other scrapers, unit tests are required for these new scrapers.

Heroku deploy

Make sure the Heroku deployment information is up-to-date and that you keep the package.json up to date as well. Don't expect the deployment to be anything other than "npm install" / "npm start" locally and "git push heroku master" for Heroku deployment.

Submission format

Your submission should be provided as a Git patch file against the commit hash mentioned in the forum. MAKE SURE TO TEST YOUR PATCH FILE!

Final Submission Guidelines

Please see above

TCO16 Bonus - Hercules TV Web Apps - Reuters and BBC scrapers

Challenge Overview

Final Submission Guidelines

Learn

ELIGIBLE EVENTS:

Review style

Final Review

Approval

Challenge links

Toolbox

ID: 30054783