Topcoder Challenge | Topcoder Community

Challenge Overview

The TV Web Apps application is a Node app that scrapes content from various websites and aggregates it into a single app. We have scrapers for lots of different sites and this content will add two more for Wired and the New York Times (NYT). In addition, it will make some changes to all scrapers with regards to how thumbnail images are processed and retrieved.

Existing API

The existing Node application and deployment details are in Gitlab, and the URL to the repository can be found in the forum.

Image sizes

The eventual application will be displayed on low powered embedded devices. In our testing, the devices seem to have trouble loading some of the large thumbnail images for things like YouTube videos or other types of videos that can load in large thumbnails.

For this content, we will introduce new parameters for max heigh and width to the environmental variables (MAX_THUMBNAIL_WIDTH and MAX_THUMBNAIL_HEIGHT). All scrapers should be updated to only pull thumbnails that have a size that falls under the max width and max height.

If a source only has a single thumbnail size, then that scraper doesn't need to be updated, but for sources that do have different sizes, like YouTube, we need to make sure the change is applied.

Wired Scraper

This challenge will introduce a new scraper to scrape videos from Wired: http://feeds.cnevids.com/brand/wired.mrss This should be a straight-forward implementation because the RSS feed contains all necessary information. You won't need to load up pages or do anything other than parse out the feed information. Use the mp4 links in the media:content tag or the enclosure tag as the video URL and make sure to also parse the duration value (runtimeLength in the data store). Make sure the thumbnail is set properly.

New York Times

The New York Times video site is deviced into "channels" like this:

* http://www.nytimes.com/video/us-politics
* http://www.nytimes.com/video/technology

The scraper will be configured for a sub-channel like those above, so if we want to scrape multiple NYT channels, we would have multiple NYT scraper configurations, similar to what we can do with the "wsj" scraper.

Each channel has 12 recent videos, and those are the videos we want to scrape out.

The video URL should be the m3u8 / HLS link to the raw video, like this:

https://vp.nyt.com/video/2016/07/01/41142_1_02trump-campaign_wg_hls/master.m3u8

README

Make sure the README is updated with verification information about the new scrapers.

Unit tests

As with the other functionality, unit tests are required for these new scrapers.

Heroku deploy

Make sure the Heroku deployment information is up-to-date and that you keep the package.json up to date as well. Don't expect the deployment to be anything other than "npm install" / "npm start" locally and "git push heroku master" for Heroku deployment.

Submission format

Your submission should be provided as a Git patch file against the commit hash mentioned in the forum. MAKE SURE TO TEST YOUR PATCH FILE!

Final Submission Guidelines

Please see above

Hercules TV Web Apps - Image size tweaks and new scapers for Wired and NYT

Challenge Overview

Final Submission Guidelines

Learn

Review style

Final Review

Approval

Challenge links

Toolbox

ID: 30054715