Topcoder Challenge | Topcoder Community

Challenge Overview

A previous challenge has implemented a set of REST APIs for handling video assets, including storing them and managing them (create, retrieve, update, delete). This challenge will add some new scrapers to support HGTV and Travel Channel videos.

Existing API

The existing Node application and deployment details are in Gitlab, and the URL to the repository can be found in the forum.

HGTV:

The HGTV scraper will be configured against a URL like this:

http://www.hgtv.com/shows/full-episodes We want to scrape out the individual items that are in the JS lists like this:

{
  "id" : "http://data.media.theplatform.com/media/data/Media/465071171695",
  "title" : "Couple Seeks a Unique Fixer",
  "showTitle" : "Fixer Upper",
  "description" : "A blended family with a baby on the way hopes to find a one-of-a-kind home.",
  "releaseUrl" : "http://link.theplatform.com/s/ip77QC/ZcDEzGXu9i6d?format=SMIL&MBR=true",
  "thumbnailUrl" : "/content/dam/images/hgtv/video/0/02/023/0230/0230157.jpg",
  "length" : "2582",
  "duration" : "43:02",
  "publisherId" : "HGTV",
  "nlvid" : "0230157",
  "scrid" : "2429199",
  "cmsid" : "6142f2e61e1aba9b3d593912b99449d2",
  "sniGUID": "12406ba9-d53f-46e7-b4e8-c5e9f191ceb5",
  "sponsor" : "hg_sh_fixer_upper"
}

The SMIL file in the releaseUrl field will contain a bunch of MP4 URLs. We will use the highest bitrate MP4 as the video URL for insertion into the DB.

When inserting the thumbnail URL, please make sure it's an absolute URL, not a relative one.

In addition to parsing that page, we want to go one branch deeper on this page (please just use recursion / the same code as scraping above, unless absolutely necessary), and we want to grab the playlists like this:

<h4 class="m-MediaBlock__a-Headline">
<a href="http://www.hgtv.com/shows/fixer-upper/fixer-upper-full-episodes-season-2-videos">
<span class="m-MediaBlock__a-HeadlineText">Fixer Upper Full Episodes - Season 2</span>
<span class="m-MediaBlock__a-AssetInfo">13 Videos</span>
</a>
</h4>

The href URL will have additional videos on it that can be parsed out and inserted as well.

Travel Channel:

The Travel Channel parser will be configured with this URL:

http://watch.travelchannel.com��

We will parse out the main page and will continue to parse out the "Load More Videos" information loaded asynchronously until no more videos come back or the limit is reached.

You can optionally parse out the JSON files like this for helping with pulling additional videos, if you want

http://feed.theplatform.com/f/ZwAKHC/jZ4qWhkK6PXd?form=json&fields=content,:sNIGUID,restrictionId,:seriesId,:episodeNumber,:nonLinearId,:showAbbr,:c3AirDate,description,title,:episodeType,:showName,thumbnails,author&fileFields=duration,url,width,height&byCustomValue={videoType}{episode|special},{episodeType}{C3|D4}&count=true&sort=:c3AirDate|desc&range=0-20

But I'm not sure that's going to be overly helpful.

For the Travel Channel, we only want to grab free videos - nothing with the "lock" symbol on it.

The individual links will go to a page like this:

http://watch.travelchannel.com/player.TBTC.html#0248073��

On that page we can grab the SMIL file and pull out the highest bitrate MP4 file for playback in our app.

Integration

These additional scrapers must integrate back into the app the same way the other scrapers work. They should be configured using the admin pages to add and edit scrapers, and they should work using the src/feedscript.js --scraperName=... flow that the other scrapers use. Basically, all the admin should have to do is add the scraper in the admin panel and run it. The admin shouldn't have to know what exactly the scraper is doing or have to configure each one with all sorts of custom information.

In addition, make sure your scrapers work with the new functionality:

* Configurable category and sub-category
* Scraper limits

Don't pull all videos and *then* limit the number of videos added. Only request and parse the number of videos that match the scraper limit.

README

Make sure the README is updated with verification information about the new scrapers and configuration information so they can be easily added.

Unit tests

As with the other scrapers, unit tests are required for these new scrapers.

Heroku deploy

Make sure the Heroku deployment information is up-to-date and that you keep the package.json up to date as well. Don't expect the deployment to be anything other than "npm install" / "npm start" locally and "git push heroku master" for Heroku deployment.

Submission format

Your submission should be provided as a Git patch file against the commit hash mentioned in the forum. MAKE SURE TO TEST YOUR PATCH FILE!

Final Submission Guidelines

Please see above

Hercules TV Web Apps - HGTV and Travel Channel scrapers

Challenge Overview

Final Submission Guidelines

Learn

Review style

Final Review

Approval

Challenge links

Toolbox

ID: 30054625