Challenge Overview
A previous challenge has implemented a set of REST APIs for handling video assets, including storing them and managing them (create, retrieve, update, delete). This challenge will add some new live TV scrapers and will update the YouTube scraper to read in / parse better descriptions.
Existing API
The existing Node application and deployment details are in Gitlab, and the URL to the repository can be found in the forum.
YouTube scraper
The current YouTube scraper is pulling in jumbled descriptions due to all sorts of URLs that users put in the text. Here is an example from SkyNews that is parsed:
Am I Zlatan? People have been taking photos with this man outside Manchester United's football ground, but he's got a surprise for them.SUBSCRIBE to our YouTube channel for more videos: http://www.youtube.com/skynewsFollow us on Twitter: https://twitter.com/skynews and https://twitter.com/skynewsbreakLike us on Facebook: https://www.facebook.com/skynewsFor more content go to http://news.sky.com and download our apps:iPad https://itunes.apple.com/gb/app/Sky-News-for-iPad/id422583124iPhone https://itunes.apple.com/gb/app/sky-news/id316391924?mt=8Android https://play.google.com/store/apps/details?id=com.bskyb.skynews.android&hl=en_GB
The actual description is "Am I Zlatan? People have been taking photos with this man outside Manchester United's football ground, but he's got a surprise for them", and everything after that is just junk that clutters up the final UI.
We need to update the YouTube scraper to be a bit smarter about how it pulls in descriptions, removing the "SUBSCRIBE" details and all the URL links in the text. This should be relatively portable, not just focused on SkyNews, but other things like Eater as well.
Live TV scrapers
This challenge will add new Live TV scrapers for:
ABC News:
* Web URL: http://abcnews.com/
* Video URL: http://abclive.abcnews.com/i/abc_live4@136330/master.m3u8?b=500,300,700,900,1200
CBS News:
Note - this will be an update to the existing live TV scraper
* Web URL: http://cbsnews.com
* Video URL: http://cbsnews-linear.mdialog.com/video_assets/cbsnews.m3u8?api_key=563b80c1ae4ce359830f572d2496a947&iu=/8264/vaw-can/mobile_web/cbsnews_mobile
NASA TV:
* Web URL: http://www.nasa.gov/multimedia/nasatv/
* Video URL: http://nasatv-lh.akamaihd.net/i/NASA_101@319270/master.m3u8
Reuters:
* Web URL: http://www.reuters.tv/live
* Video URL: http://37.58.85.156/rlo001/ngrp:rlo001.stream_all/playlist.m3u8
Note that the video URLs are just samples and could be different in the actual implementation. The goal is to get out the M3U8 URLs for playback.
README
Existing API
The existing Node application and deployment details are in Gitlab, and the URL to the repository can be found in the forum.
YouTube scraper
The current YouTube scraper is pulling in jumbled descriptions due to all sorts of URLs that users put in the text. Here is an example from SkyNews that is parsed:
Am I Zlatan? People have been taking photos with this man outside Manchester United's football ground, but he's got a surprise for them.SUBSCRIBE to our YouTube channel for more videos: http://www.youtube.com/skynewsFollow us on Twitter: https://twitter.com/skynews and https://twitter.com/skynewsbreakLike us on Facebook: https://www.facebook.com/skynewsFor more content go to http://news.sky.com and download our apps:iPad https://itunes.apple.com/gb/app/Sky-News-for-iPad/id422583124iPhone https://itunes.apple.com/gb/app/sky-news/id316391924?mt=8Android https://play.google.com/store/apps/details?id=com.bskyb.skynews.android&hl=en_GB
The actual description is "Am I Zlatan? People have been taking photos with this man outside Manchester United's football ground, but he's got a surprise for them", and everything after that is just junk that clutters up the final UI.
We need to update the YouTube scraper to be a bit smarter about how it pulls in descriptions, removing the "SUBSCRIBE" details and all the URL links in the text. This should be relatively portable, not just focused on SkyNews, but other things like Eater as well.
Live TV scrapers
This challenge will add new Live TV scrapers for:
ABC News:
* Web URL: http://abcnews.com/
* Video URL: http://abclive.abcnews.com/i/abc_live4@136330/master.m3u8?b=500,300,700,900,1200
CBS News:
Note - this will be an update to the existing live TV scraper
* Web URL: http://cbsnews.com
* Video URL: http://cbsnews-linear.mdialog.com/video_assets/cbsnews.m3u8?api_key=563b80c1ae4ce359830f572d2496a947&iu=/8264/vaw-can/mobile_web/cbsnews_mobile
NASA TV:
* Web URL: http://www.nasa.gov/multimedia/nasatv/
* Video URL: http://nasatv-lh.akamaihd.net/i/NASA_101@319270/master.m3u8
Reuters:
* Web URL: http://www.reuters.tv/live
* Video URL: http://37.58.85.156/rlo001/ngrp:rlo001.stream_all/playlist.m3u8
Note that the video URLs are just samples and could be different in the actual implementation. The goal is to get out the M3U8 URLs for playback.
README
Make sure the README is updated with verification information about the new features and configuration information so they can be easily added.
Unit tests
As with the other scrapers, unit tests are required for these new changes.
Heroku deploy
Make sure the Heroku deployment information is up-to-date and that you keep the package.json up to date as well. Don't expect the deployment to be anything other than "npm install" / "npm start" locally and "git push heroku master" for Heroku deployment.
Submission format
Your submission should be provided as a Git patch file against the commit hash mentioned in the forum. MAKE SURE TO TEST YOUR PATCH FILE!