Register
Submit a solution
The challenge is finished.

Challenge Overview

Challenge Overview

 

Welcome to Infant nutrition product info scraper. In this challenge, we aim to create a CLI  tool - scraper (nodeJS), that would scrape the product search results from retail sites and save them to a database (Mongo)

Project Overview

In this project we will be:

  • Scraping retail sites for product info, ratings, reviews, nutrients and ingredients data

  • Identifying competing products across brands based on ingredients and nutrients data

  • Analyzing user reviews to identify topics, positives, and negatives for each product group and brand

  • Looking for identified items in social media posts to estimate how popular/important each one of them is

  • Providing reports that allow for drill-down per topic, brand, product group or individual product level

Technology Stack

  • NodeJS

  • Mongo

  • Amazon, Walmart, FirstCry

 

Assets

We’re starting a new codebase. It’s up to you to create the base code for the tool.

 

Individual requirements

Create a CLI tool that scrapes product info from keyword search results. List of search keywords should be configurable and use “infant nutrition” and “baby food” for verification. 

Sites to scrape are Amazon, Walmart, and FirstCry. For now, we only want to scrape English versions of these sites, so create the scrapers as templates (ie Amazon scraper should be able to scrape data from amazon.com, Amazon.uk, etc). Configuration should contain info on which sites to scrape and which keywords to use for each one.

The tool should detect duplicate products (in case a product shows up in results for multiple keywords) and save just one data copy to the database.

The following details should be scrapped for each product:

  • ID

  • Name

  • Description

  • Price

  • Rating info

  • User reviews

  • Product images with their URLs

Note that product images should be saved to the database, not just the image URL. It is up to you to create a database collection structure for saving the product details.

Log any errors to standard output.

Create a docker file for the app and a docker-compose script that runs the app and starts a mongo DB.

What to submit

  • Submit the full source code for the tool and a README with configuration, deployment and verification steps



 

Final Submission Guidelines

See above

ELIGIBLE EVENTS:

2020 Topcoder(R) Open

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30116579