Challenge Overview
Challenge Overview
Welcome to Infant nutrition product info scraper. In this challenge, we aim to create a CLI tool - scraper (nodeJS), that would scrape the product search results from retail sites and save them to a database (Mongo)
Project Overview
In this project we will be:
-
Scraping retail sites for product info, ratings, reviews, nutrients and ingredients data
-
Identifying competing products across brands based on ingredients and nutrients data
-
Analyzing user reviews to identify topics, positives, and negatives for each product group and brand
-
Looking for identified items in social media posts to estimate how popular/important each one of them is
-
Providing reports that allow for drill-down per topic, brand, product group or individual product level
Technology Stack
-
NodeJS
-
Mongo
-
Amazon, Walmart, FirstCry
Assets
We’re starting a new codebase. It’s up to you to create the base code for the tool.
Individual requirements
Create a CLI tool that scrapes product info from keyword search results. List of search keywords should be configurable and use “infant nutrition” and “baby food” for verification.
Sites to scrape are Amazon, Walmart, and FirstCry. For now, we only want to scrape English versions of these sites, so create the scrapers as templates (ie Amazon scraper should be able to scrape data from amazon.com, Amazon.uk, etc). Configuration should contain info on which sites to scrape and which keywords to use for each one.
The tool should detect duplicate products (in case a product shows up in results for multiple keywords) and save just one data copy to the database.
The following details should be scrapped for each product:
-
ID
-
Name
-
Description
-
Price
-
Rating info
-
User reviews
-
Product images with their URLs
Note that product images should be saved to the database, not just the image URL. It is up to you to create a database collection structure for saving the product details.
Log any errors to standard output.
Create a docker file for the app and a docker-compose script that runs the app and starts a mongo DB.
What to submit
-
Submit the full source code for the tool and a README with configuration, deployment and verification steps