Challenge Overview
Challenge Overview
Welcome to Infant nutrition backend API challenge. In this challenge, we aim to create a backend API for the infant nutrition dashboard tool.
Project Overview
In this project we will be:
-
Scraping retail sites for product info, ratings, reviews, nutrients and ingredients data
-
Identifying competing products across brands based on ingredients and nutrients data
-
Analyzing user reviews to identify topics, positives, and negatives for each product group and brand
-
Looking for identified items in social media posts to estimate how popular/important each one of them is
-
Providing reports that allow for drill-down per topic, brand, product group or individual product level
Technology Stack
-
NodeJS
-
Mongo
Assets
We’re starting a new codebase. It’s up to you to create the base code for the tool.
Products database backup is available in the forums.
Individual requirements
So far we have been building a scraper tool that creates products data in Mongo collection, and a data extraction tool that creates additional product attributes (review sentiments, review topics, etc) - in short, we have a collection of products saved to the Mongo database. Now we would like to modify the data structure in the database so we can capture the historical data (ex search rank each time the scraper is run, product price changes, etc) and build a simple read API that exposes the data along with a few filtering and grouping options.
The current product document structure is
-
id
-
sku
-
upc
-
gtin
-
source
-
name
-
description
-
descriptionDetail
-
reviews: rating, title, date, textContent, sentiment: positive, negative, neutral, compound
-
Price
-
Ranking: keyword:rank
-
rating: overall, total, fiveStars, fourStars, threeStars, twoStars, oneStars
-
lastUpdated
-
productUrl
-
ingredients: name, amount, unit, referenceValue
-
nutrients: name, amount, unit, referenceValue
-
sentiment: positive, negative, neutral, compound
-
topics: positive, negative
You need to update the document structure to enable persisting historical values for these attributes:
-
Price
-
Ranking
-
Rating
Note: reviews already have date field that can be used for filtering by time interval
It is up to you whether to store the historical values as new attributes in the product document, or in separate collections (both options have tradeoffs between ease of implementation vs query performance).
The following API endpoints need to be implemented
-
/docs - serves the swagger UI for the API
-
Product search (/search)
Search by product name. Returns only product id, brand, name, detail. Supports pagination and filtering by brand -
Product details (/products/:id)
Returns latest product details (name, description, descriptionDetail, url, ingredients, nutrients, sentiment, topics, images) and historical data for price, ranking and ratings. Images should contain only an id, not the complete image data -
Product images (/products/:id/images/:id)
Returns products image -
Product reviews (/products/:id/reviews)
Returns product reviews -
Brand nutrients (/brands/nutrients)
Aggregates the nutrients data per brand and returns an array of nutrients for each brand with nutrient value = percentage of products that contain that nutrient (product.nutrients.amount>0) -
Brand ingredients (/brands/ingredients)
Aggregates the ingredients data per brand and returns an array of nutrients for each brand with nutrient value = percentage of products that contain that nutrient (product.nutrients.amount>0) -
Brand rating statistics (/brands/ratings?startDate&endDate)
Aggregates historical brand ratings data and returns number of new ratings for each brand in the provided time interval -
Brand review statistics (/brands/reviews?startDate&endDate)
Aggregates historical brand review sentiment data and returns number of new reviews and, number of reviews with positive, negative and neutral sentiment and average review sentiment for each brand in the provided time interval -
Products with largest price changes (/stats/priceChanges?startDate&endDate)
Returns products with largest price change (pcnt) in the provided time interval - returns top 10 results -
Products with largest rating changes (/stats/ratingChanges?startDate&endDate)
Returns products with largest overall rating change in the provided time interval - returns top 10 results -
Products with largest sentiment changes (/stats/sentimentChanges?startDate&endDate)
Returns products with largest sentiment change in the provided time interval - returns top 10 results
Create a docker file for the app and a docker-compose script that runs the app and starts a mongo DB. Create a script to generate some demo historical data for verification
What to submit
-
Submit the full source code for the API and a README with configuration, deployment and verification steps
-
Submit a Postman collection for verifying the API