Challenge Overview
Over the past year Topcoder has developed an Product Inventory Audit Web Application. First we developed a web crawler to pull the raw html from the site, and data extraction processes to parse data and put information into the Vertica database platform. We've also developed a REST API which allows clients to access the data in JSON format over HTTP and the first client for this service - a Product Inventory Audit Web application.
We also have a Best Buy REST API interface which we use to pull reviews, ratings and pricing from Best Buy (https://github.com/cloudspokes/hp_api_product_extractor). Since the initial creation of the Best Buy application, there have been a few significant changes:
1. Best Buy has deprecated its Review API. This is no longer publicly available without a license agreement.
2. HP has provided UPC data for a significant segment of their product line.
3. The Product Inventory Application is collecting product and product specification data in a more flexible way to support collection of product data from multiple sites.
Please make the following updates to the API Product Extraction application:
1. Remove the code making the Review API calls. This code is currently commented out but it is unlikely that this code will ever be used again. It is technical debt that should be removed from the app.
2. Rather than a using the Best Buy API app as a data enhancement mechanism for products, let’s promote this app to create Product records within the Product table using the site id (2) for Best Buy. You can simply add the “manufacturer=hp” parameter to REST API request URL. The product field mapping.xlsx will be provided in the Code Document forum attached to this challenge which shows the mapping between the Product table in Vertica and the Best Buy Products API response.
3. Let’s pull pricing and rating data available using the Products API. There should still be aggregated rating data available even if individual reviews are not available.
4. You should populate the Product Specification table with the product's details collection (https://bestbuyapis.github.io/api-documentation/#detail) from the Best Buy Products API. This is visible if the “show=all” parameter is included in the Products API call.
5. You'll need to map the Categories provided by Best Buy to the Primary Product Type.
6. Product.id is a synthetic key added when a product is new. Likewise, Product.dateAdded is only updated the first time a product is inserted.
7. Product.primaryProduct will only be true if the product isn't found on one of the HP sites.
8. Product.sourceFile is the name of the json file containing all hp products, which is pulled everyday. It should be named as: bestbuyproducts-mm-dd-yyyy.json.
9. Product.fullText should be the values of all fields (delimited by |) of the corresponding product json from the Best Buy API response.
The mapping file (from the BestBuy API to the Product table) will be provided in the forum.
Sample API call using some of the parameters discussed above:
https://api.bestbuy.com/v1/products(manufacturer=hp)?format=json&apiKey={{your api_key}}&active=true&show=all&pageSize=100
Final Submission Guidelines
- Upload all your source code in a zip file.- Provide documentation for your application. It should contain complete build, deployment, and execution instructions.
- Screen sharing video is not required for this application.
- You should use the existing code found in the GitHub repositories as the starting point for this application. The details for the GitHub repositories can be found in the Code Document forums attached to this challenge.
- This application uses the Vertica database as a persistence layer. We have a docker script which configures this database for you. The details can be found in the Code Document forums attached to this challenge.