Challenge Overview
The HP Data Audit group is looking to develop a toolset to validate and analyze HP Products available on their web store as well as other web sites. In previous challenges we have developed a web crawler and data extraction processes to populate product-related data into a Vertica database. The source of the data is the Hewlett-Packard website for the United States:
http://www.hp.com/country/us/en/hho/welcome.html
The purpose of the crawl and the data extraction processes is to gather the data elements from product pages such as this one:
http://store.hp.com/us/en/pdp/Laptops/hp-spectre-x360---13-4002dx-%28energy-star%29
In addition to the extraction processes we're going to be developing a simple auditing web application which uses this data. In preparation for the development of the web application, we're going to be developing a REST API interface using Java's Jersey to retreive data. Our data extraction processes are using Hibernate to extract data from Vertica. You are encouraged to use the same processes. Here are the API's that we need to implement:
/api/v1/products
/api/v1/products/:product_number
/api/v1/products/:product_number/images
/api/v1/products/:product_number/relatedaccessories
/api/v1/desktops
/api/v1/desktops/:product_number
/api/v1/laptops
/api/v1/laptops/:product_number
/api/v1/printers
/api/v1/printers/:product_number
/api/v1/tablets
/api/v1/tablets/:product_number
The following Google Doc provides some more detail about each of the API endpoints detailed above: https://docs.google.com/spreadsheets/d/10SNWr54IANT1df9po0wfL_L68fxgduLW6IdcuiHfKys/edit?usp=sharing
Notes:
1. All our API's are implementing read-only functionality with GET calls. There will not be update or post functionality required for this implementation.
2. The resources returned will be JSON objects based on the schema's of the Vertica tables which the calls access. For example, here is the product table:
CREATE TABLE Product (
productNumber VARCHAR(5000) NOT NULL,
version INTEGER NOT NULL,
auditTimeStamp TIMESTAMP,
id INTEGER,
productUrl VARCHAR(5000),
sourceFile VARCHAR(5000),
productType VARCHAR(5000),
currentPrice NUMERIC(10, 2),
currency VARCHAR(3),
strikedPrice NUMERIC(10, 2),
dateAdded DATE,
previousPrice NUMERIC(10, 2),
dateOfPriceChange DATE,
rating INTEGER,
previousRating INTEGER,
numberOfReviews INTEGER,
dateOfRatingChange DATE,
parsingError VARCHAR(5000),
dateOfParsingError DATE,
comingSoonDate DATE,
availableForSaleDate DATE,
PRIMARY KEY (productNumber)
);
the product resource will look like the following:
{
"Product":
{
"productNumber": "J8U63UT#ABA",
"version": "1",
"auditTimeStamp": "09-01-2015 00:00:01",
"id": "1",
"productName": "HP EliteBook Folio 9480m Notebook PC (ENERGY STAR)",
"productUrl": "http://store.hp.com/us/en/pdp/business-solutions/hp-elitebook-folio-9480m-notebook-pc-%28energy-star%29-p-j8u63ut-aba--1",
"sourceFile": "productPage3.html",
"productType": "Laptop",
"currentPrice": "1249.00",
"currency": "USD",
"strikedPrice": "",
"dateAdded": "08-01-2015",
"previousPrice": "",
"dateOfPriceChange": "",
"rating": "5",
"previousRating": "4",
"numberOfReviews": "15",
"dateOfRatingChange": "08-31-2015",
"parsingError": "",
"dateOfParsingError": "",
"comingSoonDate": "",
"availableForSaleDate": "08-01-2015"
}
}
The schema file for the database is attached along with a sample data file. It should be noted however, that the sample data file isn't complete. You can expect null values in a number of columns as our data model has expanded recently and we don't have sample data to provide from our data extraction process.
3. For our list api calls, (/api/v1/products, /api/v1/printers, /api/v1/laptops, /api/v1/desktops, /api/v1/tablets) we should allow clients to pass a "fields" parameters allowing them to designate which fields that they would like to include in the returned resource. (e.g. GET /api/v1/products?fields=productNumber, productName, currentPrice)
4. For our list api calls we should implement paging. (e.g. GET /api/v1/products?offset=10limit=5). Default values should be limit=20 and offset=0.
5. Errors should be provided with HTTP Status Codes:
200 – OK – Eyerything is working
400 – Bad Request – The request was invalid or cannot be served. The exact error should be explained in the error payload. E.g. "The JSON is not valid"
404 – Not found – There is no resource behind the URI.
6. For our lists calls, we should offer the ability to return/include both image data and related accessory data as subobjects as part of the returned json based on a parameter. (e.g. GET /api/v1/products?images=true&ra=true)
7. For our list calls, we should implement filtering based on the Search parameter fields provided in the table above. Note that minCurrentPrice, maxCurrentPrice refer to ranges on the currentPrice field. For some of the search parameters like Ratings or Product Types multiple values might be provided. Our API should support this.
8. For our list calls, the application should support sorting. If possible, we should implement sorting on all possible fields. However, if dynamic sorting is too difficult to implement, the miminum fields which should support sorting are: productName, currentPrice, and rating. (e.g. GET /api/v1/products?sort=-productName,+currentPrice)
The required database for this REST service application is Vertica. You can download a community edition of Vertica directly from HP: http://www.vertica.com/. You simply sign in for a free developer account. However, a direct Vertica installation requires a Unix/Linux server. The more straightforward way to standup Vertica is to use VMWare. VMWare also has free trials available. A server image can be found at my.vertica.com. But Topcoder is providing a recent disk image file for Vertica at the following link. This is a large download (~2 GB).
https://drive.google.com/file/d/0ByjxTGykXQjAWkkwTWUzcXJucjQ/view?usp=sharing
JDBC Jar files for Vertica can be found here:
http://www.vertica.com/resources/vertica-client-drivers/
Here is a video which describes the deployment process for the data extraction application: http://youtu.be/BbwsGE6D7aM. This video includes several helpful steps about how to configure your code to interact with Vertica deployed on a virtual machine.
Final Submission Guidelines
- Please include your source code, build scripts, and documentation in your submission.
- You should use Java to implement this solution and Jersey to implement the REST API.
- Maven is the required build tool for this application.
- Vertica is the required database for this service. Vertica requires a Linux/Unix server for implementation.
- You're strongly encouraged to use Hibernate to interact with the Vertica database. Sample code to interact with the database is being provided with the data extraction tool. You can find this code in the forums for this challenge.
- Sample data is attached to the challenge along with the schema file for the Vertica database. The schema should be basis for the returned JSON objects.
- Please include written instructions which document how to build and run your solution.
- You should include a screensharing video documenting your completed solution.