Challenge Overview
Challenge Objectives
The purpose of this challenge is to implement a web scraper that will extract the purchase history list from E-commerce (EC) Site. The tool will be used by our client who owns representative EC site in Japan.
Technology Stack
Data Extraction
Challenge Requirements
The following features need to implemented in this challenge:
The purpose of this challenge is to implement a web scraper that will extract the purchase history list from E-commerce (EC) Site. The tool will be used by our client who owns representative EC site in Japan.
Technology Stack
- JDK 8
- Gradle 3.5
- Spring Boot 1.5.7
- Libraries such as scraping, DI, unit test may may be freely used.
Data Extraction
- Data fields: order number, order date, product name, unit price, product quantity, product distributor, total amount of money, delivery status
- Data format: Json
Challenge Requirements
The following features need to implemented in this challenge:
- EC Site authentication module
- Input:
- EC site authentication page URL
- EC site of account information: email and password, such account information should be easily configurable
- Output: signed in EC site initial page of HTML data
- Process:
- Get the sign-in page (email address input)
- Based on the result of the above, set the email address, to get the sign-in page (password input)
- Based on the result of the above, set a password, to get the Signed in EC site initial page
- Remarks: we don’t have to take MFA into account
- Input:
- Purchase history list analysis module
- Input:
- HTML data of the purchase history list page of each EC site
- The purchase history data extracted last time
- Output:
- Purchase history list information
- Information extracted from HTML data: order number, order date, product name, unit price, product quantity, product distributor, the total amount of money, delivery status
- Process:
- Get the HTML data of the purchase history list page from the EC site.
- Read the purchase history list data that was previously gotten.
- Comparing the data above, extract the new data and ship it to Json.
- Remarks:
- Information to be obtained is the difference data of the previous acquired data
- The pagination should also be taken into account
- Input:
- Note: the application must be built using Spring Boot
- The following requirements are not in scope of this challenge but you should consider your code code design to accommodate for them in the near future:
- Get detailed purchase history
- Get product details
- Support multiple EC sites
- Support multiple accounts
Final Submission Guidelines
- Build script to execute the program
- Detailed readme in markdown format covering the following:
- Build instructions
- How to configure the program
- How to execute the program and verify results