Challenge Overview
In this challenge series we are building a simple tool that helps users with a database cleaning process to eliminate duplicate data following certain criteria defined by the user. Data is stored in MaprDB tables and we will build various MapReduce/Hadoop jobs to manage that data.
In a previous challenge we built a MapReduce job for filtering the data based on a simple criteria. Input to the job is the data from the MapR-DB table. The job goes through all the data rows and inserts them into one of two new tables
API should use consistent naming conventions (camelCase) and don't introduce multiple names for the same entities. Response types and error handling should be covered in swagger document. API will later be implemented with Spring Boot.
Submit the screens to api mapping document
In a previous challenge we built a MapReduce job for filtering the data based on a simple criteria. Input to the job is the data from the MapR-DB table. The job goes through all the data rows and inserts them into one of two new tables
- Clean data table (table name Final_#tablename) or
- Manual audit table (table name Stage_#tablename)
- We have a csv file (data definition file) describing columns in the input data including column names and valid ranges.
- For some of those columns, the input data has duplicate columns named "original_name_dup_#number" and only one of those values is expected to be in the allowed range (specified by PLOT_MIN, PLOT_MAX in the data definition file) and that is the value that should be used in the clean data table.
- If more than one value is in the allowed range, we can't decide on the correct one, so the entire row should be added to the manual audit table that will later be reviewed by users.
-
Swagger API definition
-
Database model (MaprDB tables)
-
UI prototype screens to API mapping document
API should use consistent naming conventions (camelCase) and don't introduce multiple names for the same entities. Response types and error handling should be covered in swagger document. API will later be implemented with Spring Boot.
Final Submission Guidelines
Submit the swagger API designSubmit the screens to api mapping document