Challenge Overview

In this challenge series we are building a simple tool that helps users with a database cleaning process to eliminate duplicate data following certain criteria defined by the user. Data is stored in MaprDB tables and we will build various MapReduce/Hadoop jobs to manage that data. 
In a previous challenge we built a MapReduce job for filtering the data based on a simple criteria. Input to the job is the data from the MapR-DB table. The job goes through all the data rows and inserts them into one of two new tables
  • Clean data table (table name Final_#tablename) or
  • Manual audit table (table name Stage_#tablename)
The logic for separating out the rows into one of the two tables is this:
  • We have a csv file (data definition file) describing columns in the input data including column names and valid ranges.
  • For some of those columns, the input data has duplicate columns named "original_name_dup_#number" and only one of those values is expected to be in the allowed range (specified by PLOT_MIN, PLOT_MAX in the data definition file) and that is the value that should be used in the clean data table.
  • If more than one value is in the allowed range, we can't decide on the correct one, so the entire row should be added to the manual audit table that will later be reviewed by users.
We have also built the UI prototype for the app that will help users review the data and clean the data that requires manual audit. In this challenge we want to design the backend REST API and the database models. No coding is required. Challenge output is:
  • Swagger API definition

  • Database model (MaprDB tables)

  • UI prototype screens to API mapping document

UI prototype is deployed here, and data cleaning tool code is available in the project repositorySee the wireframe design challenge details for description of application features (this should be obvious from the wireframes as well, but if anything is not clear, please do ask questions in the forums).
API should use consistent naming conventions (camelCase) and don't introduce multiple names for the same entities. Response types and error handling should be covered in swagger document. API will later be implemented with Spring Boot.

 

Final Submission Guidelines

Submit the swagger API design
Submit the screens to api mapping document

ELIGIBLE EVENTS:

2018 Topcoder(R) Open

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30064597