Challenge Overview
Challenge Objectives
In this challenge you will provide the training data, data analysis done in the previous challenge and test harness scripts. Based on the given inputs you have to
Project Background
Printing consumables are a significant cost and revenue driver in the printing business. Accurate predictions of when they will occur provide many opportunities for improvement such as component design, pricing, and service schedules. The project aims to predict near term component replacements.Specifically, 80% or better accuracy in predicting all replacements that will occur in the next 2 weeks. The success of the project depends on the achieving a False positive rate less than 20% and a False negative rate less than 20%.
Technology Stack
Python 3 / Or any language of your choice.
Individual requirements
Training Dataset: The dataset provided contains a unique id followed by number of columns with both numerical and non-numerical values. The last column is t/f (True/False) which is the target value that needs to be predicted. The training.csv which displays ID numbers (to identify each row in the training dataset) and the predicted t/f value.
Testing DataSet: The dataset is similar to training dataset, however without the target_value (t/f).
1. Build and Train the Model - This is entirely up to you to figure out the best predictive analytics approach that needs to be used based on the inputs provided. All the charateristics should be backed by data analysis done.
2. Prediction requirements - Generate a csv file with the row identifier and the predicted t/f values.
What should be the prediction format?
As part of this challenge you should provide a testing_data.csv file as output that matches the format of the training_data.csv. Please include the following header in your file:
ID, TargetValue.
Deployment Guide
Make sure you provide a README.md that covers how to run the script in any environment.
Important Notes:
Review will be highly subjective and done by the client. No appeals will be allowed. The solutions will be ranked in order of accuracy of the predictions on the testing set. You can use the test harness script provided. Your solution must be flexible enough to accommodate new training and testing files.
In this challenge you will provide the training data, data analysis done in the previous challenge and test harness scripts. Based on the given inputs you have to
- Build and train a model
- Make Predictions (True/False)
Project Background
Printing consumables are a significant cost and revenue driver in the printing business. Accurate predictions of when they will occur provide many opportunities for improvement such as component design, pricing, and service schedules. The project aims to predict near term component replacements.Specifically, 80% or better accuracy in predicting all replacements that will occur in the next 2 weeks. The success of the project depends on the achieving a False positive rate less than 20% and a False negative rate less than 20%.
Technology Stack
Python 3 / Or any language of your choice.
Individual requirements
Training Dataset: The dataset provided contains a unique id followed by number of columns with both numerical and non-numerical values. The last column is t/f (True/False) which is the target value that needs to be predicted. The training.csv which displays ID numbers (to identify each row in the training dataset) and the predicted t/f value.
Testing DataSet: The dataset is similar to training dataset, however without the target_value (t/f).
1. Build and Train the Model - This is entirely up to you to figure out the best predictive analytics approach that needs to be used based on the inputs provided. All the charateristics should be backed by data analysis done.
2. Prediction requirements - Generate a csv file with the row identifier and the predicted t/f values.
- 80% or better accuracy in predicting the target value.
- False positive rate less than 20%, false negative rate less than 20%.
What should be the prediction format?
As part of this challenge you should provide a testing_data.csv file as output that matches the format of the training_data.csv. Please include the following header in your file:
ID, TargetValue.
Deployment Guide
Make sure you provide a README.md that covers how to run the script in any environment.
Important Notes:
Review will be highly subjective and done by the client. No appeals will be allowed. The solutions will be ranked in order of accuracy of the predictions on the testing set. You can use the test harness script provided. Your solution must be flexible enough to accommodate new training and testing files.
Final Submission Guidelines
- Revised data analysis scripts with details on how to run it
- Source Code as zip file
- Documentation, including list of features (columns) and its correleation with label (target_value)