Challenge Overview
Challenge overview
Topcoder is reaching out to our creative Topcoder community to find ideas which will help with Topcoder challenge management and challenge success rates.
In this challenge, we need to create model to predict if one challenge will get valid submission before it finishes.
CHALLENGE BACKGROUND
Thanks to the amazing work from the community, Topcoder has been running many challenges over the past few years.
Topcoder is reaching out to our creative Topcoder community to
- Research on the historical challenge data
- Build ML workflow using AWS Sagemaker
- Use ML technology to help with
- Easy Topcoder challenge management
- Increase challenge success rates. Challenge success here means one challenge gets one successful submission (winner) in the end.
Technology Stack
- Python 3
- AWS Sagemaker (Optional)
Task Detail
Data, including 5k challenges, is shared in the forum. The file headers should explain each column very well.
The final code will be deployed via AWS Sagemaker, so in this challenge, we allow two kinds of solutions:
1. Using your own algorithm
You need to provide one python script:
- the script should accept two arguments: training_data and test_data. Both point to a file.
- process the data
- train your model
- validate your model by test data
- write result to a file or print in stdout
2. Using AWS Sagemaker build-in algorithm
You need to create jupyter notebook instance in AWS Sagemaker and create one notebook. In this notebook:
- Preprocess the data
- Train a model with build-in algorithm, such as Linear Learning Algorithm https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner.html or XGBoost Algorithm https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html Feel free to choose any other AWS build-in algorithms
- Validate your model, either write result into S3 or print in stdout
For both approaches, please notice:
- Result format is one result per line, 1 for success challenge, 0 or fail challenge.
- We expect a simple algorithm in this challenge, please use only ONE algorithm for your model. Do not mix several algorithms, even the result is better. You are allowed to submit 2 solutions max in your submission, and we will take the higher accuracy solution as your score.
- Please use variables such as instance type, s3 bucket name, s3 bucket path on top of your solution file and mention them in README file, so that reviewer is able to test your solution easily.
- During the review phase, we will use the same data shared in the forum to train your model. And to validate, we will use another 200 challenges data. The result is evaluated on the accuracy of your predictions (the percentage of challenges you get correct).
- Add comments in your code.
- This challenge uses the DS subject scorecard.
Final Submission Guidelines
You submission must include the following items
- A report about your model, including data analysis, which fields you use, model details and variable importance.
- Jupyter notebook file or python script file
- README file including environment preparation and deployment instructions
- How to install required libs or how to prepare AWS env
- How to configure
- How to run your solution