Topcoder broadly has two kinds of contests in Data Science:
Data Science Match
Data Science Challenge (also known as Data Science Sprint)
In a normal Data Science Challenge/sprint, there is no automatic scoring of submissions, and usually the submission is reviewed only once during the review phase, which starts once the submission phase has ended.
A Data Science Match on the other hand, has live auto-scoring, where the contestant can see their submission’s score within a few minutes on the leaderboard. Note - The same holds true for Marathon Matches (which previously used to include Data Science Matches, but now is primarily used for Competitive Programming style problems with live-scoring)
The key question to enable auto-scoring - given that submissions need to be auto-scored, how do we achieve that, given that all contestants would be working on their own solutions and might have different coding styles, libraries, dependencies etc?
The answer - While allowing the contestants to use whatever stack they prefer, to build their own unique algorithms and models, there is certainly a need to ensure that there is at least some consistency among all submissions. This consistency is required in the following areas:
a. For supporting multiple languages - The industry-standard way to achieve this is to use containers, via tools such as ‘Docker’. Here, Docker container acts as a mini-standalone OS, along with all the custom code from the developer, which can be virtualised on a host computer which has Docker installed. Given that the entire OS has been made available via the container, any language can be used.
b. For ensuring fulfillment of project-specific dependencies - In almost all projects, there are certain dependencies that need to be fulfilled before the software can successfully deploy. Manually, this dependencies are typically fulfilled by running the required scripts, downloading the relevant files etc. But when a large variety of languages and hence dependencies are anticipated, the industry-standard way to reliably fulfill them is to use container-based tools such as Docker.
c. For ensuring a consistent input/output interface with each submission - Each contestant typically develops a custom/unique way to invoke the key parts of their submissions, such as the training script, the testing script etc. If no consistency is enforced in the manner in which these key actions are invoked, it would not be possible to auto-score the submissions. Hence, in addition to requiring the need for Docker, Topcoder also requires that some specific shell files (linux script files) be placed in particular folders of the Docker container, so that the automated script can expect to find them in the designated folders. As soon as those are found, they can be invoked.
To address all the concerns listed above, Topcoder came up with a Docker based submission template, the details of which can be found here: GitHub - topcoder-platform-templates/marathon-data-and-code.
In this template, the contestant needs to ensure that all requirements of the template are fulfilled, and after that they are free to add any custom code to it, as long as no requirement of the template is violated. The repository readme contains details of the template and how to craft a valid submission using template.
In addition to the details mentioned in the template repo, considering the popularity of Python in Data Science Matches, I have shared a sample submission using the same template here: DataScienceMatchTemplate - Google Drive. This contains the sample submission in Python and the text file, which contains the common commands to be used for developing and deploying a similar submission package.
To better illustrate these resources, please refer to the following video:
The template details, the example Python submission using the template, the command text file (both in the Google Drive folder above) and the video shared here should enable any motivated contestant to understand the Docker based template, which while ensuring smooth automatic deployment and execution, will also provide the freedom to contestants to try the language/tool/stack of their choice, without worrying about any custom code to enable deployment and interfacing, apart from the requirements of the template.