Challenge Overview
This Marathon Match is part of the March Madness Marathon Match Series and will have the following prizes and bonuses:
Prize Distribution
1st $12,000
2nd $8,000
3rd $6,000
4th $4,000
5th $3,000
Total Prizes $33,000
In addition to the main prizes there are special prizes of $250 each for the highest scoring
-
first-time marathon participant,
-
participant who hasn't submitted since 2016, and
-
first-time sponsored marathon participant.
See here for details.
Introduction
Facial re-id is a cornerstone capability in many forward-looking applications. The ability to not only detect faces, but also distinguish if a face has been seen before has a broad range of applications. From robot-human interactions to biometric user identification, the ability to distinguish people from each other and recognize someone that has been seen before is an empowering capability. This competition is interested in an algorithm capable of performing facial re-id on a dataset containing multiple images of multiple people. The dataset has a number of heterogeneous image conditions and environments, including varying lighting, posture, occlusion and blurriness. This challenge aims to develop a facial re-id algorithm that can identify various people across images, reporting when it sees a person that it has been trained to recognize. The model should also be capable of localizing the position of the face in the context of the full scene.
Your task will be to detect bounding boxes containing faces from images captured from video frames. The bounding boxes and the person identifiers you detect will be compared to ground truth, the quality of your solution will be measured using the mean average precision metric.
Input files
This contest uses a data set that was prepared for a recent similar contest, see this page for details: [Opensetface]. You will need to agree to the Data Usage Terms on [Opensetface] before downloading the files.
Most of the information described at [Opensetface] is valid for the current contest as well, but be mindful about some important differences:
-
[Opensetface] describes two tasks: face detection and face recognition. This current contest concerns only the face recognition task. However, there is a separate contest about face detection as well, running in parallel to this contest, using the same input data.
-
The scoring metric is different, see below at Scoring for details.
-
The format of the required output is different, see below at Output file and Submission format for details.
Important notes:
-
Use only the images found in the Training folder of the [Opensetface] data access page. This contest uses different images for provisional and final testing than those listed in the Testing and Validation folders of [Opensetface]. Using Opensetface's testing and validation data for training is strictly prohibited. Such usage will be checked in final testing.
-
[Opensetface] gives two versions of annotations for the training data. In this contest version 2 is used (see protocol_v2.zip on the data access page).
Provisional test data
Provisional test data can be downloaded from this AWS S3 bucket:
Bucket name: neptune-face-detect
Access key ID: AKIAJNVC7PEDQRPLM6AQ
Secret access key: 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
(If you are unfamiliar with AWS S3 technology then this document can help you get started. Note that the document was prepared for a previous image processing contest, but most of its content - like how to set up an AWS account or what are the tools you can use - is relevant in this challenge as well.)
Output file
The detections your algorithm creates must be listed in a single CSV file. This file should contain all the detections your algorithm creates using all image files of the test set, that is all 2600 images found in the AWS bucket referenced above. The file must be named solution.csv and have the following format:
ImageId,SubjectId,FACE_X,FACE_Y,W,H,Confidence
Your solution file may or may not include the above header line. The rest of the lines should specify the bounding boxes your algorithm extracted, one per line.
Sample lines:
image001.jpg,699,1270.5,1495.5,274,353,0.9
image001.jpg,1234,1749.5,15.5,235,265,0.4
image123.jpg,1234,3837.5,153.5,234,264,0.75
The required fields are:
-
ImageId is a string that uniquely identifies the image, corresponds to the file name of the image, including the .jpg extension.
-
SubjectId is the unique integer identifier of known persons present in the training data. Note that the training annotations contain -1 for unknown people. Your output must not contain such lines, you should report only faces of people you recognize.
-
FACE_X, FACE_Y, W, H describe a detected bounding box by giving the x and y coordinate of its top left corner, its width and height, respectively. All values are in pixels (real values allowed), x is measured from left to right, y is measured from top to bottom.
-
Confidence is a real number in the [0,...,1] range, higher numbers mean you are more confident that this face is indeed present and pictures the given person. See the details of scoring for how this value is used.
Your output file must not contain more than 60 bounding boxes for the same ImageId.
Submission format
This match uses a combination of the "submit data" and "submit code" submission styles. Your submission must be a single ZIP file not larger than 500 MB, with the following content:
/solution
solution.csv
/code
Dockerfile
<your code>
, where
-
/solution/solution.csv is the output your algorithm generates on the provisional test set. The format of this file is described above in the Output file section.
-
/code contains a dockerized version of your system that will be used to reproduce your results in a well defined, standardized way. This folder must contain a Dockerfile that will be used to build a docker container that will host your system during final testing. How you organize the rest of the contents of the /code folder is up to you, as long as it satisfies the requirements listed below in the Final testing section.
Notes:
-
During provisional testing only your solution.csv file will be used for scoring, however the tester tool will verify that your submission file confirms to the required format. This means that at least the /code/Dockerfile must be present from day 1, even if it doesn't describe any meaningful system to be built. However, we recommend that you keep working on the dockerized version of your code as the challenge progresses, especially if you are at or close to a prize winning rank on the provisional leader board.
-
You must not submit more often than once every 4 hours. The submission platform does not enforce this limitation, it is your responsibility to be compliant to this limitation. Not observing this rule may lead to disqualification.
-
Make sure that your submission package is smaller than 500 MB. This means that if you use large files (external libraries, data files, pretained model files, etc) that won't fit into this limit, then your docker build process must download these from the net during building. There are several ways to achieve this, e.g. external libraries may be installed from a git repository, data files may be downloaded using wget or curl from Dropbox or Google Drive or any other public file hosting service. In any case always make sure that your build process is carefully tested end to end before you submit your package for final testing.
-
During final testing your last submission file will be used to build your docker container.
-
Make sure that the contents of the /solution and /code folders are in sync, i.e. your solution.csv file contains the exact output of the current version of your code.
-
To speed up the final testing process the contest admins may decide not to build and run the dockerized version of each contestant's submission. It is guaranteed however that if there are N main prizes then at least the top 2*N ranked submissions (based on the provisional leader board at the end of the submission phase) will be final tested.
Scoring
During scoring your solution.csv file (as contained in your submission file during provisional testing, or generated by your docker container during final testing) will be matched against expected ground truth data using the following algorithm.
If your solution is invalid (e.g. if the tester tool can't successfully parse its content, or if it contains an unknown image ID), you will receive a score of 0.
Otherwise calculate the MAP (mean average precision) value by averaging the AP (average precision) score over several IOU threshold values. The process is explained here and here, also sample code is available on the first referred link. For calculating AP using a given threshold:
-
Sort the detected bounding boxes (BBs) in decreasing order of confidence. (In case of identical confidence values the order in which they appear in the solution file will be preserved.)
-
For each solution BB find the best matching one from the set of the ground truth BBs. Loop over the truth BBs, and:
-
Skip this BB if it was already matched with another solution BB.
-
Skip this BB if it belongs to a different ImageId than the solution BB.
-
Skip this BB if it belongs to a different SubjectId than the solution BB.
-
Otherwise calculate the IOU (Intersection over Union, Jaccard index) of the two BBs.
-
Note the truth BB which has the highest IOU score if this score is higher than the current threshold. Call this the ‘matching’ BB.
-
-
Now that we know whether for each solution BB we have a matching BB or not, we can calculate the AP score as described on the above referenced pages. We use the version of calculation in which AP is computed as the average of maximum precision at these 11 recall levels: [0.0, 0.1, 0.2, …, 0.9, 1.0]. An offline tester tool is provided that implements scoring. For the exact steps of scoring see its source code.
MAP is calculated as the average of AP scores corresponding to these 10 IOU threshold values: [0.50, 0.55, 0.60, … , 0.95].
Finally your score is calculated as 100 * MAP.
Final testing
This section describes the final testing work flow and the requirements against the /code folder of your submission. You may ignore this section until you decide you start to prepare your system for final testing.
To be able to successfully submit your system for final testing, some familiarity with Docker is required. If you have not used this technology before then you may first check this page and other learning material linked from there. To install docker follow these instructions. In this contest it is very likely that you will work with GPU-accelerated systems, see how to install Nvidia-docker here.
This section will be filled after contest launch, at the time when test data gets available and scoring is opened.
General Notes
-
This match is rated.
-
Use the match forum to ask general questions or report problems, but please do not post comments and questions that reveal information about the problem itself or possible solution techniques.
-
Teaming is not allowed. You must develop your solution on your own. Any communication between members beyond what is allowed by the forum rules is strictly forbidden.
-
In this match you may use any programming language and libraries, including commercial solutions, provided Topcoder is able to run it free of any charge. You may also use open source languages and libraries, with the restrictions listed in the next section below. If your solution requires licenses, you must have these licenses and be able to legally install them in a testing VM (see “Requirements to Win a Prize” section). Submissions will be deleted/destroyed after they are confirmed. Topcoder will not purchase licenses to run your code. Prior to submission, please make absolutely sure your submission can be run by Topcoder free of cost, and with all necessary licenses pre-installed in your solution. Topcoder is not required to contact submitters for additional instructions if the code does not run. If we are unable to run your solution due to license problems, including any requirement to download a license, your submission might be rejected. Be sure to contact us right away if you have concerns about this requirement.
-
You may use open source languages and libraries provided they are equally free for your use, use by another competitor, or use by the client. If your solution includes licensed elements (software, data, programming language, etc) make sure that all such elements are covered by licenses that explicitly allow commercial use.
-
If your solution includes licensed software (e.g. commercial software, open source software, etc), you must include the full license agreements with your submission. Include your licenses in a folder labeled “Licenses”. Within the same folder, include a text file labeled “README” that explains the purpose of each licensed software package as it is used in your solution.
-
External data sets and pre-trained models are allowed for use in the competition provided the following are satisfied:
-
The external data and pre-trained models are unencumbered with legal restrictions that conflict with its use in the competition.
-
The data source or data used to train the pre-trained models is defined in the submission description.
Final prizes
In order to receive a final prize, you must do all the following:
-
Achieve a score in the top five according to final system test results. See the "Final testing" section above.
-
Once the final scores are posted and winners are announced, the prize winner candidates have 7 days to submit a report outlining their final algorithm explaining the logic behind and steps to its approach. You will receive a template that helps creating your final report.
-
If you place in a prize winning rank but fail to do any of the above, then you will not receive a prize, and it will be awarded to the contestant with the next best performance who did all of the above.