Challenge Overview
Objective
A spacesuit has unique movement patterns that can be observed during spacewalks or Extravehicular Activities (EVA), and mobility assessments are needed to discern and mitigate suit injury risk. It is very difficult to measure spacesuit motion in uncontrolled environments such as training facilities, and a novel method is needed to quantify spacesuit motions from conventional and readily available video and photographs without requiring or needing motion capture cameras. Once validated for the accuracy and reliability of the posture extractions, the selected system will be deployed to estimate the EVA postures in current and future missions and analog training events. The framework will be applicable to Neutral Buoyancy Lab (NBL) and Active Response Gravity Offload System (ARGOS) testing, which will help to optimize training procedures. Additionally, the winning solution will be tested and validated on video recordings collected during the next-generation spacesuit testing.
NASA is seeking novel solutions to label and identify spacesuit motions from conventional and readily available video and photographs to overcome current system limitations in terms of cost and training feasibility.
The winning computer vision algorithms are expected to have the ability to:
-
Detect spacesuits in a variety of environments and lightning conditions.
-
Correctly discriminate between an “unsuited” person and a spacesuit.
-
Robustly extract suit postures from images partially occluded.
-
Capable of functioning with a single or multiple spacesuits.
In this challenge your task will be
-
to extract polygonal areas that represent spacesuits from photographs,
-
to determine the coordinates (in 2D pixel space) of some predefined joints (like Right Shoulder Joint or Left Knee Joint) from photographs,
-
to determine the coordinates (in 3D metric space) of joints from videos.
The polygons and joint coordinates your algorithm returns will be compared to ground truth data, the quality of your solution will be judged by how much your solution matches the expected results, see Scoring for details.
Input Files
In this task you will work with two types of media: images and video. Image data is available in standard .jpg and .png formats, each training and test image is given in only one of these two formats. Video data is available in two formats: as .mov files and as a set of .jpg images representing the individual frames of the video. Each training and test video is given in both formats, you may choose to work with either or both of them. Although training data will be provided, challenge participants are strongly encouraged to augment the data set with their own labeled data or existing datasets.
Image annotations
Image annotations (spacesuit contours and joint coordinates) are described in a .txt file, one image per line, as follows.
<image-id>,<joint-coordinate-list>,[<spacesuit-shape>]...
where
<image-id> is the case sensitive file name of the image, including the file extension. The angle brackets (here and also elsewhere in this document) are just for readability, they are not present in the annotation text.
<joint-coordinate-list> is a comma separated sequence of x,y,v triplets. X and y are pixel coordinates (x is from left to right, y is from top to bottom), v is visibility:
-
0: not labelled
-
1: labelled, not visible
-
2: labelled, visible
There are always a multiple of 15 x,y,v triplets present, based on how many spacesuits are shown in the image. Note there are images that show no spacesuits at all. The joints belonging to one spacesuit are described in this order:
-
Right Head
-
Left Head
-
Base Head
-
Right Shoulder
-
Right Elbow
-
Right Hand
-
Left Shoulder
-
Left Elbow
-
Left Hand
-
Right Hip
-
Right Knee
-
Right Foot
-
Left Hip
-
Left Knee
-
Left Foot
<spacesuit-shape> describes the area occupied by a spacesuit as the union of polygonal areas:
(<polygon-1>)(<polygon-2>)(...etc)
where <polygon-i> is a comma separated sequence of x,y pixel coordinate pairs. Most spacesuit shapes can be described using a single polygonal area but for some shapes more polygons are needed, either because of occlusion the shape is made up of disjoint areas or because it contains holes. Positive areas are given by listing the polygon's points in clockwise order, negative areas (holes) are given by listing the points in counter-clockwise order.
Examples:
This describes a single rectangular shape:
image1.jpg,100,200,2,120,210,2,...13 joint coordinates omitted for brevity...,[(100,100,200,100,200,200,100,200,100,100)]
A rectangular shape with a hole:
image2.jpg,100,200,2,120,210,2,...13 joint coordinates omitted for brevity...,[(100,100,200,100,200,200,100,200,100,100)(110,110,110,120,120,120,120,110,110,110)]
Two spacesuits, a rectangular and a triagonal:
image3.jpg,100,200,2,120,210,2,...28 joint coordinates omitted for brevity...,[(100,100,200,100,200,200,100,200,100,100)][(300,100,400,100,300,200,300,100)]
No spacesuits
image-empty.png
Noisy images
Besides the original images, the provisional and final test sets contain images with artificially lowered quality. Noise was added to originals using the following algorithm:
-
Applied Gaussian blur using a radius chosen randomly and uniformly between 0 and 8 pixels.
-
Added Gaussian noise (to each pixel and independently to the 3 colour channels), the standard deviation of the amount of change is chosen uniformly and randomly between 0 and 8.
Video annotations
The video annotations are stored in .csv files, one file per video. Excluding two header lines each line in the file describes one video frame:
<frame-id>,x,y,z,x,y,z,...// 11 {x,y,z} triplets omitted for brevity
where <frame-id> is the frame ordinal (1-based integer), followed by 13 x,y,z coordinate triplets that represent the following joint positions in this order:
-
CLAV Clavicle
-
RSJC Right Shoulder Joint Center
-
LSJC Left Shoulder Joint Center
-
REJC Right Elbow Joint Center
-
LEJC Left Elbow Joint Center
-
RWJC Right Wrist Joint Center
-
LWJC Left Wrist Joint Center
-
RHJC Right Hip Joint Center
-
LHJC Left Hip Joint Center
-
RKJC Right Knee Joint Center
-
LKJC Left Knee Joint Center
-
RAJC Right Ankle Joint Center
-
LAJC Left Ankle Joint Center
All coordinates are measured in meters. The z axis always points vertically upwards, but the orientation of the x and y axes are not fixed, they can be (and are) different across videos, but they remain constant for the duration of a video. In the annotations the origin of the coordinate system is tied to the RAJC point on frame #1, this is an arbitrary choice which has no significance in scoring. See the Scoring section for details on how the undefined nature of the coordinate system is handled.
Important notes:
-
There are missing data in the annotations. For some frames the coordinates of certain joints are not known due to data collection limitations. Collection of the 3D position data is independent of the video capture process, so missing data has no correlation to joints being visible or not in the video.
-
For most of the videos the 3D motion capture device recorded one less set of joint positions than the number of frames in the video, data for the last video frame was missing. This is handled by duplicating the last row of annotations in the .csv files.
-
Video frames and annotations are not fully synchronized for some of the videos. The divergence is small in most cases, and should have minimal effect in scoring. The most visible difference is with the training video MKIII-06.
Downloads
The following files are available for download.
-
train.zip (501 MB). The full training data set.
-
test.zip (339 MB). The provisional testing data set. Your submissions must contain spacesuit shape and joint coordinate extractions from this data set.
-
sample-submission.zip. A sample submission package to illustrate the required submission format.
Output File
Your submission should be a single ZIP file with the following content:
/solution
/images
/annotations
solution.txt
/videos
/annotations
<video-id-1>.csv
<video-id-2>.csv
... etc., other csvs
/code
<your code, see details in the Final testing section>
-
The folder structure within the zipped package must be exactly as specified above.
-
solution.txt must contain the extracted suit masks and joint positions from all images of the test set. The format of the file is the same as for the Image annotation files described earlier. See the truth2d.txt file for the training data or the sample submission package for examples.
-
<video-id-i>.csv must contain the extracted joint coordinates for a test video named <video-id-i>.mov. The /solution/videos/annotations folder must contain separate csv files for each video in the test set. The format of these csv files is the same as for the Video annotation files described earlier. See the csv files of the training data or the sample submission package for examples.
Your output must only contain algorithmically generated suit and joint descriptions. It is strictly forbidden to include hand labeled data, or data that - although initially machine generated - is modified in any way by a human.
Submission format and code requirements
This match uses a combination of the "submit data" and "submit code" submission styles. In the online submission phase your output file (generated off line) is compared to ground truth, no code is executed on the evaluation server. In the final testing phase your training and testing process is verified by executing your system.
The required format of the submission package is specified in a submission template document. This current document gives only requirements that are either additional or override the requirements listed in the template.
-
You must not submit more often than 3 times a day. The submission platform does not enforce this limitation, it is your responsibility to be compliant to this limitation. Not observing this rule may lead to disqualification.
-
An exception to the above rule: if your submission scores 0 or -1, then you may make a new submission after a delay of 1 hour.
-
The /solution folder of the submission package must contain the output files, which should be formatted and organized as specified above in the Output file section and must list the extracted suit and joint positions from all images and videos in the test set.
-
Your solution must be able to run inference (testing) without access to online resources.
-
Images and videos must be processed one by one, independently of other images and videos. It is allowed though to process the frames of a video in any order, or use an approach that extracts information from more than one frames at a time. Your solution must work only with the raw pixel data, it is not allowed to use any meta data (like file names, file extensions, explicit media meta data fields, time stamp of file creation, etc.), neither for training, nor for inference.
-
See the General notes section about the allowed open source licenses and 3rd party libraries.
Scoring
During scoring, your output files (as contained in your submission file during provisional testing, or generated by your docker container during final testing) will be matched against the expected ground truth data using the following method.
If your solution is invalid (e.g. if the tester tool can't successfully parse its content, or it violates the size limits), you will receive a score of -1.
If your submission is valid, your score will be calculated as follows:
The score has 3 components, S2, J2 and J3, where S2 describes how well your algorithm detects suits on the 2D images, J2 describes the joint detection accuracy in 2D, and J3 is the joint detection accuracy in 3D (videos). All these components are in the [0...1] range, 1 being the best. The calculation of these 3 components are summarized below, for the exact algorithm of scoring see the visualizer source code, namely the scoreImage() and scoreVideo() methods of the ScoringAlgorithm class and the register() method of the Video class.
S2 is calculated from the F-score of true and predicted spacesuit masks. For each image in the test set the F-score is calculated based on the overlapping and non-overlapping areas of true and predicted masks, then the average of such scores is taken across all images.
J2 is based on the RMSE of true and predicted coordinates, measured in pixel space. For each image the RMSE is calculated by taking the pixel distances of each set of true joints to the closest set of predicted joints. (Here "set" means a sequence of 15 joints listed in the order specified in the Image annotations section.) Distances are measured between joints of the same type (i.e. a true "Right Shoulder" is measured against a predicted "Right Shoulder"). Only the visible true joints are considered. The visibility of the predicted joints is not considered in scoring. If any such joint-to-joint distance is larger than MAX_ERR (100 pixels) then MAX_ERR is used in the calculation. These distances are squared and summed. Then if there are more predicted joint sets than true sets on the image then the sum is multiplied by the ratio of the number of joints in these two sets. Then the RMSE value is calculated by dividing the sum of squares by the number of visible true joints and taking the square root of the result. J2 for an image is the maximum of 0 and 1 - RMSE/MAX_ERR. Finally the average of such J2 values is taken across all images in the test set.
Similarly, J3 is based on RMSE, but this time we use the 3D coordinates measured in meters. The calculation is similar to what is described above for J2, but this time a MAX_ERR value of 1 meter is used. Before calculating RMSE the predicted joint positions are 'registered' to the true positions as follows:
-
Find the averaged position (centre of body) of both the true and predicted skeletons. For this calculation we use only the first 15 frames and only the CLAV, RSJC, LSJC, RHJC, LHJC joints. Move both skeletons so that their averaged position is at {0,0,0}.
-
Find the best possible rotation of the predicted skeleton in the XY plane that minimizes the RMSE over the length of the video.
The overall score is a weighted average of these 3 components:
score = ((w_S2 * S2) + (w_J2 * J2) + (w_J3 * J3)),
where w_S2 = 0.4, w_J2 = 0.4, and w_J3 = 0.2
Finally for display purposes the score is scaled up to the [0...100] range.
Final testing
This section details the final testing workflow, and the requirements against the /code folder of your submission are also specified in the submission template document. This current document gives only requirements or pieces of information that are either additional or override those given in the template. You may ignore this section until you decide you start to prepare your system for final testing.
-
The signature of the train script is as given in the template:
train.sh <data_folder>
The supplied <data_folder> parameter points to a folder having the training data in the same structure as is available for you during the coding phase, zip files already extracted. The supplied <data_folder> is the parent folder of 3 subfolders containing images, videos and video frames. -
The allowed time limit for the train.sh script is 8 GPU-days (2 days on a p3.8xlarge with 4 GPUs). Scripts exceeding this time limit will be truncated.
-
A sample call to your training script (single line) follows. Note that folder names are for example only, you should not assume that the exact same folders will be used in testing. You can assume however that the images/, videos/ and video-names/ folder names won't change.
./train.sh /data/spacesuit/train/
In this sample case the training data looks like this:
/data
/spacesuit
/train
/images
/annotations
truth2d.txt
img1.png
img2.jpg
... etc., other .png and .jpg files
/videos
/annotations
video1.csv
video2.csv
... etc., other .csv files
video1.mov
video2.mov
... etc., other .mov files
/videos-frames
video1-0001.jpg
video1-0002.jpg
... etc., other .jpg frames
-
The signature of the test script:
test.sh <data_folder> <output_folder>
The testing data folder contains similar image and video data as is available for you during the coding phase. -
The allowed time limit for the test.sh script is 12 GPU-hours (3 hours on a p3.8xlarge with 4 GPUs) when executed on the full provisional test set (the same one you used for submissions during the contest). Scripts exceeding this time limit will be truncated.
-
A sample call to your testing script (single line) follows. Again, folder and file names are for example only, you should not assume that the exact same names will be used in testing.
./test.sh /data/spacesuit/test/ /wdata/my_output/
In this sample case the testing data looks like this:
data/
spacesuit/
test/
images/
img1.png
... etc., other .png and .jpg files
videos/
video1.mov
... etc., other .mov files
videos-frames/
video1-0001.jpg
... etc., other .jpg frames
-
To speed up the final testing process the contest admins may decide not to build and run the dockerized version of each contestant's submission. It is guaranteed however that at least the top 10 ranked submissions (based on the provisional leader board at the end of the submission phase) will be final tested.
-
Hardware specification. Your docker image will be built, test.sh and train.sh scripts will be run on a p3.8xlarge Linux AWS instance. Please see here for the details of this instance type.
Additional Resources
-
A visualizer is available here that you can use to test your solution locally. It displays your extracted spacesuit contours and joint positions, the expected ground truth, and the difference of these two. It also calculates scores as defined in the Scoring section so it serves as an offline tester.
General Notes
-
This match is NOT rated.
-
Teaming is allowed. Topcoder members are permitted to form teams for this competition. After forming a team, Topcoder members of the same team are permitted to collaborate with other members of their team. To form a team, a Topcoder member may recruit other Topcoder members, and register the team by completing this Topcoder Teaming Form. Each team must declare a Captain. All participants in a team must be registered Topcoder members in good standing. All participants in a team must individually register for this Competition and accept its Terms and Conditions prior to joining the team. Team Captains must apportion prize distribution percentages for each teammate on the Teaming Form. The sum of all prize portions must equal 100%. The minimum permitted size of a team is 1 member, the maximum permitted team size is 5 members. Only team Captains may submit a solution to the Competition. Topcoder members participating in a team will not receive a rating for this Competition. Notwithstanding Topcoder rules and conditions to the contrary, solutions submitted by any Topcoder member who is a member of a team on this challenge but is not the Captain of the team are not permitted, are ineligible for award, may be deleted, and may be grounds for dismissal of the entire team from the challenge. The deadline for forming teams is 11:59pm ET on the 14th day following the date that Registration & Submission opens as shown on the Challenge Details page. Topcoder will prepare a Teaming Agreement for each team that has completed the Topcoder Teaming Form, and distribute it to each member of the team. Teaming Agreements must be electronically signed by each team member to be considered valid. All Teaming Agreements are void, unless electronically signed by all team members by 11:59pm ET of the 21st day following the date that Registration & Submission opens as shown on the Challenge Details page. Any Teaming Agreement received after this period is void. Teaming Agreements may not be changed in any way after signature.
The registered teams will be listed in the contest forum thread titled “Registered Teams”. -
Organizations such as companies may compete as one competitor if they are registered as a team and follow all Topcoder rules.
-
Relinquish - Topcoder is allowing registered competitors or teams to "relinquish". Relinquishing means the member will compete, and we will score their solution, but they will not be eligible for a prize. Once a person or team relinquishes, we post their name to a forum thread labeled "Relinquished Competitors". Relinquishers must submit their implementation code and methods to maintain leaderboard status.
-
In this match you may use open source languages and libraries, and publicly available data sets, with the restrictions listed in the next sections below. If your solution requires licenses, you must have these licenses and be able to legally install them in a testing VM (see “Requirements to Win a Prize” section). Submissions will be deleted/destroyed after they are confirmed. Topcoder will not purchase licenses to run your code. Prior to submission, please make absolutely sure your submission can be run by Topcoder free of cost, and with all necessary licenses pre-installed in your solution. Topcoder is not required to contact submitters for additional instructions if the code does not run. If we are unable to run your solution due to license problems, including any requirement to download a license, your submission might be rejected. Be sure to contact us right away if you have concerns about this requirement.
-
You may use open source languages and libraries provided they are equally free for your use, use by another competitor, or use by the client.
-
You may use open source languages and libraries provided they are equally free for your use, use by another competitor, or use by the client. If your solution includes licensed elements (software, data, programming language, etc) make sure that all such elements are covered by licenses that explicitly allow commercial use.
-
External data sets and pre-trained networks (pre-built segmentation models, additional imagery, etc) are allowed for use in the competition provided the following are satisfied:
-
The external data and pre-trained network dataset are unencumbered with legal restrictions that conflict with its use in the competition.
-
The data source or data used to train the pre-trained network is defined in the submission description.
-
The external data source must be declared in the competition forum not later than 14 days before the end of the online submission phase to be eligible in a final solution. References and instructions on how to obtain are valid declarations (for instance in the case of license restrictions). If you want to use a certain external data source, post a question in the forum thread titled “Requested Data Sources”. Contest stakeholders will verify the request and if the use of the data source is approved then it will be listed in the forum thread titled “Approved Data Sources”.
-
-
Use the match forum to ask general questions or report problems, but please do not post comments and questions that reveal information about the problem itself or possible solution techniques.
Requirements to Win a Prize
In order to receive a final prize, you must do all the following:
Achieve a score in the top five according to final system test results. See the "Final scoring" section above.Comply with all applicable Topcoder terms and conditions.
Once the final scores are posted and winners are announced, the prize winner candidates have 7 days to submit a report outlining their final algorithm explaining the logic behind and steps to its approach. You will receive a template that helps when creating your final report.If you place in a prize winning rank but fail to do any of the above, then you will not receive a prize, and it will be awarded to the contestant with the next best performance who did all of the above.