Topcoder Challenge | Topcoder Community

Challenge Overview

Problem Statement

Prize Distribution

              Prize             USD
  1st                         $25,000
  2nd                         $16,000
  3rd                         $12,000
  4th                          $8,000
  5th                          $5,000
  Progress prizes*         3 * $3,000
  Special prizes*
    Best POI Category          $5,000
    Best undergraduate         $5,000
    Open Source Incentives 3 * $5,000
Total Prizes                 $100,000

*see the 'Award Details and Requirements to Win a Prize' section for details

Background and motivation

Intelligence analysts, policy makers, and first responders around the world rely on geospatial land use data to inform crucial decisions about global defense and humanitarian activities. Historically, analysts have manually identified and classified geospatial information by comparing and analyzing satellite images, but that process is time consuming and insufficient to support disaster response.

The functional Map of the World (fMoW) Challenge seeks to foster breakthroughs in the automated analysis of overhead imagery by harnessing the collective power of the global data science and machine learning communities. The Challenge publishes one of the largest publicly available satellite-image datasets to date, with more than one million points of interest from around the world. The dataset contains satellite-specific metadata that researchers can exploit to build a competitive algorithm that classifies facility, building, and land use. See more background information about the challenge here.

Objective

Your task will be to classify the objects present in satellite images. The classification labels your algorithm returns will be compared to ground truth data, the quality of your solution will be judged by the combination of precision and recall, see Scoring for details.

Input Files

Satellite images

Satellite images are available in a variety of formats:

<image_id>_<T>_ms.tif is a 8-band multispectral TIFF image, where
- <image_id> is the unique identifier of the scene,
- <T> is an integer, representing time. Different T values for the same image_id mean snapshots of the same scene made at different points in time. (T values are not related to absolute time, their order may not correspond to temporal order.)
<image_id>_<T>_rgb.tif is 3-band pan-sharpened version of the above, in TIFF format.
<image_id>_<T>_msrgb.jpg corresponds to format #1 above, converted to a 3-band, JPEG-compressed RGB image.
<image_id>_<T>_rgb.jpg corresponds to format #2 above, converted to a 3-band, JPEG-compressed RGB image.

You may choose any of the above formats (or more of them) to work with, the scene content is the same but the number of spectral bands, the image resolution and the level of image compression are different.

Image metadata and ground truth bounding boxes

Metadata on each of the image files is available in JSON files, having the same file name but the .tif or .jpg extension is replaced by .json. The most important pieces of metadata are the following:

gsd : ground sample distance, the physical size of one image pixel, in meters.
utm, country code : the approximate geolocation of the object.
timestamp : the time when the image was taken, in UTC.
bounding_boxes : defines the category label, ID and location (in image space) of rectangles that you must use as annotated training data and as target for your predictions. The 'box' field within a bounding_box object contains 4 integers which define the x and y coordinates of the top left corner of the rectangle, and the width and height of the rectangle, in this order.

Differences between training and testing data

The file structure and naming conventions of training and testing data are different. Also the testing data has been altered in several ways to remove ground truth information and increase the difficulty of the challenge.

Training images contain only one bounding box. Testing images may contain more than one.
The <image_id> of a training image contain the category label assigned to the bounding box present in the image. The <image_id> of a testing image is a random numeric value.
The training dataset is organized into folders corresponding to category labels. The testing image file structure is one level shallower.
Category labels have been removed from the testing metadata.
In the testing dataset a small amount of uniform noise has been added to several metadata parameters to reduce their effective precision. The minutes and seconds fields of timestamps are set to random values in the range [0 - 59].
Additional bounding boxes have been added to several testing images. These bounding boxes include content that does not fall into any category in the challenge. If the bounding boxes were generated by some object proposal algorithm then these additional boxes would represent false detections.
Images and bounding boxes within images have been randomized and assigned numeric image_ids.

A note on training and validation data. The training dataset contains data in two folders: train and val. The content of these two folders are similar, they were created by randomly assigning the whole training dataset into two subsets. You can use both subsets as training data.

Downloads

Input files are available for download from the fmow-full and fmow-rgb AWS buckets as well as the corresponding fmow-full and fmow-rgb BitTorrent files. The fmow-full dataset contains the TIFF images, the fmow-rgb dataset contains the compressed JPEG images. Both datasets contain the accompanying image metadata files. A separate guide is available that details the process of obtaining the data. Note that the dataset in the fmow-full bucket is huge (~3.5 TB).

The following torrent files are available:

fMoW-rgb_trainval_v1.0.0.torrent : training data, RGB format
fMoW-rgb_test_v1.0.0.torrent : testing data, RGB format
fMoW-full_trainval_v1.0.0.torrent : training data, TIFF format
fMoW-full_test_v1.0.0.torrent : testing data, TIFF format
fMoW-rgb_val_sample_v1.1.0.torrent : 6.6 GB sample of training data, RGB format (ADDED AFTER CONTEST LAUNCH)
fMoW-full_val_sample_v1.1.0.torrent : 125 GB sample of training data, TIFF format (ADDED AFTER CONTEST LAUNCH)

Meta data archives that contain sample false_detection bounding boxes for the val subset of the training data:

fMoW-rgb_val_false_detection_metadata.tar.bz2 : for the RGB data set (ADDED AFTER CONTEST LAUNCH)
fMoW-full_val_false_detection_metadata.tar.bz2 : for the FULL data set (ADDED AFTER CONTEST LAUNCH)

Output Files

Your output must be a text file that describes the object classifications your algorithm makes for all of the images in a test set. You must make a prediction for each bounding box specified in the metadata files of the test set. The file should contain lines formatted like:

<bounding_box_id>,<category>

where

<bounding_box_id> is the unique identifier of an object bounding box as defined in the Input Files section above. This corresponds to the 'ID' field of an element of the 'bounding_boxes' array found in the .json metadata files. (Angle brackets are for clarity only, they should not be present in the file.)
<category> is either
- one of the category labels included in this challenge. This corresponds to the 'category' field of an element of the 'bounding_boxes' array found in the .json metadata files of training images. (You can find the full list of categories by looking at the manifest file of the AWS buckets, and also in the source code of the visualizer tool.). Or,
- 'false_detection'. This corresponds to the bounding boxes added to the testing images to represent false detections.

Some sample lines:

  3456,crop_field
  1234,false_detection

Your output must be a single file with .txt extension. Optionally the file may be zipped, in which case it must have .zip extension.

Your output must only contain algorithmically generated classifications. It is strictly forbidden to include manually created predictions, or classifications that - although initially machine generated - are modified in any way by a human.

Functions

This match uses the result submission style, i.e. you will run your solution locally using the provided files as input, and produce a TXT or ZIP file that contains your answer.

In order for your solution to be evaluated by Topcoder's marathon system, you must implement a class named FunctionalMap, which implements a single function: getAnswerURL(). Your function will return a String corresponding to the URL of your submission file. You may upload your files to a cloud hosting service such as Dropbox or Google Drive, which can provide a direct link to the file.

To create a direct sharing link in Dropbox, right click on the uploaded file and select share. You should be able to copy a link to this specific file which ends with the tag "?dl=0". This URL will point directly to your file if you change this tag to "?dl=1". You can then use this link in your getAnswerURL() function.

If you use Google Drive to share the link, then please use the following format: "https://drive.google.com/uc?export=download&id=" + id

Note that Google has a file size limit of 25MB and can't provide direct links to files larger than this. (For larger files the link opens a warning message saying that automatic virus checking of the file is not done.)

You can use any other way to share your result file, but make sure the link you provide opens the filestream directly, and is available for anyone with the link (not only the file owner), to allow the automated tester to download and evaluate it.

An example of the code you have to submit, using Java:

public class FunctionalMap  {
  public String getAnswerURL() {
    //Replace the returned String with your submission file's URL
    return "https://drive.google.com/uc?export=download&id=XYZ";
  }
}

Keep in mind that your complete code that generates these results will be verified at the end of the contest if you achieve a score in the top 5, as described later in the "Requirements to Win a Prize" section, i.e. participants will be required to provide fully automated executable software to allow for independent verification of the performance of your algorithm and the quality of the output data.

Scoring

A full submission will be processed by the Topcoder Marathon test system, which will download, validate and evaluate your submission file.

Any malformed or inaccessible file, or one that contains an invalid category label, or one that does not contain a prediction for each bounding box that belong to the test set will receive a zero score.

First an F-score is calculated for each object category present in the test set. We define true positive (TP), false positive (FP) and false negative (FN) counts as follows:

For each bounding box in the test set let E be its expected (i.e. ground truth) category and G be your guessed (i.e. predicted) category. If E equals G then the TP counter for E is incremented by one, otherwise the FN counter for E is incremented by one and also the FP counter for G is incremented by one.

Then for each category let

F-score = 0 if TP = 0, otherwise
F-score = 2 * precision * recall / (precision + recall), where precision = TP / (TP + FP) and recall = TP / (TP + FN)

Finally, your score will be the weighted-average of category F-scores calculated as above, multiplied by 1,000,000. The weights for each category are as follows:

Ignored (weight = 0.0): false_detection
Low-impact (weight = 0.6): wind_farm, tunnel_opening, solar_farm, nuclear_powerplant, military_facility, crop_field, airport, flooded_road, debris_or_rubble, single-unit_residential
High-impact (weight = 1.4): border_checkpoint, construction_site, educational_institution, factory_or_powerplant, fire_station, police_station, gas_station, smokestack, tower, road_bridge
Medium-impact (weight = 1.0): All remaining categories.

Note that although the F-score you achieve in the 'false_detection' category is ignored in the final score calculation, the false_detection -> C mismatches increase the FP counter of category C. That is if a true label of bounding box is 'false_detection' but you classify it as 'park' then this error will lower the F-score you will achieve in the 'park' category, and thus your overall score as well. Similarly, a C -> false_detection mismatch increases the FN counter of category C.

For the exact algorithm of the scoring see the visualizer source code.

Example submissions can be used to verify that your chosen approach to upload submissions works and also that your implementation of the scoring logic is correct. The tester will verify that the returned String contains a valid URL, its content is accessible, i.e. the tester is able to download the file from the returned URL. If your file is valid, it will be evaluated, and detailed score values will be available in the test results. The example evaluation is based on the following small subset of the training data:

  bounding_box_id     image_id
  ---------------     -------- 
  144                 airport_0
  30912               airport_100
  30175               park_320
  1                   prison_0
  23                  single-unit_residential_0

Though recommended, it is not mandatory to create example submissions. The scores you achieve on example submissions have no effect on your provisional or final ranking.

Note that during the first week of the match online scoring will not be enabled, you may make submissions but a score of 0 will be reported. Meanwhile, you can work locally with the provided images, tools and resources. Starting 21st September submissions will be scored normally.

Final Scoring

The top 10 competitors according to the provisional scores will be invited to the final testing round. The details of the final testing are described in a separate document.

Your solution will be subjected to three tests:

First, your solution will be validated (i.e. we will check if it produces the same output file as your last submission, using the same input files used in this contest). Note that this means that your solution must not be improved further after the provisional submission phase ends. (We are aware that it is not always possible to reproduce the exact same results. E.g., if you do online training then the difference in the training environments may result in different number of iterations, meaning different models. Also you may have no control over random number generation in certain 3rd party libraries. In any case, the results must be statistically similar, and in case of differences you must have a convincing explanation why the same result can not be reproduced.)

Second, your solution will be tested against a new set of images.

Third, the resulting output from the steps above will be validated and scored. The final rankings will be based on this score alone.

Competitors who fail to provide their solution as expected will receive a zero score in this final scoring phase, and will not be eligible to win prizes.

Additional Resources

A visualizer is available here that you can use to test your solution locally. It displays the bounding boxes on top of the RGB version of the satellite images, the expected and predicted category labels. It also calculates precision, recall and F-scores so it serves as an offline tester.
Plenty of relevant papers, reference material and pointers to more information can be downloaded from the functional Map of the World (fMoW) microsite resources section.

General Notes

This match is NOT rated.
Teaming is allowed. Topcoder members are permitted to form teams for this competition. After forming a team, Topcoder members of the same team are permitted to collaborate with other members of their team. To form a team, a Topcoder member may recruit other Topcoder members, and register the team by completing this Topcoder Teaming Form. Each team must declare a Captain. All participants in a team must be registered Topcoder members in good standing. All participants in a team must individually register for this Competition and accept its Terms and Conditions prior to joining the team. Team Captains must apportion prize distribution percentages for each teammate on the Teaming Form. The sum of all prize portions must equal 100%. The minimum permitted size of a team is 1 member, the maximum permitted team size is 5 members. Only team Captains may submit a solution to the Competition. Topcoder members participating in a team will not receive a rating for this Competition. Notwithstanding Topcoder rules and conditions to the contrary, solutions submitted by any Topcoder member who is a member of a team but is not the Captain of the team may be deleted and is ineligible for award. The deadline for forming teams is 11:59pm ET on the 21th day following the date that Registration and Submission opens as shown on the Challenge Details page. Topcoder will prepare a Teaming Agreement for each team that has completed the Topcoder Teaming Form, and distribute it to each member of the team. Teaming Agreements must be electronically signed by each team member to be considered valid. All Teaming Agreements are void, unless electronically signed by all team members by 11:59pm ET of the 28th day following the date that Registration & Submission opens as shown on the Challenge Details page. Any Teaming Agreement received after this period is void. Teaming Agreements may not be changed in any way after signature. The registered teams will be listed in the contest forum thread titled "Registered Teams".
Organizations such as companies may compete as one competitor if they are registered as a team and follow all Topcoder rules.
Relinquish - Topcoder is allowing registered competitors or teams to "relinquish". Relinquishing means the member will compete, and we will score their solution, but they will not be eligible for a prize. Once a person or team relinquishes, we post their name to a forum thread labeled "Relinquished Competitors". Relinquishers must submit their implementation code and methods to maintain leaderboard status.
In this match you may use any programming language and libraries, including commercial solutions, provided Topcoder is able to run it free of any charge. You may use open source languages and libraries provided they are equally free for your use, use by another competitor, or use by the client. If your solution requires licenses, you must have these licenses and be able to legally install them in a testing VM (see "Requirements to Win a Prize" section). Submissions will be deleted/destroyed after they are confirmed. Topcoder will not purchase licenses to run your code. Prior to submission, please make absolutely sure your submission can be run by Topcoder free of cost, and with all necessary licenses pre-installed in your solution. Topcoder is not required to contact submitters for additional instructions if the code does not run. If we are unable to run your solution due to license problems, including any requirement to download a license, your submission might be rejected. Be sure to contact us right away if you have concerns about this requirement.
If your solution includes licensed software (e.g. commercial software, open source software, etc), you must include the full license agreements with your submission. Include your licenses in a folder labeled "Licenses". Within the same folder, include a text file labeled README.txt that explains the purpose of each licensed software package as it is used in your solution.
External data sets and pre-trained models are allowed for use in the competition provided the following are satisfied:
- The external data and pre-trained models are unencumbered with legal restrictions that conflict with its use in the competition.
- The data source or data used to train the pre-trained models is defined in the submission description.
Use the match forum to ask general questions or report problems, but please do not post comments and questions that reveal information about possible solution techniques.

Award Details and Requirements to Win a Prize

Progress prizes

To encourage early participation bonus prizes will be awarded to contestants who reach a certain threshold at 3 check points during the competition. The threshold for the first such prize is 600,000. Thresholds for the 2nd and 3rd such prizes will be announced later in the contest forums.

Any competitor whose provisional score is above the threshold will get a portion of the prize fund ($3000 for each month) evenly dispersed between the others who also hit the threshold. To determine these prizes a snapshot of the leaderboard will be taken on the following days: October 10, November 5, December 2.

Best POI Category

Performance in a single category or subset of no more than 10 categories (to be identified at a later date). The highest scoring eligible submission, calculated using unweighted average F1 if there is more than one category, will be used to award the $5,000 prize. The category or categories will be identified so as to encourage solutions capable of labeling difficult categories that many other contestants underperform on.

Best undergraduate

The highest scoring (after provisional scoring) undergraduate university student who did not win one of the main prizes is awarded $5,000.

Open Source Incentives

The top 3 contestants (after final scoring) will be given the option of winning an additional $5,000 by open sourcing their solution and publishing it on GitHub.

Final prizes

In order to receive a final prize, you must do all the following:

Achieve a score in the top 5 according to final test results. See the "Final scoring" section above.

Once the final scores are posted and winners are announced, the prize winner candidates have 7 days to submit a report outlining their final algorithm explaining the logic behind and steps to its approach. You will receive a template that helps creating your final report.

If you place in a prize winning rank but fail to do any of the above, then you will not receive a prize, and it will be awarded to the contestant with the next best performance who did all of the above.

Additional Eligibility

Johns Hopkins University, Booz Allen, and Digital Globe affiliates will be allowed to participate in this challenge, but will need to forego the monetary prizes. Winners will still be publicly recognized by IARPA in the final winner announcements based on their performance. Throughout the challenge, Topcoder��s online leaderboard will display your rankings and accomplishments, giving you various opportunities to have your work viewed and appreciated by stakeholders from industry, government and academic communities.

Definition

Class:	FunctionalMap
Method:	getAnswerURL
Parameters:
Returns:	String
Method signature:	String getAnswerURL()
(be sure your method is public)

Examples

"1"

Returns: "Test case 1"

This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2020, TopCoder, Inc. All rights reserved.

Functional Map of the World - Functional Map of the World