Topcoder Challenge | Topcoder Community

Challenge Overview

Prize Distribution

The winners will be the eligible, registered participants that submit the best overall solutions in accordance with the evaluation criteria by the Challenge deadline. The total prize purse for the CircleFinder Challenge is $50,000.00. There will be up to five placed awards based on the accuracy of the model and the model’s ability to:

1st - $20,000

2nd - $10,000

3rd - $8,000

4th - $7,000

5th - $5,000

Note: The prize may be paid from the client directly, which is slightly different from the Topcoder system.

Introduction

The National Geospatial-Intelligence Agency (NGA) is the nation's primary source of geospatial intelligence (GEOINT). The NGA provides GEOINT in support of U.S. national security and defense, as well as disaster relief. GEOINT is the exploitation and analysis of imagery and geospatial information that describes, assesses and visually depicts physical features and geographically referenced activities on the Earth.

Computer vision algorithms currently require large volumes of data to define a single discrete object. While gaining success, it is still difficult for machines to search geographic areas and accurately segment specific shapes within those areas. The NGA seeks to know where all the circles in the world are and how big they are.

NGA is seeking novel approaches to segmentation of satellite imagery to detect, delineate, and describe circular shaped features. These features come in a variety of sizes (from 3m to 300m) and compositions (from vegetation to steel). Examples include agriculture, circular irrigation areas, fuel storage tanks, buildings, traffic circles and fountains. What makes this task a bit more challenging is that circular features might not be perfectly circular - portions might be jagged edges that are otherwise part of a circular or disrupted by cross-cutting objects at greater height.

Objective

Your task is to identify all circles from the provided images as accurate as possible by an AI model.

Input Files

The only input data that your AI model will receive during the test phase will be images in TIFF format. These image files typically have a square-like shape. All these images have their own unique ImageIDs. Each image may have 1 to 1000 individual circles. It just depends on proximity and size of the feature.

Each annotation will have the projection information and ONLY the coordinates that make up the circle. Note that these annotations are all based on polygons. There will not be any other attributes. The projection, also known as CRS, is the same for the associated images.

In summary, there are a few files associated with the same image:

[ImageID]_PAN.tif: This is the panchromatic only bands of the image for that chip. This has the highest resolution (0.3m). It is a geotiff, so it is projected.
[ImageID]_Mul.tif: This is the multispectral bands of the image, Red, Green, Blue, Yellow, NIR,NIR2, RedEdge, Coastal. These bands are usually captured at lower resolution, but capture more spectral information. It is a geotiff, so it is projected.
[ImageID]_metadata.json: This captures the image properties for both Pan and Mul, such as pixel height/width, bits per band, etc. It helps track the slight differences between images, since not all images will have exactly the same properties. This is a JSON file format with following structure
- Pan : {PixelHeight, PixelWidth, OffNadirAngle, BitsPerPixel}
- Coastal : {PixelHeight, PixelWidth, OffNadirAngle, BitsPerPixel}
- Red : {PixelHeight, PixelWidth, OffNadirAngle, BitsPerPixel}
- Blue : {PixelHeight, PixelWidth, OffNadirAngle, BitsPerPixel}
- Green : {PixelHeight, PixelWidth, OffNadirAngle, BitsPerPixel}
- Yellow : {PixelHeight, PixelWidth, OffNadirAngle, BitsPerPixel}
- NIR : {PixelHeight, PixelWidth, OffNadirAngle, BitsPerPixel}
- NIR2 : {PixelHeight, PixelWidth, OffNadirAngle, BitsPerPixel}
- RedEdge : {PixelHeight, PixelWidth, OffNadirAngle, BitsPerPixel}
[ImageID]_anno.geojson: This is the actual delineation of circular shaped features for that chip. This has the projection information and shape properties. The GEOJSON file looks like following:
- Type: FeatureCollection
- CRS: { "type": "name", "properties": { "name": } }
- Features : [ { "type": "Feature", "properties": { }, "geometry": { "type": "Polygon", "coordinates": [ [ [] ] ]

Training Data

The training data set has 3903 images which can be downloaded from the forum. It contains all the files associated with the chips. You can use this data set to train and test your algorithm locally.

Testing Data (Provisional & Final)

The testing data set, containing all the files associated with the chips, except for the [ImageID]_anno.geojson files, has 3903 images and can be downloaded from the forum. This set has been randomly partitioned into a provisional set with 1951 images and a system set with 1952 images. The partitioning will not be made known to the contestants during the contest. The provisional set is used only for the leaderboard during the contest. During the competition, you will submit your algorithm's results when using the entire testing data set as input. Some of the images in this data set, the provisional images, determine your provisional score, which determines your ranking on the leaderboard during the contest. This score is not used for final scoring and prize distribution. Your final submission's score on the system data set will be used for final scoring. See the "Final Scoring" section for details.

Output File

This contest uses the result submission style. For the duration of the contest, you will run your solution locally using the provided provisional data set images as input and produce a CSV file which contains your results.

Your output file must follow the same geojson format as the provided [ImageID]_Anno.geojson files, following the same naming manner for these images.

Submission Format

This match uses a combination of the "submit data" and "submit code" submission styles. Your submission must be a single ZIP file with the following content:

/solution

[ImageID]_Anno.geojson files

report.pdf

/code

dockerfile

, where

/solution/[ImageID]_anno.geojson files are the outputs your algorithm generates on the entire test set (including both provisional and final test sets). The format of this file is described above in the Output file section.
/solution/report.pdf is a white paper that discusses the approach used creating their solution code. Specific elements and requirements of the white paper are:
- Processing requirements
- Language of code
- Map projection necessary to run code
- Source(s) of unclassified remote sensing imagery
/code contains a dockerized version of your system that will be used to reproduce your results in a well defined, standardized way. This folder must contain a dockerfile that will be used to build a docker container that will host your system during final testing. How you organize the rest of the contents of the /code folder is up to you, as long as it satisfies the requirements listed below in the Final testing section.

Notes:

During provisional testing only your /solution/[ImageID]_anno.geojson files file will be used for scoring, however the tester tool will verify that your submission file confirms to the required format. This means that at least the /code/dockerfile must be present from day 1, even if it doesn't describe any meaningful system to be built. However, we recommend that you keep working on the dockerized version of your code as the challenge progresses, especially if you are at or close to a prize winning rank on the provisional leaderboard.
You must not submit more often than once every 4 hours. The submission platform does not enforce this limitation, it is your responsibility to be compliant to this limitation. Not observing this rule may lead to disqualification.
During final testing your last submission file will be used to build your docker container.
Make sure that the contents of the /solution and /code folders are in sync, i.e. your /solution/[ImageID]_anno.geojson files contain the exact output of the current version of your code.

Evaluation Criteria and Scoring

Final submissions will be first evaluated on a pass/fail basis on the whitepaper submission to ensure the submission is well documented in discussing the approaches used in creating their solution code and sufficiently covering all elements listed in the Submission Format section.

After that, we will consider your /solution/[ImageID]_anno.geojson files (as contained in your submission file during provisional testing, or generated by your docker container during final verification) will be matched against expected ground truth data using the following algorithm.

If your solution is invalid (e.g. if the tester tool can't successfully parse its content, or if it contains an unknown filename), you will receive a score of 0.

Provisional submissions should include predictions for all images in the testing data set. Provisional submissions will be scored against the provisional image set during the contest. Your final provisional system will be scored against the system image set at the end of the contest.

The found circles (i.e., polygons of a circular shape) will be scored against the ground truth in the following way.

First of all, we will check if your polygons are really circular. We use the compactness measurement:

UnitCircle = pi * (PolygonPerimeter / (2 * pi))^2

= PolygonPerimeter^2 / (4 * pi)

compactness = PolygonArea / UnitCircle

The compactness of a polygon must be at least 0.85 to be considered as a circle.

The overlap factor between 2 polygons, A and B, is the area of the intersection of A and B divided by the area of the union of A and B. We use O(A, B) to denote this measure. Specifically, we have

O(A, B) = area(A ∩ B) / area(A ∪ B)

It is obvious that this factor is always between 0 and 1.

For each image, we will iterate through your predicted circles for this image in the order they appear in the geojson file. For each circle, we will try to match it to a groundtruth circle that has the biggest O(A, B) with it. If there is a groundtruth circle with an overlapping ratio over 0.5, we will match it and remove this circle from the groundtruth set for later matching. Otherwise, this predicted circle will not be matched to any groundtruth circle. Since all the circles, no matter if they're big or small, are equally important, we use the F1 score for the evaluation. Suppose there are X circles matched during this process, and there are Y and Z circles in your prediction and the groundtruth, respectively. We then calculate the F1 score as follows:

Precision = X / Y

Recall = X / Z

F1 = Precision * Recall * 2 / (Precision + Recall)

There are a few special cases: (1) when there is no circle in the groundtruth for this image, your F1 score is 1 when your prediction is empty; otherwise, it’s 0; (2) when your prediction is empty and there is some circle in the groundtruth, the score is 0; and (3) if you have too many predictions, i.e., over 2000 circles for one image, your score is 0.

It is obvious that this F1 is between 0 and 1. The total score across all images, avgImageScore, is the average of each individual image’s F1 score.

Final normalized score:

score = 100 * avgImageScore

We will provide the tester tool in the forum. If you identified any issue, please make a reply to that thread.

Final testing

This section describes the final testing work flow and the requirements against the /code folder of your submission. You may ignore this section until you decide you start to prepare your system for final testing.

To be able to successfully submit your system for final testing, some familiarity with Docker is required. If you have not used this technology before then you may first check this page and other learning material linked from there. To install docker follow these instructions.

Contents of the /code folder

The /code folder of your submission must contain:

All your code (training and inference) that are needed to reproduce your results.
A Dockerfile (named dockerfile, without extension) that will be used to build your system.
All data files that are needed during training and inference, with the exception of
- the contest’s own training and testing data. You may assume that the contents of the /training and /testing folders (as described in the Input files section) will be available on the machine where your docker container runs, zip files already unpacked,
- large data files that can be downloaded automatically either during building or running your docker script.
Your trained model file(s). Alternatively your build process may download your model files from the network. Either way, you must make it possible to run inference without having to execute training first.

The tester tool will unpack your submission, and the

docker build -t <id> .

command will be used to build your docker image (the final ‘.’ is significant), where <id> is your TopCoder handle.

The build process must run out of the box, i.e. it should download and install all necessary 3rd party dependencies, either download from Internet or copy from the unpacked submission all necessary external data files, your model files, etc.

Your container will be started by the

docker run -v <local_data_path>:/data:ro -v <local_writable_area_path>:/wdata -it <id>

command (single line), where the -v parameter mounts the contest’s data to the container’s /data folder. This means that all the raw contest data will be available for your container within the /data folder. Note that your container will have read only access to the /data folder. You can store large temporary files in the /wdata folder.

Training and test scripts

Your container must contain a train and test (a.k.a. inference) script having the following specification:

train.sh <data-folder> should create any data files that your algorithm needs for running test.sh later. The supplied <data-folder> parameters point to a folder having training image and annotation data in the same structure as is available for you during the coding phase. The allowed time limit for the train.sh script is 2 days. You may assume that the data folder path will be under /data.
As its first step train.sh must delete the your home made models shipped with your submission.
Some algorithms may not need any training at all. It is a valid option to leave train.sh empty, but the file must exist nevertheless.
Training should be possible to do with working with only provided data and publicly available external data. This means that this script should do all the preprocessing and training steps that are necessary to reproduce your complete training workflow.
A sample call to your training script (single line):
./train.sh /data/training/
In this case you can assume that the training data looks like this:
data/
train/

ImageID1/

ImageID1_PAN.tif

ImageID1_MUL.tif

ImageID1_metadata.geojson

ImageID1_anno.geojson

ImageID2/

...

test.sh <data-folder> <output_path> should run your inference code using new, unlabeled data and should generate an output CSV file, as specified by the problem statement. The allowed time limit for the test.sh script is 12 hours. The testing data folder contains similar data in the same structure as is available for you during the coding phase. The final testing data will be similar in size and in content to the provisional testing data. You may assume that the data folder path will be under /data.
Inference should be possible to do without running training first, i.e. using only your prebuilt model files.
It should be possible to execute your inference script multiple times on the same input data or on different input data. You must make sure that these executions don't interfere, each execution leaves your system in a state in which further executions are possible.
A sample call to your testing script (single line):
./test.sh /data/test/ /outout_folder
In this case you can assume that the testing data looks like this:
data/
test/

ImageID1/

ImageID1_PAN.tif

ImageID1_MUL.tif

ImageID1_metadata.geojson

ImageID2/

...

Code requirements

Your training and inference scripts must output progress information. This may be as detailed as you wish but at the minimum it should contain the number of epochs processed so far.
Your testing code must process the test and validation data the same way, that is it must not contain any conditional logic based on whether it works on images that you have already downloaded or on unseen images.

Verification workflow

First test.sh is run on the provisional test set to verify that the results of your latest online submission can be reproduced. This test run uses your home built models.
Then test.sh is run on the final validation dataset, again using your home built models. Your final score is the one that your system achieves in this step.
Next train.sh is run on the full training dataset to verify that your training process is reproducible. After the training process finishes, further execution of the test script must use the models generated in this step.
Finally test.sh is run on the final validation dataset (or on a subset of that), using the models generated in the previous step, to verify that the results achieved in step #2 above can be reproduced.

A note on reproducibility: we are aware that it is not always possible to reproduce the exact same results. E.g., if you do online training then the difference in the training environments may result in different numbers of iterations, meaning different models. Also you may have no control over random number generation in certain 3rd party libraries. In any case, the results must be statistically similar, and in case of differences you must have a convincing explanation why the same result can not be reproduced.

Hardware specification

Your docker image will be built and run on a Linux AWS instance, having this configuration:

m4.2xlarge

Please see here for the details of this instance type.

General Notes

This match is rated.
Use the match forum to ask general questions or report problems, but please do not post comments and questions that reveal information about the problem itself or possible solution techniques.
Teaming is not allowed. You must develop your solution on your own. Any communication between members beyond what is allowed by the forum rules is strictly forbidden.
In this match you may use any programming language and libraries, including commercial solutions, provided Topcoder is able to run it free of any charge. You may also use open source languages and libraries, with the restrictions listed in the next section below. If your solution requires licenses, you must have these licenses and be able to legally install them in a testing VM. Submissions will be deleted/destroyed after they are confirmed. Topcoder will not purchase licenses to run your code. Prior to submission, please make absolutely sure your submission can be run by Topcoder free of cost, and with all necessary licenses pre-installed in your solution. Topcoder is not required to contact submitters for additional instructions if the code does not run. If we are unable to run your solution due to license problems, including any requirement to download a license, your submission might be rejected. Be sure to contact us right away if you have concerns about this requirement.
You may use open source languages and libraries provided they are equally free for your use, use by another competitor, or use by the client.
If your solution includes licensed software (e.g. commercial software, open source software, etc), you must include the full license agreements with your submission.

All software must be available for commercial use. Include your licenses in a folder labeled “Licenses”. Within the same folder, include a text file labeled “README” that explains the purpose of each licensed software package as it is used in your solution.

External data sets and pre-trained models are allowed for use in the competition provided the following are satisfied:

The external data and pre-trained models are unencumbered with legal restrictions that conflict with its use in the competition.
The data source or data used to train the pre-trained models is defined in the submission description.
Same as the software licenses, data must be unrestricted for commercial use.

Final prizes

In order to receive a final prize, you must do all the following:

Achieve a score in the top five according to final system test results. See the "Final testing" section above.
Once the final scores are posted and winners are announced, the prize winner candidates have 7 days to submit a report outlining their final algorithm explaining the logic behind and steps to its approach. You will receive a template that helps creating your final report.
If you place in a prize winning rank but fail to do any of the above, then you will not receive a prize, and it will be awarded to the contestant with the next best performance who did all of the above.

Eligibility

This Challenge is authorized under Title 10 of United States Code § 2374a, which authorizes the Secretary of Defense to award prizes in recognition of outstanding achievements in basic, advanced, and applied research, technology development, and prototype development that have the potential for application to the performance of military missions of the Department of Defense.

Entries may only be submitted by a registered participant. To be eligible under this Challenge, an individual or entity:

Shall have complied with all the requirements under this section.
May not be a Federal entity.
May not be a Federal employee acting within the scope of their employment. Federal employees may not pursue an application while in the Federal workplace or while on duty.
Employees of NGA and NGA support contractors directly supporting the development or execution of this challenge, including their spouses and dependents for tax year 2020, are not eligible to participate.
Federal grantees may not use Federal funds to develop challenge applications unless consistent with the purpose of their grant award.
Federal contractors may not use Federal funds from a contract to develop challenge applications or to fund efforts in support of a challenge submission.
May not be a judge of the Challenge, or any other party involved in the design, production, execution, or distribution of the Challenge or the immediate family of such a party (i.e. spouse, parent, step-parent, child, or step-child).

By participating in this Challenge:

Participants agree to indemnify the Federal Government against third party claims for damages arising from or related to Challenge activities.
Participants agree to assume any and all risks and waive claims against the Federal Government and its related entities, except in the case of willful misconduct, for any injury, death, damage, or loss of property, revenue, or profits, whether direct, indirect, or consequential, arising from participation in this prize contest, whether the injury, death, damage, or loss arises through negligence or otherwise

Additional participation rules

Participation in this Challenge is open to individuals and entities. Entries may only be submitted by a registered participant.

The rules apply to all participants in this NGA Challenge and may be changed without prior notice. Participants should monitor the Challenge website for the latest information.
Registration information collected by CCC/TopCoder will be used solely for the purpose of administering the event. Registration information will not be distributed to any parties outside of TopCoder, CCC, and NGA nor released for any other purpose except as noted in this document.
Individual participants’ display name may be listed on the Challenge website to enable the event to be tracked by interested members of the public. The name and photographs of the winner may be posted on the NGA website and released to the media.
NGA may contact registered participants to discuss the means and methods used in solving the Challenge.
NGA may compute and release to the public aggregate data and statistics from the submitted solutions. Names and select information about competition winners may be publicly displayed by NGA for announcement, promotional, and informational purposes.
Nothing in these rules, to include information on the Challenge website and communications by NGA officials, may be interpreted as authorizing the incurrence of any costs or modifying the statement of work or authorizing work outside the terms and conditions of any existing agreements or contracts with NGA.
A submission may be disqualified if, in NGA’s sole judgment:
- Fails to function as described,
- The detailed description is significantly inaccurate or incomplete,
- Malware or other security threats are present.
NGA Reserves the right to disqualify a participant whose actions are deemed to violate the spirit of the competition for any reason, including but not limited to: abusive, threatening, or violent behavior; attempts to reverse engineer or otherwise misappropriate the submission of another participant; or violation of laws or regulations in the course of participating in the challenge. NGA does not authorize or consent to a participant infringing on any US patent or copyright while participating in the Challenge.
NGA Reserves the right, in its sole discretion to (a) cancel, suspend or modify the Challenge without notice, and/or (b) modify the number and dollar amount of prizes, based on the number and quality of submissions, including not awarding any prize if no entries are deemed worthy.
The agency’s award decision is final.
Each individual (whether competing singly or in a group) or entity agrees to follow applicable local, State, and Federal laws and regulations.

Approved for Public Release #20-799

Circle Finder Marathon Challenge