Topcoder Challenge | Topcoder Community

Challenge Overview

Problem Statement

Prize Distribution

              Prize             USD
  1st                          $11,000
  2nd                           $9,000
  3rd                           $7,000
  4th                           $6,000
  5th                           $5,000
  Progress prizes*            3 * $500
    Biweekly targets (week 2, 4 and 6)
Total Prizes                   $39,500

*see the 'Award Details and Requirements to Win a Prize' section for details

Background and motivation

High-resolution satellite imagery is changing our understanding of the world around us, as well as the way we as humans interact with our planet. However, the raw images do little more than pique our interest �� unless we can superimpose a layer that actually identifies real objects.

Reliable extraction of building footprints based on satellite imagery is one of the first and most challenging steps in producing accurate 3D models and maps. While automated algorithms continue to improve, significant manual effort is still necessary to ensure geospatial accuracy and acceptable quality. Improved automation is required to enable more rapid response to major world events such as humanitarian and disaster response. 3D height data can help improve automated building footprint detection performance, and capabilities for providing this data on a global scale are now emerging. In this challenge, we ask solvers to use satellite imagery and newly available 3D height data products to improve upon the state of the art for automated building detection.

USSOCOM is seeking an algorithm that provides reliable, automatic extraction of building footprints based solely on orthorectified color satellite imagery and 3D height data. See more background information about the challenge here.

Objective

Your task will be to extract regions that represent building footprints from satellite images. The regions - represented as pixel masks - will be compared to ground truth data, and the quality of your solution will be judged by the combination of precision and recall, see Scoring for details.

Input Files

Satellite images

Satellite imagery is available in equal sized tiles. Each tile has a size of 2048x2048 pixels and a spatial resolution (ground sample distance) of ~0.5 meter. For each tile the following 6 data files are available.

<title_id>_DSM.tif : the digital surface model of the tile. The file contains 1 band of 32-bit signed floating point values that represent height (in meters) referenced to the WGS84 ellipsoid. Unknown data is represented as -32767.0.
<title_id>_DTM.tif : digital terrain model. The format is the same as for DSM.
<title_id>_RGB.tif : orthorectified RGB image. 3 bands of unsigned integers, 8 bits per band [0..255].
<title_id>_GTI.tif : ground truth classification raster, contains instance-level building labels. Each building region is assigned a unique (positive) integer index. The file contains 1 band of 16-bit unsigned integers [0..65535]. Background (no building) is represented by 0.
<title_id>_GTL.tif : class-level ground truth classification raster. Single band, 8 bit integer values [0..255], contains building / non-building / uncertain classification for each pixel, represented by values 6, 2, 65, respectively. See a note on uncertain values below.
<title_id>_GTC.tif : color image of class-level ground truth classification raster. Buildings are orange, uncertain regions are dark gray. This file does not contain more information than what is present in the _GTL files, it is provided only for visualization and convenience.

_GTI, _GTL and _GTC files are provided only for training images.

The <tile_id> is formatted like <site_id>_Tile_<N>, where

<site_id> is a three letter site identifier, one of JAX(for Jacksonville, FL), TAM(for Tampa, FL) or XXX (an undisclosed city used in final testing).
<N> is a three digit integer ID.

Notes

Two different building regions (with different building labels) may touch but are considered two separate buildings.
'Uncertain' labels are present only in the _GTL and _GTC files, the corresponding regions are marked the same way as true buildings in the _GTI files (i.e. they are assigned positive instance labels). These are buildings for which we do not have enough information to confirm the building footprint. These uncertain buildings are not considered as ground truth buildings during score calculation, and you will neither be awarded for predicting them, nor penalized for missing them.
Buildings that are located at the edge of a tile and are also smaller than a threshold are considered uncertain buildings. These buildings are NOT marked as uncertain ones in the _GTL and _GTC files, but they are handled as such during scoring.
Buildings that are located at the boundary of visible and non-visible areas are treated similarly as buildings at tile edges: if their visible part is smaller than a threshold then they are considered uncertain buildings. Again, these buildings are not marked as uncertain ones in the _GTL and _GTC files. Non-visible areas appear as black (0x000000) in the RGB images and contain -32767 in the DSM images. See Scoring for details of this and the previous two items.

Downloads

training.zip : contains 174 training tiles (~4.6 GB)
testing.zip : contains 62 testing tiles (~1.8 GB)
truth-training.txt : the building footprint definitions of the training files in text format (to be used with the visualizer tool)
truth-example.txt : the building footprint definitions corresponding to the example test set in text format (to be used with the visualizer tool)
solution-example.txt : a simple solution for the example test set that scores ~250k

Output Files

Your output must be a text file that contains the pixel masks describing the building instances for each tile of the test set. Note that conceptually this corresponds to the contents of the _GTI files described above, but here we use a more compact representation.

Each tile in the test set is represented by 3 consecutive lines in the file where

the first line contains the <title_id> of the tile,
the second line contains the width and height of the tile, separated by a comma,
the third line contains the run length encoded data corresponding to the building instance level raster that your algorithm generated for the tile.

The required format of the 3rd line is the following:

The line must contain {label,run},... tuples, separated by comma, where

label is a positive integer label you assigned to a building instance, or 0 for background,
run is the number of consecutive pixels that has this label assigned, if we list all pixels of the image from left to right, top to bottom, wrapping around at the right edge of the image.

As an example, consider this mask image (for better readability zeros are replaced by dots) that contains two building instances labeled with 1 and 2:

  ..........
  ......111.
  ..2...111.
  .222..111.
  22222.....
  .222......
  ..2.......

This image should be encoded like this:

tile_id
10,7
0,16,1,3,0,3,2,1,0,3,1,3,0,2,2,3,0,2,1,3,0,1,2,5,0,6,2,3,0,8,2,1,0,7

Here the first pair of numbers in the 3rd line (0, 16) means that the image starts with 16 background pixels (10 in the first line and 6 in the second line).

The images may be listed in any order in your file, but the file must include data for all images in the test set. The building instance labels may be arbitrary positive integers, but all pixels of a given building instance must contain the same label. Background (no building) must be represented by 0.

The pixel mask of each building instance must form a 4-connected region.

Your output must be a single file with .txt extension. Optionally the file may be zipped, in which case it must have .zip extension. The maximum allowed file size is 100MB.

Your output must only contain algorithmically generated footprint masks. It is strictly forbidden to include manually created predictions, or masks that - although initially machine generated - are modified in any way by a human.

Functions

This match uses the result submission style, i.e. you will run your solution locally using the provided files as input, and produce a TXT or ZIP file that contains your answer.

In order for your solution to be evaluated by Topcoder's marathon system, you must implement a class named UrbanMapper3D, which implements a single function: getAnswerURL(). Your function will return a String corresponding to the URL of your submission file. You may upload your files to a cloud hosting service such as Dropbox or Google Drive, which can provide a direct link to the file.

To create a direct sharing link in Dropbox, right click on the uploaded file and select share. You should be able to copy a link to this specific file which ends with the tag "?dl=0". This URL will point directly to your file if you change this tag to "?dl=1". You can then use this link in your getAnswerURL() function.

If you use Google Drive to share the link, then please use the following format: "https://drive.google.com/uc?export=download&id=" + id

Note that Google has a file size limit of 25MB and can't provide direct links to files larger than this. (For larger files the link opens a warning message saying that automatic virus checking of the file is not done.)

You can use any other way to share your result file, but make sure the link you provide opens the filestream directly, and is available for anyone with the link (not only the file owner), to allow the automated tester to download and evaluate it.

An example of the code you have to submit, using Java:

public class UrbanMapper3D  {
  public String getAnswerURL() {
    //Replace the returned String with your submission file's URL
    return "https://drive.google.com/uc?export=download&id=XYZ";
  }
}

Keep in mind that your complete code that generates these results will be verified at the end of the contest if you achieve a score in the top 10, as described later in the "Requirements to Win a Prize" section, i.e. participants will be required to provide fully automated executable software to allow for independent verification of the performance of your algorithm and the quality of the output data.

Scoring

A full submission will be processed by the Topcoder Marathon test system, which will download, validate and evaluate your submission file.

Any malformed or inaccessible file, or one that is larger than 100MB, or one that does not contain predicted building footprints for each tile in the test set will receive a zero score.

First an F-score is calculated for each tile that belongs to the test set. The following procedure is repeated for each tile:

Find the uncertain truth buildings present on the tile: these are those that are either
- explicitly marked as such in the _GTL files, or
- smaller than 100 pixels in size and are located at the edge of the tile. A building is at the edge if it has at least one pixel having an x or y coordinate equal to 0 or 2047. Or
- smaller than 100 pixels in size and are located at the boundary of visible and non-visible regions of the image. A building is at the boundary if it has at least one pixel that has -32767 in the DSM image. Only the visible pixels count when we calculate the building size.
For each building instance that your algorithm found in this tile, check whether its shape is a connected region. Skip this building if it is not connected, otherwise find the best matching one from the set of the ground truth building instances. Loop over the truth building instances (including uncertain ones), and:
- Skip this truth building if it was already matched with another solution building.
- Otherwise calculate the IOU (Intersection over Union, Jaccard index) of the two buildings. Only the visible pixels of the truth buildings are considered in this calculation.
- Note the truth building which has the highest IOU score. (In the unlikely case of more than one truth building having the same highest IOU score, take the one with the smallest building label.) If this score is higher than 0.45 then call this the ��matching�� building.
If there is a matching building found above, and it is not an uncertain one, increase the count of true positives by one (TP).
If there is no matching building found, increase the count of false positives by one (FP).
When all solution buildings are processed then for each truth building that is left unmatched and is not an uncertain one, increase the count of false negatives by one (FN).

The precision and recall of your algorithm are defined as

Precision = TP / (TP + FP)
Recall = TP / (TP + FN)

The tile-level F-score of your algorithm is defined as 0 if either Precision or Recall is 0, Otherwise:

F_score = 2 * Precision * Recall / (Precision + Recall)

Finally, your score will be the average of tile-level F-scores calculated as above, multiplied by 1,000,000.

For the exact algorithm of the scoring see the visualizer source code.

Example submissions can be used to verify that your chosen approach to upload submissions works and also that your implementation of the scoring logic is correct. The tester will verify that the returned String contains a valid URL, its content is accessible, i.e. the tester is able to download the file from the returned URL. If your file is valid, it will be evaluated, and detailed score values will be available in the test results. The example evaluation is based on the following small subset of the training data:

  JAX_Tile_004
  JAX_Tile_005
  TAM_Tile_000
  TAM_Tile_001

Though recommended, it is not mandatory to create example submissions. The scores you achieve on example submissions have no effect on your provisional or final ranking.

Final Scoring

The top 10 competitors according to the provisional scores will be invited to a two phased final testing round. Participation is optional but necessary for receiving prizes.

Phase 1. Code review

Within 2 days from the end of submission phase you must package the source codes you used to generate your latest submission and send it to walrus71@copilots.topcoder.com and tim@copilots.topcoder.com so that we can verify that your submission was generated algorithmically. We won't try to run your code at this point, so you don't have to package data files, model files or external libraries, this is just a superficial check to see whether your system looks convincingly automatized. If you pass this screening you'll be invited to Phase 2.

Phase 2. Online testing

Within 10 days after your code passed the code review you are required to submit a dockerized version of your code that we can use to test your system. The technical details of this process are described in a separate document.

Your solution will be subjected to three tests:

First, your solution will be validated (i.e. we will check if it produces the same output file as your last submission, using the same input files used in this contest). Note that this means that your solution must not be improved further after the provisional submission phase ends. (We are aware that it is not always possible to reproduce the exact same results. E.g., if you do online training then the difference in the training environments may result in different number of iterations, meaning different models. Also you may have no control over random number generation in certain 3rd party libraries. In any case, the results must be statistically similar, and in case of differences you must have a convincing explanation why the same result can not be reproduced.)

Second, your solution will be tested against a new set of images. It is important to note that the final testing data will feature a city entirely different from the two that are used in training and provisional testing. The name and location of this 3rd city is not disclosed. However, the scene content will be similar to the published data set.

Third, the resulting output from the steps above will be validated and scored. The final rankings will be based on this score alone.

Competitors who fail to provide their solution as expected will receive a zero score in this final scoring phase, and will not be eligible to win prizes.

Additional Resources

A visualizer is available here that you can use to test your solution locally. It displays your extracted building footprints, the expected ground truth, and the difference of these two. It also calculates precision, recall and f-scores so it serves as an offline tester.
This challenge is in many ways similar to Round 1 and Round 2 of the SpaceNet challenge. See the resources sections of those challenges. The winning solutions and algorithm descriptions are available here and here.

General Notes

This match is NOT rated.
Teaming is allowed. Topcoder members are permitted to form teams for this competition. After forming a team, Topcoder members of the same team are permitted to collaborate with other members of their team. To form a team, a Topcoder member may recruit other Topcoder members, and register the team by completing this Topcoder Teaming Form. Each team must declare a Captain. All participants in a team must be registered Topcoder members in good standing. All participants in a team must individually register for this Competition and accept its Terms and Conditions prior to joining the team. Team Captains must apportion prize distribution percentages for each teammate on the Teaming Form. The sum of all prize portions must equal 100%. The minimum permitted size of a team is 1 member, the maximum permitted team size is 5 members. Only team Captains may submit a solution to the Competition. Topcoder members participating in a team will not receive a rating for this Competition. Notwithstanding Topcoder rules and conditions to the contrary, solutions submitted by any Topcoder member who is a member of a team but is not the Captain of the team may be deleted and is ineligible for award. The deadline for forming teams is 11:59pm ET on the 21th day following the date that Registration and Submission opens as shown on the Challenge Details page. Topcoder will prepare a Teaming Agreement for each team that has completed the Topcoder Teaming Form, and distribute it to each member of the team. Teaming Agreements must be electronically signed by each team member to be considered valid. All Teaming Agreements are void, unless electronically signed by all team members by 11:59pm ET of the 28th day following the date that Registration and Submission opens as shown on the Challenge Details page. Any Teaming Agreement received after this period is void. Teaming Agreements may not be changed in any way after signature. The registered teams will be listed in the contest forum thread titled "Registered Teams".
Organizations such as companies may compete as one competitor if they are registered as a team and follow all Topcoder rules.
Relinquish - Topcoder is allowing registered competitors or teams to "relinquish". Relinquishing means the member will compete, and we will score their solution, but they will not be eligible for a prize. Once a person or team relinquishes, we post their name to a forum thread labeled "Relinquished Competitors". Relinquishers must submit their implementation code and methods to maintain leaderboard status.
In this match you may use any programming language and libraries, including commercial solutions, provided Topcoder is able to run it free of any charge. You may use open source languages and libraries provided they are equally free for your use, use by another competitor, or use by the client. If your solution requires licenses, you must have these licenses and be able to legally install them in a testing VM (see "Requirements to Win a Prize" section). Submissions will be deleted/destroyed after they are confirmed. Topcoder will not purchase licenses to run your code. Prior to submission, please make absolutely sure your submission can be run by Topcoder free of cost, and with all necessary licenses pre-installed in your solution. Topcoder is not required to contact submitters for additional instructions if the code does not run. If we are unable to run your solution due to license problems, including any requirement to download a license, your submission might be rejected. Be sure to contact us right away if you have concerns about this requirement.
If your solution includes licensed software (e.g. commercial software, open source software, etc), you must include the full license agreements with your submission. Include your licenses in a folder labeled "Licenses". Within the same folder, include a text file labeled README.txt that explains the purpose of each licensed software package as it is used in your solution.
External data sets and pre-trained models are allowed for use in the competition provided the following are satisfied:
- The external data and pre-trained models are unencumbered with legal restrictions that conflict with its use in the competition.
- The data source or data used to train the pre-trained models is defined in the submission description.
Use the match forum to ask general questions or report problems, but please do not post comments and questions that reveal information about possible solution techniques.

Award Details and Requirements to Win a Prize

Progress prizes

To encourage early participation bonus prizes will be awarded to contestants who reach a certain threshold at 2, 4 and 6 weeks after the launch of the competition. The threshold for the first such prize is 600,000. Thresholds for the 2nd and 3rd such prizes will be announced later in the contest forums.

Any competitor whose provisional score is above the threshold will get a portion of the prize fund ($500 for each 2 week period) evenly dispersed between the others who also hit the threshold. To determine these prizes a snapshot of the leaderboard will be taken on exactly 14, 28 and 42 days after the launch of the contest.

Final prizes

In order to receive a final prize, you must do all the following:

Achieve a score in the top 5 according to final test results. See the "Final scoring" section above.

Once the final scores are posted and winners are announced, the prize winner candidates have 7 days to submit a report outlining their final algorithm explaining the logic behind and steps to its approach. You will receive a template that helps creating your final report.

If you place in a prize winning rank but fail to do any of the above, then you will not receive a prize, and it will be awarded to the contestant with the next best performance who did all of the above.

Eligibility

The following restrictions apply to the Challenge: (1) Federal employees acting within the scope of their employment are not eligible to participate; (2) Federal employees acting outside the scope of their employment should consult their ethics advisor before participating in the Challenge; (3) Contractors receiving Government funding for directly related work may participate in this challenge but must forego monetary prizes. Competitors will still be publicly recognized based on their performance and must relinquish code. Throughout the challenge, Topcoder��s online leaderboard will display your rankings and accomplishments, giving you various opportunities to have your work viewed and appreciated by stakeholders from industry, government, and academic communities.

Definition

Class:	UrbanMapper3D
Method:	getAnswerURL
Parameters:
Returns:	String
Method signature:	String getAnswerURL()
(be sure your method is public)

Examples

"1"

Returns: "Test case 1"

This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2020, TopCoder, Inc. All rights reserved.

Urban Mapper 3D - Urban Mapper 3D