Challenge Overview
Problem Statement | |||||||||||||
Prize Distribution
BackgroundAn aquifer is an underground layer of water-bearing, permeable rock from which groundwater can be extracted using a water well. The composition of underground rock layers affects an aquifer���s quality. Changes in underground rock layer composition can be measured using instruments that record natural radiation (gamma ray, or "GR") and electrical properties (resistivity, or "RESD"). Fluctuations in underground layer "GR" and "RESD" values are plotted against depth and are referred to as "well logs", where a well log is comprised of multiple measured logs (e.g. gamma ray logs and resistivity logs). Well logs begin at the ground surface (depth = 0) with depth values increasing as the well penetrates deeper underground. Gamma ray and resistivity measurements may not be recorded over the entire length of the well. Changes in gamma ray and deep resistivity are used to identify the boundaries of different underground layers which allow for the quantification of aquifer properties. Experts utilize changes in measured logs, along with their knowledge of the subsurface, to identify important aquifer boundaries. The processes of identifying aquifer boundaries across hundreds or even thousands of wells is time-consuming and prone to human error. We are seeking a method to rapidly identify aquifer boundaries in unidentified wells using several expert identified aquifer boundaries as a reference. ObjectiveYour task is to identify the depth of aquifer boundaries for eight different underground rock layers, or strata, in a well at a given location based on gamma ray logs and resistivity logs measured at a range of depths within the well. For each well, the eight depths that your algorithm returns will be compared with the ground truth, and the quality of your solution will be judged according to how well your solution matches the ground truth. See the "Scoring" section below for details. Input Data FilesThe complete data set contains 1,218 wells. This data has been partitioned into three data sets: training, provisional, and system. The training data set which can be used to train and improve your algorithm contains 609 wells, and the ground truth strata depths are included for these wells. The provisional data set which is used for scoring your submissions during the contest contains 183 wells. The system data set which is used for scoring your final submission at the end of the contest contains 426 wells. The complete data set was partitioned into these subsets randomly. The data set further referred to as the testing data set is the union of the provisional and system data sets. Contestants will not be privy to the partitioning of the testing data set into provisional and system data sets. Your algorithm will receive a well summary CSV file containing the location of all wells along with the eight strata depths for the wells contained in the ground truth. This CSV file contains a header and one row for each well in both the training and testing data sets. This CSV file will contain the following 12 columns:
For wells in the testing data set, all of the strata depths will be left empty. For wells in the training data set, some wells will have one or more missing strata depths. If a stratum depth is missing from the training ground truth, this means that this stratum does not physically exist at this location. The summary CSV file can be downloaded here. Your algorithm will also receive one depth data csv file for each well in both the training and testing data sets. Each csv file contains a header and one row for each measurement of gamma radiation and resistivity. Each row will contain the following columns:
The zipped csv files with the instrument data for each well can be downloaded here. Output FileThis contest uses the result submission style. For the duration of the contest, you will run your solution locally and produce a CSV file which contains your results. Your output file must be a CSV file, without a header, which contains one row for each well in the testing data set. The CSV file should have only the nine following columns, in this order:
You must include predictions for all wells in the testing data set (609 wells). Do not include any wells from the training data set in this file. FunctionsDuring the contest, only your output CSV file, containing the provisional results, will be submitted. In order for your solution to be evaluated by Topcoder's marathon match system, you must implement a class named AutoTops, which implements a single function: getAnswerURL(). Your function will return a String corresponding to the URL of your submission file. You may upload your output file to a cloud hosting service such as Dropbox or Google Drive, which can provide a direct link to the file. To create a direct sharing link in Dropbox, right click on the uploaded file and select share. You should be able to copy a link to this specific file which ends with the tag "?dl=0". This URL will point directly to your file if you change this tag to "?dl=1". You can then use this link in your getAnswerURL() function. If you use Google Drive to share the link, then please use the following format: "https://drive.google.com/uc?export=download&id=XYZ" where XYZ is your file id. Note that Google Drive has a file size limit of 25MB and can't provide direct links to files larger than this. (For larger files the link opens a warning message saying that automatic virus checking of the file is not done.) You can use any other way to share your result file, but make sure the link you provide opens the filestream directly, and is available for anyone with the link (not only the file owner), to allow the automated tester to download and evaluate it. An example of the code you have to submit, using Java: public class AutoTops { public String getAnswerURL() { // Replace the returned String with your submission file's URL return "https://drive.google.com/uc?export=download&id=XYZ"; } } Keep in mind that your complete code that generates these results will be verified at the end of the contest if you achieve a score in the top 10, as described later in the "Requirements to Win a Prize" section, i.e. participants will be required to provide fully automated executable software to allow for independent verification of the performance of your algorithm and the quality of the output data. ScoringExample submissions will be scored against the entire training data set. Provisional submissions should include predictions for all wells in the testing data set. Provisional submissions will be scored against the provisional data set during the contest. Your final provisional system will be scored against the system data set at the end of the contest. When a stratum exists in the ground truth and a prediction is made, the penalty for each stratum depth prediction is: P_stratum = abs(prediction - ground_truth)When a stratum exists in the ground truth but no prediction is made, this constitutes a false negative, and the penalty is 250: P_stratum = 250When a stratum does not exist in the ground truth but a prediction is made, this constitutes a false positive, and the penalty is 500: P_stratum = 500The penalty for each well is the sum of penalties for each stratum which contains a ground truth entry: P_well = sum(P_stratum)The total penalty for all wells in the relevant data set is: P_total = sum(P_well)The maximum possible total penalty is computed as follows: P_stratum_max = max(max_depth - ground_truth, ground_truth - min_depth)where min_depth and max_depth are the minimum and maximum depth of the depth data CSV file for the well. Note that in rare cases, the ground truth might lie outside this range. The maximum penalty for each well is the sum of maximum penalties for each stratum which contains a ground truth entry: P_well_max = sum(P_stratum_max)The total penalty for all wells in the relevant data set is: P_total_max = sum(P_well_max)Your final score is calculated as follows: Score = 1,000,000 * (1 - P_total / P_total_max)10Final ScoringThe top 10 competitors with non-zero provisional scores are asked to participate in a two phased final verification process. Participation is optional but necessary for receiving prizes. Phase 1. Code Review Within 2 days from the end of submission phase you must package the source codes you used to generate your latest submission and send it to jsculley@copilots.topcoder.com and tim@copilots.topcoder.com so that we can verify that your submission was generated algorithmically. We won't try to run your code at this point, so you don't have to package data files, model files or external libraries, this is just a superficial check to see whether your system looks convincingly automatized. If you pass this screening you'll be invited to Phase 2. Phase 2. Online Testing You will be given access to an AWS VM instance. You will need to load your code to your assigned VM, along with three scripts for running it:
Your solution will be validated. We will check if it produces the same output file as your last provisional submission, using the same testing input files used in this contest. We are aware that it is not always possible to reproduce the exact same results. For example, if you do online training then the difference in the training environments may result in a different number of iterations, meaning different models. Also, you may have no control over random number generation in certain 3rd party libraries. In any case, the results must be statistically similar, and in case of differences you must have a convincing explanation why the same result can not be reproduced. Competitors who fail to provide their solution as expected will receive a zero score in this final scoring phase, and will not be eligible to win prizes. General Notes
Requirements to Win a PrizeIn order to receive a prize, you must do all the following:
| |||||||||||||
Definition | |||||||||||||
| |||||||||||||
Examples | |||||||||||||
0) | |||||||||||||
| |||||||||||||
1) | |||||||||||||
| |||||||||||||
2) | |||||||||||||
|
This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2020, TopCoder, Inc. All rights reserved.