Challenge Overview
Problem Statement | |||||||||||||
Prize DistributionPrize USD 1st $20,000 2nd $15,000 3rd $10,000 4th $2,000 5th $1,000 Progress prizes* week 2 $1,000 week 4 $500 week 6 $500 Total Prizes $50,000*see the 'Requirements to Win a Prize' section for details Background and motivationThe MORGOTH'S CROWN Challenge (Modeling of Reflectance Given Only Transmission or High-concentration Spectra for Chemical Recognition Over Widely-varying eNvironments) is offered by IARPA (Intelligence Advanced Research Projects Activity), within the Office of the Director of National Intelligence (ODNI). IR spectrometers measure the signature of an unknown compound on a given surface, and a detection algorithm identifies the compound by comparing to a detection library. Even with a perfect spectral measurement, the identification is only as good as the correspondence between the detection library and the "real world" signatures. Currently the quality of these detection libraries, and the computational models that support them, are a bigger limitation to accurate chemical identification than the capabilities of the spectrometer hardware. This Challenge invites experts from across government, academia, industry and developer communities to create fast and accurate IR spectral models using new approaches that will advance technology and potentially foster enormous humanitarian impact. IARPA will provide solvers with spectra of training coupons and bulk target chemical data, and solvers are asked to generate an algorithm to predict the sample coupon spectra. See more background information about the challenge here. ObjectiveYour task will be to predict the infrared spectra of different chemicals, taking into account the effects of target chemical loading (mass, fill factor, film thickness), target chemical microstructure, and target chemical - substrate interaction. The spectra your algorithm returns will be compared to ground truth data, the quality of your solution will be judged by how much your solution correlates with the expected results, see Scoring for details. Input FilesCoupon spectra In this task you will work with infrared spectra of sample coupons. A coupon consists of two components: a substrate and a target chemical. In this challenge the substrate can be one of the following materials: glass, polished aluminum, roughened aluminum, anodized aluminum and acrylic. The target chemical is one of the following: acetaminophen, caffeine, potassium nitrate (KNO3) and warfarin. Data corresponding to a spectrum is made up of two components:
Both (a) and (b) are provided as training data for 2 of the 4 target chemicals (warfarin and potassium nitrate), only (b) is provided for the other 2 chemicals. Data corresponding to acetaminophen is used for provisional testing, caffeine is used for final testing. The format of (a) is the following:
The format of (b) is the following:
Additionally to (a) and (b) there are other coupon spectra related files that may contain useful information:
Target chemical and substrate spectra In addition to coupon spectra (ie. combinations of target chemicals and substrates) you have access to spectra of the pure target chemicals and substrates.
Downloads The following data files are available for download.
Output FilesYour output must be a text file that describes the spectra your algorithm predicts for all of the 18 coupons in a test set. The file should contain lines formatted like: <coupon_id>,r1,r2,...,r7152 where
A sample line: Ace_AA_205.6_Ab,0.28,0.27,0.25,...<truncated for brevity>...,0.98,0.99 Your output must be a single file with .txt extension. Optionally the file may be zipped, in which case it must have .zip extension. Your output must only contain algorithmically generated spectra predictions. It is strictly forbidden to include manually created predictions, or spectra that - although initially machine generated - are modified in any way by a human. FunctionsThis match uses the result submission style, i.e. you will run your solution locally using the provided files as input, and produce a CSV or ZIP file that contains your answer. In order for your solution to be evaluated by Topcoder's marathon system, you must implement a class named SpectrumPredictor, which implements a single function: getAnswerURL(). Your function will return a String corresponding to the URL of your submission file. You may upload your files to a cloud hosting service such as Dropbox or Google Drive, which can provide a direct link to the file. To create a direct sharing link in Dropbox, right click on the uploaded file and select share. You should be able to copy a link to this specific file which ends with the tag "?dl=0". This URL will point directly to your file if you change this tag to "?dl=1". You can then use this link in your getAnswerURL() function. If you use Google Drive to share the link, then please use the following format: "https://drive.google.com/uc?export=download&id=" + id Note that Google has a file size limit of 25MB and can't provide direct links to files larger than this. (For larger files the link opens a warning message saying that automatic virus checking of the file is not done.) You can use any other way to share your result file, but make sure the link you provide opens the filestream directly, and is available for anyone with the link (not only the file owner), to allow the automated tester to download and evaluate it. An example of the code you have to submit, using Java: public class SpectrumPredictor { public String getAnswerURL() { //Replace the returned String with your submission file's URL return "https://drive.google.com/uc?export=download&id=XYZ"; } } Keep in mind that your complete code that generates these results will be verified at the end of the contest if you achieve a score in the top 5, as described later in the "Requirements to Win a Prize" section, i.e. participants will be required to provide fully automated executable software to allow for independent verification of the performance of your algorithm and the quality of the output data. ScoringA full submission will be processed by the Topcoder Marathon test system, which will download, validate and evaluate your submission file. Any malformed or inaccessible file, or one that doesn't contain the expected number of lines will receive a zero score. If your submission is valid, your solution will be scored using the following algorithm. dist(s1, s2) is a function that measures the distance of two spectra, its definition is based on the SID(TAN) metric described in this paper. Let s1 and s2 be two spectra, i.e. two arrays of real values, having equal length (n = 7152 in this contest). Then dist(s1, s2) is calculated as follows: EPS = 1e-9; sum1 = 0; sum2 = 0; len1 = 0; len2 = 0; for (i = 0; i < n; i++) { s1[i] = max(EPS, s1[i]); sum1 += s1[i]; len1 += s1[i] * s1[i]; s2[i] = max(EPS, s2[i]); sum2 += s2[i]; len2 += s2[i] * s2[i]; } len1 = sqrt(len1); len2 = sqrt(len2); sid = 0; for (int i = 0; i < n; i++) { a = s1[i] / sum1; b = s2[i] / sum2; sid += (a - b) * (log(a) - log(b)); } sum = 0; for (i = 0; i < n; i++) { sum += s1[i] * s2[i]; } cosAngle = sum / (len1 * len2); sam = acos(cosAngle); dist = sid * tan(sam); Here the tan() and acos() functions are the usual trigonometric functions, log() is natural logarithm, sqrt() is square root. Note that a distance of 0 means that the two spectra are identical. For each coupon in the test set a score will be calculated as score(sPredicted) = max(0, 1 - (dist(sPredicted, sTruth) / (2 * dist(sSubstrate, sTruth)))), where
This means that coupon scores are normalized to the pure substrate spectra, so that a baseline solution that simply returns the substrate spectrum for a given substrate+chemical combination will get a score of 0.5. Finally, your score will be the average of coupon scores calculated as above, multiplied by 1 000 000. Note that you may make full submissions once every 8 hours. Example submissions can be used to verify that your chosen approach to upload submissions works and also that your implementation of the scoring logic is correct. The tester will verify that the returned String contains a valid URL, its content is accessible, i.e. the tester is able to download the file from the returned URL. If your file is valid, it will be evaluated, and detailed score values will be available in the test results. The example evaluation is based on a small subset of the training data, these 3 sample coupons are used:
Example submissions must contain 3 lines of text. Though recommended, it is not mandatory to create example submissions. The scores you achive on example submissions have no effect on your provisional or final ranking. Final ScoringThe top 10 competitors according to the provisional scores will be invited to the final testing round. The details of the final testing are described in a separate document. Your solution will be subjected to three tests: First, your solution will be validated, i.e. we will check if it produces the same output file as your last submission, using the same input files used in this contest. Note that this means that your solution must not be improved further after the provisional submission phase ends. (We are aware that it is not always possible to reproduce the exact same results. E.g. if you do online training then the difference in the training environments may result in different number of iterations, meaning different models. Also you may have no control over random number generation in certain 3rd party libraries. In any case, the results must be statistically similar, and in case of differences you must have a convincing explanation why the same result can not be reproduced.) Second, your solution will be tested against a new set of coupons. Third, the resulting output from the steps above will be validated and scored. The final rankings will be based on this score alone. Competitors who fail to provide their solution as expected will receive a zero score in this final scoring phase, and will not be eligible to win prizes. Additional ResourcesPlenty of relevant papers, reference material and pointers to more information can be downloaded from the Morgoth's Crown microsite resources section. General Notes
Requirements to Win a PrizeProgress prizes To encourage early participation bonus prizes will be awarded to contestants who reach a certain threshold after week 2, 4 and 6 of the competition. The threshold for the first such prize is 600,000. Thresholds for the 2nd and 3rd such prizes will be announced later in the contest forums. Any competitor whose provisional score is above the threshold will get a portion of the prize fund ($1000 at week 2, $500 at week 4 and 6) evenly dispersed between the others who also hit the threshold. To determine these prizes a snapshot of the leaderboard will be taken on exactly 2 weeks, 4 weeks and 6 weeks after the launch of the contest. Final prizes In order to receive a final prize, you must do all the following: Achieve a score in the top 5 according to final test results. See the "Final scoring" section above. Once the final scores are posted and winners are announced, the prize winner candidates have 7 days to submit a report outlining their final algorithm explaining the logic behind and steps to its approach. You will receive a template that helps creating your final report. If you place in a prize winning rank but fail to do any of the above, then you will not receive a prize, and it will be awarded to the contestant with the next best performance who did all of the above. Additional Eligibility SILMARILS Program Performers and Affiliates will be allowed to participate in this challenge. IARPA's SILMARILS performers, government partners, and their affiliates are welcome to participate in the challenge, but will need to forego the monetary prizes. Winners will still be publicly recognized by IARPA in the final winner announcements based on their performance. Throughout the challenge, Topcoder's online leaderboard will display your rankings and accomplishments, giving you various opportunities to have your work viewed and appreciated by stakeholders from industry, government and academic communities. | |||||||||||||
Definition | |||||||||||||
| |||||||||||||
Examples | |||||||||||||
0) | |||||||||||||
|
This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2020, TopCoder, Inc. All rights reserved.