Challenge Overview
Topcoder is working with a group of researchers organized by the University of Chicago that are competing to understand a series of simulated environments. In the Disaster World program, we are looking for predictive models to better predict severity of a simulated hurricane as it relates to a data set of virtual actors and regions. This challenge asks you to analyze and predict how individual actors and regions will be affected, both in the short term and the long term.
Background
It's hurricane season in an area along the coastline. The population is diverse with distinct regions.
Task Detail
We have some initial datasets collected as follows
Instance 18
RunDataTable and InstanceVariableTable:
1. Providing census info, casualty stats, pre- and post-hurricane surveys, and hurricane tracking data for the first six hurricanes of a season.
2. Tracking data for the seventh hurricane of the season.
3. TargetActor: A single actor, identified by its ActorPost Participant ID in
InstanceVariableTable.
Instance 19
RunDataTable and InstanceVariableTable:
1. Providing census info, casualty stats, pre- and post-hurricane surveys, and hurricane tracking data for an entire hurricane season.
2. TargetActor: A single actor, identified by its ActorPost Participant ID in InstanceVariableTable.
Additional, secondary data has been provided that refers to specific research requests (RR). This secondary data may be helpful and could be useful in helping certain models perform more accurately. There is a "Secondary Research Data" folder in the data pack that contains this data. There is also a requests document in the forum that can be used, along with the research request ID (RR-XXXX), to find the individual data that was collected and described as part of the request.
Final Predictive Goals: Given all these data of two instances (18 and 19), we will need to build models to predict the situations for a new, unseen, hurricane. You may want to build separate models for different questions, but please note that these multiple predictions are highly relevant to each other. That’s why they are included in the same challenge.
Short-term Predictions (Instance 18)
All questions pertain to the period of days spanned by the provided HurricaneInput table, with additional data provided. Please provide clear methods in your submission that we can use to generate these predictions, given new data.
-
Global Prediction: Given a hurricane’s path (values taken from InstanceVariableTable), how many people will evacuate at least once during a given hurricane?
-
Local Prediction: Given a hurricane’s path (values taken from InstanceVariableTable), which region will have the highest percentage of evacuations?
-
Individual Prediction: Given:
-
Hurricane tracking data
-
Gender
-
Age
-
Ethnicity
-
Religion
-
Children
-
Fulltime job status
-
Pets
-
Wealth
-
Risk
-
Possibilities
-
Timestamp
-
Region residence
-
of an ActorPost TargetActor, will the TargetActor evacuate during a hurricane? What confidence do you assign to your answer?
-
Global Prediction: How many people will evacuate at least once during the new hurricane? Provide a predicted number, a 50% confidence interval, and a 90% confidence interval.
-
Local Prediction: Which region will have the highest percentage of evacuations?
-
Individual Prediction: Will TargetActor (Actor 49, timestamp 18) evacuate during a new hurricane? What confidence do you assign to your answer?
-
Counterfactual Prediction: How would your answers to questions 1 and 3 change if all of the area’s shelters became unusable at the end of the last hurricane in the IDP and remained unusable throughout the HurricaneInput period?
Long-term Predictions (Instance 19)
The timestep of interest for the following questions is the end of the following hurricane season. Please provide clear methods in your submission that we can use to generate these predictions, given new data.
Given a new set of hurricane data:
-
Global Prediction: How many people will die?
-
Local Prediction: Which region will suffer the highest percentage of deaths?
-
Individual Prediction: Will a given ActorPost ID survive the following hurricane season? What confidence do you assign to your answer?
-
Global Prediction: How many people will die? Provide a predicted number, a 50% confidence interval, and a 90% confidence interval.
-
Local Prediction: Which region will suffer the highest percentage of deaths?
-
Individual Prediction: Will TargetActor (Actor 93) survive the following hurricane season? What confidence do you assign to your answer?
-
Counterfactual Prediction: How would your answers to questions 1 and 3 change if, after the IDP season ends, but before the following season begins, the government taxes everyone (decreasing everyone’s wealth by 10%), thus enabling a 50% increase in its aid impact during the following season?
Goal of This Challenge:
You are asked to build models to make predictions to the corresponding questions. Your solution will be judged based on the novelty as well as the performance on the given data.
A few of the given questions will be objectively scored according to answers that are known to be correct, and others will be scored subjectively.
For all predictions, please provide clear model training and usage to create the predictions. The reviewers should be able to easily expand your code to use a new / expanded data set. Do not leave anything to be assumed here, no matter how trivial. This will be part of the review at the end of the challenge, so the more information you provide, and the better your documentation is, the better your chances of winning will be.
The dataset to use can be downloaded here or you can also find dataset link in the forum.
Important Note:
Each University of Chicago Team has the ability to request additional information from the virtual world simulation teams beyond what is initially provided through a “Research Request” process. Data files or folders that are denoted with an “RR” are the output of this process. In the Code Document forum you’ll find a link to a Research Request document which provides the original request submitted by the University of Chicago researchers that can provide some context. The requests have to include a plausible collection methodology (e.g. surveys or instruments that can collect data). There may be additional data that is provided over the course of this challenge submission period. You are encouraged to include this input into your analysis.
Final Submission Guidelines
Submission
The final submission must include the following items.
-
A Jupyter notebook detailing:
-
How the data is prepared and cleaned, from the tsv files
-
How each model is created, trained, and validated
-
How individual predictions are created
-
How we can plug in new data into the model for validation purposes.
-
This will be part of the review, so please ensure the reviewer can easily put in the held-back data for scoring purposes.
-
-
Answers / Exploration of the counterfactual prediction questions detailed above. This should be well documented and clearly described.
-
Judging Criteria
Winners will be determined based on the following aspects:
-
Model Effectiveness (45%)
-
Are your predictions on the review data accurate?
-
-
Model Feasibility (30%)
-
How easy is it to deploy your model?
-
Is your model’s training time-consuming?
-
How well your model/approach can be applied to other problems?
-
-
Model Novelty (10%)
-
Are you using any novel model?
-
-
Clarity of the Report (15%)
-
Do you explain your proposed method clearly?
-