Phase 2 Urban World Prescribe Challenge

Register
Submit a solution
The challenge is finished.

Challenge Overview

Challenge overview:

Topcoder is working with a group of researchers organized by the University of Chicago that are competing to understand a series of simulated environments.                    

In our recent predict challenge, we asked the Topcoder Community to develop models and make a series of predictions based on a set of scenarios and accompanying data. In our Prescribe challenge, we will challenge you to continue this research and use what we have learned about Urban World and its  residents to make a counterfactual prediction. For this challenge you’ll be completing the following tasks: 

Select a set of 200 agents to remain in the world in order to maximize friendship.                                

Specifically you will select 200 individuals to remain in Urban World after all other (4000+ individuals) are removed from the simulation on Simulation Date 2022-07-15 00:00 (P-Day). From this time, the simulation will continue running with only the 200 selected agents for one additional simulation month. 
 

Submission

Submissions will consist of a csv file having a single column containing the UIDs of the 200 selected agents. Since this task is about maximization, only one submission will be allowed.                     

Evaluation                    

On each day, we will measure the average number of friends of our agents. The average over the last seven days (08/08-08/14) will determine your final score. Thus your goal is to maximize the average number of friends among the 200 selected agents during the last seven days of the simulation.

Initial Data Package Description                    

The Initial Data Package (IDP) for this Predict Test consists of six sub-packages. For each package, data is reported for all agents until P-Day (2022-07-15 00:00). In addition, data is reported AFTER P-Day using a baseline prescription of 200 individuals selected uniformly at random. Thus, after P-Day, only these 200 individuals (their UIDs are provided in Data Package 6) remain.                        

Data Package 1:���

 Detailed data from randomly selected recreational sites for 60 days before, and 30 days after the P-Day (5/16/2022-8/14/2022)                  

- This data contains detailed information, including UIDs of all individuals presents at these recreational sites. Reported at five minute intervals.

- Information about all meetings happening at these recreational sites. All members of each meeting are reported at five minutes intervals.                        

- Sampling: 25% of recreational sites.                    

Data Package 2:��� 

Low-detail information for selected recreational sites for 60 days before and 30 days after the P-Day (5/16/2022-8/14/2022)                    

- This data contains low-detail information obtained from a virtual camera located outside the recreational site.                        

- It includes information whenever an individual enters the recreational site, and whenever an individual leaves the recreational site.            

- Agent IDs are not provided.

- Sampling: All recreational sites.                        

Data Package 3:��� 

High-detail journal data reported by 100 individuals for 60 days before and 30 days after the P-Day (5/16/2024-8/14/2024)                    

- This information includes a detailed travel journal, updated at five-minute intervals, of the places the individual has visited.                    

- It includes information obtained from inside the place, including information about other individuals present at the location, and information meetings happening at the location.        

- Individuals will be chosen uniformly random, but stratified over the sets of prescribed and non-prescribed agents. Thus, 50 of the 100 agents are among the 200 agents chosen to remain in this world after the P-Day.    

Randomly selected 100 individuals for DataPackage3:                    

UID-75f38254eb,UID-443164d6ff,UID-eeb98c9d97,UID-c2e84f6423,UID-42eb0b6ba9,UID-d99f 9e38e8,UID-ed82dbeb6a,UID-9f66e1d468,UID-662fbd744b,UID-50f7727568,UID-e4031383d8, UID-ae2bea0ce6,UID-25c1977a0e,UID-a27a3df76b,UID-4e5a2e9228,UID-e1e969a9c0,UID-f85 11d4e0f,UID-b443bb714d,UID-463306d190,UID-bc4db336c6,UID-94072cdda3,UID-1be0537bb 3,UID-f6ff0a8c77,UID-138d7a1858,UID-6e8da2bdaa,UID-fe68c21681,UID-3e65601888,UID-f27 6742d3b,UID-467b38ec8f,UID-a64aa2b956,UID-1d880cac66,UID-7c5affb128,UID-295f56f8a5, UID-40e9b432a8,UID-4436487bc1,UID-4b11700da0,UID-b5f96e5131,UID-83311503e3,UID-93 dd329180,UID-3ab20b7ea0,UID-42a0637212,UID-a084bb2900,UID-cbfd03c0c1,UID-b286a427 28,UID-bffefb480c,UID-c33226dab4,UID-a155750a06,UID-7b0227d8e5,UID-3cab524130,UID-2 64d1b8f9b,UID-86cbe79f54,UID-aa83ee364c,UID-4e778901a3,UID-998386af5d,UID-4dd80802 6c,UID-a287a27ae6,UID-a09b2c8656,UID-a033f0b10e,UID-d870e2e6e8,UID-ce29121c62,UID- 94833edf95,UID-f2b4cb8d22,UID-bdf04a76f8,UID-de129219ed,UID-6d3d3bd1c1,UID-c41222b b1e,UID-9898cdb33e,UID-9b140b1957,UID-abe9b91940,UID-40d5618e2f,UID-dc3bf4969c,UID -700ad2bcb8,UID-0b6db69cbe,UID-5eee3771f3,UID-c4c654162e,UID-5367759bb3,UID-f7c9a3 5de8,UID-198b03922c,UID-0ad9765538,UID-16296cacd2,UID-0d046a9235,UID-1f62ea85ba,UI D-3426db3358,UID-0676478f80,UID-6dd34e66eb,UID-46c94f2579,UID-dbd4294506,UID-8ba8 1c4125,UID-6886e0b1b8,UID-02ee765f44,UID-1654912e0a,UID-de32397471,UID-81d54f9b0a, UID-40e9525d29,UID-254271a286,UID-bb9096b50e,UID-cf427b72fd,UID-16473965d9,UID-10 4a92cba6,UID-d1f61d96ec         

Data Package 4: 

���Basic statistics of the social network as well as detailed social network information for the 100 individuals described in Data Package 3 for 30 days before the event and for 30 days after the event. (6/16/2022-8/14/2022)                        

- This information includes the average number of friends per day. (the average of this, over the last seven days yields the Prescribe Test Result of the baseline prescription)            

- Detailed social network information of the 100 agents selected in Data Package 3.        

Data Package 5:��� 

Home locations of all people just before P-Day (7/15/2022 00:00). This includes the home location of all individuals, not only the 200 selected individuals.                    

Data Package 6:��� 

Information on individuals selected for the baseline prescription - A list of selected 200 people and their home locations

Supplementary Data Packages Description                    

Additional data will be provided about agents, sites, and the geography of Urban World. These data can be linked through key variables such as Timestep (i.e. timestamp within simulation), UID (i.e. each agent has a unique UID), SiteIdx (i.e. each site has a unique SiteIdx), and so on. 

We're also providing Research Request data.  The Urban World research team can query the simulation team for additional data.  This data may or may not be substantively useful. Values for each feature may or may not vary (e.g. all values for a certain variable may be missing or a single non-missing value). Be sure to strategize which data to use, why it is useful, and how to use it before making modeling decisions, as there are many potentially disjoint strategies for approaching this challenge.   Here is a document which explains the research requests.

Note: Time is a significant dimension of this data, with many longitudinal/panel data sets and some cross-sectional data sets. Be sure you account for temporality appropriately, as this challenge is fundamentally concerned with predicted future behavior!    The Explain Phase, Predict Phase, and Prescribe Phase data sets each represent samples from the same virtual world at different time slices.



Final Submission Guidelines

The final submission must include the following items.

  • CSV file containing the UID’s of the remaining agents one per line.

  • A Jupyter notebook detailing:

    • How the data is prepared and cleaned, from the tsv files

    • How your model is created, trained, and validated

    • How individual predictions are created

    • Provide error rates and/or confidence intervals based on training data.

    • Answers / Exploration of the counterfactual prediction question detailed above.  This should be well documented and clearly described.

Winners will be determined based on the following aspects:

  • Model Effectiveness (70%)

    • The research teams from the University of Chicago will be comparing your work with theirs in answering the question above.  Your submission will receive a subjective evaluation from this team based on the clarity and rigor of your analysis.

  • Model Feasibility (20%)

    • How easy is it to deploy your model?

    • Is your model’s training time-consuming?

    • How well your model/approach can be applied to other problems?

  • Clarity of the Report (10%)

    • Do you explain your proposed method clearly?

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30109373