Topcoder Challenge | Topcoder Community

Challenge Overview

Project Overview

Governments across the world are increasingly applying open government practices such as crowdsourcing to develop stronger policies and to engage citizens, providing access to civic influencing beyond election cycle. When hundreds of ideas from citizens flow in, the crowdsourcers are facing a problem: Lack of efficient analysis and synthesis tools.

Civic CrowdAnalytics, a group at Stanford University, is developing solutions to address this problem, and is taking steps towards more participatory, inclusive and transparent democratic societies, making sure that all citizens have an equal opportunity to get their voices heard.

This challenge is part of the HPE Living Progress Challenge Blitz Program (Secure top placements in the leaderboard to grab additional cash prizes)

Competition Task Overview

We’ve just launched our first challenge to categorize data provided by citizens related to transportation issues in Palo Alto, CA. In order to score and rank the submissions for that challenge, we’re going to need to be able to do the following things:

1. The command line application should open and read a spreadsheet, submissions.csv. The submissions.csv file will have 9 columns: submission id (6 digit Topcoder submission id), endpoint, score, overall accuracy, main category accuracy, subcategory 1 accuracy, subcategory 2 accuracy, subcategory 3 accuracy, subcategory 4 accuracy. The filename should be configurable or passed as a command line parameter.
2. The application should read the submissions.csv file. For each line in the submission id the application should do the following:

2.1 Make a REST POST to an endpoint of the format: http://your.ip.address/api/v1/categorize. Please note you will need to provide a mock api for this (using our sample input / output) in order to test your code.

2.2 Read a spreadsheet which contains our test data. The data will look exactly like this: https://docs.google.com/spreadsheets/d/1tyZu4gNumrQT0xWg0iytf6CIPBw48LbdpwwrxuVUh48/edit?usp=sharing

2.3 Send ten rows from the spreadsheet in the format shown here where each id is one row of the source document:

{
   "document" :
   [
       {
           "id" : "1",
           "content" : "A large block of text, which you should categorize."
       }, {
           "id" : "2",
           "content" : "A large block of text, which you should also categorize."
       }
   ]
}

2.4 Process responses in the following format:

{

"document" :

[

{

"id" : "1",

"content" : "A large block of text, which you should categorize.",

"primary_main_category" : "Big picture infrastructure",

"primary_subcategory1" : "other subcategory”,

"primary_subcategory2" : "other subcategory”,

"primary_subcategory3" : "other subcategory”,

"primary_subcategory4" : "other subcategory”,

"secondary_main_category" : "Public ",

"secondary_subcategory1" : "other subcategory”,

"secondary_subcategory2" : "other subcategory”,

"secondary_subcategory3" : "other subcategory”,

"secondary_subcategory4" : "other subcategory”

}, {

"id" : "2",

"content" : "A large block of text, which you should also categorize.",

"primary_main_category" : "Private Transit",

"primary_subcategory1" : "other subcategory”,

"primary_subcategory2" : "other subcategory”,

"primary_subcategory3" : "other subcategory”,

"primary_subcategory4" : "other subcategory”,

"secondary_main_category" : "Non-motor powered transit ",

"secondary_subcategory1" : "other subcategory”,

"secondary_subcategory2" : "other subcategory”,

"secondary_subcategory3" : "other subcategory”,

"secondary_subcategory4" : "other subcategory”

}

]

}

2.5 Processing responses will mean determining whether the categories provided for each response are correct and assigning a score based on the rubric below. Secondary categories should only be considered if the primary categories are incorrect. The scoring function will work as follows:

1 point will be given for each correct main category and subcategory 1 associations

.5 points will be given for each correct subcategory 2 association

.25 points will be given for each correct subcategory 3 or 4 association

We’re also going to allow the apps to make a second guess for each category tag. The scoring for the second guesses will be as follows:

.5 points will be given for each correct secondary main category and subcategory 1 associations

.25 points will be given for each correct secondary subcategory 2 association

.125 points will be given for each correct secondary subcategory 3 or 4 association

2.6 Record the scores in the submission.csv file.

2.7 Record the accuracy statistics in the submissions.csv file. The accuracy statistics should only apply to the primary categories and subcategories. Accuracy is measured as the # of correct answers/# of total answers. All the categories and subcategories should be weighted the same for the accuracy stats we won’t be weighting them for depth as we did with the scoring metric above.

3. The app should log output so requests and responses can be debugged if necessary.
4. The app should provide summary and progress information to the console screen to verify that execution is proceeding

Technology Overview

Linux
Python 2.7
REST
JSON

Final Submission Guidelines

Submission Deliverables

1. Python code that covers all the mentioned requirements (including the mock api)
2. Detailed deployment guide documenting how to deploy & test your code

Living Progress - CrowdAnalytics - Python Test Harness

Challenge Overview

Project Overview

Competition Task Overview

Technology Overview

Final Submission Guidelines

Submission Deliverables

Learn

ELIGIBLE EVENTS:

Review style

Final Review

Approval

Challenge links

Toolbox

ID: 30054393