PseudoVet - Scrape US Veteran Health Records Data Challenge

Key Information

Register
Submit
The challenge is finished.

Challenge Overview

Welcome to the PseudoVet - Scrape US Veteran Health Records Data Challenge.

 

Overview

 

PseudoVet is an automated patient data fabrication engine which provides a set of active synthetic patients and clinical data that can be used for healthcare software development. Development against real patient data unnecessarily exposes patient health information (PHI) and personally identifiable information (PII) and cannot be used by developers outside of the VA network. However, fully functional, realistic data sets can be used safely in development, testing, training and other non-production environments in compliance with the Health Information Technology for Economic and Clinical Health Act (HITECH Act) and other regulations. Development against current fabricated data is not useful because the data sets are outdated, which requires development teams to spend time developing data sets to use in lieu of writing code or require licenses and cannot be shared.

 

Challenge Requirements

 

In this challenge, we want to focus on finding/ scraping data related to US veterans

The key requirements are listed below and you will need to search public sources on the Internet/ .gov sites to find this data.

Some good starting points are mentioned below (but feel free to pick other sites)

 

The end objective of this challenge is to build a report on Patient Demographics by Selected Birth Year

    We seek the following detail:

  • Date of Birth

  • Date of Death (if exists)

  • Gender (M/F/U)

  • Height

  • Weight

  • Language Codes

  • Language Preference Indicator

  • Military Branch (if exists)

  • Military start and end dates (if exists)

  • Military era (if exists, or can be found, else null)

  • Behaviors (Smoking, Drinking etc.)

  • And others as may be determined

 

If there’s a record for which a specific field information is not available, please mention it as NA (Not Available)

 

There is no coding or implementation involved in this challenge.

 

Review Criteria

  • The co-pilot and PM will review the submissions for data authenticity as well as the volume of dataset provided. The submission with most credible data as per the above spec will be chosen as winners. All submissions will be judged on a scale of 1-10. There will be no appeals or appeals response phases for this challenge.

 


Final Submission Guidelines

  • Zipped dataset in CSV format (double quote enclosed text) with a text file clearly listing referenced data sources

  • Please clearly list down references in a text file. Submissions which do not list valid references will be disqualified

ELIGIBLE EVENTS:

2018 Topcoder(R) Open

REVIEW STYLE:

Final Review:

Community Review Board

Approval:

User Sign-Off

SHARE:

ID: 30059196