Challenge Overview
Project Gwalldata Overview
Thanks for your interest in the Project Gwalldata!
We're now mid-way through an interesting project based around python, text-processing, pdf and word documents. In the coming weeks and months there are many further contests planned as our complexity and needs ramp up and we're glad to have you on board! Basically, our client has extremely large documents (both Word and pdf files) that need to be checked in various ways and with various methods for consistency amongst a possible large number of variables. Data preparation is going to involve creating realistic data to provide to the community so that we can simulate our real documents.
Challenge Task Overview
This challenge should develop the utility for following requirements and add the utility to the GUI.
- P Values
- Input: Folder or Single Document which contains the p values
- Output: Invalid P Values (File Name, Row Number)������
- "p-values" (assessment of statistical significance) are limited to either 2 or 3 decimal places (that is, one decimal place or 4 or more decimal places are unacceptable) throughout all tables.
- Attached is a document with example p-values.
- Please note p-values may be referred to in tables and text as: 'p', 'p-value', 'p=', 'p<=', 'p<', 'p>', etc. The solution should address any of these combinations.
- Lab Criteria Unit
- Input: Folder or Single Document, Proportion (single value for Abnormal observations), Given Lab (string) (for Significant deviations)
- Output:
- Inconsistent lab criteria and units (File Name, Row Number)
- Abnormal observations
- Significant deviations
- For any given lab criteria (e.g., cholesterol, potassium, vitamin B12, etc.), the units for each lab
value are consistent across all tables. (Lab Criteria is 'TEST_NAME' in the spreadsheet and units for each lab values is 'LABUNIT' - Abnormal observations (by the input proportion) for patients(VISIT) having the same lab(Test Name) at the same lab site(LABSUB) for a given visit date(VISIT_DATE)
- Significant deviations on a given lab(TEST_NAME) for a given patient across visits
Final Submission Guidelines
Environment:
- Windows
- Linux
Submission Deliverables:
- Source Code
- Deployment Guide (include verification steps)