Hestia - Differential Privacy - Data Anonymization Challenge

Register
Submit a solution
The challenge is finished.

Challenge Overview

Challenge Detail

In this challenge you have to implement differential privacy masking breaks on the given dataset in a way so that no corelation to real world objects/peoples/entities are possible making use of any available open source libraries.

Project Background

The client is exploring the possibilties of using data science challenges for various use cases of their business. As part of the data preparation for data science work we need to protect privileged information and prevent linkage attacks before opening it to the community. Multiple levels of masking might be required for this. We need to come up with a data masking solution that can provide high scalability and ease of use for the dataset.

Technology Stack
  • Java 8
  • Python
The preferred language is Java / Python 3.

There are few open source libraries recommend that can be referred to achieve the masking requirements. You are free to research and use other open source libraries after getting approval.
  • https://github.com/uber/sql-differential-privacy
  • https://github.com/arx-deidentifier/arx

Individual Requirements

Challenge Input
The sample dataset csv file containing the required columns that need to be masked will be shared in the challenge forums.

Scope
  • You have to implement a masking program in Java/Python that can reasonably prevent linkage attacks making use of the recommended libraries or any good anonymization software.
  • Your script should be highly optimized and should be able to be executed on millions of rows without crashing.
  • The model used for masking should be documented properly for review purpose. Include a google doc or PDF that describes your approach and specifically explains your epsilon.
Important Notes

The winner needs to support for any issues faced during the code run for the larger dataset for an additional prize money that would be provided based on the amount of work.

Deployment Guide and Validation Document

Make sure to require two separate documents for validation.

A README.md that covers:
  • Deployment - that covers how to build and test your submission.
  • Configuration - make sure to document the configuration that are used by the submission.
  • Dependency Installation -  should clearly describe the step-by-step guide for installing dependencies and should be up to date.
A Validation.md that covers:
Validation of each requirement can be mentioned in this document which will be easier for reviewers to map the requirements with your submission.

Final Submission Guidelines

  • Documentation
  • Project Source Code

ELIGIBLE EVENTS:

Topcoder Open 2019

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30082341