Challenge Overview
Quartz: NPT Correlated Dummy Data Builder
Description:
We have several tables that contains sensitive data in them that we don't want to share in our challenges. However, we need representative data in the same form in order to move on to our next set of challenges. To do this, we'd like to build a tool that generates large data sets on the fly that conform to the schema we need.
Sounds simple enough, right? Here's the rub; We'd also like to build in a little intelligence to the tool so that the data can be used to model "clusters". We'd like to create records that correlate together when reviewed in aggregate by pattern matching tools. This is to simulate real, unknown correlations that may exist in the original data. These correlations will be temporal in nature and clustered upon both text fields with keywords as well as some related data.
The Backstory
Oil & gas exploration and development is a massive engineering undertaking. It's an extremely expensive endeavor so it's extremely important to keep all systems running as close to 100% of the time as possible. Any time down, known as a Non-Productive Time (NPT), is expensive and reduces ROI. NPTs are caused by a variety of factors from inclement weather to some one dropping a wrench down the drilled hole (this is hysterically called a "fish in hole")! Every day, the engineers fill out a plan for the next 24 hrs. We'd like to see if we can find correlations between things in their plan for the next day and NPT events that have happened in the past.
For this challenge, we're just trying to generate dummy NPT events. Not solve for finding the correlations.
Requirements
The DB Schema can be see on the Readme in the repository
Description:
We have several tables that contains sensitive data in them that we don't want to share in our challenges. However, we need representative data in the same form in order to move on to our next set of challenges. To do this, we'd like to build a tool that generates large data sets on the fly that conform to the schema we need.
Sounds simple enough, right? Here's the rub; We'd also like to build in a little intelligence to the tool so that the data can be used to model "clusters". We'd like to create records that correlate together when reviewed in aggregate by pattern matching tools. This is to simulate real, unknown correlations that may exist in the original data. These correlations will be temporal in nature and clustered upon both text fields with keywords as well as some related data.
The Backstory
Oil & gas exploration and development is a massive engineering undertaking. It's an extremely expensive endeavor so it's extremely important to keep all systems running as close to 100% of the time as possible. Any time down, known as a Non-Productive Time (NPT), is expensive and reduces ROI. NPTs are caused by a variety of factors from inclement weather to some one dropping a wrench down the drilled hole (this is hysterically called a "fish in hole")! Every day, the engineers fill out a plan for the next 24 hrs. We'd like to see if we can find correlations between things in their plan for the next day and NPT events that have happened in the past.
For this challenge, we're just trying to generate dummy NPT events. Not solve for finding the correlations.
Requirements
- Develop a dummy data generator that can produce data with some correlations between records and store then in a database
- node.js or python is acceptable
- mysql or postgres is acceptable data store choice
- CLI tool or web application are acceptable solutions
The DB Schema can be see on the Readme in the repository
Final Submission Guidelines
- Winning submitters are required to make a Merge Request against the repository specified on the forums
- Add wendell and lazybaer as members of your forked repository.