Challenge Overview
Quartz Energy is monitoring stresses on a structure. The tensions are expressed in kips (Wikipedia) and are monitored by a series of cells which are recording and reporting the data. There are generally 3 cells monitoring each structure and tension readings are reported in 12 hour intervals. For the M data series there are 6 cells per structure and there is one structure – H2 where the cell groupings do not apply.
Currently, there is a manual process in place to both download the data and ensure that the instruments monitoring the stresses are reporting valid values. Our client has asked for Topcoder's help in automating this process.
There will be four steps to the complete process.
A little explanation of the data files:
The key to the data files in the forums can be found here. The link to the data files themselves can be found in the forums attached to this challenge.
Here are the conditions that we'd like to identify in the data:
In this challenge, please prepare a report that identifies the requested anomalies in the data. Produce the following information for each load cell in an Excel file. Data for each Load Cell should be placed on a separate worksheet:
Summary Section
Detail Section: This can all be organized as a single sheet with the dates as rows and the conditions as columns.
Technical Guidelines:
You should provide the following in your submission.zip file
Currently, there is a manual process in place to both download the data and ensure that the instruments monitoring the stresses are reporting valid values. Our client has asked for Topcoder's help in automating this process.
There will be four steps to the complete process.
- Data integration
- Data validation
- Report generation
- Alert notification
A little explanation of the data files:
- Each data filename contains the structure that it pertains too. For example, B_A.csv is the data for Structure B Tower A.
- Each data file may have data related to multiple cells related to the same tower. The cell names are in column A. (e.g. in B_A.csv Column A has 3 distinct values -- Tower A LC 1 Load, Tower A LC 2 Load, Tower A LC 3 Load)
- The H2 data files only contain information about 1 cell.
The key to the data files in the forums can be found here. The link to the data files themselves can be found in the forums attached to this challenge.
Here are the conditions that we'd like to identify in the data:
- The data series from each cell should have some variance because of environmental factors. The plotted readings for a particular cells shouldn't flat line. A series of duplicate values is a red flag.
- The cell groupings are designated in the Channel Name column in the files. For example, Tower A LC 1, Tower A LC2, and Tower A LC3 all refer to cells supporting Tower A. The tension values for all the cells should rise and fall at the same general rate. If a structure has 3 cells and 1 cell isn't reflecting increasing or decreasing tension values in a similar direction to the other two cells, there may be a malfunction with that cell. As stated above there is one structure where the cell groups don't apply: H2.
- The average of the recorded Max, Mean, and Min values should be within 1.5% of the previous week (14 records in the data). Values much different than the previous time period may indicate a cell malfunction or some kind of maintenance activity. There are intermittent invalid values especially in the MIN data series. A single bad value that is widely out of sync with other values should be flagged but isn't necessarily cause for concern.
- Flag duplicate rows. The failures reflected in the data may be intermittent or cover multiple periods in the data.
In this challenge, please prepare a report that identifies the requested anomalies in the data. Produce the following information for each load cell in an Excel file. Data for each Load Cell should be placed on a separate worksheet:
Summary Section
- The number of rows in the data set
- The time duration governed by the data set.
- The number of rows where there are duplicated values to the previous record. (Condition #1 above)
- The number of rows where tensions are not in sync with cells for the same structure. (Condition #2 above)
- The number of times where recorded values are not within 1.5% of the previous 14 periods (Simple Moving Average) for Max, Mean, and Min values (Condition #3). This will be 3 separate values -- one for each data series.
- The number of duplicate rows found. (Condition #4)
Detail Section: This can all be organized as a single sheet with the dates as rows and the conditions as columns.
- List of Dates where readings have flat lined.
- List of Dates where tensions are out of sync.
- List of Dates where recorded values are not with 1.5% of previous week
- List of Dates which contain duplicate data
Technical Guidelines:
- Your code should embrace a modular design that encourages reuse. In the next phase of this project, we’ll be using this code to evaluate the data in an automated way rather than generating a report about it. Portions of the data will be included or excluded from various calculations based on our audit.
- Please make the application configurable where possible. For example, the 14 days moving average – X days.
Final Submission Guidelines
Technology:- Python 3
You should provide the following in your submission.zip file
- Your source code
- Dependency Management and build scripts (pip install, etc)
- Documentation (README.md)