Challenge Overview

Challenge Objectives

In this challenge you will be developing a model to detect the anomalies in Supervisory Control and Data Acquisition (SCADA) data using unsupervised machine learning.

Project Background

The project focuses on improving the overall productivity of the Wind Turbine Generators (WTG) by evaluating the condition of the internal components, anomalous behaviour, and risk of failure. This challenge is a part of a large data science project WTG Predictive Asset Management (PAM), Topcoder is proud to be a part of. Taking part in this competition you do not only get a chance to work on an important real-world problem, and win good prizes, but you also contribute to the future well-being of our planet.

Customer is planning to improve the overall productivity by identifying any anomalous behavior then corrective actions thereby avoid failures which leads to downtime . The idea is to identify Components tend to show anomalous behavior due to various factors like deterioration of health, Load, wear and tear etc.

Anomaly is defined as any deviation from the reference. It is important to analyze the anomalous behavior and the root cause for the anomaly. Once the root cause is known then corrective actions can be carried out to avoid failures.

The project aims to capture the anomaly detection at Component , Subcomponent , system level, turbine ( WTG) level for that turbine.( Check output format) .

Technology Stack

Python 3.6.x

Individual Requirements

Scope The focus of this challenge is to explore abilities of unsupervised machine learning to detect anomalous behavior of WTGs at turbine, system, subsystem, component, and sub- component level. There can be 2 problems for variable selection to solve here

- Univariate input like power of turbine to detect the anomaly. - Multivariate inputs like different scada tags to detect the anomaly.

- For example, for a particular wind speed, at a certain rpm value for the

rotation of the rotor is producing a temperature beyond the normal values can be considered as an anomaly.

While detecting an anomaly the model should also provide a confidence score.

Data Analysis

You need to perform data analysis work on the given dataset and save the notebook which should be shared along with the submission. It should have the following items covered properly:

Feature importance and selection procedures, preferably using any graphs
If other methods tried before finalizing on an approach, you can keep this work also in the notebook for reference purposes
Code should be documented appropriately (within the code): Explanations are needed on how the different areas of the model work.

Dataset Following are the list of input provided in the forum

SCADA dataset for 5 years
Data Dictionary
Asset Hierarchy and mapping of tags with components and subcomponents which can be used to derive whether the anomaly is detected at turbine, system, subsystem, component or subcomponent level.

Prediction Format

You must submit a CSV file that contains the following details, for the given dataset.

Timestamp, Turbine, System, Subsystem, Component, Subcomponent, Confidence, Scada Tags

A template is provided in the forum which can be used as a reference for output format.

Evaluation

Since we are using unsupervised learning there won’t be any objective evaluation. Client will decide the best model that can be selected for this use case based on their internal evaluation.

Deployment Guide

Make sure you provide a README.md that covers how to run the script in any environment.

Final Submission Guidelines

Submit the following:

Data Analysis Code Notebook
Model as Python script
Documentation

Your submission should include a text, .doc, PPT or PDF document that includes the following sections and descriptions:

Overview: describe your approach in “layman's terms”
Methods: describe what you did to come up with this approach, eg literature search, experimental testing, etc. If you augmented any of the ideas provided as input, describe your innovations.
Materials: did your approach use a specific technology beyond Jupyter? Any libraries? List all tools and libraries you used
Discussion: Include your analysis in this section. Explain what you attempted, considered or reviewed that worked, and especially those that didn’t work or that you rejected. For any that didn’t work, or were rejected, briefly include your explanation for the reasons (e.g. such-and-such needs more data than we have). If you are pointing to somebody else’s work (e.g. you’re citing a well-known implementation or literature), describe in detail how that work relates to this work, and what would have to be modified
Data: What other data should one consider? Is it derived? Is it necessary in order to achieve the aims? Also, what about the data described/provided - is it enough?
Assumptions and Risks: What are the main risks of this approach, and what are the assumptions you/the model is/are making? What are the pitfalls of the dataset and approach?
Results: Did you implement your approach? How’d it perform? Provide some suggested approaches to evaluate your results.
Other: Discuss any other issues or attributes that don’t fit neatly above that you’d also like to include

PAM Anomaly Detection

Challenge Overview

Data Analysis

Final Submission Guidelines

Learn

ELIGIBLE EVENTS:

Review style

Final Review

Approval

Challenge links

Toolbox

ID: 30095641