Register
Submit a solution
The challenge is finished.

Challenge Overview

Introduction

This challenge will use a machine learning package of your choice that can help do a few tasks with a supplied dataset. Examples are Scikit-learn (python library) or Weka (java executable)The dataset will contain data collected from a variety of sensors that are distributed throughout a house.  Each room (call it Room1, Room2, ...) has a few sensors (sensor 11, sensor 12, …, 1k) that are organized in a text file (zipped in the challenge) as follows:

[Room1, Room2, ... ].csv  -

Files containing the sensor data captured in the given room.    

These comma-separated files contain roughly 3 weeks worth of captured data, the sensors having been sampled  once every two seconds in that timeframe.

data.config

A comma separated config file, specifying the labels of the Room data files features.

Units specified in parenthesis below (this metadata is not in the data.config file)

These are:

    date  - The local sensor date of sample.

    timestamp - The local sensor time of sample

    light (ohms) - Ranges from “infinite” (99999999) to 0, “pitch black” to very bright

    IR sensor - Values for nfrared or motion range from 0 (no activity) to 3.34 (sensor tripped).

    analog temperature (F) - analog reading of local temperature

    digital temperature (F) - digital reading of local temperature on humidity sensor

    fast humidity (%) - the sensor’s quick calculation of humidity

    slow humidity (%)  - the sensor’s slower calculation of humidity


Requirements

There will be 3 weeks of data for you to work with. Your submission needs to address the following requirements:

  1. Outlier detection: use your favorite statistics/Machine Learning package (refer above for our recommendations) to, based on our specific dataset, detect and remove the outliers in the dataset for each of the sensors.  The output should be a cleaned data file as well as some statistics on the outlier detection/removal for each sensor value. Specify how you achieved these results, that is, in addition to comments in the code, supply a readme describing your procedure and your results.

  2. Prediction: Using the data, develop a classification system to predict IR’s value (bonus points for generalizing to do any of the sensors). If you’d like to, you can add additional parameters to force a particular type of classifier. Please refer to the README.txt file in the provided zip file in challenge assets.

  3. Variable dependence: again, using your favorite statistics/Machine Learning package, find which sensors/features are dependent on each other (e.g., negative or positive correlation).  produce an output and a readme that describes the variables, the dependencies, and how they are related.

 

As mentioned above, you may utilize any readily available packages to perform your analyses. This is real data from actual sensors in a house; you may want to look at the raw data and see if it matches your analyses.

Feel free to carry out the task as you see fit, provided you explicitly state your rationale for approaching the problem in that way.  For example, if you choose to set this up as a multi-class classification problem, establish what classes your system is going to identify, and how your system achieves this downstream.

Provided Artefacts

Please download the data from here - https://www.dropbox.com/s/edcyruja2but4ry/oos_challenge.zip?dl=0


Have fun!



Final Submission Guidelines

The submissions on this challenge will be evaluated on the following criteria

- Number/ extent of requirements met as mentioned above

- Quality of documentation/ supporting analysis provided

Please submit the following

- Complete source code for your implementation along with any test data. Your scripts must be able to run these tasks with different parameters.

- Detailed Deployment and Verification Guides

- Detailed justification/ analysis for the chosen approach and tracabiltiy to requirements

 

 

ELIGIBLE EVENTS:

2016 TopCoder(R) Open

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30052088