Challenge Overview

In this challenge we will develop a utility to dump data from the IoTHub and EventHubs to help identify issues in the IoT applications. 
The Java utility will dump data from one IoTHub and two EventHubs based on user input parameters, into well-defined CSV files. There are typically 4 to 32 partitions setup in the IoTHub/EventHubs. Usually, the data sent from a device (identified by a UUID) are sticky to the same partition in the IoTHub/EventHub, but we cannot trust that it happens all the time. This is because sometimes we will send data with the same device UUID from different connections.

Attached to the forums is a Java project you can use as the base code. For connecting to the Azure IotHub/EventHub we have created Azure services with enough bandwidth that you can use for testing. Connection details are in the forums.

The utility shall support command line interface. Some common command line parsing library and the logging library (e.g. log4j) shall be used. It shall be able to run on both Windows and Linux (e.g. CentOS) with the latest JVM. The utility will be used to diagnose the data pushed from the devices. We shall expect invalid data and exception cases. For example, the wrong type of data are pushed to the wrong IoTHub/EventHubs. It is important to export the EnquedTime, the properties of the data, and the payload of the data, as well as the partition id. These are the data flows from the IOT devices:

Message Schemas:
Each message has JSON string payload and a set of system and application properties. Below is the list of custom properties. A single ASCII char is defined as the key. See https://docs.microsoft.com/en-us/azure/iot-hub/iot-hub-devguide-messages-construct
  • ‘a’ (custom, required): A string to categorize the data (e.g. "DeviceTree", etc.).
  • ‘p’ (custom, optional): A string to represent the device or the gateway’s UUID who sent the data.
These system properties are important:
  • EnqueuedTime: Timestamp indicating when the message was created.
  • ConnectionDeviceId: The source gateway or device’s UUID. This typically matches the application property ‘p’. If they are different, it usually indicates a problem.
The message size can be calculated this way.
  • MessageSize = The body size in bytes + the size in bytes of all the values of the message system properties + the size in bytes of all user property names and values in ASCII.
There are three types of payload data. The examples are below.
  • a=”DeviceTree”
{
  "d":{
    "d":"c98b007d-5d5e-4f25-b185-8df0fdc8515d",         // required
    "profile":"eb62853a-d42d-4b91-8dc0-865b7a400f45",   // required
    "name":"MyDeviceName",                              // required
    "serial":"MySerial",                                // optional
    "asset":"MyAsset",                                  // optional
    "mac":"00:0a:95:9d:61:19",                          // optional
    "ds":[                                             
      {
        "d":"2d778391-c9b0-414b-ae76-a4d5dfd50f0d",
        "profile":"87896aeb-ac7a-4031-a572-d9c14f62754f",
        "name":"MySubDeviceName"
      }, …
    ]
  }
}
  • a=”Realtimes”
{
  "realtimes":[
    {
      "d":"a945a223-8ebc-4c8e-aba3-cf837bbbdf62",   // optional, skip if this device UUID is the same as the p property
      "c":"1234",    // required, the channel integer tag, either the MCL channel tag or the custom channel tag.
      "t":1456198800, // required, timestamp
      "t_ms":0,             // optional, ms offset
      "v":"120.34"         // required, the value
    }, …    
  ]
}
  • a=”Trends”
{
  "trends":[
    {
      "d":"a945a223-8ebc-4c8e-aba3-cf837bbbdf62",   // optional, skip if this device UUID is the same as the the p property
      "c":"1234",       // required, the channel integer tag, either the MCL channel tag or the custom channel tag.
      "t":1456198800,   // required, timestamp
      "v":"120.34",     // optional
      "avg":"120.39",   // optional
      "min":119.2",     // optional
      "max":121.9"      // optional
    }, …   
  ]
}


User can configure and input these parameters to control the dumped outputs. Note: All timestamps shall be in UTC ISO 8601 format: e.g. 2017-11-27T13:48:28Z, unless we explicitly call out the local time format (this should be configurable)
Configurations:
  • IoTHub’s connection string
  • EventHubs’ connection strings (two)
  • Log4J configuration file
Inputs:
  • Gateway UUID: Dump all data whose ‘p’ property = UUID.
  • Time Range in UTC format, which can be open ended. If the start time is missing, dump data since the beginning of the queue in the IoTHub/EventHub. If the end time is missing, dump data to the end of the queue.
  • Options to dump data (1) just from IoTHub, (2) just from Device Tree EventHub, (3) just from trends EventHub, (4) from all of them, etc.
  • Parameters to to control the output data file size/lines or partitioning the output files by date
The dumped data files shall be in CSV format, and can be opened in Excel. The tool doesn't need to do anything specific if the data format in a record is invalid, just write the record to output file.
Design the layout to allow easy manipulation, filtering and searching in the Excel. Partition the output files to make sure that they are not overly too big to be opened in Excel. 

 
These columns will be included in the dump CSV files. You might want to add more columns if make sense.
  • IoThub/EventHub name: This information is in the connection string.
  • IoTHub/EventHub partitioned: e.g. 1 to 32 from which partition the data are dumped from.
  • EnqueuedTime: From the IoTHub/EventHub system properties.
  • EnqueuedTimeLocal
  • ConnectionDeviceId: From the IoTHub/EventHub system properties.
  • P Property: From the application properties.
  • A Property: From the application properties.
  • MessageSize: Calculated.
  • Payload: JSON Payload, with all spaces and newlines removed. Note: Make sure the JSON string can be loaded properly in the Excel as in a cell.
  • PayloadTimeStampStart: The first timestamp inside the payload. You do not need to sort the data. Ignore the ms portion.
  • PayloadTimeStampEnd: The last timestamp inside the payload. You do not need to sort the data. Ignore the ms portion.

The tool should be able to process hundreads of thousands of messages per day in production, and you should create a small script to test it with 100k messages.
 

Final Submission Guidelines

Submit the complete source code for the tool
Submit a deployment guide.
Verification guide.
Submit a short demonstration video (unlisted Youtube link)

ELIGIBLE EVENTS:

2018 Topcoder(R) Open

REVIEW STYLE:

Final Review:

Community Review Board

Approval:

User Sign-Off

SHARE:

ID: 30060807