Challenge Overview

Overview

Create an application that will process the output from the Web Traffic Scan Challenge and identify REST-based API endpoints.  The goal of this application is to separate standard web traffic from REST-based API web traffic, discard the standard traffic, then create a summarized report of the REST-based traffic.  This application consumes input in the form of CSV and creates output in plain text.  

The algorithm used to determine REST-based API traffic is to be determined by the challenge submitter.  

Interface

The application should be command line based.  It should take a single CSV file as input (generated using the provided solution from the previous challenge) and create a single TXT file as output.  The output file should be UTF-8 and use Windows line endings.  

The challenge submitter is responsible for creating their own CSV file for development and testing.  This file should have a mix of normal web traffic and REST-based API traffic.  

The CSV file will have a header row and contain the following fields:
Client IP
Server IP
Authorization
Request Content-Type
Request Content-Length
Request Host
Request Date
User-Agent
Content-Encoding
Response Content-Type
Response Content-Length
Response Date
Server
Status
Server Port
HTTP Method
HTTP Path
Request Body (truncated - up to 24 characters)
Response Body (truncated - up to 24 characters)
Encrypted (true if the TCP connection was encrypted, false if otherwise)
Certificate common name (for encrypted connections)
Certificate organization (for encrypted connections)
- Certificate issuer (for encrypted connections)
Certificate Expiration Date (for encrypted connections)

The TXT output file should have the following format:

Host (or Server IP address for those any API endpoints for which a Host header was not specified)

Host / Server IP 1 (use Server IP if no hostname can be determined)
   
Certificate common name (for encrypted connections)
      
Certificate organization (for encrypted connections)
      
Certificate issuer (for encrypted connections)
      
Certificate Expiration Date (for encrypted connections)

      HTTP Path and Port #1
         
Encrypted (true or false)
         
Response Content-Type (list of MIME types used at this API endpoint)
         
Request Content-Types: (if applicable like in POST and PUTs)
         
HTTP Methods used: (list of HTTP methods used at this API endpoint)
         
Authorization Method: (Basic, Bearer, No Authentication, or Other)

      HTTP Path and Port #2
         
Encrypted (true or false)
         
Response Content-Type (list of MIME types used at this API endpoint) 
         
Request Content-Types: (if applicable like in POST and PUTs)
         
HTTP Methods used: (list of HTTP methods used at this API endpoint) 
         
Authorization Method: (Basic, Bearer, No Authentication, or Other)

Host / Server IP 2
    ...

    
HTTP Path and Port #1
        ...

    HTTP Path and Port #2
        ...

Host / Server IP N
    
...

    HTTP Path and Port #1
        ...

HTTP Path and Port #2
        ...

Grouping and redundancy.  

The information in the output file should be grouped by Host / Server IP and certificate information.  For example, if there are 100 API endpoints found at google.com, using certificates with the same name, organization, issuer, and expiration date, the application should list only one google.com host.  However, if other hostnames are used (like m.google.com) or if additional certificates are found in use by the same host, you should start a new Host / Server IP grouping.

The application should also remove redundant information from the output.  For example, if a given API endpoint is called 100 times, for the same host, with the same certificate information, then it should only be listed in the output one time.  

System Requirements

The application must run on Windows.  Mac OS X support is optional.

Documentation

Include a README file with installation and configuration instructions

Final Submission Guidelines

Deliverables
All source code to implement the requirements.
A sample CSV input file.  Please ensure no sensitive information is in this file.  
A sample TXT output file.  Please ensure no sensitive information is in this file.  
README file containing installation and configuration documentation.
Verification document containing steps to verify your solution.

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30054446