Challenge Overview
Overview
Create an application that will process the output from the Web Traffic Scan Challenge and identify REST-based API endpoints. The goal of this application is to separate standard web traffic from REST-based API web traffic, discard the standard traffic, then create a summarized report of the REST-based traffic. This application consumes input in the form of CSV and creates output in plain text.
The algorithm used to determine REST-based API traffic is to be determined by the challenge submitter.
Interface
The application should be command line based. It should take a single CSV file as input (generated using the provided solution from the previous challenge) and create a single TXT file as output. The output file should be UTF-8 and use Windows line endings.
The challenge submitter is responsible for creating their own CSV file for development and testing. This file should have a mix of normal web traffic and REST-based API traffic.
The CSV file will have a header row and contain the following fields:
- Client IP
- Server IP
- Authorization
- Request Content-Type
- Request Content-Length
- Request Host
- Request Date
- User-Agent
- Content-Encoding
- Response Content-Type
- Response Content-Length
- Response Date
- Server
- Status
- Server Port
- HTTP Method
- HTTP Path
- Request Body (truncated - up to 24 characters)
- Response Body (truncated - up to 24 characters)
- Encrypted (true if the TCP connection was encrypted, false if otherwise)
- Certificate common name (for encrypted connections)
- Certificate organization (for encrypted connections)
- Certificate issuer (for encrypted connections)
- Certificate Expiration Date (for encrypted connections)
The TXT output file should have the following format:
Host (or Server IP address for those any API endpoints for which a Host header was not specified)
Host / Server IP 1 (use Server IP if no hostname can be determined)
Certificate common name (for encrypted connections)
Certificate organization (for encrypted connections)
Certificate issuer (for encrypted connections)
Certificate Expiration Date (for encrypted connections)
HTTP Path and Port #1
Encrypted (true or false)
Response Content-Type (list of MIME types used at this API endpoint)
Request Content-Types: (if applicable like in POST and PUTs)
HTTP Methods used: (list of HTTP methods used at this API endpoint)
Authorization Method: (Basic, Bearer, No Authentication, or Other)
HTTP Path and Port #2
Encrypted (true or false)
Response Content-Type (list of MIME types used at this API endpoint)
Request Content-Types: (if applicable like in POST and PUTs)
HTTP Methods used: (list of HTTP methods used at this API endpoint)
Authorization Method: (Basic, Bearer, No Authentication, or Other)
Host / Server IP 2
...
HTTP Path and Port #1
...
HTTP Path and Port #2
...
...
HTTP Path and Port #1
...
HTTP Path and Port #2
...
The information in the output file should be grouped by Host / Server IP and certificate information. For example, if there are 100 API endpoints found at google.com, using certificates with the same name, organization, issuer, and expiration date, the application should list only one google.com host. However, if other hostnames are used (like m.google.com) or if additional certificates are found in use by the same host, you should start a new Host / Server IP grouping.
The application should also remove redundant information from the output. For example, if a given API endpoint is called 100 times, for the same host, with the same certificate information, then it should only be listed in the output one time.
System Requirements
The application must run on Windows. Mac OS X support is optional.
Documentation
Include a README file with installation and configuration instructionsFinal Submission Guidelines
Deliverables
- All source code to implement the requirements.
- A sample CSV input file. Please ensure no sensitive information is in this file.
- A sample TXT output file. Please ensure no sensitive information is in this file.
- README file containing installation and configuration documentation.
- Verification document containing steps to verify your solution.