Challenge Overview

Problem Statement

Prize Distribution

    1st place - $2,500
    2nd place - $2,000
    3rd place - $1,500
    4th place - $1,000

Background

The client is looking to run a contest in order to better understand the effect on market prices of traded securities based on trading volume data. Contestants will use supplied traded data to create an algorithm that will attempt to predict swap prices.

The 2010 Dodd��Frank Wall Street Reform and Consumer Protection Act (the Dodd-Frank Act) created new entities called swap data repositories (SDRs) ��in order to provide a central facility for swap data reporting and collecting. Under the Dodd- Frank Act, all swaps, whether cleared or uncleared, are required to be reported to registered SDRs.�� As of January 2013, all registered swap dealers active in credit and interest rate trading send trade data to the public swap repository. Depending on their size and type (e.g., block trades), swap transactions must be reported within 5 to 15 minutes of execution. These developments have increased the availability of swap trade data. An extract of this data for a specified time period is supplied for this challenge.

Objective

Supply and demand in the swap market affect swap prices. Swap prices are also influenced by tenor. Tenor is the maturity of the swap measured in full-years such as 2, 3, 5, 7, 10, and 30. We are interested in using the volume of vanilla US$ / Libor spot start swap transactions of the full-year maturities to predict the prices of those same instruments over relatively short time intervals.

The scoring will focus on the tenors of full years in PriceData. In SwapData, you may receive some irregular tenors such as *Y*M.

Data Description

Price Data:

Timestamp �� The time stamp when the mid price is recorded.
Tenor �� The trade instrument.
ABC mid �� Mid price for trades from source ABC.
DEF mid �� Mid price for trades from source DEF.

Public Swap Repository Data:

Time Stamp: The time when the trade happened.
Price: The traded price at which level the transaction happens.
Size: The size of this trade.
Tenor: The trade instrument.
Trade Direction: Whether someone buys or sells.
ABC/DEF: Trades on ABC or DEF.

Example data is provided in Jan and Feb 2016 (i.e., 2 months). Another 2 months will be used for provisional tests and system tests respectively.

Implementation

The evaluation will be a streaming mode. That is, predictions are made when you receiving some new data and your prediction will be compared to mid prices in a short period (e.g., 5 to 10 minutes) after the latest data your have. The data will be sent strictly in the chronological order.

Your task is to implement two methods: update and predict, whose signatures are detailed in the Definition section below. Both methods will be called several times.

In update, you will receive some new data with timestamps. More specifically, you will receive two lists of comma separated strings (quotes enclosed). The columns are in the same order as data description.

In predict, you should return a list of predictions in the same order of the received test data. The test data has the similar format as Price Data, but there are no ��ABC mid�� and ��DEF mid��. Every prediction forms a string containing two values separated by a comma: the predicted ABC mid and DEF mid of the specified tenor at the specified time. For example, ��1.002,2.000�� (without quotes) could be a prediction.

Scoring

Submissions will be scored by running the solution against different data from different time periods. Before the first call of predict, at least 2 hours of data will be given to make sure you have a reasonable volume of data to build up your model.

The generation of test case is as follows:

Randomly select 2~3 consecutive days from the given time period. For instance, in example test, we will select 2~3 days from Jan and Feb 2016. Two days are consecutive if there is only holidays and weekends in between.
Use the first 2 hours�� data for the first update.
Call predict for next random 5 ~ 10 minutes.
Call update using the data in next random 10~30 minutes. We will keep add additional 10~30 minutes until there is certain amount of data.
Go to 3 if there is data left. Otherwise, this test case ends.

In every test case, the raw error is calculated as

    rawErr = 0
    for i = 1 to N do
        rawErr += (ABCTruth[i] - ABCPred[i])^2 + (DEFTruth[i] - DEFPred[i])^2

where, N is the total number of predictions.

As a naive solution, we will use the average price of all seen data of the same tenor as the baseline. For example, to predict the ABC mid for 3Y, all 3Y ABC mid��s have been seen until now will be used to calculate an average as the ABCpred. If there is no such data ever seen before, we will predict it as 0. The raw error computed based on this method serves as our baseErr.

The raw score will be

    raw score = max(0, 1 - rawErr / baseErr)

The final score of each test case will be the raw score multiplied by 1000000.0. And the score showing on the standing will be the average score of different test cases.

Requirements to Win A Prize

In order to receive a prize, you must do all the following:

Achieve a score in the top 4, according to system test results. See the "Scoring" section below.
Create a legitimate algorithm that runs successfully on a different data set with the same fields. Hard-coded solutions are unacceptable.
Within 7 days from the end of the challenge, submit a complete report at least 2 pages long outlining your final algorithm, explaining the logic behind and steps to its approach. The required content and format appear in the "Report" section below.
Within 7 days of the end of the challenge, submit all code used in your final algorithm in 1 appropriately named file (or tar or zip archive). We will contact the winners via email and ask for the file. The naming convention should be memberHandle-ContestName. For example, handle "johndoe" would name his submission "johndoe-ContestName."

Report

Your report must be at least 2 pages long, contain at least the following sections, and use the section and bullet names below.

Your Information

This section must contain at least the following:

First name
Last name
Topcoder handle
Email address

Approach Used

Please describe your algorithm so that we know what you did even before seeing your code. Use line references to refer to specific portions of your code.

This section must contain at least the following:

Approaches considered
Approach ultimately chosen
Steps to approach ultimately chosen, including references to specific lines of your code
Open source resources and tools used, including URLs and references to specific lines of your code
Advantages and disadvantages of the approach chosen
Comments on libraries
Comments on open source resources used
Special guidance given to algorithm based on training
Potential improvements to your algorithm

If you place in the top 4 but fail to do any of the above, then you will not receive a prize, and it will be awarded to the contestant with the next best performance who did all of the above.

Additional Information

Only data used in Example Test will be released, you can download it here.
In order to receive the prize money, you will need to fully document your code and explain your algorithm. If any parameters were obtained from the training data set, you will also need to provide the program used to generate these parameters. There is no restriction on the programming language used to generate these training parameters. Note that all this documentation should not be submitted anywhere during the coding phase. Instead, if you win a prize, a TopCoder representative will contact you directly in order to collect this data.
You may not use any external (outside of this competition) source of data to train your solution.

Definition

Class:	PricePredictor
Method:	update
Parameters:	String[], String[]
Returns:	int
Method signature:	int update(String[] priceData, String[] swapData)

Method:	predict
Parameters:	String[]
Returns:	String[]
Method signature:	String[] predict(String[] testData)
(be sure your methods are public)

Notes

- This match (is) rated.

- The allowed programming languages are Java, Python, C++, C# and VB.

- You can include open source code in your submission, provided it is free for you to use and would be for the client as well. Code from open source libraries under the BSD or MIT licenses will be accepted. Other open source licenses may be accepted too, just be sure to ask us.

- The usage of external data and pre-trained models are allowed, as long they meet the license requirements.

- The test servers have only the default installs of all languages, so no additional libraries will be available.

- Use the match forum to ask general questions or report problems, but please do not post comments and questions that reveal information about the problem itself, possible solution techniques or related to data analysis.

- You can train your solution offline based on the given files and you can hardcode data into your solution -- just remember that you can't use data from other sources than this contest.

- There are 2 test cases in example test; 10 test cases in provisional test; 30 test cases in system test.

- Time limit is 10 minutes per test and memory limit is 1024MB.

- There is no explicit code size limit. The implicit source code size limit is around 1 MB (it is not advisable to submit codes of size close to that or larger).

- The compilation time limit is 600 seconds. You can find information about compilers that we use and compilation options here.

Examples

"0"

Returns: "Seed: 0"

"1"

Returns: "Seed: 1"

This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2020, TopCoder, Inc. All rights reserved.

Price Predictor Mini MM - Price Predictor