Talaria CFO Forecasting: Voice (Consumer and Business) and Business Networks - Refinement challenge (inc COVID-19 Impact)

Register
Submit a solution
The challenge is finished.

Challenge Overview

Prize

1st: $3500    

2nd: $2000

3rd: $1500

4th: $750

5th: $500

6th: $250

 

Challenge Overview

Over the last few months, a series of challenges have been run to generate a series of high quality financial forecasts for a telecommunications brand, including both consumer and business brands. These challenges have been known as ‘CFO Forecasting’ in various formats and as a result, high quality, highly accurate algorithmic forecasts have been produced for a number of financial target variables.

 

As we all know that COVID-19 has changed the world significantly. Different variables have been affected in different ways. Some variables don’t appear to have been impacted, some are slightly disrupted, and the others are strongly affected. 

 

We have categorized these variables into 3 types based on the level of Covid-19 disruption:

  1. No Disruption – No obvious impact due to Covid-19.  

  2. Minor Disruption – Some disruption due to Covid-19, however the trajectory / trend has recovered to pre Covid-19 levels.

  3. Major Disruption – Disrupted due to Covid-19, however the trajectory / trend has not recovered and a lasting impact is expected.

 

Please refer to this blog on ‘Forecasting Time Series in COVID-19 Days’ for more details: 

https://blogs.sap.com/2020/05/26/forecasting-time-series-in-covid-19-days/ 

 

In this challenge, we will focus on variables with minor disruptions.

 

The community is asked to look at three categories of product and extend the provided data for each variable from September 2020 to March 2022 (19 months).

 

The Three categories are:

  1. Consumer Voice (Segment 1)

    1. Sandesh Brand 1

    2. Sandesh Brand 3

  2. Business Voice (Segment 2) 

    1. Sandesh Brand 4

    2. Sandesh Brand 5

    3. Sandesh Brand 4 and 5

  3. Business Networks

    1. Sandesh Brand 2

 

Task Details

We will present a set of variables and include:

  1. Their privatized values until Aug 2020 and 

  2. Related models.

 

The task for each variable will be to review the historic data provided and create a 19th month forecast starting from September 2020 and ending in March 2022.

 

As we all know, Covid-19 has caused major disruptions to businesses of all types, with this in mind, the community is asked to consider any Covid-19 impact. In the UK the disruption started in March 2020 and after a bumpy few months seems to be recovering from July 2020. The community is requested to consider the impact during and the recovery after the Covid-19 period and create a creditable forecast for the period requested (19 months starting from September 2020).

 

Variables to be modelled:

 

We will also share the baseline model for each variable. Some necessary documentation will be provided for you to run these models. You are required to use these baseline models to make predictions. Minor modifications are acceptable, but you cannot change the model fundamentally. 

 

The dataset and baseline models will be shared in the forum. Note that, in this challenge, there is no threshold or target. 

Business Context

A description for the disguised products is as follows:

 

Financial Year modeling

Sandesh reports its financial year from April - March.  This may contribute to seasonality based on financial year, and quarters (Jun, Sep, Dec, and Mar), rather than calendar year.

 

Anonymized and Privatized data set

 

‘Z-score’ is used to privatize the real data.

 

For all the variables, following is the formula used to privatize the data:

            zi = (xi – μ) / σ

 

Where zi = z-score of the ith value for the given variable

            xi = actual value

            μ = mean of the given variable

            σ = standard deviation for the given variable

Quantitative Scoring

Quantitative Scoring will consist of two parts:

 

MAPE on Prediction Window

 

The MAPE (Mean Absolute Percentage Error) of the predictions on the privatised data set should be provided over a period of 5 months from April 2020 to August 2020. 

 

Given two values, one ground truth value (gt) and one predicted value (pred), we define the relative error as:

 

    MAPE(gt, pred) = |gt - pred| / |gt|

 

Stress testing/Robustness testing methodology:

 

Once model building is done, robustness of the model is to be calculated. For this we need to do evaluation on a rolling forecasting from origin. See the below image for understanding. Forecast window length is 18 months.

 

 
  1. Every horizontal line represents one iteration. 

  2. Blue windows are the training period and Orange windows are the forecasting period.

  3. One separate directory to be created with the name “robust” to store the forecast for all the iterations.

  4. Forecast to be saved in a .csv file with the name submission affixed with the iteration number.

  5. Separate python function/module to be created, which will call the model to generate the forecast for different periods as explained in the above image. This is required for code modularity. 

  6. The function/module for the robustness, should have below input parameters.

  1.  Start date of the training window of the iteration 1. (Start date of the series)

  2. End Date of the training window of the iteration 1. (August 2020)

  3. Forecast period i.e. number of months to forecasts. (18 months)

  4. Number of iteration. (12 iterations to be done)

  5. For subsequent iteration Train/Forecast start and end month should be automatically calculated based on the input given in step A and B as shown in the above image.

  1. While running this module the model parameter will be fixed. For an example if it’s an ARIMA model then p,d,q values should be the same throughout all the iteration. If it’s a LSTM then hyper param such as epochs, number of LSTM units, look back/look forward etc should be the same as your final model built for date range mentioned in point 8. All the iterations should run on the same parameter configuration. 


Note: For rolling window accuracy refer https://otexts.com/fpp2/accuracy.html

Final Submission Guidelines

Submission

Your submission should include 

  • A codebase. It should be able to take an input spreadsheet as well as the start time period of the COVID (or other events that potentially have impact), and then output the forecast for 19 months (starting from September 2020).

  • Plots of your 19-month predictions from Sept 2020. We will judge these plots visually and see if they make sense or not. When plotting the curves, please include the data points from Aug 2017 to Aug 2020 too.

  • Stress testing results. The predicted files for all stress testing periods. The robustness score should be reported in your whitepaper too.

  • A whitepaper. It is a text, .doc, PPT or PDF document that includes the following sections and descriptions:

    • Overview: describe your approach in “layman's terms”

    • Methods: describe what you did to come up with this approach, e.g. literature search, experimental testing, etc.

    • Materials: did your approach use a specific technology?  Any libraries?  List all tools and libraries you used

    • Discussion: Explain what you attempted, considered or reviewed that worked, and especially those that didn’t work or that you rejected.  For any that didn’t work, or were rejected, briefly include your explanation for the reasons (e.g. such-and-such needs more data than we have).  If you are pointing to somebody else’s work (e.g. you’re citing a well-known implementation or literature), describe in detail how that work relates to this work, and what would have to be modified.

    • Data:  What other data should one consider?  Is it in the public domain?  Is it derived?  Is it necessary in order to achieve the aims?  Also, what about the data described/provided - is it enough?

    • Assumptions and Risks: what are the main risks of this approach, and what are the assumptions you/the model is/are making?  What are the pitfalls of the data set and approach?

    • Results: Did you implement your approach?  How’d it perform?  If you’re not providing an implementation, use this section to explain the EXPECTED results.

    • Other: Discuss any other issues or attributes that don’t fit neatly above that you’d also like to include.

Judging Criteria

You will be judged on the quality of your ideas, the quality of your description of the ideas and its effective demonstration on sample data along with success measures. The winner will be chosen by the most logical and convincing reasoning as to how and why the idea presented will meet the objective. Note that, this contest will be judged subjectively by the client and Topcoder. However, the judging criteria will largely be the basis for the judgement.

 
  1. Effectiveness (40%)

    1. Is your algorithm effective?

      1. A stress test (i.e., robustness evaluation) is required. Please report the average MAPE results of the stress test based on the given data. This will serve as a reference during our evaluation. 

      2. We will also evaluate your plots visually and see if it makes sense or not.

    2. Is there any interesting insights from the data analysis?

  2. Feasibility (40%)

    1. Is your algorithm efficient, scalable to large volumes of data? 

    2. Is your implementation general enough? There should be no hardcoded hyper-parameters for your model - they should be all dataset-adaptive.

  3. Clarity (20%)

    1. Please make sure your report is easy to read.

    2. Figures, charts, and tables are welcome.

Submission Guideline

Only your last submission will be evaluated. We strongly recommend you to include your best solution and detail as much as possible in a single submission.

 

ELIGIBLE EVENTS:

2021 Topcoder(R) Open

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30143658