Register
Submit a solution
The challenge is finished.

Challenge Overview

Challenge Objective

The objective of this challenge is to improve on the accuracy of a strong SARIMAX based forecast model generated in a previous challenge by introducing an awareness of a secondary variable to be described later.  The SARIMAX model is provided as a foundation for the challenge.

  • The accuracy of the winning model must at least maintain, and ideally improve on, the MAPE error on the privatised data set compared to the foundation model.

  • The winning submission must model the impact of Price increase and, as a result, include the ability to predict ARPU based on the introduction of a subsequent price increase.

  • The winning model must also improve on the Absolute Percentage Error (APE) of the foundation model when calculated from Sep ‘18 to Sep ‘19.

Challenge Details

The three products are broadband products: 

•Tortoise (legacy product, declining product, available everywhere)

•Rabbit (biggest product, reaching maturity, available in most of the country)

•Cheetah (best and most expensive product, new and growing rapidly but only available in limited geographies)

 

Average Revenue per User (ARPU) is the average revenue generated by a single customer in the full month in question.  Total Revenue for this product is therefore a function of the ARPU and the volume of customers.

 

The relationship between key financial variables

  • Volume Closing Base for a Product = Volume Opening Base for that Product + Gross Adds – Leavers + Net Migrations to that Product

  • Volume Net Adds = Volume Closing Base – Volume Opening Base

  • Revenue = Average Volume of Customers over the period * Average Revenue per existing Customer over the same period.

 

ARPU performance demonstrates time series features ...

This chart of ARPU for the three products reflects the relative performance of the three products as shown here - x and y axis intersect at £0 ARPU

 

ARPU Trend is downwards …

Both Tortoise and Rabbit ARPU demonstrate a gradual but steady underlying decline in Average Revenue per User of the 2 years. This is driven by the relative value of those customers acquired compared to those leaving every month - since, on average, the ARPU of customers leaving is higher than both the average customer and the acquired customer, the average ARPU gradually erodes over time.

 

Price Increase impacting as a ‘Level Shift Outlier’ in the forecasting of ARPU

In Jan ‘18 and Sep ‘18, a price increase was imposed on the existing customer base. This price increase, documented in the privatised data set as a single value in month of impact, introduced a ‘level shift’ situation into all three ARPU trends.  The financial impact of this level shift was then subsequently eroded by the gradual underlying fall in ARPU as described previously.

This Price Increase would appear to impact over a couple of months, assumed to be due to billing processes employed by the brand.  Starting a month before the date of the price increase and building over a couple of months beyond the date of impact.

Isolating this ‘level shift outlier’ from the underlying trend by removing the price increase gradually from the APRU gives a smoother underlying trend.

 

Price increase curve is a suggested ‘line of best fit’ to gradually introduce the impact of the price increase.  When this tempered price increase is removed, the underlying ARPU decline is revealed.

 

One key objective of this challenge is to build this price increase into the forecast model as a secondary variable as described in the challenge objectives. The next price increase is planned for March 2020 and magnitude will be determined by CPI in January 2020.

 

As context, the following price increases have been applied to the customer base since 2016, and are reflected in the variable ‘Price Increase - Product name’.  

 
 

3rd June 16

2nd April 17

7th January 18

16th September 18

31st March 2020

(CPI = 2%)

 

Price Change

Rabbit and Cheetah

+£ 2.5x

+£ 2.5x

+£ 2.5x

+£ 2.5x

+£ x

 

Tortoise

+£ x

+£ 2x

+£ 2x

+£ 2.5x

+£ 0.8x

 

Time between Price Change

 

10 months

9 months

8 months

18 months

 

Financial year modeling

 

Financial year for Sandesh is April to March (instead of January to December), hence Q1 is April, May and June.

 

Challenge structure

 

Anonymised and Privatised data set:

 

‘Z-score’ is used to Privatised the real data.

 

For all the variables, following is the formula used to privatise the data:

            zi = (xi – μ) / σ

 

where zi = z-score of the ith value for the given variable

            xi  = actual value

            μ = mean of the given variable

            σ = standard deviation for the given variable

 

Modeling Insight derived from previous challenges.

 

A  SARIMAX (univariate model) have proven to be the best algorithm for predicting the target variables in this data set.  This model is included as the foundation for refinement.

 

The MAPE error rates of this foundation model on the privatised data set provided have been calculated and have been used to set the threshold targets for this challenge.  The winning model will need to maintain or improve on this accuracy while modeling the price increase impact. The predictions need to reflect the smooth, gradually drift in ARPU.

 


Final Submission Guidelines

Submission Format

You submission must include the following items

  • The filled test data. We will evaluate the results quantitatively (See below)

  • A report about your model, including data analysis, model details, local cross validation results, and variable importance. 

  • A deployment instructions about how to install required libs and how to run.

Expected in Submission

  1. Working Python code which works on the different sets of data in the same format

  2. Report with clear explanation of all the steps taken to solve the challenge (refer section “Challenge Details”) and on how to run the code

  3. No hardcoding (e.g., column names, possible values of each column, ...) in the code is allowed. We will run the code on some different datasets

  4. All models in one code with clear inline comments 

  5. Flexibility to extend the code to forecast for additional months

Quantitative Scoring

Given two values, one ground truth value (gt) and one predicted value (pred), we define the relative percentage error as:

 

    APE(gt, pred) = |gt - pred| / gt

 

We will average APEs over months in the test set, and then obtain MAPE. This MAPE will be the major optimization objective. The smaller, the better.

 

In addition to the overall performance measured by MAPE, we will also dive deep into each month and check the effectiveness on a monthly basis. We will take the maximum APE (MaxAPE) over months in the test set. This serves as a secondary optimization objective. Again, the smaller, the better. It is to some extent similar to variance, considering that we have 6 months in the test set.

 

Roughly, MAPE takes a role of 70% and MaxAPE has the remaining 30%.

Judging Criteria

Your solution will be evaluated in a hybrid of quantitative and qualitative way. 

  • Effectiveness (80%)

    • We will evaluate your forecasts by comparing it to the ground truth data. Please check the “Quantitative Scoring” section for details.

    • The smaller MAPE & MaxAPE, the better. 

    • The model must achieve better performance than the provided baseline models including improving the coefficient of variation.

  • Clarity (10%)

    • The model is clearly described, with reasonable justifications about the choice.

  • Reproducibility (10%)

    • The results must be reproducible. We understand that there might be some randomness for ML models, but please try your best to keep the results the same or at least similar across different runs.



 

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30110931