Register
Submit a solution
The challenge is finished.

Challenge Overview

Challenge Objective

The objective of this challenge is to improve on the accuracy of a strong SARIMAX based forecast model generated in a previous challenge. The SARIMAX model is provided as a foundation for the challenge.

  • The accuracy of the winning model must at least maintain, and ideally improve on, the MAPE error on the privatised data set compared to the foundation model.

  • The winning submission must model the impact of Price increase and, as a result, include the ability to predict Revenue based on the introduction of a subsequent price increase.

  • The winning model must also improve on the Absolute Percentage Error (APE) of the foundation model when calculated from Sep ‘18 to Sep ‘19.

Challenge Details

The three products are broadband products: 

  • Tortoise (legacy product, declining product, available everywhere)
  • Rabbit (biggest product, reaching maturity, available in most of the country)
  • Cheetah (best and most expensive product, new and growing rapidly but only available in limited geographies) 

Revenue is the revenue generated by all customers on the Customer base in the full month in question.  Total Revenue for this product is therefore a function of the ARPU and the volume of customers.

The relationship between key financial variables

  • Volume Closing Base for a Product = Volume Opening Base for that Product + Gross Adds – Leavers + Net Migrations to that Product

  • Volume Net Adds = Volume Closing Base – Volume Opening Base

  • Revenue = Average Volume of Customers over the period * Average Revenue per existing Customer over the same period.

 

Revenue performance demonstrates time series features ...

This chart of Revenue  for the three products reflects the relative performance of the three products as shown here - x and y axis intersect at £0 revenue.

Revenue Trend reflects the product’s life cycle …

Both Tortoise and Rabbit Revenues demonstrate a gradual but steady underlying trend in Revenue over the last 2 years. This is driven by the gradual decline in Average Revenue per User (ARPU) over this period, balanced by the gradual change in Customer Base (Closing Base) size (decline for Tortoise,  and growth for Rabbit / Cheetah) - please see APRU and Closing Base graphs below. 

Price Increase impacting as a ‘Level Shift Outlier’ in the forecasting of Revenue.

In Jan ‘18 and Sep ‘18, a price increase was imposed on the existing customer base. This price increase, documented in the privatised data set as a single value in month of impact, introduced a ‘level shift’ situation into all three Revenue trends for the three products.  The financial impact of this level shift has then subsequently diminished over time driven by the gradual underlying price erosion in the Market.

This Price Increase would appear to impact over a couple of months, assumed to be due to billing processes employed by the brand.  Isolating this ‘level shift outlier’ from the underlying trend will result in a smoother Revenue trend and an improved forecast overall.

 

Average Revenue per User (ARPU) shows an underlying decline trend in value for all three products.  Note: x - y axis intercept is at £0 per month ARPU.

 

Closing Base for all three products shows the relative growth (Cheetah and Rabbit) and decline (Tortoise) of the products, and their relative popularity.  Cheetah is growing rapidly from a very small base. Note: x- y axis intercepts at 0 customers. 

One key objective of this challenge is to build this price increase into the forecast model as a secondary variable as described in the challenge objectives. The next price increase is planned for March 2020 and magnitude will be determined by CPI in January 2020.

 

As context, the following price increases have been applied to the customer base since 2016, and are reflected in the variable ‘Price Increase - Product name’.  



Financial year modeling 

Financial year for Sandesh is April to March (instead of January to December), hence Q1 is April, May and June.

Challenge structure 

Anonymised and Privatised data set:

‘Z-score’ is used to Privatised the real data. 

For all the variables, following is the formula used to privatise the data:

            zi = (xi – μ) / σ

where zi = z-score of the ith value for the given variable

            xi  = actual value

            μ = mean of the given variable

            σ = standard deviation for the given variable

Modeling Insight derived from previous challenges. 

A  SARIMAX (univariate model) have proven to be the best algorithm for predicting the target variables in this data set.  This model is included as the foundation for refinement.

The MAPE error rates of this foundation model on the privatised data set provided have been calculated and have been used to set the threshold targets for this challenge.  The winning model will need to maintain or improve on this accuracy while modeling the price increase impact. The predictions need to reflect the smooth, gradually drift in Revenue.

 


Final Submission Guidelines

Submission Format

You submission must include the following items

  • The filled test data. We will evaluate the results quantitatively (See below)

  • A report about your model, including data analysis, model details, local cross validation results, and variable importance. 

  • A deployment instructions about how to install required libs and how to run.

Expected in Submission

  1. Working Python code which works on the different sets of data in the same format

  2. Report with clear explanation of all the steps taken to solve the challenge (refer section “Challenge Details”) and on how to run the code

  3. No hardcoding (e.g., column names, possible values of each column, ...) in the code is allowed. We will run the code on some different datasets

  4. All models in one code with clear inline comments 

  5. Flexibility to extend the code to forecast for additional months

Quantitative Scoring

Given two values, one ground truth value (gt) and one predicted value (pred), we define the relative percentage error as:

 

    APE(gt, pred) = |gt - pred| / gt

 

We will average APEs over months in the test set, and then obtain MAPE. This MAPE will be the major optimization objective. The smaller, the better.

 

In addition to the overall performance measured by MAPE, we will also dive deep into each month and check the effectiveness on a monthly basis. We will take the maximum APE (MaxAPE) over months in the test set. This serves as a secondary optimization objective. Again, the smaller, the better. It is to some extent similar to variance, considering that we have 6 months in the test set.

 

Roughly, MAPE takes a role of 70% and MaxAPE has the remaining 30%.

Judging Criteria

Your solution will be evaluated in a hybrid of quantitative and qualitative way. 

  • Effectiveness (80%)

    • We will evaluate your forecasts by comparing it to the ground truth data. Please check the “Quantitative Scoring” section for details.

    • The smaller MAPE & MaxAPE, the better. 

    • The model must achieve better performance than the provided baseline models including improving the coefficient of variation.

  • Clarity (10%)

    • The model is clearly described, with reasonable justifications about the choice.

  • Reproducibility (10%)

    • The results must be reproducible. We understand that there might be some randomness for ML models, but please try your best to keep the results the same or at least similar across different runs.

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30111357