Challenge Overview
Background
Telecom providers sell products such as broadband and mobile phone contracts. These contracts consist of products of different types and capabilities. These products are sold in different markets. The focus of this challenge is Broadband products in the Consumer market.
For each of the products the customer would like to forecast the following:
-
Gross adds – the number of new subscribers / connections during a period
-
Leavers – the number of subscribers / connections which terminated service with the provider during that period
-
Net migrations – the number of subscribers / connections which remained with the provider but moved to another product
-
Average revenue per customer
Challenge Objectives
In this challenge, we want to predict the future Gross adds, Leavers, Net migrations and Average revenue per customer as accurate as possible. An ideal goal will be getting forecasts that are within 2% of eventual actual results for the following year. 2% is not a requirement for this challenge, but it is a target, and solutions that perform best will receive higher scores.
Note: Simply replicating the last available revenues or similar methods without a proper modeling of the given data will not be eligible for the prize.
We will use the historical data for training and testing. All data can be downloaded from the forum.
Training Data
The training data set covers from all data before 18/19_Q4_Mar. Each row described an item on a certain date as follows.
-
Generic Group
-
Generic Brand
-
Generic Product Category
-
Generic Product
-
Generic Variable
-
Generic Sub-Variable
-
Generic LookupKey
-
Units
-
Time Period (a month)
The items include metrics like revenue, volume base, gross ads, leavers, net migrations and Average revenue per customer (see Background section) for Broadband for the Consumer market and also broken down by the Product level.
The ground truth file has the same number of rows, but only has one column, i.e., the revenue. You can use this data set to train and test your algorithm locally.
Testing Data
The testing data set covers from a few months starting from 18/19_Q4_Mar till now. It has the same format as the training set, but there is no groundtruth provided.
You are asked to make predictions for the testing data. You will need to append the last column of “Value” into the testing data. The newly added column should be filled by your model’s predictions.
Final Submission Guidelines
Submission format
You submission must include the following items
-
The filled test data. We will evaluate the results quantitatively (See below)
-
A report about your model, including the data analysis, model details, local cross validation results, and feature importance.
-
A deployment instructions about how to install required libs and how to run.
Quantitative Scoring
Given two values, one groundtruth value (gt) and one predicted value (pred), we define the relative error as:
Relative Error(gt, pred) = |gt - pred| / gt
We then compute the raw_score(gt, pred) as
raw_score(gt, pred) = max{ 0, 1 - Relative Error(gt, pred) }
That is, if the relative error exceeds 100%, you will receive a zero score in this case.
The final score is computed based on the average of raw_score, and then multiplied by 100.
Final score = 100 * average( raw_score(gt, pred) )
We will use this as a part of evaluation.
Judging Criteria
Your solution will be evaluated in a hybrid of quantitative and qualitative way.
-
Effectiveness (50%)
-
We will evaluate your forecasts by comparing it to the groundtruth data. Please check the “Quantitative Scoring” section for details.
-
-
Clarity (20%)
-
The model is clearly described, with reasonable justifications about the choice.
-
-
Reproducibility (20%)
-
The results must be reproducible. We understand that there might be some randomness for ML models, but please try your best to keep the results the same or at least similar across different runs.
-
-
Completeness (10%)
-
The submission must be complete.
-
The winners should also provide the feature importance analysis for their models. Specifically, we are looking for 5 ~ 10 most important features to support a “what if” functionality. In the future applications, we will provide a slider for these features to the end user, so they can play with these feature values.
-