Register
Submit a solution
The challenge is finished.

Challenge Overview

Challenge Objectives

We had an ideation challenge and a POC which has some initial data analysis done along with multiple approaches documented. Based on the POC submissions, this challenge asks you to:
  • Perform detailed analysis work using Jupyter Notebook
  • Continue improving existing models or build a new model mainly focused on multivariate inputs
  • Train the model and predict the forecast values for the next five years based on the given training dataset.
  • Include extra, publicly-available data that helps strengthen your forecasts.

Project Background

Telecom providers sell products such as broadband and mobile phone contracts. These contracts consist of products of different types and capabilities, which are outlined below:
  • Mobile
  • Broadband
These products are sold in different markets, such as Consumer and Small-Medium Enterprises (SMEs).

For each of the products the customer would like to forecast the following:
  • Volume base (opening base) – the total number of subscribers / connections at a point in time
  • Gross adds – the number of new subscribers / connections during a period
  • Churn – the number of subscribers / connections which terminated service with the provider during that period
  • Net migrations – the number of subscribers / connections which remained with the provider but moved to another product
  • Average revenue per customer / connection / time period
Our model would eventually need to account for:
  • Different subscription lengths
  • Shocks introduced in the market, e.g. competitor price disruption and seasonal releases of new product lines (e.g. Samsung in Spring, Apple in the Fall), changes in market or technology-related regulation etc.
An ideal, complete vision for this application would allow:
  • Forecasts that are within 20% of eventual actual results for the following year. 20% is not a requirement for this challenge, but it is a target, and solutions that perform best will receive higher ratings
  • “What-if” scenario building by users (future scope). See the “Re-forecasting” visual design provided in the forums where users will be able to adjust important features and see how this affects the forecast. This is to give you a vision for the application and where we are heading.
  • Ability to load new datasets in order to test them for relevance against the validated forecast (e.g. improves accuracy – future scope)

Technology Stack

  • Python 3.7.x

Individual Requirements

Data Analysis
We would like you to focus your analysis on the following products/brands:

Consumer
  • Sandesh 1 -  Broadband
  • Sandesh 2 - mobile and Broadband

Enterprise
  • SME Sandesh 2 mobile
  • SME Sandesh 1 Broadband
You need to perform data analysis work on the given dataset and save the notebook which should be shared along with the submission. It should have the following items covered properly:
  • Feature importance and selection procedures, preferably using histograms
  • We’re specifically interested in measuring how strongly the data provided in this challenge, and any data you add on your own, impact the “forecastability” of gross-adds, net migrations, leavers, volume base and average revenue per customer.  Which features account most for changes in values?
  • If other methods were tried before finalizing on an approach, you can keep this work also in the notebook for reference purposes
  • Code should be documented appropriately (within the code): Explanations are needed on how the different areas of the model work.
When plotting your forecasts, please use a scatter plot with line of best fit along with the variance:

To get you started, the customer has provided a set of features they would like you to analyse (please see the “Hypothesis” tab of the spreadsheet). This is just a starting point and submissions will receive a higher rating if other important features are discovered and analysed. Ultimately the customer is looking for a multivariate analysis to understand the most important features and how strongly they affect the forecast. Where you see a “1” in the table is a hypothesis that there is a correlation between those variables.

Important: Be very careful when using Imputations with the data.

Dataset
The dataset is provided in the forum which can be used for training and testing your model. You can split the data into training and testing data sets. 

Please note that we have removed the current year’s values for final validation of your (and future) model(s) 

Please also note that the model is expected to produce monthly, quarterly and annual forecasts. By default, data provided is monthly. When the input data is provided only quarterly or annually, it has been logged in the last month of the quarter / year and the other months have been left blank.

You are permitted to source your own data sets and models to improve the accuracy of your forecasts. The data sets can be of the same frequency intervals as the data set in the forum or at other frequency intervals (e.g. daily).

You must request approval in the forum before using externally sourced data sets. Data sets sourced externally must come from a credible source. Your test results must quantify the relative improvement in the forecast provided by your external data set(s).

Prediction Format
You must submit a CSV file for each of the forecasts below, with the same set of product categories in the training dataset with monthly columns. A template is provided in the forum.

Forecast Evaluation
The evaluation of models will be done using k-fold evaluation (k=2, k=3). A scoring script along with a naive model is provided in the forum which can be used for reference and optimization of your model. You have to absolutely make sure that your prediction format is aligned with the scoring script so that it won’t fail the scoring. 

The error metrics that will be considered are RMSE, MAE, MAPE and MASE for evaluation and multiple models will be evaluated and ranked based on these error metrics.

Deployment Guide

Make sure you provide a README.md that covers how to run the script in any environment.

Final Submission Guidelines

  • Data analysis code notebook
  • Source code
  • Documentation:
Your submission should include a text, .doc, PPT or PDF document that includes the following sections and descriptions:
  • Overview: describe your approach in “layman's terms”
  • Methods: describe what you did to come up with this approach, eg literature search, experimental testing, etc.  If you augmented any of the ideas provided as input, describe your innovations.
  • Materials: did your approach use a specific technology beyond Jupyter?  Any libraries?  List all tools and libraries you used
  • Discussion: Include your analysis in this section.  Explain what you attempted, considered or reviewed that worked, and especially those that didn’t work or that you rejected.  For any that didn’t work, or were rejected, briefly include your explanation for the reasons (e.g. such-and-such needs more data than we have).  If you are pointing to somebody else’s work (e.g. you’re citing a well-known implementation or literature), describe in detail how that work relates to this work, and what would have to be modified
  • Data:  What other data should one consider?  Is it in the public domain?  Is it derived?  Is it necessary in order to achieve the aims?  Also, what about the data described/provided - is it enough?
  • Assumptions and Risks: what are the main risks of this approach, and what are the assumptions you/the model is/are making?  What are the pitfalls of the data set and approach?
  • Results: Did you implement your approach?  How’d it perform?  If you’re not providing an implementation, use this section to explain the EXPECTED results.
  • Other: Discuss any other issues or attributes that don’t fit neatly above that you’d also like to include

ELIGIBLE EVENTS:

Topcoder Open 2019

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30093618