Register
Submit a solution
The challenge is finished.

Challenge Overview

Challenge Objectives

We had an ideation challenge which has some initial data analysis done along with multiple approaches documented. Based on the ideation submission, this challenge asks you to:
  • Perform detailed analysis work using Jupyter Notebook
  • Build the model
  • Train the model and predict the forecast values for the next five years based on the given training dataset.
  • Include extra, publicly-available data that helps strengthen your forecasts.

Project Background

Telecom providers sell products such as broadband and mobile phone contracts. These contracts consist of products of different types and capabilities, which are outlined below:
  • Mobile
  • Broadband
These products are sold in different markets, such as Consumer and Small-Medium Enterprises (SMEs).

For each of the products the customer would like to forecast the following:
  • Volume base (opening base and closing base) – the total number of subscribers / connections at a point in time
  • Gross adds – the number of new subscribers / connections during a period
  • Leavers – the number of subscribers / connections which terminated service with the provider during that period
  • Net migrations – the number of subscribers / connections which remained with the provider but moved to another product
  • Average revenue per customer/connection
  • Revenue 
  • Re-grades/upsell
For the avoidance of doubt the closing volume base at the end of a period = opening base at the beginning of the period + Volume Gross Adds – Leavers + Net Migrations
Similarly Average revenue per customer = Revenue/ ((Opening Base+ Closing Base)/2)

Our model would eventually need to account for:
  • Different subscription lengths
  • Shocks introduced in the market, e.g. competitor price disruption and seasonal releases of new product lines (e.g. Samsung in Spring, Apple in the Fall), changes in market or technology-related regulation etc.
An ideal, complete vision for this application would allow:
  • Forecasts that are within 20% of eventual actual results for the following year. 20% is not a requirement for this challenge, but it is a target, and solutions that perform best will receive higher ratings
  • What-if scenario building by users (future scope)
  • Ability to load new datasets in order to test them for relevance against the validated forecast (e.g. improves accuracy – future scope)

Technology Stack

  • Python 3.7.x

Individual Requirements

Data Analysis
You need to perform data analysis work on the given dataset and save the notebook which should be shared along with the submission. It should have the following items covered properly:
  • Feature importance and selection procedures, preferably using histograms
  • We’re specifically interested in measuring how strongly the data provided in this challenge, and any data you add on your own, impact the “forecastability” of gross-adds, net migrations, leavers, volume base and average revenue per customer.  Which features account most for changes in values?
  • If other methods tried before finalizing on an approach, you can keep this work also in the notebook for reference purposes
  • Code should be documented appropriately (within the code): Explanations are needed on how the different areas of the model work.
Dataset
The dataset is provided in the forum which can be used for training and testing your model. You can split the data into training and testing data sets. 

Please note that we have removed the current year’s values for final validation of your (and future) model(s) 

Please also note that the model is expected to produce monthly, quarterly and annual forecasts. By default, data provided is monthly. When the input data is provided only quarterly or annually, it has been logged in the last month of the quarter / year and the other months have been left blank.

You are permitted to source your own data sets and models to improve the accuracy of your forecasts. The data sets can be of the same frequency intervals as the data set in the forum or at other frequency intervals (e.g. daily).

You must request approval in the forum before using externally sourced data sets. Data sets sourced externally must come from a credible source. Your test results must quantify the relative improvement in the forecast provided by your external data set(s).

Prediction Format
You must submit a CSV file for each of the forecasts below, with the same set of product categories in the training dataset with monthly columns. A template is provided in the forum. Create a new tab for each evaluation of your model.

Forecast Evaluation
To test the accuracy of your forecast, provide the Mean absolute error (MAE) and Root mean squared error (RMSE) values for the following test scenarios:
  1. Remove every other month
  2. Remove random months for 15% of the data set
  3. Repeat step (2) three times
Plot these values in separate graphs.  See the Submission Template section below: in the Results section, summarize your Forecast Evaluation and graphs and provide your opinion on the strength of the forecast and include the conditions under which it is strong or weak.

Regarding Final Review: We realize the dataset is limited for forecasting.  We are seeking to learn how much the factors we already have, and any additional data you bring, predict future performance.  Therefore the “explainability” of your model and strength of your written analysis and recommendations will be extremely important.

Deployment Guide

Make sure you provide a README.md that covers how to run the script in any environment.

Final Submission Guidelines

  • Data analysis code notebook
  • Source code
  • Documentation:
Your submission should include a text, .doc, PPT or PDF document that includes the following sections and descriptions:
  • Overview: describe your approach in “layman's terms”
  • Methods: describe what you did to come up with this approach, eg literature search, experimental testing, etc.  If you augmented any of the ideas provided as input, describe your innovations.
  • Materials: did your approach use a specific technology beyond Jupyter?  Any libraries?  List all tools and libraries you used
  • Discussion: Include your analysis in this section.  Explain what you attempted, considered or reviewed that worked, and especially those that didn’t work or that you rejected.  For any that didn’t work, or were rejected, briefly include your explanation for the reasons (e.g. such-and-such needs more data than we have).  If you are pointing to somebody else’s work (e.g. you’re citing a well-known implementation or literature), describe in detail how that work relates to this work, and what would have to be modified
  • Data:  What other data should one consider?  Is it in the public domain?  Is it derived?  Is it necessary in order to achieve the aims?  Also, what about the data described/provided - is it enough?
  • Assumptions and Risks: what are the main risks of this approach, and what are the assumptions you/the model is/are making?  What are the pitfalls of the data set and approach?
  • Results: Did you implement your approach?  How’d it perform?  If you’re not providing an implementation, use this section to explain the EXPECTED results.
  • Other: Discuss any other issues or attributes that don’t fit neatly above that you’d also like to include

ELIGIBLE EVENTS:

Topcoder Open 2019

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30091690