Topcoder Challenge | Topcoder Community

Challenge Overview

Challenge Objectives

We had an ideation challenge and a POC which has some initial data analysis done along with multiple approaches documented. Based on the POC submissions, this challenge asks you to:

Perform detailed analysis work using Jupyter Notebook
Continue improving existing models or build a new model mainly focused on multivariate inputs
Train the model and predict the forecast values for the next five years based on the given training dataset.
Include extra, publicly-available data that helps strengthen your forecasts.

Project Background

Telecom providers sell products such as broadband and mobile phone contracts. These contracts consist of products of different types and capabilities, which are outlined below:

Mobile
Broadband

These products are sold in different markets, such as Consumer and Small-Medium Enterprises (SMEs).

For each of the products the customer would like to forecast the following:

Volume base (opening base) – the total number of subscribers / connections at a point in time
Gross adds – the number of new subscribers / connections during a period
Churn – the number of subscribers / connections which terminated service with the provider during that period
Net migrations – the number of subscribers / connections which remained with the provider but moved to another product
Average revenue per customer / connection / time period

Our model would eventually need to account for:

Different subscription lengths
Shocks introduced in the market, e.g. competitor price disruption and seasonal releases of new product lines (e.g. Samsung in Spring, Apple in the Fall), changes in market or technology-related regulation etc.

An ideal, complete vision for this application would allow:

Forecasts that are within 20% of eventual actual results for the following year. 20% is not a requirement for this challenge, but it is a target, and solutions that perform best will receive higher ratings
“What-if” scenario building by users (future scope). See the “Re-forecasting” visual design provided in the forums where users will be able to adjust important features and see how this affects the forecast. This is to give you a vision for the application and where we are heading.
Ability to load new datasets in order to test them for relevance against the validated forecast (e.g. improves accuracy – future scope)

Technology Stack

Python 3.7.x

Individual Requirements

Data Analysis
We would like you to focus your analysis on the following products/brands:

Consumer

Sandesh 1 - Broadband
Sandesh 2 - mobile and Broadband

Enterprise

SME Sandesh 2 mobile
SME Sandesh 1 Broadband

You need to perform data analysis work on the given dataset and save the notebook which should be shared along with the submission. It should have the following items covered properly:

Feature importance and selection procedures, preferably using histograms
We’re specifically interested in measuring how strongly the data provided in this challenge, and any data you add on your own, impact the “forecastability” of gross-adds, net migrations, leavers, volume base and average revenue per customer. Which features account most for changes in values?
If other methods were tried before finalizing on an approach, you can keep this work also in the notebook for reference purposes
Code should be documented appropriately (within the code): Explanations are needed on how the different areas of the model work.

When plotting your forecasts, please use a scatter plot with line of best fit along with the variance:

To get you started, the customer has provided a set of features they would like you to analyse (please see the “Hypothesis” tab of the spreadsheet). This is just a starting point and submissions will receive a higher rating if other important features are discovered and analysed. Ultimately the customer is looking for a multivariate analysis to understand the most important features and how strongly they affect the forecast. Where you see a “1” in the table is a hypothesis that there is a correlation between those variables.

Important: Be very careful when using Imputations with the data.

Dataset
The dataset is provided in the forum which can be used for training and testing your model. You can split the data into training and testing data sets.

Please note that we have removed the current year’s values for final validation of your (and future) model(s)

Please also note that the model is expected to produce monthly, quarterly and annual forecasts. By default, data provided is monthly. When the input data is provided only quarterly or annually, it has been logged in the last month of the quarter / year and the other months have been left blank.

You are permitted to source your own data sets and models to improve the accuracy of your forecasts. The data sets can be of the same frequency intervals as the data set in the forum or at other frequency intervals (e.g. daily).

You must request approval in the forum before using externally sourced data sets. Data sets sourced externally must come from a credible source. Your test results must quantify the relative improvement in the forecast provided by your external data set(s).

Prediction Format
You must submit a CSV file for each of the forecasts below, with the same set of product categories in the training dataset with monthly columns. A template is provided in the forum.

Forecast Evaluation
The evaluation of models will be done using k-fold evaluation (k=2, k=3). A scoring script along with a naive model is provided in the forum which can be used for reference and optimization of your model. You have to absolutely make sure that your prediction format is aligned with the scoring script so that it won’t fail the scoring.

The error metrics that will be considered are RMSE, MAE, MAPE and MASE for evaluation and multiple models will be evaluated and ranked based on these error metrics.

Deployment Guide

Make sure you provide a README.md that covers how to run the script in any environment.

Final Submission Guidelines

Data analysis code notebook
Source code
Documentation:

Your submission should include a text, .doc, PPT or PDF document that includes the following sections and descriptions:

Overview: describe your approach in “layman's terms”
Methods: describe what you did to come up with this approach, eg literature search, experimental testing, etc. If you augmented any of the ideas provided as input, describe your innovations.
Materials: did your approach use a specific technology beyond Jupyter? Any libraries? List all tools and libraries you used
Discussion: Include your analysis in this section. Explain what you attempted, considered or reviewed that worked, and especially those that didn’t work or that you rejected. For any that didn’t work, or were rejected, briefly include your explanation for the reasons (e.g. such-and-such needs more data than we have). If you are pointing to somebody else’s work (e.g. you’re citing a well-known implementation or literature), describe in detail how that work relates to this work, and what would have to be modified
Data: What other data should one consider? Is it in the public domain? Is it derived? Is it necessary in order to achieve the aims? Also, what about the data described/provided - is it enough?
Assumptions and Risks: what are the main risks of this approach, and what are the assumptions you/the model is/are making? What are the pitfalls of the data set and approach?
Results: Did you implement your approach? How’d it perform? If you’re not providing an implementation, use this section to explain the EXPECTED results.
Other: Discuss any other issues or attributes that don’t fit neatly above that you’d also like to include

Sandesh - Forecasting Model Challenge - Proof of Concept 2

Challenge Overview

Challenge Objectives

Project Background

Technology Stack

Individual Requirements

Deployment Guide

Final Submission Guidelines

Learn

Review style

Final Review

Approval

Challenge links

Toolbox

ID: 30093618