Challenge Overview
Background
Over the last few months, a series of challenges have been run to generate a series of high quality financial forecasts for a consumer broadband brand. These challenges have been known as ‘CFO Forecasting’ in various formats. As a result, a high quality, high accuracy series of algorithmic forecasts have been produced for 6 financial target variables.
Challenge Objective
The objective of this challenge is to integrate the models into our services layer. We’ll need to do two things here.
1. Create endpoints to “train” the models on a designated data set and time period.
2. Create a forecast output based on a set of input. The forecast should use the latest available model that has been trained. Generally, we’re working with 12-15 month forecast windows but this time period should be dynamic.
Business context
The three|four products are broadband products:
-
Tortoise (legacy product, declining product, available everywhere) - slow download speeds.
-
Rabbit (main product, reaching maturity, available in most of the country) - faster download speeds.
-
Falcon
-
Cheetah (premium product, available in most of the country) - fastest download speeds
These two products do have an inter dependency since Rabbit product is an upgrade of the earlier version, with the product growth of the later version dependent to a large extent on upgrading the customers from the earlier version. There is therefore a gradual move from Tortoise to Rabbitt.
The six variables are financial metrics
-
Gross adds – the number of new subscribers by product joining the brand during a month
-
Leavers – the number of subscribers by product who terminated service with the brand during that month
-
Net migrations – the number of subscribers who remained with the brand but moved to another product. This usually is an upgrade to faster broadband speed. (Model not provided for Net Migration yet).
These three ‘discontinuous’ variables are seasonal; vary significantly from month to month; and are almost entirely dependent on the market, and competitor pressures at that point in time.
-
Closing Base - the number of subscribers by product at the end of the month
-
Average Revenue per user - the average monthly revenue paid by a subscriber per month for the service
-
Revenue - the total revenue generated by the subscriber base per product per month.
These three ‘continuous’ variables have a significant monthly recurring component with only small monthly incremental (both positive or negative) change. They are therefore smooth and continuous with only gradual shifts in the value.
Requirement Details
1. The current Python services application supports a data import feature from a csv file to create a new “forecast” through a post to the /forecasts endpoint. Actually this “forecast” import processes both historical and forecast data that is manually prepared outside this system and loaded into the database. Please rename this endpoint to be called “Upload Existing Forecast”.
2. You should use the “forecast” endpoint to process uploaded historical data (there shouldn’t be future predictions in the new files provided -- only actuals) and execute the models to write forecast output to MongoDB.
3. The model code to be currently included in the Services application can be found here: https://docs.google.com/spreadsheets/d/1r2PEDTauyvPkoyfyckPuPnWX6t963koRmnDtBRr3lK4/edit?usp=sharing. As you can see there are 6 different models repositories. Some of the models work broadly across Variables and Products but other have been configured to specific Variable and Product combinations. Please parameterize and combine the code/models for rows 6 and 7 and also rows 8 and 9. You’ll see that the code between 6 and 7 is almost identical and this is the same between 8 - 9 as well. It’s fine to place the parameters in a configuration file. These won’t be changing that often. You should configure the relationships between models and the brand, product, variable combinations dynamically. Some of the current models will be applied to other brands/variable/product combinations and it is likely that new models will be developed for the current combinations as research continues and performance improves.
4. You’ll need to convert the current LSTM models from Jupyter Notebook to Python 3 form.
5. Please include a services endpoint to train a specific model (if required) and a general “train all models” endpoint. The newly trained model will override the previous one. There should only be one model assignment per brand/product type/variable combination. For now we won’t be versioning or archiving the previously trained models.
6. A couple of the models perform better with a normalization step. For the Leaver and Gross Add models we normalize the input using the concept of “trading days” which standardize the number of days in a month. In this reporting convention months are either 28 days or 35 days. This application will need to calculate trading days for each month in question. The algorithm is simple:
If a month has 5 Fridays, trading days = 35
Else trading days = 28
7. Both the Leaver and Gross Add models expect monthly input values to be divided by the number of trading days of the month and produce output in normalized form. The LSTM models produce x values that are predicted for x months starting from the end of the historical period. Multiply these predicted numbers by trading days respectively to get actual predictions
Practice data for the challenge can be found in the Code Document forums.
Final Submission Guidelines
1. Working Python 3 code which works on the different sets of data in the same format
2. Report with clear explanation of all the steps taken to solve the challenge (refer section “Challenge Details”) and on how to run the code
3. No hardcoding (e.g., column names, possible values of each column, ...) in the code is allowed. We will run the code on some different datasets
4. All models in one code with clear inline comments
5. Flexibility to extend the code to forecast for additional months