OLS Clickstream Retirement Predictor Code Challenge

Register
Submit a solution
The challenge is finished.

Challenge Overview

Our customer wants to increase retirement account sales to users who are already visiting their banking site. Your job is to analyze website visitor and account data and show how to identify likely buyers. We have run an ideation challenge before and get some interesting ideas. We would like to go further along the previous winning solutions and make it more well-organized.
The data is comprised of click-stream data from visitors of different types - some visitors buy retirement programs with their first visit, others click around and learn more first.  Others are customers for many years whose funds on deposit, demographics and browsing behavior now indicates that they might be interested in retirement accounts.

Background

As a global wealth management bank, we are looking to analyze actions their clients perform in their web application. Main goal is to predict if the user is giving out signals that they are planning their retirement, ie opening a retirement account.
In this challenge you will get access to the submission which has data exploration and some analysis on how clickstream data can be used to qualify potential buyers of retirement accounts.

Task Detail 

Please first have a check on the previous challenge about data description and the general information: https://www.topcoder.com/challenges/30076899. You should be able to download the previous challenge’s winning solutions after you registered this challenge.
In this challenge, we are particularly looking for
  1. The previous winning solution is partly using UNIX commands. Please first turn all the implementations into Python.
  2. We would like to see a more in-depth model built based on more variables. The previous winning solution only looks into the frequency of the pages visited and does not leverage other variables (e.g., age of user or other types of account client may have). Please try to split the data into train/dev/test sets and build your machine learning model on the train set, tune your parameters on the dev set, and finally evaluate the tuned model on the test set. Some analysis like feature importance scores are also welcome.
  3. Please make your final output easy to ready. There must be no manual actions (e.g., copy-paste from Excel tables) required.

As a final delivery, you are required to submit both the report and the code.



Final Submission Guidelines

Contents

A document with details for the proposed algorithm and/or a proof of concept solution, pseudo-code or any documentation/ previous research papers that helps illustrate proposal.
The final submission should be a report, more like a technical paper. It should include, but not limited to, the following contents. The client will judge the feasibility and the quality of your proposed likelihood function.
  1. Title : Title of your idea

  2. Abstract / Description : High level overview / statement of your idea

    • Outline of your proposed approach
    • Outline of the approaches that you have considered and their pros and cons
    • Justify your final choice
  3. Details :

    • Detailed description. You must provide details of each step and details of how it should be implemented
      • Description of the entire mechanism
      • The advantage of your idea - why it could be better than others
      • If your idea includes some theory or known papers;
        • Reason why you chose
        • Details on how it will be used
        • Reference to the papers of the theory
      • Reasonings behind the feasibility of your idea
  4. Appendix(optional) :

    • Bibliography, A reference to the paper, etc.

Format

  • A document should be minimum of 2 pages in PDF / Word format to describe your ideas.
  • It should be written in English.
  • Leveraging charts, diagrams, and tables to explain your ideas is encouraged from a comprehensive perspective.

Judging Criteria

You will be judged on the quality of your ideas, the quality of your description of the ideas, and how much benefit it can provide to the client. The winner will be chosen by the most logical and convincing reasoning as to how and why the idea presented will meet the objective. Note that, this contest will be judged subjectively by the client and Topcoder. However, the judging criteria will largely be the basis for the judgement.

Accuracy (50%)

  • Please justify your final chosen model conceptually and discuss the pros and cons of all compared models.
  • Please establish evaluation metrics and benchmark the models that you have tried.
  • Please explore and explain the data characteristics and outlining the main findings -- graphs and other visuals are highly encouraged
  • Provide recommendations for any additional data sets that might be useful to increase the accuracy.

Efficiency (30%)

  • Please discuss the effect of the training data size to the accuracy.
  • Please discuss the data preparation pipeline. What can be precomputed? And what must be calculated in real-time?

Implementation (20%)

  • Data analysis scripts with environment setup and instructions on how to run the analysis.
  • Predictor training and testing scripts with deployment/verification instructions.
  • Should be implemented using Python only.

Submission Guideline

You can submit at most TWO solutions but we encourage you to include your great solution and details as much as possible in a single submission.

Supplementary materials

You will be able to download from the Google Drive link posted in the forum.

ELIGIBLE EVENTS:

Topcoder Open 2019

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30079791