Challenge Overview
Challenge Objectives
-
Implement algorithm (model, training, testing) for predicting likely buyer company for a company on the market from a list of first round buyers
-
Ideation challenge submissions are available as a starting point
Project Background
Our client, a global investment bank, is looking to build a predictive analytics algorithm
to obtain characteristic insight based on historic deal data taken from their CRM
matching potential closing bidders on assets based on behavior in previous biddings.
-
Prediction algorithm will be used in client environment to predict most likely company to close the deal
Technology Stack
-
Algorithm should be implemented using Python, R, or .NET stack. If you want to use a different technology, ask for confirmation in the forums.
Data description
Historic data is provided in two Excel sheets:
-
Bidder profiles - contains data about actual bids - Buyer’s lists from previous transactions, including details such as firm size, revenue, industry, etc. for each bidding firm and firm sold. Data also includes each bidding firm’s progress through each round until the final closing bid. This data is considered as ground truth. There is total of 42 variables (columns)
-
Seller profiles - contains data about firms sold in the bidding process - includes firm size, revenue, industry, etc. There is a total of 37 columns.
Explanations for all columns are provided in the forums.
Prediction requirements
Main goal is to predict the most likely buyer, given the first round bids. Bidding process for one company is identified by the “ENGAGEMENT_NUMBER__C” column in the data set. First round bids are the ones with ROUND__C=”First”. Closed deals are marked with ROUND_C=”Closing”.
For example - a sale of company X (engagement number 80314) had total of 9 bids - 5 first round, 3 final round and one closing bid. Input for the prediction would be the 5 first round bids and the expected output is likelihood of each of the 5 companies to close the deal.
Your prediction model will be use historic data to predict the most likely buyers for a new company to be sold and should have the following properties:
-
Handles null values gracefully - there are a lot of null values in some of the columns of ground truth data
-
Ranks the buyers on the list - predict which buyer is most likely to close the deal
-
Algorithm output should show why the is high/low (in terms of different factors considered)
Scoring
Predicting the closing bidder correctly is very important, but algorithm that predicts the actual closer as second most likely closer is better than algorithm predicting that same closer as fifth, so the scoring function will take that into consideration:
Score = w*(1 - (Actual closer predicted rank-1)/( Number of first round bids-1))
where w=(1+number of first round bids/n), n=10. Maximum score (correct prediction, 10 first round bids) is 2, minimum score is 0 (correct closing bidder is predicted to be last).
We have provided ideation challenge submissions in the challenge forums. You should use them as a starting point for algorithm implementation. You are free to combine ideas from the submissions or add your own modifications. Here are a few points that will probably have a big effect on accuracy:
-
Financial and strategic bidders contain different information especially in terms of financial data. So while treating NULL values, simply dropping fields that are less than 50% filled will result in a loss of all financial data fields because typically each field is ONLY filled for strategic bidders OR financial bidders, but these fields when filled are important so cannot just be dropped.
-
Creating a separate model for each industry or industry group (as suggested by one of the ideation challenge submissions)
-
The features introduced to track a buyers bid history should be very useful in prediction.
Submissions will be evaluated based on the following criteria:
-
Accuracy (average score) - 60%
-
Code review (quality, documentation, correct usage of libraries, etc) - 20%
-
Documentation - 20%
Your submission should contain:
-
Code
-
Algorithm description document - overview of used features from the ideation challenge or your additions to the final model and discussion section explaining achieved accuracy and possible improvements.
-
README file with details on how to deploy and test the submission with the provided data set
Final Submission Guidelines
-
Code
-
Algorithm description document - overview of used features from the ideation challenge or your additions to the final model and discussion section explaining achieved accuracy and possible improvements.
-
README file with details on how to deploy and test the submission with the provided data set