Challenge Overview
Prize Distribution
1st place - $50002nd place - $3000
3rd place - $2000
4th place - $1000
5th place - $500
Introduction
We are looking for a scalable solution for assembling mortgage pools, a complex non-linear combinatorial optimization problem, which is some variant of the knapsack problem. The best solutions should allow for configurability in constraints, speedy results, and cloud microservice deploy-ability.Background
A given bank might lend out a couple hundred mortgages to various home buyers. The bank typically wants to liquidate these loans by packaging and selling them to government entities such as Fannie Mae or Freddie Mac at a small markup. In this way, the bank not only makes money on fees and the markup, but also frees up these assets for providing more loans, and repeating the cycle.Fannie Mae and Freddie Mac have pre-defined pools for creating their mortgage-backed securities, each of which has a price they are willing to pay to the bank for that loan. For example, FNMA 3.0 December shows a price of 101.38, while FHLCMC 3.0 December shows a price of 101.44, meaning that Fannie Mae’s markup for their December 3% coupon is 1.38% and Freddie Mac’s is 1.44%. It is in the bank’s best interest to sell each loan for the highest price possible (i.e. sell into the pool which offers the highest price). Therefore, among these two pools, the bank should allocate the loan to FHLCMC 3.0 December because 1.44% > 1.38%. (Underlines are added for a better understanding.)
The price that a given loan can receive is further affected by the servicing rights owner. Additional restrictions and constraints exist in each pool and servicing combination, such that maximizing the total price obtained across all pool allocations is a complex optimization. The following sections will outline the optimization goal and constraints.
Objective
The goal is, for each loan i, to choose a pool option j and servicer option k among i’s list of eligible options, such that the total economic value the bank can receive across all the loans in question is maximized.Total Economic Value = \sum{ Pijk / 100 * Li }
where Pijk is the price of the pool+servicer combination j, k that mortgage i is assigned to, and Li is the LoanAmount of mortgage i.
Constraints
There are in total 18 constraints. As there are many formulas, please check this PDF: https://www.dropbox.com/s/1dpxw5ixd2ekatk/Constraints.pdf?dl=0.Additional Requirements
- The constraint values c1 to c18 can be user-selected and passed in as inputs.
- Each of the constraints above must be configurable, i.e. can be turned on or off for a given run.
- Every loan must be assigned to one of its eligible combinations of j and k.
- The solution must output results within 25 minutes.
Data Description
There are three CSV files of data. Data can be found here: https://www.dropbox.com/sh/6pb7fnfil52mlbb/AADpT2TEbA23MDycfl1mGIMIa?dl=0.The Eligible Pricing Combinations tab shows the available pool options j and servicing options k for each loan i, and the corresponding price. Each row is a unique eligible pool & servicing option for a loan.
The Loan Data tab contains relevant info for each loan i, including:
- LoanID – unique identifier for loan i
- LoanAmount – value of the loan, Li
- FICO – credit score
- DTI – debt-to-income ratio
- HighBalFlag – identifies whether a loan is high balance or not
- PropOcc – Property Occupancy type indicator. Three types: Primary, Secondary, NOO
- PropState – Property State indicator. Assume all 50 states possible.
- PropType – Property Type indicator. Thirteen types: 2 Unit, 3 Unit, 4 Unit, Co-op, CondoHi, CondoLo, CondoMid, DetCondo, Manu, Modular, NonWarrantCondo, SFR_At, SFR_Dt
- Purpose – Loan purpose indicator. Three types: Purchase, Cashout, Rate/Term
The Pool Option Data tab contains type information for each pool j, including:
- Pool Type – indicates whether the pool is Single-Issuer or Multi-Issuer
- Pool Balance Type – indicates whether pool is Standard Balance or High Balance
- Agency – indicates whether pool is from Freddie Mac or Fannie Mae
Implementation
You must provide an interface taking the following parameters:- LoanData.csv
- PoolOptionData.csv
- EligiblePricingCombinations.csv
- Constraint.csv
- Loan
- Pool
- Servicer
Scoring
Your score will be the Total Economic Value, defined above. However, any solution that violates the constraints gets a score of 0. We provide the scorer for you to evaluate your solution locally. It can be found here: https://www.dropbox.com/sh/3u8r7umysdpou73/AACkt13mu8QN2mEsSmBC9e7na?dl=0.Requirements to Win a Prize
In order to receive a prize, you must do the following:- Achieve a score in the top 4, according to system test results calculated using the test dataset.
- Within 7 days from the announcement of the contest winners:
- submit a complete 2-page (minimum) report that: (1) outlines your final algorithm and (2) explains the logic behind and steps of your approach.
- Properly annotate and comment your code
Submission format
This match uses a combination of the "submit data" and "submit code" submission styles. Your submission must be a single ZIP file with the following content:/solution
A.csv
B.csv
/code
dockerfile
<your code>
, where
- /solution/A.csv and /solution/B.csv are the output your algorithm generates on the provisional test set. The format of this file is described above in the Output file section.
- /code contains a dockerized version of your system that will be used to reproduce your results in a well defined, standardized way. This folder must contain a dockerfile that will be used to build a docker container that will host your system during final testing. How you organize the rest of the contents of the /code folder is up to you, as long as it satisfies the requirements listed below in the Final testing section.
Please carefully document your solution, so we can easily repeat your solution on an AWS VM. Your docker image will be built and run on a Linux AWS instance, having this configuration:
- m4.2xlarge
General Notes
- This match is unrated.
- Use the match forum to ask general questions or report problems, but please do not post comments and questions that reveal information about the problem itself or possible solution techniques.
- Teaming is not allowed. You must develop your solution on your own. Any communication between members beyond what is allowed by the forum rules is strictly forbidden.
- In this match you may use any programming language and libraries, including commercial solutions, provided Topcoder is able to run it free of any charge. You may also use open source languages and libraries, with the restrictions listed in the next section below. If your solution requires licenses, you must have these licenses and be able to legally install them in a testing VM. Submissions will be deleted/destroyed after they are confirmed. Topcoder will not purchase licenses to run your code. Prior to submission, please make absolutely sure your submission can be run by Topcoder free of cost, and with all necessary licenses pre-installed in your solution. Topcoder is not required to contact submitters for additional instructions if the code does not run. If we are unable to run your solution due to license problems, including any requirement to download a license, your submission might be rejected. Be sure to contact us right away if you have concerns about this requirement.
- You may use open source languages and libraries provided they are equally free for your use, use by another competitor, or use by the client.
- If your solution includes licensed software (e.g. commercial software, open source software, etc), you must include the full license agreements with your submission.
- External data sets and pre-trained models are allowed for use in the competition provided the following are satisfied:
- The external data and pre-trained models are unencumbered with legal restrictions that conflict with its use in the competition.
- The data source or data used to train the pre-trained models is defined in the submission description.
- Same as the software licenses, data must be unrestricted for commercial use