Challenge Overview
INTRODUCTION
Welcome to the sparck Role Assignment challenge. As part of this challenge you will assemble a machine learning classifier and REST service that can identify names within a document and assign them to different roles.
REQUIREMENTS
For this challenge you will have two main deliverables, the classifier and the REST service. Both should be written in Java and be able to be deployed to the AWS cloud. Spring Boot is the preferred method of execution.
Classifier
You will be creating a classifier capable of accepting training data consisting of document contents and expected results. From this data your classifier should be able to assign roles to names with at least 60% accuracy.
Person names are associated with one of two roles: "Insured Name" or "Agency Company Name". Each JSON input document will contain an ENTITIES key, which will in turn contain PERSON_NAME and/or COMPANY_NAME arrays. The classifier should take the entries in these arrays and attempt to associate them with a role based on the data in the document.
PERSON_NAME Identification Rules
Use these rules to associate names contained within the PERSON_NAME array.
INSURED_NAME
-- Insured Name is a Person or Company Name
-- Insured Name is identified as a string of alphabetic characters in reasonable proximity to { "name" } + { "insured", "borrower", "policy issued to", "applicant" }
-- If the Insured Name contains the word "and" which acts as a conjunction to join person names, or displays the names vertically (one above the other), the datapoint must be captured in its entirety and include all person names
AGENCY_COMPANY_NAME
-- Agency Company Name is either a Person Name or Company Name
-- Agency Company Name is a string of alphabetic characters in reasonable proximity to { "agent", "agency", "producer", "broker" }
COMPANY_NAME Identification Rules
Use these rules to associate names contained within the COMPANY_NAME array.
AGENCY_COMPANY_NAME
-- Agency Company Name is either a Person Name or Company Name
-- Agency Company Name is a string of alphabetic characters in reasonable proximity to { "agent", "agency", "producer", "broker" }
CARRIER_NAME
-- Carrier Name is a string of alphabetic characters in reasonable proximity to { "carrier", "insurance company", "insurer company", "insuring company", "insurance group", "insurer", "company", "underwritten", "undersigned", "issued" }
-- Carrier Name is typically represented by a logo on the document
-- If Carrier Name cannot be found in the document, Carrier Name = "Company Not Identified"
-- If two company names are found in reasonable proximity to { "carrier", "insurance company", "insurer company", "insuring company", "insurance group", "insurer", "company", "underwritten", "undersigned" }, Carrier Name is typically the company name represented in a smaller font size
LOSS_PAYEE_NAME
-- Loss Payee Name is a string of alphabetic characters in reasonable proximity to { "name" } AND { "loss payee" }
MORTGAGEE_NAME
-- Mortgagee Name is a string of alphabetic characters in reasonable proximity { "name" } AND { "mortgagee" }
-- If the document contains a positive selection indicator (e.g. checked box) in reasonable proximity to { "mortgagee" }, a string of alphabetic characters in reasonable proximity to the positive selection indicator
REST Service
You will be creating a REST service capable of accepting a full document in JSON format. The service will then run the JSON data against the classifier, returning any name assignments that it discovers. Each document could have multiple assigned names.
1. The exposed endpoint should be /assignroles
2. The JSON document should be POSTed in the body to this endpoint
3. Return JSON in the response body
{
"INSURED_NAME": "PERSON NAME",
"AGENT_COMPANY_NAME": "AGENT NAME",
"CARRIER_NAME": "CARRIER NAME",
"LOSS_PAYEE_NAME": "PAYEE NAME",
"MORTGAGEE_NAME": "NAME"
}
If a name is not assigned to a role, return NULL.
RESOURCES
Please see the previous challenge in this series where we identified policy numbers:
https://www.topcoder.com/challenge-details/30057066/?type=develop&noncache=true
You will find the source for this challenge in the forums. Use this as the starting point for completing this challenge. You will also find in the forums:
-- Training data for your classifier
-- The NewInput folder contains the input documents
-- The NewOutput folder contains the corresponding role assignments
Final Submission Guidelines
-- Java source for your solution with well commented blocks where appropriate
-- Postman collection for testing your endpoints
-- Any dependencies required to run your solution
-- You are free to use 3rd party libraries so long as their license allows you to do so
-- Spring boot is the preferred method of delivery
-- You must include both deliverables as outlined above
-- Provide instructions/details on how to test your accuracy