Challenge Overview
Our client came up with the following problem to solve. As a part of an expenses reimbursement process, employees submit to an internal system soft copies of their expense receipts (i.e. scanned versions of paper receipts, in jpeg / png / pdf / doc or some other file format, typically produced by regular scanners; an employee may submit multiple images to support his claim). Along with the receipts employees manually introduce basic information about the reimbursement claim: expenses category and type, bill date, location(s), distance (in case of transport-related expenses), expense amount, etc. (see the screenshot of the submission screen below):
All these data are stored in the database of pending claims, and wait for the further approval by the finance-processing manager. To approve the expense reimburement, the manager must to ensure that submitted claim is not a dublicate of any past expense claim (either from the same, or from a different employee). Our client wants to develop an API system that will help the finance-processing manager to make such verification. This system will take as the input the employee's ID, the details of his expence claim, and the receipt images; will try to match the image against other receipt images in the database (from any time / any employee, probably just a partial dublicate); and will respond with the result. If it is not able to detect any dublicates, the claim will be approved; otherwise the API will return the details on the potential dublicates, so that the matching past claims can be shown to the finance-processing manager side-by-side with the new claim, to process with the manual cross-check of these, potentially dublicated, expence claims. Note, that we probably can use the meta-data introduced by the employee as the part of the reimbursement claim to limit / optimize the dublicates search.
Sure, your code should follow the standard coding best practices (it should be clear, clean and well commented, so that it is easy to further develop upon your PoC solution).
In case of any doubts or questions, don’t hesitate to ask in the challenge forum (or via Contact Manager link in Online Review, in case of sensitive questions).
All these data are stored in the database of pending claims, and wait for the further approval by the finance-processing manager. To approve the expense reimburement, the manager must to ensure that submitted claim is not a dublicate of any past expense claim (either from the same, or from a different employee). Our client wants to develop an API system that will help the finance-processing manager to make such verification. This system will take as the input the employee's ID, the details of his expence claim, and the receipt images; will try to match the image against other receipt images in the database (from any time / any employee, probably just a partial dublicate); and will respond with the result. If it is not able to detect any dublicates, the claim will be approved; otherwise the API will return the details on the potential dublicates, so that the matching past claims can be shown to the finance-processing manager side-by-side with the new claim, to process with the manual cross-check of these, potentially dublicated, expence claims. Note, that we probably can use the meta-data introduced by the employee as the part of the reimbursement claim to limit / optimize the dublicates search.
Technical Aspects
We want you to develop a proof-of-concept app solving this problem. The client wants to use our solution inside a .NET-based system; however it can be done as a stand-alone service, exposing necessary functionality via REST API; thus you are free to use any appropriate tech stack. Probably, C++ or Python. It should take as the input the claim data (employee ID, expence details, and the receipt images); and return these data if no dublicates found; or return all data about this and all detected dublicate claims. It also should add new claim details to the database of past claims, and to do any relevant processing to ensure that the API performs efficiently as the database of claims grows.Sure, your code should follow the standard coding best practices (it should be clear, clean and well commented, so that it is easy to further develop upon your PoC solution).
In case of any doubts or questions, don’t hesitate to ask in the challenge forum (or via Contact Manager link in Online Review, in case of sensitive questions).