Challenge Overview
In the previous ideation challenge we have developed a proof-of-concept Python / OpenCV / PyTorch based solution (provided in the challenge forum) for the automated recognition and machine-reading of scanned passport and visas.
Our client praised the outcome of that challenge and decided to run this follow-up competition to further elaborate the idea and prototype app. The outlined focus points are:
Don’t hesitate to clarify any doubts via the challenge forum, or via Contact Managers option in Online Review system, if you have sensitive questions, that should not be shared with other competitors.
Our client praised the outcome of that challenge and decided to run this follow-up competition to further elaborate the idea and prototype app. The outlined focus points are:
- Improvements of data recognition and capture:
- Current solution does not work for some images now: for example, it fails for tilted passport pages. It should be improved to be more robust for image alignment and quality;
- It reads only data from Machine-Readable Zones (MRZ). For example, issue date and expiration dates are not encoded into MRZ, but they are generally present in all passports and visas; in general the client wants to be able to read all pages of passport, even those not containing MRZ;
- Data capturing should be zonal-based rather than template-based (i.e. we should avoid the need to manually provide template for each possible kind of passport / visa page, instead we would like to have a solution that is able to automatically detect and parse different fields it finds at a page);
- Some passports / visas may have barcodes with encoded infromation (e.g. Brazilian passports), we should support reading ot them, when available.
- Be sure that the format of extracted passport / visa data is uniform between different passports and visas across all countries.
- Technical improvements:
- Current prototype app supports only JPEG images. The client wants to add support for PNG and PDF formats;
- Dates should be converted to the standard format (dd/mon/yyyy) across all passports and visas;
- Solution should be deployable on Microsoft Azure.
- The client wants to further look for a hardware solution / system to scan all pages of passport without a manual intervention. Keep in mind that we are interested in a specific, reasonably priced device (i.e. a big and expensive machine intended for automated digitalization of big books in library, most probably does not match the expectations).
Don’t hesitate to clarify any doubts via the challenge forum, or via Contact Managers option in Online Review system, if you have sensitive questions, that should not be shared with other competitors.