Challenge Overview
Topcoder has a relationship with a client that has several requests for digitization of forms that contain handwriting. We're somewhat relunctant though to invest in these projects without knowing what are the reasonable expectations for accuracy. The interpretation of handwritten digits seems to be a solveable problem; a trip to a bank teller machine that can automatically reads check amounts validates that. But what about the larger set of both cursive and printed handwritten English characters. In our case the handwriting can be quite messy too. Scrawled text on comments cards. This is a far more difficult problem.
In this challenge we'd like you to provide a research document which discusses the following topics:
1. What level of accuracy can be expected in the interpretation of handwritten documents?
2. What open source or proprietary libraries are available to assist with this?
3. Can these libraries interpret both printed and cursive text? Please provide sample interpretations/examples if possible.
4. Are there publicly available data sets that can be used for training and testing?
5. What are most promising techniques and technologies? Should we devise different processes for printing vs. cursive text? Are there suggested preprocessing steps?
The most impressive OCR handwriting example that I've witnessed personally is Ancestry.com's interpretation of old US Census records. They seem to be doing an admirable job of interpreting the quite regular and consistent cursive text of the census takers which manually created records in the early part of the this century and the last one in the US. It would be great if we could develop something similar for our clients but we need some research to get us pointed in the right direction.
2. You will be judged 75% on the depth of your analysis and 25% on the quality of your deliverables. We're not judging on your facility with English.
3. Graphical elements, tables, and process diagrams to illustrate your research are appreciated and encouraged.
In this challenge we'd like you to provide a research document which discusses the following topics:
1. What level of accuracy can be expected in the interpretation of handwritten documents?
2. What open source or proprietary libraries are available to assist with this?
3. Can these libraries interpret both printed and cursive text? Please provide sample interpretations/examples if possible.
4. Are there publicly available data sets that can be used for training and testing?
5. What are most promising techniques and technologies? Should we devise different processes for printing vs. cursive text? Are there suggested preprocessing steps?
The most impressive OCR handwriting example that I've witnessed personally is Ancestry.com's interpretation of old US Census records. They seem to be doing an admirable job of interpreting the quite regular and consistent cursive text of the census takers which manually created records in the early part of the this century and the last one in the US. It would be great if we could develop something similar for our clients but we need some research to get us pointed in the right direction.
Final Submission Guidelines
1. Please produce an MS Word or PDF document which covers the five elements discussed above.2. You will be judged 75% on the depth of your analysis and 25% on the quality of your deliverables. We're not judging on your facility with English.
3. Graphical elements, tables, and process diagrams to illustrate your research are appreciated and encouraged.