Challenge Overview
Our client has extremely large word documents and pdf files that need to be checked in various ways and with various methods for consistency amongst a possible large number of variables.
There are constraints with obfuscating data present in the documents that need to be processed and part of this project may be suggesting ways to accomplish the sanitizing of data before releasing it to the more general community. This could involve the removal of designated sections of text (or replacing with XXX etc) and making sure there are data tables present to be checked to verify successful data processing.
We want to create software that will flag the areas of concern (discrepencies) within the word and/or pdf documents.