Challenge Overview
Project Overview
GE processes thousands of patent applications a year in the US alone. They have applications from other countries with applicable info called "prior art." This info (prior art) sometimes needs to be transposed from one application to another, a process that is done manually today and can take anywhere from five minutes to an hour. I think you can see the value that this tool could provide when you think about this process happening over thousands of applications.
We are running series of POC challenges to build a tool that could automate this process.
The final tool we want to build, will have the following workflow :
-
A user opens a web page to have two functionality :
-
upload a patent application that contains the ‘prior art’ information
-
upload an optional file that contains key/value pairs representing additional information that will be replacing default values used in building the final output
-
-
A php application backend will handle the upload functionality, and delegate logic to a java command line utility (packaged in jar file) passing the files to the utility
-
The utility will parse the input arguments, and decide the type of input file of both input files
-
For the patent application file, the utility extracts the ‘prior art’ information, and construct an xml
-
For optional file (let’s call it ‘extra info’ file), the utility converts the file to xml
-
the utility uses a mapping file, the converted xmls, and create a pdf/xls/csv file, store it locally, and return the full path to the file to the php
-
the php will read the file and send it back to the user.
Challenge Requirement
In this challenge you are building the java command line utility, we are removing the process of converting input to xml of the scope of this challenge. The output of the utility will be a csv file.
You are addressing the following in this challenge :
-
Build a command line java utility
-
The utility accepts two xml input files as arguments
-
First xml file is the ‘prior art’ patent document, and it’s structure must match the provided XSD, you need to perform validation
-
The 2nd file structure can be like this :
<items>
<item name=”key1” value=”value1” />
<item name=”key2” value=”value2” />
<item name=”key3” value=”value3” />
</items>
-
-
It should have a configured Map file : map the fields between ‘prior art’ fields and optional fields, and the output file here is some notes to consider when designing the mapping file structure :
-
A field should have a flag attribute to indicate whether it is required or optional
-
If field is optional then default value should be present
-
-
A field should have a file attribute to indicate which xml to read the field from
-
Field should have XPath url to read it’s value from corresponding xml, i.e. if the field’s value should be taken from ‘prior art’ xml, then the XPath url value will be used to get the value from the xml. You need to provide a solution if the field is expecting more than one value.
-
Each field/item in the xml represents a csv column.
-
The format of the file would look like this :
<mapping>
<item name=”key1” required=”true” file=”prior-art” column-name=”key-column-1” multiple=”false”/>
<item name=”key2” required=”false” file=”extra-info” value=”value2” column-name=”key-column-2” multiple=”true”/>
</mapping>-
‘name’ represents the field name to read from prior-art or extra-info xml
-
‘column-name’ represents the column name in output csv file
-
‘multiple’ attributes is a flag indicates whether or not there are multiple values to read from the xml
-
-
-
We want to support two output formats of the csv, we are providing sample in challenge forums for each output format.
-
The output format should be sent as input argument to the utility, either 'standard' or 'multiple', the corresponding csv formatter should be used based on the passed type.
-
Design the csv format class to be pluggable interface that have two concrete implementation, each implementation will support one csv format.
-
The output file will be parsed by some other applications, make sure the format is machine-readable and can easily be parsed.
-
For "Multiple" sample file, make sure you add delimiter between sections to be easily parsed.
-
We only provided output sample for Example 3.
-
-
The tool should store the output file locally or to a predefined directory folder (can be passed as argument when calling the utility)
-
The utility should return the full path to the output file.
You need to take the following notes into account when building the tool :
-
It is a java command line tool that will be invoked and executed by php code.
-
Implementation should use a facade (interfaces) and adapter design patterns (and any other proper design patterns) to make it possible to extend the functionality in future.
-
You should use XPath to manipulate the XML.
-
The XPath URL when manipulating XML content should not be hardcoded in code, the XSD might change in future so we need to make sure it is only configured in the mapping file.
-
You can use an open source library to read XML and Mapping files, and to build the output file.
Documents
Provided in challenge forums the XSD file, and 3 samples of input files and their output.
Test
We will use the samples to test your solution, the output is expected to match the provided output samples. Make sure to test them thoroughly.
Submission Deliverables
Below is an overview of the deliverables:
-
A fully implemented tool with all the functionality defined by the requirements above.
-
A complete and detailed readme document explaining how to deploy the application including configuration information.
Final Submission Guidelines
.