Challenge Overview
Challenge Objectives
-
Extend the provided CLI tool to create relationships from the extracted DB tables.
-
Extend the provided CLI tool to extract data based on the above relationships.
Tech Stack
-
JAVA 11
-
Maven 3.x
Code Access
Repo: https://gitlab.com/tc-owa/xml-cli
Branch: develop
You will find a self-registration link on the challenge forum which you can use to access our private Gitlab group.
Detailed Requirements
Our client has provided us a huge dataset in XML format which contains various data exported by different applications. We are in the process of building a CLI application that will be able to extract the data from the dataset and will export it into organized CSV files based on the relationships of different entities so we can later use that data in an upcoming marathon match.
You will find the dataset (an XML file) attached to the challenge forum.
In the previous challenge, we created a CLI that reads an XML file and extracts all tag names (which represent DB table names) and their properties into JSON files.
As part of this challenge, you need to extend the existing CLI based on the following requirements:
1. Extract the current functionality into its own method.
The current functionality, which is extracting the metadata from the XML file must be extracted on its own method so we can have different commands for each operation that our CLI can do.
Add a new `--extractMeta` flag to the CLI. If that flag is passed when executing the CLI, it should extract the metadata (current functionality).
2. Create relationships
This is a new operation that will be invoked if the `--createRelationships` flag is passed when running the CLI.
This operation will run after the “Extract Metadata” operation is done.
For this operation, you need to come up with a creative algorithm to generate possible relationships between different tables. For example, this may be based on common properties that have a common pattern. Eg: there may be multiple tables that have the property abc_id. This could mean that all those tables are related to each other.
We do not have strict requirements here and we also don’t know the actual relationships that there may be on the actual database(s).
We rely on your expertise and your creativity here to come up with a creative algorithm to extract that information.
You need to store this information in JSON format, similar to the one we are already currently exporting JSON files in the first operation.
Those relationship-related JSON files must be saved inside the output directory (from the CLI parameters) within a sub-folder named “relationships. The naming of each file must follow the following rule: tableName1_tableName2_….tableNameN. For example, having the related tables abc and def, the JSON name will be abc_def.json.
You need to add support for an additional flag “--includeOnlyCommonProps” that will be used to decide whether to include only the common properties from both tables in the final JSON or all properties from the tables in the relationship.
3. Extract Data
The final operation in the chain, which will be invoked if the `--extractData` flag is passed when running the CLI will be the data extraction operation which will read the JSON files from the first operation as well as the relationship files created by the previous command and extract the data from the input XML file into the corresponding CSV files.
Add support for a flag to indicate whether or not to include the CSV column headers.
Add support for flags to decide whether the CLI will export data using only the base JSON files, the relationship JSON files or both.
Important Notes
-
The CLI must be able to process large files without issues.
-
You need to update the existing README.md with any additional instructions/information needed to use the CLI.
-
Running the CLI without any input parameters or with invalid input parameters should print the correct usage instructions.
Should you have any doubts, feel free to ask on the challenge forum!