Challenge Overview
We have a script that compares the data that we have in Elasticsearch and Database. Now it compares some part of models and in this challenge, we would like to add more models to compare.
Project Background
Topcoder Project Service is the main backend service of Topcode Connect – the client-facing application of Topcoder. We keep all the data in two places in DB (Database) and in ES (Elasticsearch Index). Every time we make any changes to the data in the DB all the changes are also reflected in ElasticSearch. Due to some circumstances data in ES and DB can become inconsistent. In a previous challenge, we’ve implemented a script that compares data in some models in DB and ES and creates an HTML report. It’s recommended to go through the requirements of the previous challenge to understand how the current script works. The script already has some general functionality of generating the HTML report, helper methods for manipulating data and utilizes jsondiffpatch as a base utility for data comparison. In the current challenge, we would like to update it to compare more models in DB and ES.
Technology Stack
-
Node.js
-
PostgreSQL
-
ElasticSearch
Code access
The work for this challenge has to be done in one repository:
- Project Service repo https://github.com/topcoder-platform/tc-project-service branch feature/es-db-compare commit 129a11f289d387bec067abb2347812e09eb833c2 or later.
- Config for local setup is provided on the forum.
Individual requirements
Timelines index
In the previous challenge, we already implemented comparing project objects with related models that are stored inside the `projects` index like `invites`, `members`, `attachments`, `phases` and `products`. Now we should also compare `timelines` and `milestones` that are associated with projects. Each project may have several `phases` and each phase may have several `products`. Each `product` may have an associated `timeline` which could be found using the next criteria: `timeline.reference = “product”` and `timeline.referenceId = {product.id}`.
Timelines and milestones are stored in `timelines` ES index that keeps data from Timeline DB model plus inside timeline ES documents it contains a list of `milestones` (Milestone DB model).
Note, that when we are generating the report we show models related to projects next to the projects they belong to for easier navigation, see screenshot. In the same way, we should show mismatches for timelines and milestones in the report, see screenshot.
Metadata index
The script should check all the data we have in the `metadata` index. It contains the next DB models: https://github.com/topcoder-platform/tc-project-service/blob/develop/src/utils/es.js#L15-L46. Note, that `metadata` ES index contains only one document which keeps all DB models as nested objects.
A list of mismatches in metadata models could be added after projects in the HTML report, something like on the screenshot.
General requirements
-
We would like the code easy to extend for more models. Please, separate code which “knows” about particular index and model structure (like methods to get data from ES/DB, schemas of data structure) from the code that could be used to compare any kind of data (like general methods to manipulate/compare data, methods to generate report with arbitrary data and so on).
In particular, instead of creating methods that know about the structure of indexes define the index/data structure using configs, and create methods that use such configs, so potentially we can reuse them for other indexes. Something like we already do for project index structure, see code. -
When possible create reusable methods so the logic of the script would be easier to read and understand.
-
No unit tests are required for the script.
-
Existent unit tests should pass.
-
Lint should pass (don’t disable eslint rules).
-
Git patch should be without errors or warnings.
Verification guide
On the forum, we share the verification guide from the previous challenge with the scripts to populate demo data to DB and ES that illustrates possible kinds of mismatches. Please, update demo data and scripts to populate it so it includes demo data for new models. Demo data should show all possible kinds of mismatches that could be in new models.
Hint: For creating demo data for new models, you may start Project Service locally as per guide provided on the forum and use Postman file that is included in the repository, and after that “break” data in DB or ES so it has some mismatches.
If you have any questions or concerns, feel free to ask on the forum.
Final Submission Guidelines
Use the “SUBMIT” button on the challenge page above and submit a zip file which would include:
-
Git patch with changes you’ve made to the code in our repository.
-
Verification guide (with updates if necessary).
-
DB demo data with new models included.
-
ES demo data with new models included.
-
Scripts to populate demo data to DB and ES (with updates if necessary).
Additionally, the winner would be required to raise a pull request to the repository after the challenge is completed.