Register
Submit a solution
The challenge is finished.

Challenge Overview

The scope of this challenge has been reduced: only data in one ES index in scope and no need to upload a report, only generate.

Create a script that would compare data we have in the Elastic Search Index and Database and create an HTML report.

Project Background

Topcoder Project Service is the main backend service of Topcode Connect – the client-facing application of Topcoder. We keep all the data in two places in DB (Database) and in ES (Elastic Search Index). Every time we make any changes to the data in the DB all the changes are also reflected in ElasticSearch. Due to some circumstances data in ES and DB can become inconsistent. We want to have a script that would report to us if there are any mismatches in data between ES and DB.

Technology Stack

  • Node.js

  • PostgreSQL

  • ElasticSearch

Code access

The work for this challenge has to be done in one repository:
- Project Service repo https://github.com/topcoder-platform/tc-project-service branch develop commit 89b82e9556c7fdea575ec71920a62c257c6e0464 or later.

- Config for local setup is provided on the forum.

Individual requirements

  • Create a script inside a new `scripts` directory with a corresponding npm command to run it `npm run es-db-compare`.
    - The script should get all config values from the environment variables.

  • The main aim of this challenge is to find all the differences between ES and DB data and provide a report about them [major]:
    - it should report objects that have different data and in particular, properties that have mismatched values [major]
    - it should report objects that present only in ES but are not present in DB, and vise-verse - if objects present in DB, but not in ES  [major]

  • The script should compare data of projects and related models which we keep in `projects` ES index. Each ES document in `projects` index is an object from Project DB model with additional data from related models:
    ES project document =
    {
      // "Project" DB object:
      id: 1234,                  
      …                            
      lastActivityUserId: 3333333,

      // plus related models:
      invites: [ProjectMemberInvite], // only invites with status “pending” or “requested” are included to the index
      members: [ProjectMember], // list of all project members
      attachments: [ProjectAttachment], // list of all project attachments
      phases: [ProjectPhase], // list of all projects phases. Each of ProjectPhase object also includes a list of products from PhaseProduct model inside “products: []” property.
    }
    All the data is indexed using a separate service https://github.com/topcoder-platform/legacy-project-processor which you may use for reference. 

  • As we have many projects, it could be heavy to compare all the projects with related data on every script run. So we should require to provide a filter for projects:
    - `PROJECT_START_ID` and `PROJECT_END_ID` - if such env variables are provided, we should only check project with ids in this range
    - `PROJECT_LAST_ACTIVITY_AT` - if such env variable is defined, we only have to check project with the `lastActivityAt` equal or later than `PROJECT_LAST_ACTIVITY_AT`
    - If the script is run without any filters it should throw an error.

  • The script should create a report in HTML format which should look something like this (please, see comments which explain what the numbers mean). Please, include in the report at least the information that is shown in the example. But the actual design and format may be different.
    - The important thing in the report is to see mismatched properties (deeply) in the JSON fields like “<project>.details.taas.items”.

  • In the HTML report, we want to use JavaScript to copy the JSON of objects to the clipboard. In practice, we would use these links to copy objects from ES and DB and past to some tool to compare JSONs, for example, http://jsondiff.com/. [major]

  • Please, use some templating library to generate the HTML report, we prefer handlebars.

  • NOTE: We use soft deleting in DB, so deleted records are still present in DB and they have `deletedBy` and `deletedAt` defined, which indicates that record is deleted. If some record is soft-deleted in DB it got completely deleted in ES.

In general, the exact report format and design are up to you. Feel free to concentrate on implementing robust functionality to find all possible mismatches in data rather than tuning HTML report.

General requirements

  • Create a guide on how to use the new script in `docs/es-db-compare.md`. It should list all the env variables to configure and guide on how to run it.

  • There could be some fields that always mismatch in ES and DB. For example, sometimes we might add `deletedBy`/`deletedAt` with “null” values to ES, and sometimes no, while we always have them in DB. Also, in ES we additionally populate user detail data to the list of members https://github.com/topcoder-platform/project-processor-es/blob/develop/src/services/ProcessorServiceProjectMember.js#L58-L72 while we don’t have this data in DB. So we have to be able to configure in one place JSON paths that have to be ignored in ES and DB when comparing data to avoid such non-informative mismatches to be reported.

  • Where possible create reusable methods and keep them in `src/utils` folder or `src/utils.js` file.

  • Keep the code as general as possible, so we could extend it to be used with other models, but first priority not to miss any mismatches.

  • Follow the DRY principle.

  • No unit tests are required for the script.

  • Existent unit tests should pass.

  • Lint should pass (don’t disable eslint rules).

  • Git patch should be without errors or warnings.

Review guidelines

The crucial part of the review is to check that the script doesn’t miss any mismatches in ES/DB and let us know about them in the report.

Verification guide

Please, provide a verification guide on how to check that script works as expected. In particular, provide a snapshot of ES indexes and DB dump with the demo data which would illustrate all possible kinds of mismatches.
Describe how to apply these ES snapshots and DB dump during the review for testing.

If you have any questions or concerns, please, raise a question on the forum.



Final Submission Guidelines

Submit a zip file which would include:

  • Git patch with changes you’ve made to the code in our repository.

  • Verification guide.

  • ES snapshots.

  • DB dump.

Additionally, the winner would be required to raise a pull request to the repository after the challenge is completed.

ELIGIBLE EVENTS:

2020 Topcoder(R) Open

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30112184