Topcoder Challenge | Topcoder Community

Challenge Overview

Previously, we have created the intial approach to populate a challenge or a list of challenges to elasticsearch.

For this challenge, we'd like to create a cronjob which will

1. Please use https://github.com/spinscale/dropwizard-jobs, if you have better choice, please raise in forum and ask approval.
1. The job should be up intervally to find the recently changed challenge ids with query as following

SELECT DISTINCT
(project_id)
FROM
project
WHERE modify_date < sysdate AND modify_date > <<last run timestamp>>
UNION
SELECT DISTINCT
(project_id)
FROM
project_info
WHERE modify_date < sysdate AND modify_date > <<last run timestamp>>
UNION
SELECT DISTINCT
(project_id)
FROM
project_phase
WHERE modify_date < sysdate AND modify_date > <<last run timestamp>>
UNION
SELECT DISTINCT
(project_id)
FROM
upload
WHERE modify_date < sysdate AND modify_date > <<last run timestamp>>
UNION
SELECT DISTINCT
(project_id)
FROM
resource
WHERE modify_date < sysdate AND modify_date > <<last run timestamp>>
UNION
SELECT DISTINCT
(project_id)
FROM
prize
WHERE modify_date < sysdate AND modify_date > <<last run timestamp>>

the interval should be configurable in YAML file and overridable by environment variables.

2. The service will be possiblly deployed in several machines and load balanced, so there will be several jobs running simutanously, we should use a distributed lock to make sure only one cronjob is running in the same time. The job are same, so no need to run in the same time.

you can use redisson to achieve this, see https://github.com/redisson/redisson/wiki/8.-Distributed-locks-and-synchronizers

3. For the cronjob, it will store and retrieve the last run timestamp from Redis, so it can be more smarter like used in the query in item 1. If there is no last run timestamp, we will consider this as a initial load, please use a enough old timestamp.

And the new time that the job started will be saved to Redis.

For this challenge, Let's use redis cache like item to store this information. Notes, for different environments, the key should be different, so better having a prefix for each environment, or make that configurable too.

4. The redis cache should be configurable in YAML file and environment variables.

5. It is possible for query in item 1, there will be a big list of challenge ids (like the intial load), the job should be able to smartly do batch update, like retrieved a configurable size (for example, 100) each time to update into elasticsearch.

and for listing the challenge ids, be sure to use desc order, so the newer challenges will be updated first, which is important for us.

Final Submission Guidelines

- Code Changes
- Verification Steps

Topcoder - Create CronJob For Populating Changed Challenges To Elasticsearch

Challenge Overview

Final Submission Guidelines

Learn

ELIGIBLE EVENTS:

Review style

Final Review

Approval

Challenge links

Toolbox

ID: 30061798