Challenge Overview

1. Context

Project Context

  • This goal of this project is to introduce new AI-based capabilities to the Topcoder platform.

    Challenge Context

    Within the project context mentioned above, the current challenge is the second in the series. (First Challenge).
    This challenge aims to deliver an implementation uses the tagging capabilities already developed, to create and maintain skill tag profiles of members, using their challenge win history.

3. Challenge Details

Dataset

  • The dataset (in JSON format) can be found in the forum thread. The data is in the format:

    challenge_id: {
    • "sub_track": sub_track of the challenge,
      "challenge_url": url of the challenge,
      "currentStatus": the status of the challenge (active/inactive).
      "challenge_spec": the challenge specification string including HTML
      "appealsEndDate": the appeals end date - i.e. the official time when the winners were confirmed.,
      "challengeCommunity": the community to which the challenge belongs to
      "winners": list of objects corresponding to participants who got a winning placement.
    }

    It should be noted that although several attributes of the challenges have been provided, the final implementation will most only make use of the 'challenge_spec', 'winners' and 'appealsEndDate' attributes. More details discussed below.

    Individual Requirements

    The goal of this challenge is to build an API, which can be used to get the skill profile of any particular or all members. Here, skill profile is an array of skill objects, with each object containing the name of the skill and the predicted proficiency of the member in that particular skill.

    Here, the API should have four modules: Raw Data Tagging Module, Member Skill History Maintainer Module and Processing Module (the Module that returns the final processed member profiles). Additionally, there will also be a Database that stores the Member Skill History Data, which can be thought of as the fourth module.


    The details of each of the modules are as follows:
    • Raw Data Tagging Module - This module pulls raw challenge data from the Topcoder API, assigns tags to them using the tagging tool built in the previous challenge, and returns the data. For this challenge, this module will use the provided json dataset as the base data. In the near future, this module will be updated to pull the data from the Topcoder API (version 5), so that everything happens automatically. When this module called, it will return the list of challenge object along with the assigned tags. The tag needs to be assigned by the submitter using the provided tagging tool (Read the documentation of inside the tool to understand how to use it). This data will then be returned to the module that has made the call, in addition the data will also be cached in JSON format in the database along with the current timestamp, using the Database module.

    • Important Note - As mentioned, in this challenge we will be using the provided dataset (which needs to be tagged using the provided tool) as the starting data. Because in the near future this will be modified to an automatic solution, please ensure that the importing of data is done using so function, which can later be easily modified to pull and process the data from an API.

    • Database Module - There should be a separate database module, whose job is to perform CRUD operations on a NoSQL or any other simple form of database. It will mostly store data cached by the Raw Data Tagging Module, and the member raw skill history profiles generated by the 'Member Skill History Maintainer Module'. Use of any NoSQL database tool that can be easily deployed in platforms like AWS, Azure etc are acceptable. If any particularly, uncommon DB is being used, please confirm it in the forum.

    • Member Skill History Maintainer Module - This module will get the tagged data from the Raw Data Tagging Module, and will parse it sequentially. It will parse it in the sequence that is sorted by the appealsEndDate. Note - this is an important detail, so please ensure that the parsing is done in the order of appealsEndDate, and not the challengeId or any other detail. This is because appealsEndDate is the official date on which the challenge is finally closed and the winners are officially finalised.

    • During parsing, the winners of each challenge should be collected, and then for each of the winning members, a new json object should be created and stored into the database. If a JSON already exists for that member in the Database, then that JSON file should be updated. (Important Note - For checking and getting an existing member JSON and to create/update one, the Member Skill History Maintainer Module should interact with the Database Module. The Database module will be responsible for actually interacting the databases.)

      This member JSON file will contain some basic details of each challenge that the member has been a winner in. The details to be added are: challengeId, the timestamp of the appealsEndDate. These JSON files can be thought of as the 'raw skill files' of each discovered member. These raw skills will be processed into a more useful and compressed form in the Processing Module that returns the final processed member profiles. More details of the processing in the relevant section below.

      Important note - whenever the member profiles are updated, it should be done sequentially starting from the earliest appealsEndDate, and the profile of all members encountered should be stored into the database. The timestamp at which the processing was started should be stored in advance in the database with name 'LastRefreshedAt'. Next time whenever this Member Skill History Maintainer Module is invoked again, the 'LastRefreshedAt' value should be checked and the parsing should start with the first challenge that was launch after the timestamp noted in 'LastRefreshedAt'. This will ensure that the entire database is not parsed every time a call in made, and instead only new unseen challenges are parsed in case the Module has already been run in the past.




    •  
    •  
    • Processing Module (the Module that returns the final processed member profiles) - This is the module that will actually interface with the client. It can receive 3 kinds of requests:

      1. Update the Raw Member Profiles of all members - Here, after this request the 'Member Skill History Maintainer Module' should be called, so that all the challenges that has completed till now have been parsed and stored in the database. This will allow fast responses if any other two kinds of calls are made, as no time will be wasted in creating/updating the member skill raw profiles.

      2. Get Raw Member Profile of a particular member - Here, after this request, the entire member skill raw profile of that particular member will be fetched from the database module and will be returned to the client. Before this is done, as with each call, the Member Skill History Maintainer Module should update all the member files, so that any newly completed challenges are accounted for.

      3. Get Processed Member Profile of a particular member - Here, after this call, the module pull the up to date member skill raw profile of that particular member, and the processed it. After the processing it returns the final output of the processing. The exact details of the processing have been described in the section below.

    Processing Raw Member History into final output

    For a member, the final processed profile to be returned will be in the form of an array which will contain skill objects, which in turn will contain the following: name of the skill and the 'skill confidence'.

    The generation of the final output will involve a bit of mathematics. To generate the final array. Each challenge object in the member's profile should be parsed, and the challengeId and the timestamp should be extracted. Using the challengeId, the tags of that challenge will be extracted from the Database (this should have been cached by Raw Data Tagging Module as mentioned). Collect the name of all the skill tags of that challenge, and using the time stamp calculate the 'decay score' of that skill tag. The exact mathematical detail of how to convert the timestamp to decay score is discussed later. Once the decay score for all the skill tags are calculated, the final 'skill confidence' of each skill is calculated using the formula:

    skill confidence = log(sum of all decay scores of that particular skill, found in the member's raw profile JSON).

    So for instance, if a member has participated in 3 challenge, and in each of the challenges, there were several associated tags. Now suppose that the tag 'Python' appeared in two of the three challenges say challenge A and B, then the confidence of 'Python' skill will be calculated for that member as: logarithm(decay_score of timestamp of challenge A + decay_score of timestamp of challenge B)

    Note - A 'skip list of skills should be implemented in the config file where the list of skills that should be skipped can be mentioned. Any skill appearing in this list should be skipped from consideration by the Processing Module (only the processing module; the other modules should continue handle and store these skills)'

    On calculating Decay Score:

    Decay score is just a score between 0 and 1 which indicates how old that challenge is. This will allow us to slowly reduce the weightage of skills that were acquired long ago. This will ensure that that skills that have been acquired recently will have a higher impact.

    To be precise, the decay score should be calculated using the half-life function. Here the half life should be set to 450 days. That is, a skill acquired exactly 450 days ago will have a decay_score of exactly 0.5. In 900 days, this will be halved to 0.25. In 1350 days, it would be again halved to 0.125 and so on. 450 days is exactly 38880000 seconds, and every calculation should be done in seconds. Here the value will keep getting lower but it will never hit zero. The exact value of this half life should be kept variable and it should be possible to modify it via a config file.

    The exact and simple formula is: 0.5 ** (seconds since the 'appealsEndDate'/38880000)
    (In other words, it's 0.5 raised to the ratio of time elapsed).

    Here, seconds since the 'appealsEndDate' can be easily found by converting the timestamp to seconds elapsed till the current time.
     

    4. Scorecard Aid

    Judging Criteria

    • The review of this challenge will be done mostly following the standard development scorecard.
      • Major - Here, the issues that are central to the working of the entire functionality will be addressed. Anything that can make the solution inadequate or disfunctional will be mentioned here
      • Minor - In the minor section, issues that are considerable but not major enough to break the solution will be mentioned.
      • Importance of modularity - As mentioned, future developments will take place on the output of this challenge, hence proper attention should be paid that everything is kept as modular as possible and any special numbers that are used, should be configurable via config files.
      • On security - Currently we are not implementing authentication, but in the near future we most probably will. So it is advisable to construct things such that it is easy to add authentication in the subsequent developments.


Final Submission Guidelines

  • The submission should include:
    • The codebase of the API with proper modularity as discussed above
    • Proper documentation of the API
    • A demo client CLI to demonstrate all three kinds of requests to the API. Should include a guide to help with its deployment

ELIGIBLE EVENTS:

2020 Topcoder(R) Open

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30131202