Register
Submit a solution
The challenge is finished.

Challenge Overview

1. Context

     

    Project Context


    This challenge is a part of the project aimed to add AI capabilities to Topcoder.
     

    Challenge Context

    Within the project context mentioned above, the current challenge aims to build a core program/system which can recommend a list of members for a particular challenge or task.

    Here, the challenge/task profile will be in the form of an object containing the list of skill tags associated with challenge or task or in fact any text from which a proper list of skill tags can be extracted. Similarly, the member profiles will also comprise of skill tags, but unlike the challenge profiles, the member profiles will also have an associated 'confidence' with each skill.  

2. Challenge Details

     

    Overview


    Within the project context mentioned above, the current challenge aims to build a core program/system which can recommend a list of members for a particular challenge or task, by taking as input the 'member skills profile' and the 'challenge skills profile' (which are in the form of list of skill tags).

    The output will be judged on the basis of how much the top N recommendations score in terms their actual mean 'participation score'. Hence, although this is not directly a classification/regression task, the 'participation score' as discussed below can be thought of as an intermediate ground truth/label.  

    Provided Training Dataset

    In the forum, a .7z compressed folder can be found, which contains these three files:
    • input_challenge_profiles.json - In this file, all the list of all relevant challenge objects can be found. Each challenge object will contain the id of the challenge and a property called 'skills', which will contain the 'skill profile' of that challenge. Note - this file will contain challenge profiles of both the 'train' as well as 'test' challenges. How to know which challenge is for testing and which is for training will be explained below.
    • input_member_profiles.json - In this file, the profiles of members can be found. Though it should be noted that there is not a single set of member profiles. It contains about 40 sets of member profiles, each set depicting the profiles at a particular point in time. More about these snapshots under the section 'About Member Profile Snapshots/Sets'. Note - Just like the challenge profiles file, this file will also contain the member profiles of both the 'training' as well as 'test' members.
    • output_for_training.csv - This file contains the training participation data for each relevant challenge. Note that because this contains the ground truth, only the training part has been shared. The testing ground truth has of course not been shared. Though it should be noted that this file does not directly contain the 'participation score'. The participation value needs to be calculated using the scheme mentioned under the section 'Calculating the Participation Score'.
    • member_profile_snapshots_example.PNG - This image contains the examples of the time periods that have been taken into consideration in creating the snapshots of member profiles. More on this discussed under the section 'About Member Profile Snapshots/Sets'
     

    About Member Profile Snapshots/Sets

    The contestants are suggested to go through the input_member_profiles.json file to understand its structure. One can notice that it has about 40 objects, where each labeled with a particular date interval.

    Let's try to understand it using an example, one of the first among around 40 snapshots is the snapshot with the label '2016/09/14-2020/10/23'. This particular set will contain: the list of all relevant members as of how the profiles looked on 2020/10/23. In this particular set/snapshot/block, only those members will be shown who won a prize in at least one challenge between 2016/09/14 to 2020/10/23
    , and also for this particular set only those challenges are used to build the profile for the members that were hosted between 2016/09/14 to 2020/10/23.

    Now the question in many contestants' mind might be "But why do we need multiple snapshots anyway?".
    The answer to this is: Unlike the challenge skills profiles (which are static in nature), the member skills profiles are not static, they evolve and change with time. A member's profile changes whenever they win a challenge. Hence, sets of member profiles are shared in different blocks of time, to make sure that contestants can get a chance to analyse how the member profiles evolve and can use this information in any way they like to build a robust system that can predict participation.

    In the image member_profile_snapshots_example.PNG provided in this contest, one can see some of the example time blocks/sets/snapshots.

    Important Note on train/test split

    - The top 10 blocks shown in the image 'member_profile_snapshots_example.PNG', starting from '2016/09/14-2020/10/23' to '2015/03/24-2019/05/02' will used for testing. The remaining can be used for training. This is reason why no ground truth has been provided for challenge after 2019/05/02 in the file output_for_training.csv.  

    Calculating the Participation Score

    The participation score of a particular member with respect to a particular challenge should be calculated as: A * if_submit + B * if_win, where A = 1 and B = 3. (Note - Here if_win means winning any prize, it doesn't have to be the first prize win.)

    So if someone who submits will have a participation score of 1. Similarity someone who wins a prize, will have 1 + 3 = 4.
    (Note - 'only registration' will not be taken into consideration as a form of participation)

    The contestants should generate these participation scores using the output_for_training.csv and use it for their development.
     

    Review Metric - Important !!!

    Note - The contestants are advised to go through this section very carefully to understand the review metric.

    The review metric will be in the lines of recommender system evaluation. The core function of the submission should take in as input: a challenge profile and a set of member profiles (only one slice/block), and it should output a list of top N member profiles.

    The submission will have to generate recommendation of top N members for all the challenges that are in the future from the point of the slice. So if the slice's label is '2015/03/24-2019/05/02', then every challenge that came after 2019/05/02 (the second date in the label), will be considered as 'test challenges' for that slice.

    Hence, it might be possible that for some challenges the predictions will have to be generated multiple times, but the predictions will most probably be different because although the challenge profile will remain the same, the member profiles in that slice will be different, and hence the recommendations might also not be identical.

    So for each slice, top 60 recommended members will have to be submitted for each challenge, which should be sorted (i.e, the first element in the list of 60 recommended members must be the most recommended and the last one least recommended among the top 60). Now, during review, the recommendation quality score for each challenge will simply be calculated as:

    challenge_recommendation_score = (actual participation score of the top recommended member) + (actual mean participation score of top 10 recommended members) + (mean participation score of top 60 recommended members)


    The final score will simply be the mean challenge_recommendation_score of all challenges. Note - in case any relevant challenge's output is not found, that challenge will receive a score of 0.

    Code Submission format

    The submission can contain any number of Python files, but it should contain the following components mandatorily:
    • A file with a function called recommend_members, which should take as input: a slice of member profiles, challenge skill tags profile, K. It should return: top K member recommendations for that challenge profile, where the members should be chosen from those in the input slice of member profiles.
    • A python file, which imports the recommend_members function using the 'import' keyword, and can take as input the filename of the challenge profile objects and member profile objects, where the JSON structure of these files should be identical to that of input_challenge_profiles.json and input_member_profiles.json which have been provided in this challenge
    • The output of the above python file should be: A csv file containing the predictions. Each row should contain the following: label of the time block/snapshot, challenge id, challenge skills profile, sorted list of top 60 member recommendations for that challenge (here the first member should be most recommended and then others in decreasing order).

    Types of acceptable technologies

    For these kinds of challenges, usually machine learning and recommender system based implementations render good results, but we don't have a rigid requirement in terms of the type of programming that can be used. As long as the performance is good enough to win a prize, any kind of programming is acceptable as long as it follows the instructions mentioned above.

    Note - only Python file based submissions are allowed. Use of Notebooks or other programming languages are not allowed. (although if notebooks are used for any other purpose, like for data exploration and basic testing during development, those can be included in the submission as additional resources - but they won't receive any score)

3. Scorecard Aid

    Judging Criteria

    In this challenge, the data science subjective scorecard will be used. The review of this challenge will broadly done on the basis of (with weights):
    • The final score achieved by the submission (as described above in the 'Review Metric' section - 75%
    • Overall code quality (subjectively determined) - 15%
    • Documentation (subjectively determined) - 10%



Final Submission Guidelines

A valid submission should contain the following:
  • A csv file containing the predictions. Each row should contain the following: label of the time block/snapshot, challenge id, challenge skills profile, sorted list of top 60 member recommendations for that challenge (here the first member should be most recommended and then others in decreasing order).
  • Training Code and the code used for generating the prediction
  • Documentation guide in Doc/PDF/Markdown/Text format, along with instruction to verify the submission
  • A document in Doc/PDF/Markdown/Text format - briefly describing the approach used to create the implementation.

ELIGIBLE EVENTS:

2021 Topcoder(R) Open

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30148622