Challenge Overview

Fast 56 Hour Challenge!!!

Note - This challenge will have a 12 hour review phase and a 6 hour appeal phase.

1. Context

 

Project Context

 

Challenge Context

  • This challenge in a part of the project meant to add AI capabilities to operations at Topcoder.
  • In this challenge, we will not work on any addition of new AI capabilities. Instead, this short challenge will focus on coming up with some scripts that interact with the already developed 'tagger' API, which is currently in the development-testing cycle.
 

2. Challenge Details

 

Overview

  • There is currently a 'tagger' API deployed for testing at https://api.topcoder-dev.com/platform-ai/tagger/ (root url). It can take as input any text through an API request and it returns skill tags extracted from the text. The documentation of this API can be found in the forum.

    Required Script(s)

    You need to come up with a script, which can take an input either a csv file, parse it, collect tags from the 'tagger' API and finally store it to an output CSV file. Here, the contents of the CSV file can be in one of two formats:
    1. A column in the CSV file will contain the text, which can be directly passed to the tagger API. Find sample in the forum: gig_descriptions_sim_sample.csv.
    2. A column in the CSV file will contain a url. This url will either contain text directly (in the form of normal HTML), or it will contain hosted files such as PDF/Doc/Docx/txt file. The text from the url (either directly or from the hosted file) of this url should be extracted. Find Sample in the forum: Mock_Member_resumes_sim.csv.

    3. Important Note - The code should be able to automatically understand the type of file hosted in the url, and it should extract the human readable string accordingly. That is, it should be able to automatically understand if the url contains a PDF, a DOC, a text or a usual text (you are free to use exception handling if needed). The algorithm should NOT rely on the term 'pdf', 'doc' etc in the url. The algorithm should not parse for any details in the url string - doing so will lead to submission disqualification

    The contestants are allowed to either submit two separate scripts or a single script with both options (along with a config file to select the option).

    About the Tagger API

    As discussed, the tagger API takes as input a string and returns an object containing the skill tags found in the string. More details about the API can be found in the 'API GUIDE-TAGGER.pdf' in the forum.
    The request should be made to the /emsi/internal_refresh endpoint. Specifically the request URL should be https://api.topcoder-dev.com/platform-ai/tagger/emsi/internal_no_refresh.

    Note - in the PDF documentation, ignore the 'get_tags/' in the path - this has been removed in the latest version of the API.

    Output Format

    The output format should be same as the input format, but with one additional columns named 'Tags'. It should contain the tags object (in string format) extracted from the API.

    Optional Existing Code

    There is some existing code which was included for testing during the development of the tagger API. It can be found in the 'CLI' folder in the forum. It was meant to the take as input a json file named 'dataset.json', which contain text in the property 'challenge_spec' in each object, which is passed to the API by this script and finally return the tags for every row of the input file.

    The contestants can use this as a starting point or for any reference - though it's use is not mandatory. The contestants are free to start from scratch.

    Easy Configurability

    The parameters or any modes of the script should be easily configurable via a config file, and any configurable value should not be hardcoded.
 

3. Expected Outcome

 
  • The expected outcomes of this challenge are: script(s) to parse through the two types of input CSV files, pass the extracted string from each row to the tagger API, collect the responses and output a csv file.
 

4. Scorecard Aid

 

Judging Criteria

  • The review will be done using the standard code scorecard and the submission will be tested against some actual member resumes and other gigs (not shared in this challenge.)


Final Submission Guidelines

The submission should include:

  • Well-commented Code
  • Documentation
  • Sample output of the two included sample input files

ELIGIBLE EVENTS:

2021 Topcoder(R) Open

REVIEW STYLE:

Final Review:

Community Review Board

Approval:

User Sign-Off

SHARE:

ID: 30160567