Topcoder Challenge | Topcoder Community

Challenge Overview

Challenge Overview

Our client is an insurance company that currently provides its potential customers with forms, which the customers fill to describe the insurance options they want to include in their policy.
There are a lot of redundant and unstructured options listed in this form used by the client, along with a lot of text input options. This makes it difficult for the client’s representatives and customers to easily find and select the options.
In the near future, the client is looking to build a simplified UI-based insurance options (benefits) configurator, where the customer can go ahead and select options according to their needs in a hierarchical manner. Example of the intended hierarchy (example only for general idea, not exactly based on the provided dataset):
- Q1 - Coverage for transplants?
  - Q1.1 - Kidney Transplant coverage?
    - Yes - 100% coverage
    - Copay - 20%
    - Copay - 10%
    - No
  - Q1.2 - Hip Transplant coverage?
    - Yes - 100% coverage
    - No
A similar kind of hierarchy was created for the client in one of the previous challenges of the series. In that challenge, an algorithm was created, which could generate the hierarchy similar (but not identical) to the type mentioned above, by analyzing the raw dataset. The raw dataset was basically anonymized past history configuration data, which contained the choices made by customers on the unstructured options discussed above.
In the last challenge, the task was to add new columns to the raw dataset based on some new goals listed below and also to compile the results of the last challenges into the columns of the raw dataset. To learn more about those goals, please refer to the ‘Goals of this challenge’ section of the last challenge.
Important - In this challenge we have two goals:
1. Primary goal - We will try to make the code as generalized and free from hardcoded strings as possible.
2. Secondary goal - Ensure that there are almost no bugs and errors in the code.
More details in the ‘Goals of this challenge’ section below.

Results of the previous challenge

The winning submission of the last challenge can be found in the forum. Participation in the last challenge is not a hard requirement for this challenge, but it is advisable to go through the specs of the last challenge for the sake of clarity.

Goals of this challenge

Goal 1 - Make the code more generic/generalized
The primary goal of this challenge is to ensure that suppose in the future a new raw dataset is used with this code, then it should be possible to create the same kind of processed csv output as the current one produces.
It should be noted that the new data will have the same general structure, like the same column names and same data types in each column.
Apart from the column names, the values inside the dataset can change, for example new kinds of categories can be added to the ‘benefit_classification’ column. Or suppose a new ‘Product 108’ can be added to the ‘product’ column. Also new values can be introduced in any other column like html_directory, covg_code, sequence_id, answer_tag, answer etc.
Flag columns like ‘type_of_tag’, ‘value_flag’, ‘top_50_flag’, ‘acct_class’ might not get new values because these are usually fixed flags. But if it’s possible to keep even these variables, that’s certainly desirable.
Importantly, it should be noted that entirely new answers and strings can be introduced in the future in the ‘answer’ column. For instance, maybe in a few years we can have an answer like ‘gene-editing covered’ or ‘personalized medicine covered for cancer treatment with 10% copay’ or something similar. These answers might have it’s own new values for the ‘product’ and ‘benefit_classification’ as well.
Currently in the provided code, it can be seen that many hardcoded rules are being used, and there are also some hardcoded/fixed supporting files that are used, such as in folder running/2/input we have Benefits-equal.csv, stop.txt, synonyms.csv or in folder running/3/input we have all_word.csv, concepts.txt.
Important - To achieve this goal, any one of these two approaches that can be taken:
1. Either some changes should be made natively inside the code so that no hardcoded rules or fixed string matching is used in the code.
2. Or an ‘updater’ code should be created which the client can occasionally run to update or generate fresh helper files like synonyms.csv, concepts.txt etc mentioned above and also update the logic inside the code.
Basically it should be possible for the reviewers of this challenge and then the client to introduce a completely new dataset with completely new values (but the same column names), and have the same functionality as the current code. That is, the code should be able to create the hierarchy from that new dataset and also the ‘intent’ and ‘generic_answer’ columns.
One recommended way to achieve this goal is to go through the code line by line and find any rules that are hardcoded or is imported from a hardcoded/fixed file, and try and make it automatic. This is one of the recommended ways, but the contestants are free to try any method they feel can help achieve the goal.
Goal 2 - (Optional) Find and fix as many bugs in the code as possible
This is a self-explanatory and optional requirement. Although it is assumed that some bugs are always fixed when any new extension are made to a code base, but we want to ensure that this final code is absolutely ready for final delivery to the client. Hence, a particular effort should be made to find as many bugs as possible in the code, such that after the challenge completes the code can ideally be delivered directly to the client with maybe only just a handful of post-review fixes.
If the member chooses to find and fix existing bugs, they should include a document named bugs_fixed, which clearly mentions all the bugs that were found and fixed in the existing code. The reviewer, copilot/admin might will go through this document, and based on their discretion, some bonus can be awarded to the submitter in addition to the prize they might win for the primary goal.
It should be noted that irrespective of whether achieving this goal 2 is attempted or not, the submission will be reviewed as usual and if any bugs are found in the part where new code has been added, points will be deducted as usual.
Note to the QA community and developers only interested in bug-fixing in this challenge- Members who are not interested in tackling Goal 1 of this challenge, but are interested in finding and fixing bugs in the code, are encouraged to just submit the bugs_fixed document mentioned above, and submit the 'bug-free' code. Submitters who don’t attempt Goal 1 should include a readme file, indicating that only the bug-fixing part has been attempted, in order to avoid any confusion during review.
Payment will be awarded to deserving submissions independent of the prizes mentioned in the challenge based on the discretion of solely the reviewer/copilot and admin and also based on the number and severity of the bugs found and fixed.

Data and code access

The winning submission of the last challenge can be found in the forum.

Base Code- The participants have to use the winning submission from the last challenge as the starting point, and make changes/improvements to it in order to achieve the goals mentioned above.

Expected technologies - The participants are free to use any technique they like to achieve additional results as long as everything is implemented in Python and the base code’s existing functionality is maintained or improved. The additional code can be a string parsing based implementation, or it might include advanced techniques from the field of NLP, Deep Learning or Machine Learning (ML). If the ML route is chosen, the participants are free to train models themselves, or use ready-made machine learning/deep learning models available online, as long as they are available to be used in commercial software free of charge.

Final Submission Guidelines

What to Submit

Updated Code which in addition to the existing capabilities adds new functionality to achieve Goal 1 and optionally Goal 2.
A PDF/Word/Markup format based report detailing the techniques and algorithms used to achieve the goals.
A README.md file detailing the deployment instructions.
Optional - bugs_fixed file in pdf/doc/txt/md format. (Refer to the Goal 2 details above)

Technology Stack

Python

Insurance Dataset Generalization & Bug-fix Challenge

Key Information

Challenge Overview

Final Submission Guidelines

LEARN:

REVIEW STYLE:

Final Review:

Approval:

CHALLENGE LINKS:

TOOLBOX:

SHARE:

ID: 30099817