Challenge Overview
The Department of Veterans Affairs' (VA) National Cemetery Administration (NCA) seeks to create an interactive digital experience that enables virtual memorialization of the millions of people interred at VA national cemeteries. This online memorial space will allow visitors to honor, cherish, share, and pay their respects and permit researchers, amateurs, students, and professionals to share information about Veterans.
The final application will likely have comments section and it is very important that the language used is appropriate. Therefore, we have two tasks in this challenge:
-
Filter out comments containing profanity
-
Sentiment analysis
For both tasks you are expected to suggest an approach (write a document) and implement a simple POC demonstrating the ideas. You can use cloud APIs or implement your own algorithms but in either case you must explain why is your approach better than the alternatives. You are not limited to specific technologies for the POC.
Note that the tool should support only English language for now.
Profanity filtering
The tool should accept a comment as a string, and output the list of all profane words and the comment with profane words filtered out. You can suggest(or implement) a simple filter from a local dictionary, use regex search or use a third party API - it is totally up to you, but you must cover these points in the document
-
Is there any feedback loop (from manual input) and is there a need for one at all ?
-
Cases that your approach will fail to detect
Sentiment analysis
Sometimes the comments won’t contain any profane words, but the overall sentiment of the comment will be very negative and that does not play well with the decorum on the site. The idea here is that we would have a several sentiment levels (good to bad) and we’d hide (or flag for human review) comments bellow a threshold. Again, you can use a third party API or suggest implementing the analysis locally, but do discuss these points
-
How to incorporate manual feedback (ie someone manually flags the comment as inappropriate)
-
What does the analysis do with long comments that can contain both positive and negative sentiment and how can we tweak the behavior
Your POC implementation just needs to demonstrate basic concepts. We might build on it later or start from scratch, but the document has to have enough details so we can implement the final solution later on.
Final Submission Guidelines
Submit a document explaining the approachSubmit a source code for the POC