Challenge Overview
Challenge Objectives
- Implement the de-duplication process.
Project Background
Currently, customer team is spending more time in resolving the work items, which were created from each email received from the consumers (end customer). To serve consumer quickly and better, the customer team want to optimize the work items creation by eliminating the categorizing the emails.Technology Stack
- Node 8+
- MongoDB
Code access
The latest codebase will be provided in the forum.Individual requirements
1. Similar to identification logic TIBCO is communicating to Email App
{ "version": "1.0", "source_system": "TIBCO", "target_system": "E-Mail Triage Dedupe Detection App", "res_trigger_by": "ABC1234", "res_trigger_dt": "2018-08-17,20:45:23", "email_box_content": [ { "email_box_identifer": "EMAIL_IDENTIFIER_01", "email_content": [ { "work_item_id": "1766jhgf900", "category": "RFP", "from": "Pam Palmisano <pam@xxxxx.com>", "sent": "Friday, August 17, 2018 1:14 PM", "to": "Pam Palmisano <yyy@yyyy.com>", "subject": "Developmental Services of Iowa, Inc. Life and DI RFP", "attachments": ["725332", "344555", "334444" ], "body": "Example Body" } ] } ] }
2. Email app is processing with app request, separating body and attachment names, clean body text by removing unwanted code and signature.
3. App formats the body text by removing unnecessary noises and prepare for Watson
Just like identification logic we need to parse the body, also we need one more parameter on body “attachment” which will contain the name of attachments in semicolon (;) separated format. From body we also get information about Workiten id, Workiten Status.
4. Invoke Watson and receive plan holder
Here is the sample code
var NaturalLanguageUnderstandingV1 = require('watson-developer-cloud/natural-language-understanding/v1.js');
var natural_language_understanding = new NaturalLanguageUnderstandingV1({
'username': '58bc4205-0fa5-4e1e,
'password': '6G1TCbx5iF1G',
'version': '2018-03-16'
});
var parameters = {
'text': 'FW: secure: Stylex Inc - eff 8/1/18 - DUE June 8',
'features': {
'entities': {
//'model': '21d63af2-0616-4e6a-9f30-eee0dc624f38'
}
}
};
natural_language_understanding.analyze(parameters, function(err, response) {
if (err)
console.log('error:', err);
else
console.log(JSON.stringify(response, null, 2));
});
Or refer this URL https://www.ibm.com/watson/developercloud/natural-language-understanding/api/v1/?node#relationsPlease note: Here we need to make 2 request one after another, first we need to call with above model (21d63af2-0616-4e6a-9f30-eee0dc624f38') , if no plan holder found then we need to request without model (empty entity). Even if in second case also no plan holder found then we can generate an exception and skip the process for that request.
5. Invoke SFDC and get opportunity for same plan holder for last 6 months, each plan include Opportunity ID
Request format is exact same as identification logic (select query), response format is available on the doc.
GET API Request (sample):
https://glic--UAT.cs47.my.salesforce.com/services/data/v43.0/query/?q=SELECT+Id,+Broker_Firm_Name__c,+CreatedDate+from+Opportunity+where+Planholder_Name__r.Name+=+'XXXXXXXXXX'+AND+CreatedDate+=+LAST_N_MONTHS:6
You should replace the XXXXXXXXXX with the plan holder name retrieved in step 4.
Note that similar to the "Feedback Logic" challenge, we will receive the SFDC token and instance_url from the TokenCallback API, and then use instance_url to construct the final SFDC API URL.
6. API response from SFDC to get all opportunity from last 6 months:
7. Check if opportunity count >0
8. IF not then send an API request to SFDC that it’s not duplicate (status: In Queue)
9. If yes then get broker details
To do this first separate each email into multiple message box (similar to identification logic), and process from one after another from top. Now for each message box (consider we may have 100 message box under a single email), first separate the Signature, from, now on main configuration keep few items such as domain name, firm name, so if this from or signature matches with configurable domain or firm name then we must skip that particular message and go for next.
On next message from Signature we can pickup the 3 following
- Name
- Firm Name
- Email Addess
- Website Address
10. Compare the broker details of all opportunities with present one
Watson return only one for the present invoked email, you need to match with all (may be 100), even if one broker matches then we need to match with attachment content from SIMON, we are in discussion with them as we need sample in/out content, for architecture you can suggest one and continue, later in this area we may need to update.
11. If broker is not same then
12. update SFDC with status not duplicate (status: In Queue)
13. If broker is same then retrieve document
We need to get the document ids associated with the opportunity id from the SFDC response.
Here is the request format
And here is the updated response format (Here 1000002 or 1000003 is the opportunity ID and 345623, 876543 or 876549 is the example of document ID.{ "Opportunity_ID": [ "1000002", "1000003", "1000004" ], "Workitem_ID": [ ] }
Please note: As presently we don’t have any endpoint URL, create a method which can accept this input and reply with mentioned output, so once we have the endpoint we can replace this method with live endpoint information.{ "Opportunity_ID": [ { "1000002": [ "345623", "876549", "876543" ] }, { "1000003": [ "678345" ] }, { "1000004": [ "678954", "345612", "908765" ] } ], "Workitem_ID": [ { "7000002": [ "145623", "576549", "676543" ] }, { "7000003": [ "578345" ] } ] }
14. Initiate API request document retrieval from SIMON
{ "Opportunity_ID": "1000003", "Document_ID": "567899" }
15. API response to get document back from SIMON
This will be the raw file data in binary format
16. Compare document
We can retrieve the present attachment details from SIMON and if other information matches then the matched email attachment so we can compare the content, here basically we need to match files, if 2 files are same then duplicate else even if one is different than not duplicate .
17. If document is not same
Skip the process, as email is not duplicate
18. Item identified as duplicate
Update SFDC with duplicate status
PATCH API Request (sample):
URI: https://glic--UAT.cs47.my.salesforce.com/services/data/v43.0/sobjects/OpportunityWorkItem__c/<Opportunity Workitem Id>
(e.g. https://glic--UAT.cs47.my.salesforce.com/services/data/v43.0/sobjects/OpportunityWorkItem__c/a0V2a000000bzCZEAY)
Headers: Authorization: OAuth <access token>
Content-Type: application/json
{ "Dedupe_Process_Completed__c": true, "Duplicate_RFP__c": "Yes" }
So if everything goes well, we send 200 status code with no response body to TIBCO. And if any error occurs, we send the error message back to TIBCO with proper status code.
Deployment guide and validation document
Please update the existing deployment and verification guide.
Final Submission Guidelines
Updated source code (with updated postman and swagger file)Updated deployment guide and verification guide
Working IBM Cloud deployment