Challenge Overview
Watson Pattern Explorer Node RED Flow Contest Specification
1. Project Overview
1.1 System Description
The goal is to produce a proof of concept pattern explorer application using Node RED technology. The application should take in data from the Watson Engagement Advisor Instance, run it through a series of processing steps and provide access to the processed data.
1.2 Competition Task Overview
1.2.1 NLC node
Develop a custom node that will handle comunication to an instance of Natural Language Classifier (NLC) service.
https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/nl-classifier/
The node should be able to train the classifier using the provided data (csv file), query the trained classifier,delete and list all classifiers. Use the NLC REST api to implement the functionalities - http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/apis/#!/natural-language-classifier
1.2.2 Custom flow
Develop a custom flow of a simple data processing pipeline. It should consist of the following:
- Exposed http endpoint for triggering the flow
- Parsing input parameters
- Training the language classifier
- Testing the classifier
- Persisting the processing results
- Sending processing status updates
The endpoint should require the following parameters:
- jobID - the id of the processing job that will be used for identifying this particular job
- dataset to use for training and testing the nlc node
- email address - to send results and status updates to
Save the jobID and email address in the global context so it will be available later for persisting the results and logging.
Split the dataset into train and test questions. Use the train questions to train a language classifier instance. A function node can be used for splitting the dataset.
In a loop check if the training is complete. When it is done, test the classifier with the test questions and report test accuracy as job result. Include the list of misclassified questions in the results.
Input dataset will be in the following format
{
'questions':
[
{
'text': String
'class': String
}
]
}
1.2.3 Deliver the processing results
Develop a custom flow to deliver the results of the processing. Build this functionality as a subflow so we can reuse it to persist results after any of the processing steps. The results should be delivered as an email message. To process emails, use the SendGrid service available in Bluemix.
Additionaly, persist the processing results in a cloudant database. Create a new database object for persistence (jobResult or similar), and use the jobID from the global context to link it with the job request. This should be optional, ie the user should be able to choose to just deliver results by email and skip database.
The email with the final results should contain the following:
Overall accuracy
Classification matrix - class of truth on columns and class of prediction on rows, with count in cells.
List of first 20 misclassified questions
List of 3 most and least accurate question classes
1.2.4 Send processing status updates
Add a status node in the flow to catch node status updates. Send these as a notification to the email address available in the global context.
1.2.5 Error handling
Create a catch node in the flow to handle possible errors. Update the processing status to “Failed” if the error is fatal (breaks the flow), otherwise just log the error.
1.4 Verification
For verification, trigger the flow and verify that the flow produces the required email messages and database changes. Make sure to verify the failure scenario too. You can use the sample NLC data for testing the classifier
https://github.com/watson-developer-cloud/natural-language-classifier-nodejs/blob/master/training/weather_data_train.csv
1.5 Technology overview
- Nodejs https://nodejs.org
- Node RED http://nodered.org
- Cloudant https://cloudant.com
- cf-cloudant http://flows.nodered.org/node/node-red-node-cf-cloudant
Final Submission Guidelines
Submit a zip file with all the deliverables.