Challenge Overview

Background

According to Wikipedia, "Mud logging is the creation of a detailed record (well log) of a borehole by examining the cuttings of rock brought to the surface by the circulating drilling medium (most commonly drilling mud)." Quartz Energy has provided Topcoder with a set of mud logs and we’re developing an application to extract structured meaning from these records. The documents are very interesting - they are even oil-well shaped! You can read more details about them here. If oil is revealed in a well hole sample, a "Show" may be recorded in logs. This is one of the most important pieces of information in the mud logs. Our first attempt to gather information from these files is going to be to find the relevant mud logging terms within the text of these mud logs.
 
In previous challenges such as this one, the Topcoder community developed a command line Java application that extracts a set of phrases from a Mud Log image file using Optical Character recognition technology such as Tesseract and Google Vision.  These images are typically in TIFF format.  In our most recent challenge we implemented the following requirements to operationalize this application in the following ways:
  1. The application is being deployed to the Azure Cloud
  2. We moved the extract command of the command line application into an Azure (cloud) function.  Conveniently, Azure functions support Java. 
  3. The application writes the extracted image and phrase data to a Cosmos NoSQL Database.
  4. Mud Log images and supporting metadata files (SIF or LIC files) are loaded into Azure’s Blob Storage.
  5. Each Azure function execution processes one image.  Azure functions should be triggered by both image and metadata files being loaded to Azure Blob storage.
 
It’s probably obvious from the requirements above, but the previous challenge formed the basis for constructing the first steps in a serverless workflow on the Azure Cloud platform:
  1. Images and SIF or LIC files are loaded to a designated Blob Storage location.
  2. This triggers an Azure function execution to process the images.
  3. Data is extracted and written to Cosmos DB.
  4. Other events will be fired to execute the remove outliers and generate marked image functions.
 

Requirements

In the current challenge, we’d like to focus on the following two requirements:
  1. Removing Outliers - After some analysis it seems like we shouldn’t actually need to create a separate Azure function to remove outliers from the extraction process.  We simply need to process the “logging sections” of the images from the information provided by the SIF or LIC files before we extract the phrases themselves.    In other words, we’re only going to insert phrase records into our data store if the phrases are within a logging section.
  2. Annotating Images - After phrase extraction we should fire a processing-complete event using Azure’s Event Grid functionality and create a second Azure function which will mark the images as the “generate marked images” command did in our original command line application.  (Code can be found in the code document forum).  The annotated images should be stored in a new Azure Storage Marked Images Bucket.
  3. If there are errors in the annotation process, the app should copy the images to the error bucket.
 
Other Requirements
  1. Please carefully document your solution.  Screenshots of the Azure console configuration are essential. 
  2. Please record a screen share video of both your Azure configuration and the execution of the app. 
  3. Use of the Google Vision API and Azure Cloud is mandatory.  This is already implemented in the current codebase.
  4. Code from the previous challenges can be found in the Code Document forums along with sample data.


Final Submission Guidelines

Project Deliverables

  1. Please provide your Azure Function Java code and database schema creation scripts to Topcoder in zipped format. 
  2. Set up the complete workflow outlined above in a personal Azure Account.  This includes Blob Storage and the Cosmos NoSQL database with the working schema.
  3. Carefully document your configuration/solution either in the Readme.md or in a User Guide.doc.  Screenshots of the Azure console configuration are essential so we can duplicate your work.
  4. Also provide a script to automate this setup so that not every step has to be done manually which is time consuming and error prone.
  5. Please record a screen share video of both your Azure configuration and the execution of the app. 
  6. Use of the Google Vision API is mandatory.  This is already implemented in the current codebase.
  7. Java code and Maven scripts from the previous challenges can be found in the Code Document forums along with sample data.  The current code does a significant amount of image processing to aid in character retrieval.  This processing should be ported to the new solution.

ELIGIBLE EVENTS:

Topcoder Open 2019

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30096381