Mud Log Image Tagging Challenge - Multi-line Adjustments

Register
Submit a solution
The challenge is finished.

Challenge Overview

According to Wikipedia, “Mud logging is the creation of a detailed record (well log) of a borehole by examining the cuttings of rock brought to the surface by the circulating drilling medium (most commonly drilling mud).”  The documents are very interesting -- they are even oil-well shaped!  You can read more details about them here. In a previous challenge, we annotated these images using the Topcoder community.  But now our clients would like to expand the functionality of the tool so we’ll need to adjust our ground truth data to meet these requirements. We have a set of 390 mud log images that we need to annotate in order to validate the efficacy of our OCR Extraction processes.  The majority of the annotations have already been completed but we’ll need to make adjustments as indicated below:

Here are the steps to begin tagging:

1. Download Mud Log Image Tagging application from Code Document Forum.  It is a Python desktop application.

2. Follow the README.md instruction to deploy the tagging application locally.  

3.  Our application/code has been configured to connect to our centralized database on AWS.  You can find those settings in the settings.py file.

4.  Point your app to the images by clicking the AWS logo in the toolbar and entering the following url:  https://s3.amazonaws.com/mud-log-images/.  The images are in a public S3 bucket.

5. Here is a list of images to adjust and annotate.  Please indicate no more than 20 images at a time which you plan to mark by putting your Topcoder Handle into the “Assigned Member” column of the Mud Log document.  After you’ve completed marking the 20 images and submitted results to us you may proceed to the next 20.

6. Review the image that you are planning to mark.  We are looking for certain phrases which indicate that hydrocarbons might be present in a drilling operation.  There are four types of phrases that we’re looking for: Show, Stain, Trace, and Negative. Here is a document which outlines the exact phrases to find.  Some documents have 100’s of phrases others have none.

7.  Select the Phrase Marking tool from the toolbar.  It looks like a “crop” tool in an image editor.

���Using the tool you should put a bounding box around each phrase that you identify.  Double-clicking on the bounding box will bring up a dialog box:
 

 

You should update the OCR Phrase Type to the appropriate type and enter the OCR Phrase.  Then click “Apply. Your entry will be recorded in the database.

8. Most of the images that you are reviewing for this challenge already have numerous annotations.  Please add or adjust annotations if phrases are missing or if the boxes are misplaced. The basic rule of thumb is you should only mark a word if it is understandable to you.  If it is really faded and hard to read you don’t need to mark it. Please review the extended phrase list above.  We’ve added some new phrases that would have been excluded in the previous tagging effort.

9.  Another condition that we’re now handling is the multi-line phrase.  We’ve updated our tagging tool to allow the identification of phrases that may span more than one line.  This will probably be the single most common reason for adjustment/improvement from the set of currently marked images.  Here are the steps required to make a tag a multi-line tag. Here is an image which contains the multi-line phrase “OIL AND GAS CUT” spread across two lines:

You’ll want to mark the phrase with the “marking” tool (it looks like a cropping tool in the toolbar).  And select the part of the phrase on the first line. In this case the phrase is a “Stain” type.  

  

 Next, you’ll need to click the “Add” button on the Phrase Editor Dialog.

The Phrase Editor Dialog will close by itself.  Then draw next box around the remainder of the phrase.

 The tool automatically associates the phrases across multiple lines.

Additional Information

  • We’ll pay $.20 (twenty cents) for each phrase adjusted or added.  We’ll hash the data before and after the challenge to determine which phrases have been updated. There are about 20,000 phrases in this data set but only a fraction of them will need to be changed..  There is really no first prize for this contest -- this challenge will continue until our images have been processed. You'll be paid if you mark one image or 100 images. Some of the files have hundreds of phrases and some files don't have any.  

  • Please be on alert for phrases that cross lines.  For example, if the phrase, “No Stn” occurs across two lines (“No” on the first line and “Stn” on the second line”) use the steps above to capture both portions of the phrase.  You may need to “delete” an annotation box and re-enter it to capture a multi-line phrase correctly. Previously our marking tool could not handle this scenario.

  • The phrases are case insensitive.  Either case is fine. (e.g., “SHOW” is the same as “show” or “Show”.)

  • If you can’t see a phrase clearly because the text is blurry don’t mark it.

  • If a phrase is covered with grid markings you’ll have to make a judgement about whether the phrase is legible or not.  If the letters aren’t comprehensible don’t mark it.

  • Please make the bounding boxes as tight as possible to the words without obscuring them.

  • Please try to mark all the visible phrases on a particular image.  If there are no phrases found in a particular document please just mark a zero in the column under phrase count.

  • Try to only access images that you are actually marking.  Unfortunately, our marking tool doesn’t have a filter so this requires some care.  



Final Submission Guidelines

1.  Update the Mud Log Image List Google Sheet with the images you are planning to mark.

2.  Please update the phrase count column and file quality notes in the Google Sheet of the files you have processed.  This will help us make payments more easily.

3.  Challenge administration will validate that you've highlighted legitimate phrases in the documents.

4.  If you mark some images submit a blank text file to the code challenge in your submission.zip for this challenge.  This makes payments to your Topcoder account straightforward.

 

ELIGIBLE EVENTS:

Topcoder Open 2019

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30091707