Register
Submit a solution
The challenge is finished.

Challenge Overview

According to Wikipedia, “Mud logging is the creation of a detailed record (well log) of a borehole by examining the cuttings of rock brought to the surface by the circulating drilling medium (most commonly drilling mud).”  The documents are very interesting -- they are even oil-well shaped!  You can read more details about them here. In a previous challenge, we annotated some selective phrases from these images using the Topcoder community.  But now our clients would like to expand the functionality of the tool so we’ll need to adjust our ground truth data to meet these requirements. In this challenge we are not focused on the selective phrases and want to annotate random 150 words from each image. We have a set of 410 mud log images that we need to annotate in order to validate the efficacy of our OCR Extraction processes. Please scatter the annotation around the image -- some at the beginning, middle and end of each document.  Each annotation should be one word and one word only.

Here are the steps to begin tagging:

1. Download Mud Log Image Tagging application from Code Document Forum.  It is a Python desktop application.

2. Follow the README.md instruction to deploy the tagging application locally.  

3.  Our application/code has been configured to connect to our centralized database on AWS.  You can find those settings in the settings.py file.

4.  Point your app to the images by clicking the AWS logo in the toolbar and entering the following url:  https://s3.amazonaws.com/mud-log-images/.  The images are in a public S3 bucket.

5. Here is a list of images to adjust and annotate.  Please indicate no more than 20 images at a time which you plan to mark by putting your Topcoder Handle into the “Assigned Member” column of the Mud Log document.  After you’ve completed marking the 20 images and submitted results to us you may proceed to the next 20.

6.  Select the Phrase Marking tool from the toolbar.  It looks like a “crop” tool in an image editor.

���Using the tool you should put a bounding box around each phrase that you identify.  Double-clicking on the bounding box will bring up a dialog box:

 

 

You should enter the OCR Phrase.  Then click “Apply. Your entry will be recorded in the database.

8. The basic rule of thumb is you should only mark a word if it is understandable to you.  If it is really faded and hard to read you don’t mark it.  

Additional Information

  • We’ll pay $15 for each image. We’ll hash the data before and after the challenge to determine how many phrases for each image added. There is really no first prize for this contest -- this challenge will continue until our images have been processed. You'll be paid if you mark one image or 100 images. 

  • If you can’t see a phrase clearly because the text is blurry don’t mark it.

  • If a phrase is covered with grid markings you’ll have to make a judgement about whether the phrase is legible or not.  If the letters aren’t comprehensible don’t mark it.

  • Please make the bounding boxes as tight as possible to the words without obscuring them.

  • Please try to mark at least 150 visible phrases on a particular image.  If there are no phrases found in a particular document please just mark a zero in the column under phrase count.

  • Try to only access images that you are actually marking.���



Final Submission Guidelines

Submission Deliverables?, Environment Setup Instructions?, Final Submission Guidelines? 

1.  Update the Mud Log Image List Google Sheet with the images you are planning to mark.

2.  Please update the phrase count column and file quality notes in the Google Sheet of the files you have processed.  This will help us make payments more easily.

3.  Challenge administration will validate that you've highlighted legitimate phrases in the documents.

4.  If you mark some images submit a blank text file to the code challenge in your submission.zip for this challenge.  This makes payments to your Topcoder account straightforward.

ELIGIBLE EVENTS:

Topcoder Open 2019

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30096410