Register
Submit a solution
The challenge is finished.

Challenge Overview

According to Wikipedia, “Mud logging is the creation of a detailed record (well log) of a borehole by examining the cuttings of rock brought to the surface by the circulating drilling medium (most commonly drilling mud).”  Quartz Energy has provided Topcoder with a set of mud logs and we’re planning to develop an application to extract meaning from these records.  The documents are very interesting -- they are even oil-well shaped!  You can read more details about them here.  However, one of the issues that we’re discovering is that the well logs are generated in a large variety of formats.  The mud log images files (500 of them) can be downloaded here.

Ultimately, we hope to parse data from these documents using image processing and OCR technology, but a logical first step is to cluster the documents in some fashion so that documents with similar visual characteristics can be grouped together in some way.

We have manually categorized a few of the images.  The categorization displayed in the attached Show Categories.docx is based on how the “Show” information is displayed.  If oil is revealed in a well hole sample, a  “Show” is recorded.  This is one of the most important pieces of information in the mud logs.  Sometimes this information will be in a separate column on the documents labeled “Footnotes Shows” or “Oil Shows” or “Hydrocarbons”.  In other documents this information is simply in text on the logs.   

NOTE:  This is only one of many ways that the documents might be categorized.  I’m showing it for example purposes and not to be prescriptive.  You might cluster the images by quality or by age, by Engineering Company or by OCR compatibility or by some other useful characteristic.  There are many possibilities.  Your solution should group the documents successfully in some visually recognizable and useful way.

Technology Overview

Python 3.6.x
API’s



Final Submission Guidelines

1. Please submit all code required by the application in your submission.zip

2. Document the build process for your code including all dependencies (pip installs etc..)

3. You may use any Python Open Source libraries or technologies provided they are available for commercial use.

4. If you use some other API or platform please document all steps required to replicate your results.  

 

ELIGIBLE EVENTS:

2017 TopCoder(R) Open

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30058055