Challenge Overview

What are we doing here?
Our client needs to analyze and process a set of LAS files on a daily and weekly basis.

What is a LAS File?
It is dataset file with multiple sections. It is a fixed width text file. It contains header data about the well in question, the operator, and logging company and many other attributes as well as information about the types of logging information that are recorded in the file. There is also instrumentation data section of the document which lists depth-registered instrumentation data.

Much better and detailed explanation here -
http://www.cwls.org/las/

Where do I need to look in the LAS file for this project?
As part of this project we will be looking into 3 parts of the LAS file.

Sections
Attributes
Meta Data & Value

To explain a bit more here are some more details -

Section
Anything starting with “~” is a Section Name here and the subsequent data is part of that section.

Attribute Name
It stays on the extreme left side of each and every field.
e.g. in the below screenshot LNAM is a Attribute

Metadata & Value
For each TAG we have the corresponding data on the right hand side with Metadata name
e.g in the below screenshot
Metadata: NAME
Value: AIT/HILT/BHC

Relevant attributes as part of this project -
There are two sets of attributes we are interested in predicting using machine learning

Standard attributes (typically found in a standard LAS file from a vendor)

SRVC (service company)
SVCO (also service company)

Non-standard attributes (attributes we would like to define to make searching and querying our LAS database easier for the end users.)

LNAM – Log name
LACT – Log activity
DSRC – Digit source
PLVL – Processing Level
FTOL – Full Toolstring
CASE – Casedhole Flag
GTOL – Generic Tool String

The existing non-standard attributes have been populated in and exported from an existing internal database. The purpose of adding them to the LAS files was to simplify population of the metadata in a new database. If this information is populated in an LAS file, it should be considered part of the training set.

The attributes to be predicted will most likely be determined by associations of attributes in the various sections, especially the ~C section.

Which Attributes are part of this challenge?
SRVC and LNAM

What is the SRVC attribute ?
The SRVC attribute value is typically the company that collected the data and generated the LAS data file. Many of the larger service companies use unique curve attributes in their files which can be predictive of both the vendor and the tool that was used to collect the data. In other cases the curves may not be predictive but other attributes in the header information might point to certain vendors working in a certain area during a certain time period.

Things to consider in the SRVC attribute

SRVC may have the following

It can be missing in the LAS file.
It can have a different value e.g. just a number (when a number it is a defined number from a standards group)
It can have spelling mistakes

How will you know if the SRVC mentioned is the right SRVC or not?

There are a few steps to the process.

Spelling Mistakes & Other values

You can find a big set of Aliases to the companies in the
- Logging Contractor Alias.xlsx (Please find link in forum)
Just remember this is not a finite list, it can differ much more. So the algorithm should be able to identify based on the data provided in the file.

So this way you will be able to make a clear judgment on the right SRVC.

Unknown Values
LAS Files with unknown SRVC and with curve information not enough to predict the LNAM should be marked as error

Non-standard attributes

The non-standard attributes are not typically found in LAS files. These attributes have been created and populated in an existing database using traditional logic statements in order to mass populate them. These generated attributes are then inserted into an LAS file export in order to load the attributes into a new database. These additional attributes will be used to simplify querying the new database using advanced search capabilities. We would like to use a machine learning algorithm to predict these same attributes as new data is delivered into the database and improve existing metadata within the database.

LNAM – “Log Name” is created by determining the combination of tools used to collect the data. Tools are currently determined by knowing the SVCO and certain combinations of mnemonics found in the ~C Curve Information section. Reference tables are available for some vendors.

Where does the Machine Learning part come in?
That’s the interesting part I guess. We are going to predict the value of LNAM as part of the challenge. The prediction will be based on the ML algo that you need to create based on analysis of a bunch of LAS files.

What are we predicting for LNAM and why ?
The attribute LNAM is a non-standard field in an LAS file. But for downstream processing we need the attribute present in every file. The current client system is not able to provide the same with good accuracy. So the client team has to go into each file and manually update them as needed. This is a big time consuming process as there are hundreds of thousands of files to be processed.

So instead of going in and checking each file manually and adding the LNAM where needed, we need the algo to predict those values.

How to predict LNAM?
The LNAM data is dependent on the Curves data and the attribute SRVC. The curves data is under the header - “~Curve Information Block”. The algorithm should come up with a logic to parse the curve information and learn based on the values present in the LNAM attribute of the training set. So that it can predict the LNAM attribute in the files where the LNAM attribute is missing.

LNAM – Log name - “Log Name” is created by determining the combination of tools used to collect the data. Tools are currently determined by knowing the SVCO and certain combinations of mnemonics found in the ~C Curve Information section. Reference tables are available for some vendors.

Final Submission Guidelines

Technology Stack
Python
(Any Open Source Machine Learning framework can be used)

What should be the prediction format ?
As part of this challenge you should come up with a csv file as output. The output should be as below -
Timestamp, File Name, LNAM Value, Flag

The value of the Flag should be either of - Existing / Predicted / Error

Existing - If the LNAM is existing in the LAS file
Predicted - The LNAM predicted by the ML code you are building
Error - LNAM do not exist in the LAS file and the ML code is not able to predict the same.

Expected Accuracy
Accuracy expected is more than 80% for over all submission.

Dataset
Please check forum for the dataset link.

Quartz Energy - LNAM Prediction Data Science challenge

Key Information

Challenge Overview

Non-standard attributes

Final Submission Guidelines

LEARN:

ELIGIBLE EVENTS:

REVIEW STYLE:

Final Review:

Approval:

CHALLENGE LINKS:

TOOLBOX:

SHARE:

ID: 30066792