Challenge Overview
What are we doing here?
Our client needs to analyze and process a set of LAS files on a daily and weekly basis.
What is a LAS File?
It is dataset file with multiple sections. It is a fixed width text file. It contains header data about the well in question, the operator, and logging company and many other attributes as well as information about the types of logging information that are recorded in the file. There is also instrumentation data section of the document which lists depth-registered instrumentation data.
Much better and detailed explanation here -
http://www.cwls.org/las/
Where do I need to look in the LAS file for this project?
As part of this project we will be looking into 3 parts of the LAS file.
To explain a bit more here are some more details -
Section
Anything starting with “~” is a Section Name here and the subsequent data is part of that section.
Attribute Name
It stays on the extreme left side of each and every field.
e.g. in the below screenshot LNAM is a Attribute
Metadata & Value
For each TAG we have the corresponding data on the right hand side with Metadata name
e.g in the below screenshot
Metadata: NAME
Value: AIT/HILT/BHC
Relevant attributes as part of this project -
There are two sets of attributes we are interested in predicting using machine learning
Standard attributes (typically found in a standard LAS file from a vendor)
The attributes to be predicted will most likely be determined by associations of attributes in the various sections, especially the ~C section.
Which Attributes are part of this challenge?
SRVC and LNAM
What is the SRVC attribute ?
The SRVC attribute value is typically the company that collected the data and generated the LAS data file. Many of the larger service companies use unique curve attributes in their files which can be predictive of both the vendor and the tool that was used to collect the data. In other cases the curves may not be predictive but other attributes in the header information might point to certain vendors working in a certain area during a certain time period.
Things to consider in the SRVC attribute
SRVC may have the following
There are a few steps to the process.
LNAM – “Log Name” is created by determining the combination of tools used to collect the data. Tools are currently determined by knowing the SVCO and certain combinations of mnemonics found in the ~C Curve Information section. Reference tables are available for some vendors.
Where does the Machine Learning part come in?
That’s the interesting part I guess. We are going to predict the value of LNAM as part of the challenge. The prediction will be based on the ML algo that you need to create based on analysis of a bunch of LAS files.
What are we predicting for LNAM and why ?
The attribute LNAM is a non-standard field in an LAS file. But for downstream processing we need the attribute present in every file. The current client system is not able to provide the same with good accuracy. So the client team has to go into each file and manually update them as needed. This is a big time consuming process as there are hundreds of thousands of files to be processed.
So instead of going in and checking each file manually and adding the LNAM where needed, we need the algo to predict those values.
How to predict LNAM?
The LNAM data is dependent on the Curves data and the attribute SRVC. The curves data is under the header - “~Curve Information Block”. The algorithm should come up with a logic to parse the curve information and learn based on the values present in the LNAM attribute of the training set. So that it can predict the LNAM attribute in the files where the LNAM attribute is missing.
LNAM – Log name - “Log Name” is created by determining the combination of tools used to collect the data. Tools are currently determined by knowing the SVCO and certain combinations of mnemonics found in the ~C Curve Information section. Reference tables are available for some vendors.
Python
(Any Open Source Machine Learning framework can be used)
What should be the prediction format ?
As part of this challenge you should come up with a csv file as output. The output should be as below -
Timestamp, File Name, LNAM Value, Flag
The value of the Flag should be either of - Existing / Predicted / Error
Existing - If the LNAM is existing in the LAS file
Predicted - The LNAM predicted by the ML code you are building
Error - LNAM do not exist in the LAS file and the ML code is not able to predict the same.
Expected Accuracy
Accuracy expected is more than 80% for over all submission.
Dataset
Please check forum for the dataset link.
Our client needs to analyze and process a set of LAS files on a daily and weekly basis.
What is a LAS File?
It is dataset file with multiple sections. It is a fixed width text file. It contains header data about the well in question, the operator, and logging company and many other attributes as well as information about the types of logging information that are recorded in the file. There is also instrumentation data section of the document which lists depth-registered instrumentation data.
Much better and detailed explanation here -
http://www.cwls.org/las/
Where do I need to look in the LAS file for this project?
As part of this project we will be looking into 3 parts of the LAS file.
- Sections
- Attributes
- Meta Data & Value
To explain a bit more here are some more details -
Section
Anything starting with “~” is a Section Name here and the subsequent data is part of that section.
Attribute Name
It stays on the extreme left side of each and every field.
e.g. in the below screenshot LNAM is a Attribute
Metadata & Value
For each TAG we have the corresponding data on the right hand side with Metadata name
e.g in the below screenshot
Metadata: NAME
Value: AIT/HILT/BHC
Relevant attributes as part of this project -
There are two sets of attributes we are interested in predicting using machine learning
Standard attributes (typically found in a standard LAS file from a vendor)
- SRVC (service company)
- SVCO (also service company)
- LNAM – Log name
- LACT – Log activity
- DSRC – Digit source
- PLVL – Processing Level
- FTOL – Full Toolstring
- CASE – Casedhole Flag
- GTOL – Generic Tool String
The attributes to be predicted will most likely be determined by associations of attributes in the various sections, especially the ~C section.
Which Attributes are part of this challenge?
SRVC and LNAM
What is the SRVC attribute ?
The SRVC attribute value is typically the company that collected the data and generated the LAS data file. Many of the larger service companies use unique curve attributes in their files which can be predictive of both the vendor and the tool that was used to collect the data. In other cases the curves may not be predictive but other attributes in the header information might point to certain vendors working in a certain area during a certain time period.
Things to consider in the SRVC attribute
SRVC may have the following
- It can be missing in the LAS file.
- It can have a different value e.g. just a number (when a number it is a defined number from a standards group)
- It can have spelling mistakes
There are a few steps to the process.
Spelling Mistakes & Other values
You can find a big set of Aliases to the companies in the
- Logging Contractor Alias.xlsx (Please find link in forum)
Just remember this is not a finite list, it can differ much more. So the algorithm should be able to identify based on the data provided in the file.
- Logging Contractor Alias.xlsx (Please find link in forum)
Just remember this is not a finite list, it can differ much more. So the algorithm should be able to identify based on the data provided in the file.
So this way you will be able to make a clear judgment on the right SRVC.
Unknown Values
LAS Files with unknown SRVC and with curve information not enough to predict the LNAM should be marked as error
LAS Files with unknown SRVC and with curve information not enough to predict the LNAM should be marked as error
Non-standard attributes
The non-standard attributes are not typically found in LAS files. These attributes have been created and populated in an existing database using traditional logic statements in order to mass populate them. These generated attributes are then inserted into an LAS file export in order to load the attributes into a new database. These additional attributes will be used to simplify querying the new database using advanced search capabilities. We would like to use a machine learning algorithm to predict these same attributes as new data is delivered into the database and improve existing metadata within the database.LNAM – “Log Name” is created by determining the combination of tools used to collect the data. Tools are currently determined by knowing the SVCO and certain combinations of mnemonics found in the ~C Curve Information section. Reference tables are available for some vendors.
Where does the Machine Learning part come in?
That’s the interesting part I guess. We are going to predict the value of LNAM as part of the challenge. The prediction will be based on the ML algo that you need to create based on analysis of a bunch of LAS files.
What are we predicting for LNAM and why ?
The attribute LNAM is a non-standard field in an LAS file. But for downstream processing we need the attribute present in every file. The current client system is not able to provide the same with good accuracy. So the client team has to go into each file and manually update them as needed. This is a big time consuming process as there are hundreds of thousands of files to be processed.
So instead of going in and checking each file manually and adding the LNAM where needed, we need the algo to predict those values.
How to predict LNAM?
The LNAM data is dependent on the Curves data and the attribute SRVC. The curves data is under the header - “~Curve Information Block”. The algorithm should come up with a logic to parse the curve information and learn based on the values present in the LNAM attribute of the training set. So that it can predict the LNAM attribute in the files where the LNAM attribute is missing.
LNAM – Log name - “Log Name” is created by determining the combination of tools used to collect the data. Tools are currently determined by knowing the SVCO and certain combinations of mnemonics found in the ~C Curve Information section. Reference tables are available for some vendors.
Final Submission Guidelines
Technology StackPython
(Any Open Source Machine Learning framework can be used)
What should be the prediction format ?
As part of this challenge you should come up with a csv file as output. The output should be as below -
Timestamp, File Name, LNAM Value, Flag
The value of the Flag should be either of - Existing / Predicted / Error
Existing - If the LNAM is existing in the LAS file
Predicted - The LNAM predicted by the ML code you are building
Error - LNAM do not exist in the LAS file and the ML code is not able to predict the same.
Expected Accuracy
Accuracy expected is more than 80% for over all submission.
Dataset
Please check forum for the dataset link.