Challenge Overview
What are we doing here?
Our client needs to analyze and process a set of LAS files on a daily and weekly basis.
What is a LAS File?
It is dataset file with multiple sections. It is a fixed width text file. It contains header data about the well in question, the operator, and logging company and many other attributes as well as information about the types of logging information that are recorded in the file. There is also instrumentation data section of the document which lists depth-registered instrumentation data.
Much better and detailed explanation here -
http://www.cwls.org/las/
Where do I need to look in the LAS file for this project?
As part of this project we will be looking into 3 parts of the LAS file.
Sections
Attributes
Meta Data & Value
To explain a bit more here are some more details -
Section
Anything starting with “~” is a Section Name here and the subsequent data is part of that section.
Attribute Name
It stays on the extreme left side of each and every field.
e.g. in the below screenshot LNAM is a Attribute
Metadata & Value
For each TAG we have the corresponding data on the right hand side with Metadata name
e.g in the below screenshot
Metadata: NAME
Value: AIT/HILT/BHC
Relevant attributes as part of this project -
There are two sets of attributes we are interested in predicting using machine learning
Standard attributes (typically found in a standard LAS file from a vendor)
The attributes to be predicted will most likely be determined by associations of attributes in the various sections, especially the ~C section.
Which Attributes are part of this challenge?
SRVC and LNAM
What is the SRVC attribute?
The SRVC attribute value is typically the company that collected the data and generated the LAS data file. Many of the larger service companies use unique curve attributes in their files which can be predictive of both the vendor and the tool that was used to collect the data. In other cases the curves may not be predictive but other attributes in the header information might point to certain vendors working in a certain area during a certain time period.
Things to consider in the SRVC attribute
SRVC may have the following
There are a few steps to the process.
Spelling Mistakes & Other values
Here is a set of Loggin Contractor Aliases to the companies. Just remember this is not a finite list, it can differ much more. The algorithm should be able to identify based on the data provided in the file. In this way you should be able to make a clear judgment on the right SRVC.
Non-standard attributes
The non-standard attributes are not typically found in LAS files. These attributes have been created and populated in an existing database using traditional logic statements in order to mass populate them. These generated attributes are then inserted into an LAS file export in order to load the attributes into a new database. These additional attributes will be used to simplify querying the new database using advanced search capabilities. We would like to use a machine learning algorithm to predict these same attributes as new data is delivered into the database and improve existing metadata within the database.
LNAM – “Log Name” is created by determining the combination of tools used to collect the data. Tools are currently determined by knowing the SVCO and certain combinations of mnemonics found in the ~C Curve Information section. Reference tables are available for some vendors.
What are we predicting for LNAM and why?
The attribute LNAM is a non-standard field in an LAS file. But for downstream processing the client needs the attribute present in every file. The current client system is not able to provide the same with good accuracy. If the LNAM value is already present in the file you can ignore it. The values in the training file should override these values. The client is hoping to automate the process of generating the LNAM's for each file to replace the current manual process. Ultimately this application will be processing thousands of files.
How to predict LNAM?
The LNAM data is dependent on the Curves data and the attribute SRVC. Each logging vendors uses a different set LNAM labels for their LAS files. The curves data is under the header - “~Curve Information Block”. The algorithm should come up with a logic to parse the curve information and learn based on the values present in the LNAM attribute of the training set. So that it can predict the LNAM attribute in the files where the LNAM attribute is missing.
LNAM – Log name - “Log Name” is created by determining the combination of tools used to collect the data. Tools are currently determined by knowing the SVCO and certain combinations of mnemonics found in the ~C Curve Information section. Reference tables are available for some vendors.
Challenge Input
You'll be provided the following data sources which can be found in Code Documents section of the forums.
Python 3
(Any Open Source Machine Learning framework can be used)
Submissions Guidelines
You should provide the following in your submission .zip file:
What should be the prediction format ?
As part of this challenge you should provide a testing.csv file as output that matches the format of the training.csv. Please include the following header in your file:
UWI, LNAM Value
Evaluation Guidelines
Our client needs to analyze and process a set of LAS files on a daily and weekly basis.
What is a LAS File?
It is dataset file with multiple sections. It is a fixed width text file. It contains header data about the well in question, the operator, and logging company and many other attributes as well as information about the types of logging information that are recorded in the file. There is also instrumentation data section of the document which lists depth-registered instrumentation data.
Much better and detailed explanation here -
http://www.cwls.org/las/
Where do I need to look in the LAS file for this project?
As part of this project we will be looking into 3 parts of the LAS file.
Sections
Attributes
Meta Data & Value
To explain a bit more here are some more details -
Section
Anything starting with “~” is a Section Name here and the subsequent data is part of that section.
Attribute Name
It stays on the extreme left side of each and every field.
e.g. in the below screenshot LNAM is a Attribute
Metadata & Value
For each TAG we have the corresponding data on the right hand side with Metadata name
e.g in the below screenshot
Metadata: NAME
Value: AIT/HILT/BHC
Relevant attributes as part of this project -
There are two sets of attributes we are interested in predicting using machine learning
Standard attributes (typically found in a standard LAS file from a vendor)
- SRVC (service company)
- SVCO (also service company)
- LNAM – Log name
- LACT – Log activity
- DSRC – Digit source
- PLVL – Processing Level
- FTOL – Full Toolstring
- CASE – Casedhole Flag
- GTOL – Generic Tool String
The attributes to be predicted will most likely be determined by associations of attributes in the various sections, especially the ~C section.
Which Attributes are part of this challenge?
SRVC and LNAM
What is the SRVC attribute?
The SRVC attribute value is typically the company that collected the data and generated the LAS data file. Many of the larger service companies use unique curve attributes in their files which can be predictive of both the vendor and the tool that was used to collect the data. In other cases the curves may not be predictive but other attributes in the header information might point to certain vendors working in a certain area during a certain time period.
Things to consider in the SRVC attribute
SRVC may have the following
- It can be missing in the LAS file.
- It can have a different value e.g. just a number (when a number it is a defined number from a standards group)
- It can have spelling mistakes
There are a few steps to the process.
Spelling Mistakes & Other values
Here is a set of Loggin Contractor Aliases to the companies. Just remember this is not a finite list, it can differ much more. The algorithm should be able to identify based on the data provided in the file. In this way you should be able to make a clear judgment on the right SRVC.
Non-standard attributes
The non-standard attributes are not typically found in LAS files. These attributes have been created and populated in an existing database using traditional logic statements in order to mass populate them. These generated attributes are then inserted into an LAS file export in order to load the attributes into a new database. These additional attributes will be used to simplify querying the new database using advanced search capabilities. We would like to use a machine learning algorithm to predict these same attributes as new data is delivered into the database and improve existing metadata within the database.
LNAM – “Log Name” is created by determining the combination of tools used to collect the data. Tools are currently determined by knowing the SVCO and certain combinations of mnemonics found in the ~C Curve Information section. Reference tables are available for some vendors.
What are we predicting for LNAM and why?
The attribute LNAM is a non-standard field in an LAS file. But for downstream processing the client needs the attribute present in every file. The current client system is not able to provide the same with good accuracy. If the LNAM value is already present in the file you can ignore it. The values in the training file should override these values. The client is hoping to automate the process of generating the LNAM's for each file to replace the current manual process. Ultimately this application will be processing thousands of files.
How to predict LNAM?
The LNAM data is dependent on the Curves data and the attribute SRVC. Each logging vendors uses a different set LNAM labels for their LAS files. The curves data is under the header - “~Curve Information Block”. The algorithm should come up with a logic to parse the curve information and learn based on the values present in the LNAM attribute of the training set. So that it can predict the LNAM attribute in the files where the LNAM attribute is missing.
LNAM – Log name - “Log Name” is created by determining the combination of tools used to collect the data. Tools are currently determined by knowing the SVCO and certain combinations of mnemonics found in the ~C Curve Information section. Reference tables are available for some vendors.
Challenge Input
You'll be provided the following data sources which can be found in Code Documents section of the forums.
- Training Set of LAS files.
- training.csv which displays UWI numbers (Well Identifier) and the predicted LNAM value.
- Testing Set of LAS files.
- Logging Contractor Alias.xlsx.
Final Submission Guidelines
Technology StackPython 3
(Any Open Source Machine Learning framework can be used)
Submissions Guidelines
You should provide the following in your submission .zip file:
- Your source code
- Dependency management and build scripts (pip install, etc)
- Documentation - README.md
- testing.csv - a file with your predictions for the LNAM assignments in the testing set of LAS files.
What should be the prediction format ?
As part of this challenge you should provide a testing.csv file as output that matches the format of the training.csv. Please include the following header in your file:
UWI, LNAM Value
Evaluation Guidelines
- The solutions will be ranked in order of accuracy of the predictions on the testing set of LAS files. We'll publish our test harness in the forums so you can use it to evaluate your own solution against the training data.
- Your solution must be flexible enough to accommodate new training and testing files. The client (ultimately) will be using this solution against and broader data set and will need to retrain from time to time.