1. An analysis on log images
John Zhu
A report submitted for summer internship project
Well Data Management
Shell Exploration & Production Company
Summer 2013
2. Abstract
Well logs are the records of the geologic formations produced in different phases
of wells. Well logs stored in graphical formats are called log images. This is the
report of the project done by John Zhu during summer 2013 which analyzed
the current log images Shell has in the well database. The project experimented
a scripting approach under the Linux environment to monitor and manage the
log image files in the database, performed conversion tests on about 30,000 files,
analyzed potential optimization solutions and created scripts and programs for
certain tasks on log image files.
7. Chapter 1
Overview
1.1 Notations
TYPE EXAMPLE MEANING
bold script.sh, files.txt this is a file
italic text this is a substitution
typewriter /apps/3rdparty/justcgm/bin/ same text in commandline
1.2 Directory Path
The Linux path to the well data directory is:
/glb/am/sepco/data/epw well/US
The path to each well’s directory is according to its UWI1
is in the format:
/<State Code>/<County Code>/<UWI>/
For example, a well with UWI ”42127338430200” has its directory located at
path:
/glb/am/sepco/data/epw well/US/42/127/42127338430200
and all the log images are located in the subdirectories such as:
/glb/am/sepco/data/epw well/US/42/127/42127338430200/logs/
1Unique Well Identifier.
1
8. 1.3 File Distribution Statistics
A complete list of all the files under /logs/ directories was first generated.2
.
Then by sorting the list3
, separate lists of each format were produced and the
counts of each format were also generated.4
Finally, extracting information
and summing up the total size of each format was done, the statistics is shown
below.5
Table 1.1: Overall format distribution
FORMAT AMOUNT SIZE(KB)
BMP 2 128
CGM 21731 202012492
CSV 7 2340
CTM 2 2680
DB 4 24
DOC 1 128
DOCX 2 732
. 1 103788
DRA 4495 18112
EMF 62 1013620
GIF 3 1072
JPG 67 361952
LIC 994 3976
LOG 6 4276
META 9 53800
PDF 15056 55187616
PDS 44 312832
PNG 4 16
PPT 6 16
SIF 4364 36580
SQL 1 12
TIF 211752 938280624
TXT 3 16
XLS 100 237284
XLSX 2 708
ZIP 2856 4905644
2This used the script ListAllLogFiles.sh.
3This used the script sortByFormat.sh.
4This used extractFormat.sh.
5This used a combination of scripts: calAllSize.sh calSize.sh listAllAttr.sh lis-
tAttr.sh.
2
9. Figure 1.1: Format distribution in amount
BMP
CGM
CSV
CTM
DB
DOC
DOCX
DOT
DRA
EMF
GIF
JPG
LIC
LOG
META
PDF
PDS
PNG
PPT
SIF
SQL
TIF
TXT
XLS
XLSX
ZIP
0.5
1
1.5
2
·105
FORMATS
AMOUNT
Figure 1.2: Format distribution in size
BMP
CGM
CSV
CTM
DB
DOC
DOCX
DOT
DRA
EMF
GIF
JPG
LIC
LOG
META
PDF
PDS
PNG
PPT
SIF
SQL
TIF
TXT
XLS
XLSX
ZIP
0
0.2
0.4
0.6
0.8
1
·109
FORMATS
Size(KB)
3
10. From the result, we could see that the major three formats are TIF, CGM
and PDF.
Though there are 19067 files other than those three formats, the total size of
them is less than 10 gigabytes. Most of them were produced because of historical
reasons and we do not usually receive many of those formats any more.
A summary of the size and amount of the three formats is shown below.
Figure 1.3: Three Major Formats in a pie
CGM
16.8%
TIF
78.02%
PDF
4.59%
Table 1.2: Three major formats statistics
FORMAT AMa
AM(%)b
TS(KB)c
TS(%)d
AFS(KB)e
CGM 21731 8.31 202,012,492 16.80 9296
PDF 15056 5.76 55,187,616 4.59 3665
TIF 211752 80.95 938,280,624 78.02 4431
aTotal amount
bTotal amount in percentage of all the files
cTotal size in kilobytes
dTotal size in percentage
eAverage file size
1.4 Vector Image and Raster Image
”Raster image” is the format in which an image’s information is stored as a grid
of pixels.
”Vector image” is the format that an image is represented as a set of description
of graphical elements such as dots and lines.
All the TIF images are raster images. CGM and PDF can be both, but they
are mostly vector images in Shell’s well log database. The industry standard
for log images used to be TIF format, and CGM is also very common because
it is produced by many well log processing applications. PDF format, though
widely used as a portable document, does not perfectly match the well log
industry standard. The task of this project is to analyze on the PDF and CGM
images Shell has and suggest a potential optimization.
4
11. 1.5 Tasks Overview
The main tasks of this project are as following:
• Investigate options to produce an automated process to manipulate image
formats, including conversion and optimization.
• Create a demo program that allows a user to select desired format and to
produce output via background conversion.
• Analyze and produce statistics on the efficiency of both formats through
a test on all the CGM and PDF files in the database.
5
12. Chapter 2
Conversion
Starting with interviews through MOC1
with a set of petrophysicists, I got an
overview of the softwares that petrophysicists use with log images, and what
issues they have with them. JustCGM is the most common software used espe-
cially for tasks like viewing CGM or converting between CGM and PDF. So I did
a thorough study on JustCGM and created scripts that wrap up functionalities
of JustCGM.
2.1 Interview with Petrophysicists
One of the reasons for this project is that there seemed to be a demand for
different formats of log images, so I reached to log image users and looked for
troubles they had with different formats of log images. I found out that most
of the problems came with plotting, and the reason that people need certain
formats is also for plotting.
2.1.1 Issues with CGM
If one needs to plot entire logs with Iterra Plotter, CGM is the only ideal
format. However, incorrect configuration of JustCGM and Iterra cause trouble
to plotting very often.
2.1.2 Issues with PDF
PDF files are desireable to view logs on mobile devices, or to print a small part
of a log on normal printers. It is common for people to have trouble with PDF
files that have page breaks. This can be solved by JustCGM. Another common
problem with PDF is that software often crashes when viewing a large PDF log.
1Microsoft Office Communicator
6
13. 2.2 JustCGM
JustCGM is the software that Shell uses for multiple purposes with CGM format.
This project used following features of JustCGM 5.0.05:
• Conversion from PDF to CGM.
• Conversion from CGM to PDF.
• Merging pages of a image file.
• Plotting CGM logs on Iterra Plotter.
• Optimizing CGM files.
In the software module, there is a main program which launches the program
with graphic user interface, where most of the functions can be accessed eas-
ily. There are also separate executables which can be run with arguments on
command line.
2.2.1 JustCGM in Windows
In Windows environment, the software can be installed by request in the service
section of Shell’s MyRequest website. Modules of separate functionalities are
located in the /bin/ folder under where the software is installed.
2.2.2 JustCGM in Linux
In the Linux environment, the software can be accessed by typing the command
justcgm as the software has been installed on the server, and is directed by
the environment variable. However, separate modules cannot be directly called
from command line. A way of solving this is to wrap the module in a script.
2.3 Module Wrapper Script
The modules that this project has used are pdf2cgm, cgm2pdf, cgmop.
They are modified based on the main program launcher, which sets the license
environment first and then calls the executable.
7
14. Chapter 3
Conversion Demo Program
In order to demonstrate the possibility of performing on demand conversion of
log images, a demo program logConversion.py is written in python1
. The
program executes as the following sequence:
1. Prompts the user for a valid UWI number
• If UWI is invalid, prompts the user with suggested UWI and then
prompts the user to try again.
• If UWI is valid, go to 2.
2. Displays a list of valid log images associated with numbers and prompts
the user to select a number.
• If number exceeds the maximum range, prompts the user to select
again.
• If user enters a non-digit, prompts the user to select a number.
• If the selection is valid, go to 3.
3. Prompts the user to choose from (convert CGM to PDF/convert PDF to
CGM/just download the file).
• If the user chooses 1, convert the file and go to 4.
• If the user chooses 2, convert the file and go to 4.
• If the user chooses 3, copy the file and go to 4.
• If the user selects none of the above, asks the user to choose again.
4. Asks the user whether to continue or exit.
• If continue, go to 2.
• If exit, then exit.
1Python is a programming language
8
16. 4.1 Conversion Test
4.1.1 CGM Conversion Test
The test was conducted using the script cgmOptTest.sh and produced raw test
result cgmOpResult.txt. The test results shows the performance of CGM
optimization and conversion of all the CGM log images, in total 21617 files.
Each record is separated by commas into 9 fields, which indicates
1. UWI
2. filename
3. original size
4. time of optimization
5. size after optimization
6. time of conversion from CGM to PDF
7. converted PDF size
8. time of conversion from PDF to CGM
9. converted CGM size
The raw TXT test result is also converted to an excel workbook format: cgmO-
pReport.xlsx, and two more columns were added to evaluate space efficiency:
• ratio of optimized size and original size
• ratio of converted CGM size and original size
Table 4.1 is the overall statistics of CGM conversion test.
Table 4.1: CGM Statistics Overview
FIELD AVERAGE MEDIAN SUM
ORSa
8596 2636 183,332,064
OPTb
1.13 1 24,154
OPSc
7822 2404 166,824,516
CTCPd
2.12 1 45,307
CPSe
3864 808 82,405,300
CTPCf
43.87 25 935,723
CCSg
13440 8816 286,642,684
OPRh
0.9
CVRi
1.5
aoriginal size (KB)
boptimization time (s)
coptimized size (KB)
dconversion time CGM→PDF
econverted PDF size
fconversion time PDF→CGM
gconverted CGM size
hoptimize ratio
iconversion ratio
10
17. 4.1.2 PDF Conversion Test
The test was conducted using the script pdf2cgmTest.sh, and produced raw
test result pdf2cgmResult.txt. The test results shows the performance of
PDF conversion of all the PDF log images, in total 21617 files. Each record is
separated by commas into 9 fields, which indicates
1. UWI
2. filename
3. original size
4. time of conversion
5. converted size
The raw TXT test result is also converted to an excel workbook format: pdf2cgmReport.xlsx,
and two more columns were added to evaluate space efficiency:
• conversion ratio (converted size/original size ratio)
• Optimized size (minimum of converted and original size)
Table 4.2: PDF Statistics Overview
FIELD AVERAGE MEDIAN SUM
ORSa
3640 1016 53,291,644
CTPCb
39 23 574,767
CCSc
11641 5952 170,414,140
CVRd
8.6 6.1 3.2
aoriginal size (KB)
bconversion time PDF-¿CGM
cconverted CGM size
dconversion ratio
11
18. 4.2 Time Efficiency
Figure 4.1: Time Efficiency Overview
less than 10 10 to 60 over 60
0
20
40
60
80
100
Time Range(s)
Percentage
CGM Optimization
CGM2PDF
PDF2CGM
From the figure we could see that, the majority of CGM files takes less than 10
seconds to convert or optimize, but PDF is much slower.
12
19. 4.3 Space Efficiency
Figure 4.2: Space Efficiency Overview
lessthan10
10to90
90to120
120to1000
0
20
40
60
Size Change(%)
Percentage
CGM Optimization
CGM2PDF
PDF2CGM
According to the test result, CGM optimization can save a lot of space in most
cases. CGM2PDF and PDF2CGM conversion saved space in some cases, but
kept about the same size in the majority of the cases and increased size in others.
PDF2CGM performs the worst, which in 28 percent of the cases increased the
file size more than 120 percent. If we choose whether to optimize, convert or
do nothing according to which way the size is smallest, the calculated potential
space that can be saved is as the figure below.
Figure 4.3: Potential Optimizable Space
20 25 30 35 40
CGM
PDF
Amount of Space Can Be Saved(%)
Format
CGM can have achieve saving of up to 42 percent, and PDF can achieve
saving of up to 18 percent.
13
20. 4.4 Graphic Integrity
Log images are usually printed using JustCGM and on an Iterra Plotter with
CGM files. A set of CGM images were plotted mostly maintain the same qual-
ity; however, a few relatively insignificant deficiencies in aspects such as line
trasparency were noted.
4.5 Problems
4.5.1 Output Error
There are a few conversions for each test that failed to produce correct results,
and gave blank ouputs. File naming caused some of the problems because spaces
and backslashes in Windows filenames cause problems in the Linux environment.
Others were caused by JustCGM failure; updating JustCGM solved some of
those.
4.5.2 Bad Conversion Performance
There are files that grow more than 10 times the original size after conversion,
and take more than a minute to convert. The files that are slow to convert are
very often huge image log1
files. The files that grow much bigger are mostly
small files with text. Through that, I also found out that there are many files
such as reports and directional surveys mixed with logs. I also found many
corrupted CGM files, but they are hard to tell from test results.
4.6 Summary
Compared with PDF format, CGM format converts faster, more possible to be
optimized. The potential amount for saving space on CGM files is 42 percent.
PDF is worse than CGM, PDF takes longer to convert, and can easily get much
bigger after conversion. The amount of space savings achievable on PDF files is
18 percent.
1Image log is a kind of log that has rich color and complex shape.
14
21. Chapter 5
Scripting Flowchart
Figure 5.1: Flowchart of Scripts and Outputs
Linux Well Database
ListAllLogFiles.sh
listOfAllLogFiles.txt Format ls.txt
sortByFormat.sh
calAllSize.sh calSize.sh
listAllAttr.sh
listAttr.sh
listOfLogFilesSorted.txt Format list.txt
extractFormat.sh
PDFList.txt CGMList.txt
pdf2cgmTest.sh cgmOptTest.sh
cgmOpResult.txt pdf2cgmResult.txt
cgmOpReport.xlsx pdf2cgmReport.xlsx
15
22. Chapter 6
Scripts and Files
All the related files should be placed in the same directory as this report. The
directory tree structure is shown in Figure 6.1. Each file is either commented
in Figure 6.1 or individually explained.
cgm2pdf CGM file
Calling with the argument of a CGM file, it configures JustCGM licensing envi-
ronment and calls the executable fscgmint. It converts the CGM file .cgm
and generates the PDF CGM file .cgm.pdf.
cgm2tif CGM file
Calling with the argument of a CGM file, it configures JustCGM licensing envi-
ronment and calls the executable fscgmint. It converts the CGM file .cgm
and generates the TIF CGM file .cgm.tiff.
cgmOpTest.sh input file output file
This is the script that does the conversion on all the CGM files. It takes an
input file that has an absolute path to a CGM file on each line, and copies it
to the /tmp/ directory and perform the conversion test, then records the times
and sizes and removes all the file generated in the /tmp/ directory.
The input file used in this project is CGMList.txt and the output file pro-
duced is cgmOpResult.txt
cgms CGM file
Calling with the argument of a CGM file, it configures JustCGM licensing en-
vironment and calls the executable justtiff with the argument -tcgm. It
optimizes the CGM file .cgm and generates the optimized CGM CGM
file X.cgm.
extractFormat.sh format
This script extracts the list of all the files with the given format. The outputs
are redirected to format list.txt files in the project.
16
23. pdf2cgm PDF file
Calling with the argument of a PDF file, it configures JustCGM licensing en-
vironment and calls the executable ps2cgm with the argument -pjoin=vert
which joins the pages. It converts the PDF file .pdf and generates the CGM
CGM file .pdf.cgm.
pdf2cgmTest.sh input file output file
This is the script that does the conversion on all the PDF files. It takes an input
file that has an absolute path to a PDF file on each line, and copies it to the
/tmp/ directory and performs the conversion test, records the times and sizes,
and removes all the file generated in the /tmp/ directory.
The input file used in this project is PDFList.txt and the output file produced
is pdf2cgmResult.txt
17
24. Figure 6.1: Directory Tree
/
calAllSize.sh .......... Script calculating the total size of all formats
calSize.sh.............Script calculating the total size of each format
cgm2pdf
cgm2tif
CGMList.txt....................................List of all CGM files
cgmOpReport.xlsx.....................Excel format CGM test results
cgmOpResult.txt ........................ Raw result of the CGM test
cgmOpTest.sh
cgms
extractFormat.sh
formats.txt.................All the formats under /logs/ directories
formatsStats.csv................Statistics of formats in CSV format
formatsStats.txt........................Statistics of formats in text
listAllAttr.sh...........Script calling listAttr.sh with each format
listAttr.sh ........ Script that outputs information to format ls.txt
listAllLogFile.sh................Script to get a complete list of files
listOfAllLogFiles.txt.........All the files under /logs/ directories
listOfLogFilesSorted.txt..........All the files sorted by extensions
logConversion.py .... The Python program discussed in Chapter 3 .2
pdf2cgm
pdf2cgmReport.xlsx...................Excel format PDF test results
pdf2cgmResult.txt.......................Raw result of the PDF test
pdf2cgmTest.sh
PDFList.txt.....................................List of all PDF files
report.pdf .............................................. The report
Lists....................The folder containing the lists of each format
BMP List.txt
BMP ls.txt
...
...
ZIP List.txt
ZIP ls.txt
18
25. Chapter 7
Conclusion and Suggestion
7.1 JustCGM
JustCGM has a variety of features dealing with CGM files. It has an active
support team, and it has regular updates. An update installed for JustCGM in
Linux during this project solved many conversion failures. There are still CGM
user not knowing many of JustCGM’s features. A suggestion is that, a detailed
instruction about most of the features of JustCGM should be delivered when
someone request for the software.
7.2 Plotting
Plotting using JustCGM as the software and Iterra Plotter as the hardware is
tricky, since there are a few steps very confusing. An instruction of configuring
the plotting environment is not easily accessable.
7.3 Format Preference
Keeping every log image in CGM format is the best choice for several reasons.
• CGM files are faster to be converted to other formats
• CGM files are easier to be optimized in size
• When a log is huge, CGM viewed in JustCGM performs the best
• CGM is the ideal format for plotting
7.4 Embeding conversion
There is a need for multiple available formats, but maintaining multiple formats
of files is not efficient in space. In this project, conversion from CGM to other
formats was proved to be fast enough, and it is possible to provide download of
demanded format by background conversion.
JustCGM is capable of performing conversion to most of the CGM files in Linux
19
26. environment, to determine a server-side software doing the conversion may need
further research.
7.5 Log Image Standard
Vector images can have huge performance differences with very similar looking.
Receiving good images is crucial to maintanence. The current set of log images
in Shell’s well database is very diverse. Log images were produced from different
eras by different logging and service companies using different equipments and
softwares. It is very hard to track the causes of bad log image files.
To create a standard of receiving log images. It is better to find out what soft-
wares are used to generate those log images, and to require log providers to use
certain softwares.
7.6 Optimization and Graphic Identification
To optimize the log image database to a full extent, more precise graphic iden-
tification is needed. During the testing phase, files other than logs like reports
and directional surveys were found to be mixed in the log directories. Badly cor-
rupted CGM files were also found. To identify problematic files in the database,
we can either manually check each file or use software to detect those issues.
However, manual checking requires much time and no effective software were
found so far.
I suggest to check a selection of files based on the excel testing report. Usu-
ally files with an extremely high or low ratio after conversion, or files with an
extremely long time of conversion are highly probable with issues. A thorough
check on those files may save a big amount of space.
An optimization process which optimizes files to smaller size and removes prob-
lematic files can be developed based on the scripts used in this project, and that
requires business decision with more detailed goals and standards.
20