SlideShare a Scribd company logo
1 of 26
Download to read offline
An analysis on log images
John Zhu
A report submitted for summer internship project
Well Data Management
Shell Exploration & Production Company
Summer 2013
Abstract
Well logs are the records of the geologic formations produced in different phases
of wells. Well logs stored in graphical formats are called log images. This is the
report of the project done by John Zhu during summer 2013 which analyzed
the current log images Shell has in the well database. The project experimented
a scripting approach under the Linux environment to monitor and manage the
log image files in the database, performed conversion tests on about 30,000 files,
analyzed potential optimization solutions and created scripts and programs for
certain tasks on log image files.
Contents
1 Overview 1
1.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Directory Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 File Distribution Statistics . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Vector Image and Raster Image . . . . . . . . . . . . . . . . . . . 4
1.5 Tasks Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Conversion 6
2.1 Interview with Petrophysicists . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Issues with CGM . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Issues with PDF . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 JustCGM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 JustCGM in Windows . . . . . . . . . . . . . . . . . . . . 7
2.2.2 JustCGM in Linux . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Module Wrapper Script . . . . . . . . . . . . . . . . . . . . . . . 7
3 Conversion Demo Program 8
4 Conversion Test 9
4.1 Conversion Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.1.1 CGM Conversion Test . . . . . . . . . . . . . . . . . . . . 10
4.1.2 PDF Conversion Test . . . . . . . . . . . . . . . . . . . . 11
4.2 Time Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3 Space Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4 Graphic Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.5.1 Output Error . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.5.2 Bad Conversion Performance . . . . . . . . . . . . . . . . 14
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5 Scripting Flowchart 15
6 Scripts and Files 16
7 Conclusion and Suggestion 19
7.1 JustCGM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7.2 Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7.3 Format Preference . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7.4 Embeding conversion . . . . . . . . . . . . . . . . . . . . . . . . . 19
i
7.5 Log Image Standard . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.6 Optimization and Graphic Identification . . . . . . . . . . . . . . 20
ii
List of Figures
1.1 Format distribution in amount . . . . . . . . . . . . . . . . . . . 3
1.2 Format distribution in size . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Three Major Formats in a pie . . . . . . . . . . . . . . . . . . . . 4
4.1 Time Efficiency Overview . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Space Efficiency Overview . . . . . . . . . . . . . . . . . . . . . . 13
4.3 Potential Optimizable Space . . . . . . . . . . . . . . . . . . . . . 13
5.1 Flowchart of Scripts and Outputs . . . . . . . . . . . . . . . . . . 15
6.1 Directory Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
iii
List of Tables
1.1 Overall format distribution . . . . . . . . . . . . . . . . . . . . . 2
1.2 Three major formats statistics . . . . . . . . . . . . . . . . . . . . 4
4.1 CGM Statistics Overview . . . . . . . . . . . . . . . . . . . . . . 10
4.2 PDF Statistics Overview . . . . . . . . . . . . . . . . . . . . . . . 11
iv
Chapter 1
Overview
1.1 Notations
TYPE EXAMPLE MEANING
bold script.sh, files.txt this is a file
italic text this is a substitution
typewriter /apps/3rdparty/justcgm/bin/ same text in commandline
1.2 Directory Path
The Linux path to the well data directory is:
/glb/am/sepco/data/epw well/US
The path to each well’s directory is according to its UWI1
is in the format:
/<State Code>/<County Code>/<UWI>/
For example, a well with UWI ”42127338430200” has its directory located at
path:
/glb/am/sepco/data/epw well/US/42/127/42127338430200
and all the log images are located in the subdirectories such as:
/glb/am/sepco/data/epw well/US/42/127/42127338430200/logs/
1Unique Well Identifier.
1
1.3 File Distribution Statistics
A complete list of all the files under /logs/ directories was first generated.2
.
Then by sorting the list3
, separate lists of each format were produced and the
counts of each format were also generated.4
Finally, extracting information
and summing up the total size of each format was done, the statistics is shown
below.5
Table 1.1: Overall format distribution
FORMAT AMOUNT SIZE(KB)
BMP 2 128
CGM 21731 202012492
CSV 7 2340
CTM 2 2680
DB 4 24
DOC 1 128
DOCX 2 732
. 1 103788
DRA 4495 18112
EMF 62 1013620
GIF 3 1072
JPG 67 361952
LIC 994 3976
LOG 6 4276
META 9 53800
PDF 15056 55187616
PDS 44 312832
PNG 4 16
PPT 6 16
SIF 4364 36580
SQL 1 12
TIF 211752 938280624
TXT 3 16
XLS 100 237284
XLSX 2 708
ZIP 2856 4905644
2This used the script ListAllLogFiles.sh.
3This used the script sortByFormat.sh.
4This used extractFormat.sh.
5This used a combination of scripts: calAllSize.sh calSize.sh listAllAttr.sh lis-
tAttr.sh.
2
Figure 1.1: Format distribution in amount
BMP
CGM
CSV
CTM
DB
DOC
DOCX
DOT
DRA
EMF
GIF
JPG
LIC
LOG
META
PDF
PDS
PNG
PPT
SIF
SQL
TIF
TXT
XLS
XLSX
ZIP
0.5
1
1.5
2
·105
FORMATS
AMOUNT
Figure 1.2: Format distribution in size
BMP
CGM
CSV
CTM
DB
DOC
DOCX
DOT
DRA
EMF
GIF
JPG
LIC
LOG
META
PDF
PDS
PNG
PPT
SIF
SQL
TIF
TXT
XLS
XLSX
ZIP
0
0.2
0.4
0.6
0.8
1
·109
FORMATS
Size(KB)
3
From the result, we could see that the major three formats are TIF, CGM
and PDF.
Though there are 19067 files other than those three formats, the total size of
them is less than 10 gigabytes. Most of them were produced because of historical
reasons and we do not usually receive many of those formats any more.
A summary of the size and amount of the three formats is shown below.
Figure 1.3: Three Major Formats in a pie
CGM
16.8%
TIF
78.02%
PDF
4.59%
Table 1.2: Three major formats statistics
FORMAT AMa
AM(%)b
TS(KB)c
TS(%)d
AFS(KB)e
CGM 21731 8.31 202,012,492 16.80 9296
PDF 15056 5.76 55,187,616 4.59 3665
TIF 211752 80.95 938,280,624 78.02 4431
aTotal amount
bTotal amount in percentage of all the files
cTotal size in kilobytes
dTotal size in percentage
eAverage file size
1.4 Vector Image and Raster Image
”Raster image” is the format in which an image’s information is stored as a grid
of pixels.
”Vector image” is the format that an image is represented as a set of description
of graphical elements such as dots and lines.
All the TIF images are raster images. CGM and PDF can be both, but they
are mostly vector images in Shell’s well log database. The industry standard
for log images used to be TIF format, and CGM is also very common because
it is produced by many well log processing applications. PDF format, though
widely used as a portable document, does not perfectly match the well log
industry standard. The task of this project is to analyze on the PDF and CGM
images Shell has and suggest a potential optimization.
4
1.5 Tasks Overview
The main tasks of this project are as following:
• Investigate options to produce an automated process to manipulate image
formats, including conversion and optimization.
• Create a demo program that allows a user to select desired format and to
produce output via background conversion.
• Analyze and produce statistics on the efficiency of both formats through
a test on all the CGM and PDF files in the database.
5
Chapter 2
Conversion
Starting with interviews through MOC1
with a set of petrophysicists, I got an
overview of the softwares that petrophysicists use with log images, and what
issues they have with them. JustCGM is the most common software used espe-
cially for tasks like viewing CGM or converting between CGM and PDF. So I did
a thorough study on JustCGM and created scripts that wrap up functionalities
of JustCGM.
2.1 Interview with Petrophysicists
One of the reasons for this project is that there seemed to be a demand for
different formats of log images, so I reached to log image users and looked for
troubles they had with different formats of log images. I found out that most
of the problems came with plotting, and the reason that people need certain
formats is also for plotting.
2.1.1 Issues with CGM
If one needs to plot entire logs with Iterra Plotter, CGM is the only ideal
format. However, incorrect configuration of JustCGM and Iterra cause trouble
to plotting very often.
2.1.2 Issues with PDF
PDF files are desireable to view logs on mobile devices, or to print a small part
of a log on normal printers. It is common for people to have trouble with PDF
files that have page breaks. This can be solved by JustCGM. Another common
problem with PDF is that software often crashes when viewing a large PDF log.
1Microsoft Office Communicator
6
2.2 JustCGM
JustCGM is the software that Shell uses for multiple purposes with CGM format.
This project used following features of JustCGM 5.0.05:
• Conversion from PDF to CGM.
• Conversion from CGM to PDF.
• Merging pages of a image file.
• Plotting CGM logs on Iterra Plotter.
• Optimizing CGM files.
In the software module, there is a main program which launches the program
with graphic user interface, where most of the functions can be accessed eas-
ily. There are also separate executables which can be run with arguments on
command line.
2.2.1 JustCGM in Windows
In Windows environment, the software can be installed by request in the service
section of Shell’s MyRequest website. Modules of separate functionalities are
located in the /bin/ folder under where the software is installed.
2.2.2 JustCGM in Linux
In the Linux environment, the software can be accessed by typing the command
justcgm as the software has been installed on the server, and is directed by
the environment variable. However, separate modules cannot be directly called
from command line. A way of solving this is to wrap the module in a script.
2.3 Module Wrapper Script
The modules that this project has used are pdf2cgm, cgm2pdf, cgmop.
They are modified based on the main program launcher, which sets the license
environment first and then calls the executable.
7
Chapter 3
Conversion Demo Program
In order to demonstrate the possibility of performing on demand conversion of
log images, a demo program logConversion.py is written in python1
. The
program executes as the following sequence:
1. Prompts the user for a valid UWI number
• If UWI is invalid, prompts the user with suggested UWI and then
prompts the user to try again.
• If UWI is valid, go to 2.
2. Displays a list of valid log images associated with numbers and prompts
the user to select a number.
• If number exceeds the maximum range, prompts the user to select
again.
• If user enters a non-digit, prompts the user to select a number.
• If the selection is valid, go to 3.
3. Prompts the user to choose from (convert CGM to PDF/convert PDF to
CGM/just download the file).
• If the user chooses 1, convert the file and go to 4.
• If the user chooses 2, convert the file and go to 4.
• If the user chooses 3, copy the file and go to 4.
• If the user selects none of the above, asks the user to choose again.
4. Asks the user whether to continue or exit.
• If continue, go to 2.
• If exit, then exit.
1Python is a programming language
8
Chapter 4
Conversion Test
9
4.1 Conversion Test
4.1.1 CGM Conversion Test
The test was conducted using the script cgmOptTest.sh and produced raw test
result cgmOpResult.txt. The test results shows the performance of CGM
optimization and conversion of all the CGM log images, in total 21617 files.
Each record is separated by commas into 9 fields, which indicates
1. UWI
2. filename
3. original size
4. time of optimization
5. size after optimization
6. time of conversion from CGM to PDF
7. converted PDF size
8. time of conversion from PDF to CGM
9. converted CGM size
The raw TXT test result is also converted to an excel workbook format: cgmO-
pReport.xlsx, and two more columns were added to evaluate space efficiency:
• ratio of optimized size and original size
• ratio of converted CGM size and original size
Table 4.1 is the overall statistics of CGM conversion test.
Table 4.1: CGM Statistics Overview
FIELD AVERAGE MEDIAN SUM
ORSa
8596 2636 183,332,064
OPTb
1.13 1 24,154
OPSc
7822 2404 166,824,516
CTCPd
2.12 1 45,307
CPSe
3864 808 82,405,300
CTPCf
43.87 25 935,723
CCSg
13440 8816 286,642,684
OPRh
0.9
CVRi
1.5
aoriginal size (KB)
boptimization time (s)
coptimized size (KB)
dconversion time CGM→PDF
econverted PDF size
fconversion time PDF→CGM
gconverted CGM size
hoptimize ratio
iconversion ratio
10
4.1.2 PDF Conversion Test
The test was conducted using the script pdf2cgmTest.sh, and produced raw
test result pdf2cgmResult.txt. The test results shows the performance of
PDF conversion of all the PDF log images, in total 21617 files. Each record is
separated by commas into 9 fields, which indicates
1. UWI
2. filename
3. original size
4. time of conversion
5. converted size
The raw TXT test result is also converted to an excel workbook format: pdf2cgmReport.xlsx,
and two more columns were added to evaluate space efficiency:
• conversion ratio (converted size/original size ratio)
• Optimized size (minimum of converted and original size)
Table 4.2: PDF Statistics Overview
FIELD AVERAGE MEDIAN SUM
ORSa
3640 1016 53,291,644
CTPCb
39 23 574,767
CCSc
11641 5952 170,414,140
CVRd
8.6 6.1 3.2
aoriginal size (KB)
bconversion time PDF-¿CGM
cconverted CGM size
dconversion ratio
11
4.2 Time Efficiency
Figure 4.1: Time Efficiency Overview
less than 10 10 to 60 over 60
0
20
40
60
80
100
Time Range(s)
Percentage
CGM Optimization
CGM2PDF
PDF2CGM
From the figure we could see that, the majority of CGM files takes less than 10
seconds to convert or optimize, but PDF is much slower.
12
4.3 Space Efficiency
Figure 4.2: Space Efficiency Overview
lessthan10
10to90
90to120
120to1000
0
20
40
60
Size Change(%)
Percentage
CGM Optimization
CGM2PDF
PDF2CGM
According to the test result, CGM optimization can save a lot of space in most
cases. CGM2PDF and PDF2CGM conversion saved space in some cases, but
kept about the same size in the majority of the cases and increased size in others.
PDF2CGM performs the worst, which in 28 percent of the cases increased the
file size more than 120 percent. If we choose whether to optimize, convert or
do nothing according to which way the size is smallest, the calculated potential
space that can be saved is as the figure below.
Figure 4.3: Potential Optimizable Space
20 25 30 35 40
CGM
PDF
Amount of Space Can Be Saved(%)
Format
CGM can have achieve saving of up to 42 percent, and PDF can achieve
saving of up to 18 percent.
13
4.4 Graphic Integrity
Log images are usually printed using JustCGM and on an Iterra Plotter with
CGM files. A set of CGM images were plotted mostly maintain the same qual-
ity; however, a few relatively insignificant deficiencies in aspects such as line
trasparency were noted.
4.5 Problems
4.5.1 Output Error
There are a few conversions for each test that failed to produce correct results,
and gave blank ouputs. File naming caused some of the problems because spaces
and backslashes in Windows filenames cause problems in the Linux environment.
Others were caused by JustCGM failure; updating JustCGM solved some of
those.
4.5.2 Bad Conversion Performance
There are files that grow more than 10 times the original size after conversion,
and take more than a minute to convert. The files that are slow to convert are
very often huge image log1
files. The files that grow much bigger are mostly
small files with text. Through that, I also found out that there are many files
such as reports and directional surveys mixed with logs. I also found many
corrupted CGM files, but they are hard to tell from test results.
4.6 Summary
Compared with PDF format, CGM format converts faster, more possible to be
optimized. The potential amount for saving space on CGM files is 42 percent.
PDF is worse than CGM, PDF takes longer to convert, and can easily get much
bigger after conversion. The amount of space savings achievable on PDF files is
18 percent.
1Image log is a kind of log that has rich color and complex shape.
14
Chapter 5
Scripting Flowchart
Figure 5.1: Flowchart of Scripts and Outputs
Linux Well Database
ListAllLogFiles.sh
listOfAllLogFiles.txt Format ls.txt
sortByFormat.sh
calAllSize.sh calSize.sh
listAllAttr.sh
listAttr.sh
listOfLogFilesSorted.txt Format list.txt
extractFormat.sh
PDFList.txt CGMList.txt
pdf2cgmTest.sh cgmOptTest.sh
cgmOpResult.txt pdf2cgmResult.txt
cgmOpReport.xlsx pdf2cgmReport.xlsx
15
Chapter 6
Scripts and Files
All the related files should be placed in the same directory as this report. The
directory tree structure is shown in Figure 6.1. Each file is either commented
in Figure 6.1 or individually explained.
cgm2pdf CGM file
Calling with the argument of a CGM file, it configures JustCGM licensing envi-
ronment and calls the executable fscgmint. It converts the CGM file .cgm
and generates the PDF CGM file .cgm.pdf.
cgm2tif CGM file
Calling with the argument of a CGM file, it configures JustCGM licensing envi-
ronment and calls the executable fscgmint. It converts the CGM file .cgm
and generates the TIF CGM file .cgm.tiff.
cgmOpTest.sh input file output file
This is the script that does the conversion on all the CGM files. It takes an
input file that has an absolute path to a CGM file on each line, and copies it
to the /tmp/ directory and perform the conversion test, then records the times
and sizes and removes all the file generated in the /tmp/ directory.
The input file used in this project is CGMList.txt and the output file pro-
duced is cgmOpResult.txt
cgms CGM file
Calling with the argument of a CGM file, it configures JustCGM licensing en-
vironment and calls the executable justtiff with the argument -tcgm. It
optimizes the CGM file .cgm and generates the optimized CGM CGM
file X.cgm.
extractFormat.sh format
This script extracts the list of all the files with the given format. The outputs
are redirected to format list.txt files in the project.
16
pdf2cgm PDF file
Calling with the argument of a PDF file, it configures JustCGM licensing en-
vironment and calls the executable ps2cgm with the argument -pjoin=vert
which joins the pages. It converts the PDF file .pdf and generates the CGM
CGM file .pdf.cgm.
pdf2cgmTest.sh input file output file
This is the script that does the conversion on all the PDF files. It takes an input
file that has an absolute path to a PDF file on each line, and copies it to the
/tmp/ directory and performs the conversion test, records the times and sizes,
and removes all the file generated in the /tmp/ directory.
The input file used in this project is PDFList.txt and the output file produced
is pdf2cgmResult.txt
17
Figure 6.1: Directory Tree
/
calAllSize.sh .......... Script calculating the total size of all formats
calSize.sh.............Script calculating the total size of each format
cgm2pdf
cgm2tif
CGMList.txt....................................List of all CGM files
cgmOpReport.xlsx.....................Excel format CGM test results
cgmOpResult.txt ........................ Raw result of the CGM test
cgmOpTest.sh
cgms
extractFormat.sh
formats.txt.................All the formats under /logs/ directories
formatsStats.csv................Statistics of formats in CSV format
formatsStats.txt........................Statistics of formats in text
listAllAttr.sh...........Script calling listAttr.sh with each format
listAttr.sh ........ Script that outputs information to format ls.txt
listAllLogFile.sh................Script to get a complete list of files
listOfAllLogFiles.txt.........All the files under /logs/ directories
listOfLogFilesSorted.txt..........All the files sorted by extensions
logConversion.py .... The Python program discussed in Chapter 3 .2
pdf2cgm
pdf2cgmReport.xlsx...................Excel format PDF test results
pdf2cgmResult.txt.......................Raw result of the PDF test
pdf2cgmTest.sh
PDFList.txt.....................................List of all PDF files
report.pdf .............................................. The report
Lists....................The folder containing the lists of each format
BMP List.txt
BMP ls.txt
...
...
ZIP List.txt
ZIP ls.txt
18
Chapter 7
Conclusion and Suggestion
7.1 JustCGM
JustCGM has a variety of features dealing with CGM files. It has an active
support team, and it has regular updates. An update installed for JustCGM in
Linux during this project solved many conversion failures. There are still CGM
user not knowing many of JustCGM’s features. A suggestion is that, a detailed
instruction about most of the features of JustCGM should be delivered when
someone request for the software.
7.2 Plotting
Plotting using JustCGM as the software and Iterra Plotter as the hardware is
tricky, since there are a few steps very confusing. An instruction of configuring
the plotting environment is not easily accessable.
7.3 Format Preference
Keeping every log image in CGM format is the best choice for several reasons.
• CGM files are faster to be converted to other formats
• CGM files are easier to be optimized in size
• When a log is huge, CGM viewed in JustCGM performs the best
• CGM is the ideal format for plotting
7.4 Embeding conversion
There is a need for multiple available formats, but maintaining multiple formats
of files is not efficient in space. In this project, conversion from CGM to other
formats was proved to be fast enough, and it is possible to provide download of
demanded format by background conversion.
JustCGM is capable of performing conversion to most of the CGM files in Linux
19
environment, to determine a server-side software doing the conversion may need
further research.
7.5 Log Image Standard
Vector images can have huge performance differences with very similar looking.
Receiving good images is crucial to maintanence. The current set of log images
in Shell’s well database is very diverse. Log images were produced from different
eras by different logging and service companies using different equipments and
softwares. It is very hard to track the causes of bad log image files.
To create a standard of receiving log images. It is better to find out what soft-
wares are used to generate those log images, and to require log providers to use
certain softwares.
7.6 Optimization and Graphic Identification
To optimize the log image database to a full extent, more precise graphic iden-
tification is needed. During the testing phase, files other than logs like reports
and directional surveys were found to be mixed in the log directories. Badly cor-
rupted CGM files were also found. To identify problematic files in the database,
we can either manually check each file or use software to detect those issues.
However, manual checking requires much time and no effective software were
found so far.
I suggest to check a selection of files based on the excel testing report. Usu-
ally files with an extremely high or low ratio after conversion, or files with an
extremely long time of conversion are highly probable with issues. A thorough
check on those files may save a big amount of space.
An optimization process which optimizes files to smaller size and removes prob-
lematic files can be developed based on the scripts used in this project, and that
requires business decision with more detailed goals and standards.
20

More Related Content

Viewers also liked

Ontspanningsmassage
OntspanningsmassageOntspanningsmassage
OntspanningsmassageSanne Koomen
 
Presentation jdi baru paket 500 usd
Presentation jdi baru paket 500 usdPresentation jdi baru paket 500 usd
Presentation jdi baru paket 500 usdMogi Mukhtar
 
Jdi global indonesia
Jdi global indonesiaJdi global indonesia
Jdi global indonesiaMogi Mukhtar
 
Presentation Rho Kinase-2 Activation in HUVEC111
Presentation Rho Kinase-2 Activation in HUVEC111Presentation Rho Kinase-2 Activation in HUVEC111
Presentation Rho Kinase-2 Activation in HUVEC111Keren Ferris
 
Seo Isn’t Dead - On The Edge NFP Conference 2015
Seo Isn’t Dead - On The Edge NFP Conference 2015Seo Isn’t Dead - On The Edge NFP Conference 2015
Seo Isn’t Dead - On The Edge NFP Conference 2015Andy Williams SEO
 

Viewers also liked (8)

Jittu
Jittu Jittu
Jittu
 
Fortune 500 - Fortune
Fortune 500 - FortuneFortune 500 - Fortune
Fortune 500 - Fortune
 
Ontspanningsmassage
OntspanningsmassageOntspanningsmassage
Ontspanningsmassage
 
Presentation jdi baru paket 500 usd
Presentation jdi baru paket 500 usdPresentation jdi baru paket 500 usd
Presentation jdi baru paket 500 usd
 
Jdi global indonesia
Jdi global indonesiaJdi global indonesia
Jdi global indonesia
 
[#Mfgadvances] 9 warning signs
[#Mfgadvances] 9 warning signs[#Mfgadvances] 9 warning signs
[#Mfgadvances] 9 warning signs
 
Presentation Rho Kinase-2 Activation in HUVEC111
Presentation Rho Kinase-2 Activation in HUVEC111Presentation Rho Kinase-2 Activation in HUVEC111
Presentation Rho Kinase-2 Activation in HUVEC111
 
Seo Isn’t Dead - On The Edge NFP Conference 2015
Seo Isn’t Dead - On The Edge NFP Conference 2015Seo Isn’t Dead - On The Edge NFP Conference 2015
Seo Isn’t Dead - On The Edge NFP Conference 2015
 

Similar to WellLogsAnalysis (20)

Work Measurement Application - Ghent Internship Report - Adel Belasker
Work Measurement Application - Ghent Internship Report - Adel BelaskerWork Measurement Application - Ghent Internship Report - Adel Belasker
Work Measurement Application - Ghent Internship Report - Adel Belasker
 
diss
dissdiss
diss
 
Ashwin_Thesis
Ashwin_ThesisAshwin_Thesis
Ashwin_Thesis
 
Performance_Programming
Performance_ProgrammingPerformance_Programming
Performance_Programming
 
S Pii Plus+C+Library+Programmer+Guide
S Pii Plus+C+Library+Programmer+GuideS Pii Plus+C+Library+Programmer+Guide
S Pii Plus+C+Library+Programmer+Guide
 
S Pii Plus+C+Library+Programmer+Guide
S Pii Plus+C+Library+Programmer+GuideS Pii Plus+C+Library+Programmer+Guide
S Pii Plus+C+Library+Programmer+Guide
 
MIL-STD-498:1994
MIL-STD-498:1994MIL-STD-498:1994
MIL-STD-498:1994
 
Thesis
ThesisThesis
Thesis
 
3 g m gw
3 g m gw3 g m gw
3 g m gw
 
22024582
2202458222024582
22024582
 
report
reportreport
report
 
CS4099Report
CS4099ReportCS4099Report
CS4099Report
 
Swi prolog-6.2.6
Swi prolog-6.2.6Swi prolog-6.2.6
Swi prolog-6.2.6
 
BI Project report
BI Project reportBI Project report
BI Project report
 
Software guide 3.20.0
Software guide 3.20.0Software guide 3.20.0
Software guide 3.20.0
 
10.1.1.21.3147
10.1.1.21.314710.1.1.21.3147
10.1.1.21.3147
 
10.1.1.21.3147
10.1.1.21.314710.1.1.21.3147
10.1.1.21.3147
 
test6
test6test6
test6
 
Live chat srs
Live chat srsLive chat srs
Live chat srs
 
Data over dab
Data over dabData over dab
Data over dab
 

WellLogsAnalysis

  • 1. An analysis on log images John Zhu A report submitted for summer internship project Well Data Management Shell Exploration & Production Company Summer 2013
  • 2. Abstract Well logs are the records of the geologic formations produced in different phases of wells. Well logs stored in graphical formats are called log images. This is the report of the project done by John Zhu during summer 2013 which analyzed the current log images Shell has in the well database. The project experimented a scripting approach under the Linux environment to monitor and manage the log image files in the database, performed conversion tests on about 30,000 files, analyzed potential optimization solutions and created scripts and programs for certain tasks on log image files.
  • 3. Contents 1 Overview 1 1.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Directory Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 File Distribution Statistics . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Vector Image and Raster Image . . . . . . . . . . . . . . . . . . . 4 1.5 Tasks Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Conversion 6 2.1 Interview with Petrophysicists . . . . . . . . . . . . . . . . . . . . 6 2.1.1 Issues with CGM . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.2 Issues with PDF . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 JustCGM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 JustCGM in Windows . . . . . . . . . . . . . . . . . . . . 7 2.2.2 JustCGM in Linux . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Module Wrapper Script . . . . . . . . . . . . . . . . . . . . . . . 7 3 Conversion Demo Program 8 4 Conversion Test 9 4.1 Conversion Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.1.1 CGM Conversion Test . . . . . . . . . . . . . . . . . . . . 10 4.1.2 PDF Conversion Test . . . . . . . . . . . . . . . . . . . . 11 4.2 Time Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.3 Space Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.4 Graphic Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.5.1 Output Error . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.5.2 Bad Conversion Performance . . . . . . . . . . . . . . . . 14 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5 Scripting Flowchart 15 6 Scripts and Files 16 7 Conclusion and Suggestion 19 7.1 JustCGM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 7.2 Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 7.3 Format Preference . . . . . . . . . . . . . . . . . . . . . . . . . . 19 7.4 Embeding conversion . . . . . . . . . . . . . . . . . . . . . . . . . 19 i
  • 4. 7.5 Log Image Standard . . . . . . . . . . . . . . . . . . . . . . . . . 20 7.6 Optimization and Graphic Identification . . . . . . . . . . . . . . 20 ii
  • 5. List of Figures 1.1 Format distribution in amount . . . . . . . . . . . . . . . . . . . 3 1.2 Format distribution in size . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Three Major Formats in a pie . . . . . . . . . . . . . . . . . . . . 4 4.1 Time Efficiency Overview . . . . . . . . . . . . . . . . . . . . . . 12 4.2 Space Efficiency Overview . . . . . . . . . . . . . . . . . . . . . . 13 4.3 Potential Optimizable Space . . . . . . . . . . . . . . . . . . . . . 13 5.1 Flowchart of Scripts and Outputs . . . . . . . . . . . . . . . . . . 15 6.1 Directory Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 iii
  • 6. List of Tables 1.1 Overall format distribution . . . . . . . . . . . . . . . . . . . . . 2 1.2 Three major formats statistics . . . . . . . . . . . . . . . . . . . . 4 4.1 CGM Statistics Overview . . . . . . . . . . . . . . . . . . . . . . 10 4.2 PDF Statistics Overview . . . . . . . . . . . . . . . . . . . . . . . 11 iv
  • 7. Chapter 1 Overview 1.1 Notations TYPE EXAMPLE MEANING bold script.sh, files.txt this is a file italic text this is a substitution typewriter /apps/3rdparty/justcgm/bin/ same text in commandline 1.2 Directory Path The Linux path to the well data directory is: /glb/am/sepco/data/epw well/US The path to each well’s directory is according to its UWI1 is in the format: /<State Code>/<County Code>/<UWI>/ For example, a well with UWI ”42127338430200” has its directory located at path: /glb/am/sepco/data/epw well/US/42/127/42127338430200 and all the log images are located in the subdirectories such as: /glb/am/sepco/data/epw well/US/42/127/42127338430200/logs/ 1Unique Well Identifier. 1
  • 8. 1.3 File Distribution Statistics A complete list of all the files under /logs/ directories was first generated.2 . Then by sorting the list3 , separate lists of each format were produced and the counts of each format were also generated.4 Finally, extracting information and summing up the total size of each format was done, the statistics is shown below.5 Table 1.1: Overall format distribution FORMAT AMOUNT SIZE(KB) BMP 2 128 CGM 21731 202012492 CSV 7 2340 CTM 2 2680 DB 4 24 DOC 1 128 DOCX 2 732 . 1 103788 DRA 4495 18112 EMF 62 1013620 GIF 3 1072 JPG 67 361952 LIC 994 3976 LOG 6 4276 META 9 53800 PDF 15056 55187616 PDS 44 312832 PNG 4 16 PPT 6 16 SIF 4364 36580 SQL 1 12 TIF 211752 938280624 TXT 3 16 XLS 100 237284 XLSX 2 708 ZIP 2856 4905644 2This used the script ListAllLogFiles.sh. 3This used the script sortByFormat.sh. 4This used extractFormat.sh. 5This used a combination of scripts: calAllSize.sh calSize.sh listAllAttr.sh lis- tAttr.sh. 2
  • 9. Figure 1.1: Format distribution in amount BMP CGM CSV CTM DB DOC DOCX DOT DRA EMF GIF JPG LIC LOG META PDF PDS PNG PPT SIF SQL TIF TXT XLS XLSX ZIP 0.5 1 1.5 2 ·105 FORMATS AMOUNT Figure 1.2: Format distribution in size BMP CGM CSV CTM DB DOC DOCX DOT DRA EMF GIF JPG LIC LOG META PDF PDS PNG PPT SIF SQL TIF TXT XLS XLSX ZIP 0 0.2 0.4 0.6 0.8 1 ·109 FORMATS Size(KB) 3
  • 10. From the result, we could see that the major three formats are TIF, CGM and PDF. Though there are 19067 files other than those three formats, the total size of them is less than 10 gigabytes. Most of them were produced because of historical reasons and we do not usually receive many of those formats any more. A summary of the size and amount of the three formats is shown below. Figure 1.3: Three Major Formats in a pie CGM 16.8% TIF 78.02% PDF 4.59% Table 1.2: Three major formats statistics FORMAT AMa AM(%)b TS(KB)c TS(%)d AFS(KB)e CGM 21731 8.31 202,012,492 16.80 9296 PDF 15056 5.76 55,187,616 4.59 3665 TIF 211752 80.95 938,280,624 78.02 4431 aTotal amount bTotal amount in percentage of all the files cTotal size in kilobytes dTotal size in percentage eAverage file size 1.4 Vector Image and Raster Image ”Raster image” is the format in which an image’s information is stored as a grid of pixels. ”Vector image” is the format that an image is represented as a set of description of graphical elements such as dots and lines. All the TIF images are raster images. CGM and PDF can be both, but they are mostly vector images in Shell’s well log database. The industry standard for log images used to be TIF format, and CGM is also very common because it is produced by many well log processing applications. PDF format, though widely used as a portable document, does not perfectly match the well log industry standard. The task of this project is to analyze on the PDF and CGM images Shell has and suggest a potential optimization. 4
  • 11. 1.5 Tasks Overview The main tasks of this project are as following: • Investigate options to produce an automated process to manipulate image formats, including conversion and optimization. • Create a demo program that allows a user to select desired format and to produce output via background conversion. • Analyze and produce statistics on the efficiency of both formats through a test on all the CGM and PDF files in the database. 5
  • 12. Chapter 2 Conversion Starting with interviews through MOC1 with a set of petrophysicists, I got an overview of the softwares that petrophysicists use with log images, and what issues they have with them. JustCGM is the most common software used espe- cially for tasks like viewing CGM or converting between CGM and PDF. So I did a thorough study on JustCGM and created scripts that wrap up functionalities of JustCGM. 2.1 Interview with Petrophysicists One of the reasons for this project is that there seemed to be a demand for different formats of log images, so I reached to log image users and looked for troubles they had with different formats of log images. I found out that most of the problems came with plotting, and the reason that people need certain formats is also for plotting. 2.1.1 Issues with CGM If one needs to plot entire logs with Iterra Plotter, CGM is the only ideal format. However, incorrect configuration of JustCGM and Iterra cause trouble to plotting very often. 2.1.2 Issues with PDF PDF files are desireable to view logs on mobile devices, or to print a small part of a log on normal printers. It is common for people to have trouble with PDF files that have page breaks. This can be solved by JustCGM. Another common problem with PDF is that software often crashes when viewing a large PDF log. 1Microsoft Office Communicator 6
  • 13. 2.2 JustCGM JustCGM is the software that Shell uses for multiple purposes with CGM format. This project used following features of JustCGM 5.0.05: • Conversion from PDF to CGM. • Conversion from CGM to PDF. • Merging pages of a image file. • Plotting CGM logs on Iterra Plotter. • Optimizing CGM files. In the software module, there is a main program which launches the program with graphic user interface, where most of the functions can be accessed eas- ily. There are also separate executables which can be run with arguments on command line. 2.2.1 JustCGM in Windows In Windows environment, the software can be installed by request in the service section of Shell’s MyRequest website. Modules of separate functionalities are located in the /bin/ folder under where the software is installed. 2.2.2 JustCGM in Linux In the Linux environment, the software can be accessed by typing the command justcgm as the software has been installed on the server, and is directed by the environment variable. However, separate modules cannot be directly called from command line. A way of solving this is to wrap the module in a script. 2.3 Module Wrapper Script The modules that this project has used are pdf2cgm, cgm2pdf, cgmop. They are modified based on the main program launcher, which sets the license environment first and then calls the executable. 7
  • 14. Chapter 3 Conversion Demo Program In order to demonstrate the possibility of performing on demand conversion of log images, a demo program logConversion.py is written in python1 . The program executes as the following sequence: 1. Prompts the user for a valid UWI number • If UWI is invalid, prompts the user with suggested UWI and then prompts the user to try again. • If UWI is valid, go to 2. 2. Displays a list of valid log images associated with numbers and prompts the user to select a number. • If number exceeds the maximum range, prompts the user to select again. • If user enters a non-digit, prompts the user to select a number. • If the selection is valid, go to 3. 3. Prompts the user to choose from (convert CGM to PDF/convert PDF to CGM/just download the file). • If the user chooses 1, convert the file and go to 4. • If the user chooses 2, convert the file and go to 4. • If the user chooses 3, copy the file and go to 4. • If the user selects none of the above, asks the user to choose again. 4. Asks the user whether to continue or exit. • If continue, go to 2. • If exit, then exit. 1Python is a programming language 8
  • 16. 4.1 Conversion Test 4.1.1 CGM Conversion Test The test was conducted using the script cgmOptTest.sh and produced raw test result cgmOpResult.txt. The test results shows the performance of CGM optimization and conversion of all the CGM log images, in total 21617 files. Each record is separated by commas into 9 fields, which indicates 1. UWI 2. filename 3. original size 4. time of optimization 5. size after optimization 6. time of conversion from CGM to PDF 7. converted PDF size 8. time of conversion from PDF to CGM 9. converted CGM size The raw TXT test result is also converted to an excel workbook format: cgmO- pReport.xlsx, and two more columns were added to evaluate space efficiency: • ratio of optimized size and original size • ratio of converted CGM size and original size Table 4.1 is the overall statistics of CGM conversion test. Table 4.1: CGM Statistics Overview FIELD AVERAGE MEDIAN SUM ORSa 8596 2636 183,332,064 OPTb 1.13 1 24,154 OPSc 7822 2404 166,824,516 CTCPd 2.12 1 45,307 CPSe 3864 808 82,405,300 CTPCf 43.87 25 935,723 CCSg 13440 8816 286,642,684 OPRh 0.9 CVRi 1.5 aoriginal size (KB) boptimization time (s) coptimized size (KB) dconversion time CGM→PDF econverted PDF size fconversion time PDF→CGM gconverted CGM size hoptimize ratio iconversion ratio 10
  • 17. 4.1.2 PDF Conversion Test The test was conducted using the script pdf2cgmTest.sh, and produced raw test result pdf2cgmResult.txt. The test results shows the performance of PDF conversion of all the PDF log images, in total 21617 files. Each record is separated by commas into 9 fields, which indicates 1. UWI 2. filename 3. original size 4. time of conversion 5. converted size The raw TXT test result is also converted to an excel workbook format: pdf2cgmReport.xlsx, and two more columns were added to evaluate space efficiency: • conversion ratio (converted size/original size ratio) • Optimized size (minimum of converted and original size) Table 4.2: PDF Statistics Overview FIELD AVERAGE MEDIAN SUM ORSa 3640 1016 53,291,644 CTPCb 39 23 574,767 CCSc 11641 5952 170,414,140 CVRd 8.6 6.1 3.2 aoriginal size (KB) bconversion time PDF-¿CGM cconverted CGM size dconversion ratio 11
  • 18. 4.2 Time Efficiency Figure 4.1: Time Efficiency Overview less than 10 10 to 60 over 60 0 20 40 60 80 100 Time Range(s) Percentage CGM Optimization CGM2PDF PDF2CGM From the figure we could see that, the majority of CGM files takes less than 10 seconds to convert or optimize, but PDF is much slower. 12
  • 19. 4.3 Space Efficiency Figure 4.2: Space Efficiency Overview lessthan10 10to90 90to120 120to1000 0 20 40 60 Size Change(%) Percentage CGM Optimization CGM2PDF PDF2CGM According to the test result, CGM optimization can save a lot of space in most cases. CGM2PDF and PDF2CGM conversion saved space in some cases, but kept about the same size in the majority of the cases and increased size in others. PDF2CGM performs the worst, which in 28 percent of the cases increased the file size more than 120 percent. If we choose whether to optimize, convert or do nothing according to which way the size is smallest, the calculated potential space that can be saved is as the figure below. Figure 4.3: Potential Optimizable Space 20 25 30 35 40 CGM PDF Amount of Space Can Be Saved(%) Format CGM can have achieve saving of up to 42 percent, and PDF can achieve saving of up to 18 percent. 13
  • 20. 4.4 Graphic Integrity Log images are usually printed using JustCGM and on an Iterra Plotter with CGM files. A set of CGM images were plotted mostly maintain the same qual- ity; however, a few relatively insignificant deficiencies in aspects such as line trasparency were noted. 4.5 Problems 4.5.1 Output Error There are a few conversions for each test that failed to produce correct results, and gave blank ouputs. File naming caused some of the problems because spaces and backslashes in Windows filenames cause problems in the Linux environment. Others were caused by JustCGM failure; updating JustCGM solved some of those. 4.5.2 Bad Conversion Performance There are files that grow more than 10 times the original size after conversion, and take more than a minute to convert. The files that are slow to convert are very often huge image log1 files. The files that grow much bigger are mostly small files with text. Through that, I also found out that there are many files such as reports and directional surveys mixed with logs. I also found many corrupted CGM files, but they are hard to tell from test results. 4.6 Summary Compared with PDF format, CGM format converts faster, more possible to be optimized. The potential amount for saving space on CGM files is 42 percent. PDF is worse than CGM, PDF takes longer to convert, and can easily get much bigger after conversion. The amount of space savings achievable on PDF files is 18 percent. 1Image log is a kind of log that has rich color and complex shape. 14
  • 21. Chapter 5 Scripting Flowchart Figure 5.1: Flowchart of Scripts and Outputs Linux Well Database ListAllLogFiles.sh listOfAllLogFiles.txt Format ls.txt sortByFormat.sh calAllSize.sh calSize.sh listAllAttr.sh listAttr.sh listOfLogFilesSorted.txt Format list.txt extractFormat.sh PDFList.txt CGMList.txt pdf2cgmTest.sh cgmOptTest.sh cgmOpResult.txt pdf2cgmResult.txt cgmOpReport.xlsx pdf2cgmReport.xlsx 15
  • 22. Chapter 6 Scripts and Files All the related files should be placed in the same directory as this report. The directory tree structure is shown in Figure 6.1. Each file is either commented in Figure 6.1 or individually explained. cgm2pdf CGM file Calling with the argument of a CGM file, it configures JustCGM licensing envi- ronment and calls the executable fscgmint. It converts the CGM file .cgm and generates the PDF CGM file .cgm.pdf. cgm2tif CGM file Calling with the argument of a CGM file, it configures JustCGM licensing envi- ronment and calls the executable fscgmint. It converts the CGM file .cgm and generates the TIF CGM file .cgm.tiff. cgmOpTest.sh input file output file This is the script that does the conversion on all the CGM files. It takes an input file that has an absolute path to a CGM file on each line, and copies it to the /tmp/ directory and perform the conversion test, then records the times and sizes and removes all the file generated in the /tmp/ directory. The input file used in this project is CGMList.txt and the output file pro- duced is cgmOpResult.txt cgms CGM file Calling with the argument of a CGM file, it configures JustCGM licensing en- vironment and calls the executable justtiff with the argument -tcgm. It optimizes the CGM file .cgm and generates the optimized CGM CGM file X.cgm. extractFormat.sh format This script extracts the list of all the files with the given format. The outputs are redirected to format list.txt files in the project. 16
  • 23. pdf2cgm PDF file Calling with the argument of a PDF file, it configures JustCGM licensing en- vironment and calls the executable ps2cgm with the argument -pjoin=vert which joins the pages. It converts the PDF file .pdf and generates the CGM CGM file .pdf.cgm. pdf2cgmTest.sh input file output file This is the script that does the conversion on all the PDF files. It takes an input file that has an absolute path to a PDF file on each line, and copies it to the /tmp/ directory and performs the conversion test, records the times and sizes, and removes all the file generated in the /tmp/ directory. The input file used in this project is PDFList.txt and the output file produced is pdf2cgmResult.txt 17
  • 24. Figure 6.1: Directory Tree / calAllSize.sh .......... Script calculating the total size of all formats calSize.sh.............Script calculating the total size of each format cgm2pdf cgm2tif CGMList.txt....................................List of all CGM files cgmOpReport.xlsx.....................Excel format CGM test results cgmOpResult.txt ........................ Raw result of the CGM test cgmOpTest.sh cgms extractFormat.sh formats.txt.................All the formats under /logs/ directories formatsStats.csv................Statistics of formats in CSV format formatsStats.txt........................Statistics of formats in text listAllAttr.sh...........Script calling listAttr.sh with each format listAttr.sh ........ Script that outputs information to format ls.txt listAllLogFile.sh................Script to get a complete list of files listOfAllLogFiles.txt.........All the files under /logs/ directories listOfLogFilesSorted.txt..........All the files sorted by extensions logConversion.py .... The Python program discussed in Chapter 3 .2 pdf2cgm pdf2cgmReport.xlsx...................Excel format PDF test results pdf2cgmResult.txt.......................Raw result of the PDF test pdf2cgmTest.sh PDFList.txt.....................................List of all PDF files report.pdf .............................................. The report Lists....................The folder containing the lists of each format BMP List.txt BMP ls.txt ... ... ZIP List.txt ZIP ls.txt 18
  • 25. Chapter 7 Conclusion and Suggestion 7.1 JustCGM JustCGM has a variety of features dealing with CGM files. It has an active support team, and it has regular updates. An update installed for JustCGM in Linux during this project solved many conversion failures. There are still CGM user not knowing many of JustCGM’s features. A suggestion is that, a detailed instruction about most of the features of JustCGM should be delivered when someone request for the software. 7.2 Plotting Plotting using JustCGM as the software and Iterra Plotter as the hardware is tricky, since there are a few steps very confusing. An instruction of configuring the plotting environment is not easily accessable. 7.3 Format Preference Keeping every log image in CGM format is the best choice for several reasons. • CGM files are faster to be converted to other formats • CGM files are easier to be optimized in size • When a log is huge, CGM viewed in JustCGM performs the best • CGM is the ideal format for plotting 7.4 Embeding conversion There is a need for multiple available formats, but maintaining multiple formats of files is not efficient in space. In this project, conversion from CGM to other formats was proved to be fast enough, and it is possible to provide download of demanded format by background conversion. JustCGM is capable of performing conversion to most of the CGM files in Linux 19
  • 26. environment, to determine a server-side software doing the conversion may need further research. 7.5 Log Image Standard Vector images can have huge performance differences with very similar looking. Receiving good images is crucial to maintanence. The current set of log images in Shell’s well database is very diverse. Log images were produced from different eras by different logging and service companies using different equipments and softwares. It is very hard to track the causes of bad log image files. To create a standard of receiving log images. It is better to find out what soft- wares are used to generate those log images, and to require log providers to use certain softwares. 7.6 Optimization and Graphic Identification To optimize the log image database to a full extent, more precise graphic iden- tification is needed. During the testing phase, files other than logs like reports and directional surveys were found to be mixed in the log directories. Badly cor- rupted CGM files were also found. To identify problematic files in the database, we can either manually check each file or use software to detect those issues. However, manual checking requires much time and no effective software were found so far. I suggest to check a selection of files based on the excel testing report. Usu- ally files with an extremely high or low ratio after conversion, or files with an extremely long time of conversion are highly probable with issues. A thorough check on those files may save a big amount of space. An optimization process which optimizes files to smaller size and removes prob- lematic files can be developed based on the scripts used in this project, and that requires business decision with more detailed goals and standards. 20