2. Open Source Scientific Software (OSSS) Group at ResearchGate.net
One occasionally encounters the need to extract information from graphical
representations of data; perhaps the information is sourced from historical documentation
for which the tabulated data are no longer available, or perhaps the original data are not
publicly available in a format easily accessed. We present two Open Source tools that can
facillitate this process.
1 The Tools
The process here will be conducted with two Open Source tools. The first is g3data (http://
www.frantz.fi/software/g3data.php) which we will use to extract the data from the image of
a graph. There are a number of alternative Open Source tools available for this function,
such as engauge-digitizer (http://digitizer.sourceforge.net/index.php?c=2), which may be
more suitable for different needs. Both of these tools happen to be available from the
Ubuntu packages repositories, which happens to be my preferred source for adding
applications to my system (this may not result in the latest available versions of the
software, but it generally insures that the software has been tested with the operating
system I am using, and one can be assured that most of these packages comply with the
GPL license, although there are exceptions to this last concern).
The second tool, Eureqa (http://ccsl.mae.cornell.edu/eureqa), which is “a software tool for
detecting equations and hidden mathematical relationships in your data,” according to the
web site. This program is currently available only as a freely downloadable Windows
binary which runs fine on a Linux machine with Wine. I have been unable to determine
the actual license under which this particular package is distributed, so it may not fully
comply with strict Open Source criteria; however, it appears that the software is open,
judging from the availability of a page dedicated to the Eureqa API
(http://code.google.com/p/eureqa-api/). We will be using this tool to convert our extracted
data to a mathematical model.
2 Extracting the Data
As an example, we will extract information from a generic plot of Kinematic Viscosity as a
function of Temperature, with the motivation of extrapolating the given information into
higher temperature ranges. For our source graph, we will use an image downloaded from
The Engineer's ToolBox which provides us with the basic generic information. We are
working with a *.png image format; the software, being designed for use with scanned
images, probably can work with other formats as well. We have not tested other formats,
however. If this package can not handle the image format you prefer, the engauge-
digitizer package will extract data from *.bmp, *.gif, *.jpg, *.png, *.pnm, and *.xpm image
formats. Our raw image:
Manual ((Add Nr.)): ((Add Title)) -2 -
3. Open Source Scientific Software (OSSS) Group at ResearchGate.net
We note the vertical scale is logarithmic, and in the units centiStokes (cSt). We also note
that we will be looking at an exponential function. The temperature scale only takes us up
to 100 ºC, and we would like a function that can give us an extrapolation to higher
temperatures. We open the image in the g3data GUI and begin by defining the axes:
Manual ((Add Nr.)): ((Add Title)) -3 -
4. Open Source Scientific Software (OSSS) Group at ResearchGate.net
Note the Zoom area to the left of the screen shot- this helps locate the points accurately.
Once we have the axes defined to our satisfaction, we next define the data points of
interest:
In this image, we have toggled the Zoom View off (from the “View” menu) to give us
access to the “Action” items. Note that we have designated that the y-axis is logarithmic.
We are going to export the data to a file; the file format will be space-delimited ASCII,
which will facilitate the next phase of our procedure, which is to convert the data into a
mathematical model.
3 Building a Formula from Raw Data
There are any number of curve-fitting tools available, but we have found the Eureqa
platform to be very sophisticated and versatile. We begin this phase with entering the raw
data from Step 2 into Eureqa. One could accomplish this by manually typing the data into
the Eureqa spreadsheet (in this particular example, with only 11 data pairs, this would
possibly be the quickest route), but being somewhat lazy and somewhat challenged in our
typing skills, we first import the data into Open Office Calc, then copy and paste into
Eureqa.
Manual ((Add Nr.)): ((Add Title)) -4 -
5. Open Source Scientific Software (OSSS) Group at ResearchGate.net
We note that our raw data probably has far too many significant digits to reflect the actual
accuracy of the source of our raw data, and we could edit this data accordingly if we so
chose, but for this demonstration, this is not critical. Once we have the data entered, we
next define the model we are looking for:
Manual ((Add Nr.)): ((Add Title)) -5 -
6. Open Source Scientific Software (OSSS) Group at ResearchGate.net
In the “Using Building Blocks” section, we insure that we have selected “Exponential”, and
deselected the trigonometric functions (we don't expect the data to fit a standard
trigonometric model). Finally, we start the search. Sit back and let it run for a while,
maybe go off and have a cup of coffee, and see what it turns up. After we have let the
program run for a bit, we can pause the process and check the progress:
On the left, we see a list of possible solutions, and on the upper right, we see the data
plotted against the selected equation (in this case, equation 22). If we are unhappy with
the fit, we can continue the search for a bit longer. Finally, once we are happy with our
results, we can export them into a *.html file which provides us with a list of all the current
solutions and information about the goodness of fit for further evaluation and final
selection. In our case, knowing the limitations of the source of the raw data, and looking
for a simple relationship that gives us a good approximation of the data, we ultimately
selected Equation 9 from the above. If we were dealing with better a better data source,
and looking for more insight into the processes underlying the data, we might chose one of
the more extensive models.
Manual ((Add Nr.)): ((Add Title)) -6 -