Osss manual-5-extract data


Published on

Extract Data from published graphs and generate mathematical expressions to describe the plots

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Osss manual-5-extract data

  1. 1. Open Source Scientific Software (OSSS) Group at ResearchGate.net Manual 5: Extracting Data from Graphs and Coverting to a Mathematical Model Version: ((Add Date)) Author and © : Charles Warner, Panama➔This group provides a forum to introduce useful Open Source Scientific Software (OSSS)to the community as well as a forum for mutual support and the exchange of ideas andexperiences. It is intended to cover packages such as OpenFOAM for CFD, structuralmechanics and PDEs, R for statistics and ODEs, Maxima for computer algebra, Salomefor grid generation, ParaView for postprocessing, gmesh for CAE and CAD and so on.➔If you like, join our group at http://www.researchgate.net/group/Open_Source_Scientific_Software_OSSS/
  2. 2. Open Source Scientific Software (OSSS) Group at ResearchGate.netOne occasionally encounters the need to extract information from graphicalrepresentations of data; perhaps the information is sourced from historical documentationfor which the tabulated data are no longer available, or perhaps the original data are notpublicly available in a format easily accessed. We present two Open Source tools that canfacillitate this process.1 The ToolsThe process here will be conducted with two Open Source tools. The first is g3data (http://www.frantz.fi/software/g3data.php) which we will use to extract the data from the image ofa graph. There are a number of alternative Open Source tools available for this function,such as engauge-digitizer (http://digitizer.sourceforge.net/index.php?c=2), which may bemore suitable for different needs. Both of these tools happen to be available from theUbuntu packages repositories, which happens to be my preferred source for addingapplications to my system (this may not result in the latest available versions of thesoftware, but it generally insures that the software has been tested with the operatingsystem I am using, and one can be assured that most of these packages comply with theGPL license, although there are exceptions to this last concern).The second tool, Eureqa (http://ccsl.mae.cornell.edu/eureqa), which is “a software tool fordetecting equations and hidden mathematical relationships in your data,” according to theweb site. This program is currently available only as a freely downloadable Windowsbinary which runs fine on a Linux machine with Wine. I have been unable to determinethe actual license under which this particular package is distributed, so it may not fullycomply with strict Open Source criteria; however, it appears that the software is open,judging from the availability of a page dedicated to the Eureqa API(http://code.google.com/p/eureqa-api/). We will be using this tool to convert our extracteddata to a mathematical model.2 Extracting the DataAs an example, we will extract information from a generic plot of Kinematic Viscosity as afunction of Temperature, with the motivation of extrapolating the given information intohigher temperature ranges. For our source graph, we will use an image downloaded fromThe Engineers ToolBox which provides us with the basic generic information. We areworking with a *.png image format; the software, being designed for use with scannedimages, probably can work with other formats as well. We have not tested other formats,however. If this package can not handle the image format you prefer, the engauge-digitizer package will extract data from *.bmp, *.gif, *.jpg, *.png, *.pnm, and *.xpm imageformats. Our raw image: Manual ((Add Nr.)): ((Add Title)) -2 -
  3. 3. Open Source Scientific Software (OSSS) Group at ResearchGate.netWe note the vertical scale is logarithmic, and in the units centiStokes (cSt). We also notethat we will be looking at an exponential function. The temperature scale only takes us upto 100 ºC, and we would like a function that can give us an extrapolation to highertemperatures. We open the image in the g3data GUI and begin by defining the axes: Manual ((Add Nr.)): ((Add Title)) -3 -
  4. 4. Open Source Scientific Software (OSSS) Group at ResearchGate.netNote the Zoom area to the left of the screen shot- this helps locate the points accurately.Once we have the axes defined to our satisfaction, we next define the data points ofinterest:In this image, we have toggled the Zoom View off (from the “View” menu) to give usaccess to the “Action” items. Note that we have designated that the y-axis is logarithmic.We are going to export the data to a file; the file format will be space-delimited ASCII,which will facilitate the next phase of our procedure, which is to convert the data into amathematical model.3 Building a Formula from Raw DataThere are any number of curve-fitting tools available, but we have found the Eureqaplatform to be very sophisticated and versatile. We begin this phase with entering the rawdata from Step 2 into Eureqa. One could accomplish this by manually typing the data intothe Eureqa spreadsheet (in this particular example, with only 11 data pairs, this wouldpossibly be the quickest route), but being somewhat lazy and somewhat challenged in ourtyping skills, we first import the data into Open Office Calc, then copy and paste intoEureqa. Manual ((Add Nr.)): ((Add Title)) -4 -
  5. 5. Open Source Scientific Software (OSSS) Group at ResearchGate.netWe note that our raw data probably has far too many significant digits to reflect the actualaccuracy of the source of our raw data, and we could edit this data accordingly if we sochose, but for this demonstration, this is not critical. Once we have the data entered, wenext define the model we are looking for: Manual ((Add Nr.)): ((Add Title)) -5 -
  6. 6. Open Source Scientific Software (OSSS) Group at ResearchGate.netIn the “Using Building Blocks” section, we insure that we have selected “Exponential”, anddeselected the trigonometric functions (we dont expect the data to fit a standardtrigonometric model). Finally, we start the search. Sit back and let it run for a while,maybe go off and have a cup of coffee, and see what it turns up. After we have let theprogram run for a bit, we can pause the process and check the progress:On the left, we see a list of possible solutions, and on the upper right, we see the dataplotted against the selected equation (in this case, equation 22). If we are unhappy withthe fit, we can continue the search for a bit longer. Finally, once we are happy with ourresults, we can export them into a *.html file which provides us with a list of all the currentsolutions and information about the goodness of fit for further evaluation and finalselection. In our case, knowing the limitations of the source of the raw data, and lookingfor a simple relationship that gives us a good approximation of the data, we ultimatelyselected Equation 9 from the above. If we were dealing with better a better data source,and looking for more insight into the processes underlying the data, we might chose one ofthe more extensive models. Manual ((Add Nr.)): ((Add Title)) -6 -