SlideShare a Scribd company logo
1 of 6
Open Source Scientific Software (OSSS) Group
                    at ResearchGate.net




                                   Manual 5:
                          Extracting Data from
                          Graphs and Coverting
                            to a Mathematical
                                  Model
                                 Version: ((Add Date))
                                    Author and © :
                                Charles Warner, Panama




➔This group provides a forum to introduce useful Open Source Scientific Software (OSSS)
to the community as well as a forum for mutual support and the exchange of ideas and
experiences. It is intended to cover packages such as OpenFOAM for CFD, structural
mechanics and PDE's, R for statistics and ODE's, Maxima for computer algebra, Salome
for grid generation, ParaView for postprocessing, gmesh for CAE and CAD and so on.



➔If you like, join our group at
      http://www.researchgate.net/group/Open_Source_Scientific_Software_OSSS/
Open Source Scientific Software (OSSS) Group at ResearchGate.net


One occasionally encounters the need to extract information from graphical
representations of data; perhaps the information is sourced from historical documentation
for which the tabulated data are no longer available, or perhaps the original data are not
publicly available in a format easily accessed. We present two Open Source tools that can
facillitate this process.

1 The Tools
The process here will be conducted with two Open Source tools. The first is g3data (http://
www.frantz.fi/software/g3data.php) which we will use to extract the data from the image of
a graph. There are a number of alternative Open Source tools available for this function,
such as engauge-digitizer (http://digitizer.sourceforge.net/index.php?c=2), which may be
more suitable for different needs. Both of these tools happen to be available from the
Ubuntu packages repositories, which happens to be my preferred source for adding
applications to my system (this may not result in the latest available versions of the
software, but it generally insures that the software has been tested with the operating
system I am using, and one can be assured that most of these packages comply with the
GPL license, although there are exceptions to this last concern).
The second tool, Eureqa (http://ccsl.mae.cornell.edu/eureqa), which is “a software tool for
detecting equations and hidden mathematical relationships in your data,” according to the
web site. This program is currently available only as a freely downloadable Windows
binary which runs fine on a Linux machine with Wine. I have been unable to determine
the actual license under which this particular package is distributed, so it may not fully
comply with strict Open Source criteria; however, it appears that the software is open,
judging from the availability of a page dedicated to the Eureqa API
(http://code.google.com/p/eureqa-api/). We will be using this tool to convert our extracted
data to a mathematical model.

2 Extracting the Data
As an example, we will extract information from a generic plot of Kinematic Viscosity as a
function of Temperature, with the motivation of extrapolating the given information into
higher temperature ranges. For our source graph, we will use an image downloaded from
The Engineer's ToolBox which provides us with the basic generic information. We are
working with a *.png image format; the software, being designed for use with scanned
images, probably can work with other formats as well. We have not tested other formats,
however. If this package can not handle the image format you prefer, the engauge-
digitizer package will extract data from *.bmp, *.gif, *.jpg, *.png, *.pnm, and *.xpm image
formats. Our raw image:




                        Manual ((Add Nr.)): ((Add Title))                            -2 -
Open Source Scientific Software (OSSS) Group at ResearchGate.net




We note the vertical scale is logarithmic, and in the units centiStokes (cSt). We also note
that we will be looking at an exponential function. The temperature scale only takes us up
to 100 ºC, and we would like a function that can give us an extrapolation to higher
temperatures. We open the image in the g3data GUI and begin by defining the axes:




                        Manual ((Add Nr.)): ((Add Title))                            -3 -
Open Source Scientific Software (OSSS) Group at ResearchGate.net


Note the Zoom area to the left of the screen shot- this helps locate the points accurately.
Once we have the axes defined to our satisfaction, we next define the data points of
interest:




In this image, we have toggled the Zoom View off (from the “View” menu) to give us
access to the “Action” items. Note that we have designated that the y-axis is logarithmic.
We are going to export the data to a file; the file format will be space-delimited ASCII,
which will facilitate the next phase of our procedure, which is to convert the data into a
mathematical model.


3 Building a Formula from Raw Data
There are any number of curve-fitting tools available, but we have found the Eureqa
platform to be very sophisticated and versatile. We begin this phase with entering the raw
data from Step 2 into Eureqa. One could accomplish this by manually typing the data into
the Eureqa spreadsheet (in this particular example, with only 11 data pairs, this would
possibly be the quickest route), but being somewhat lazy and somewhat challenged in our
typing skills, we first import the data into Open Office Calc, then copy and paste into
Eureqa.




                         Manual ((Add Nr.)): ((Add Title))                            -4 -
Open Source Scientific Software (OSSS) Group at ResearchGate.net




We note that our raw data probably has far too many significant digits to reflect the actual
accuracy of the source of our raw data, and we could edit this data accordingly if we so
chose, but for this demonstration, this is not critical. Once we have the data entered, we
next define the model we are looking for:




                         Manual ((Add Nr.)): ((Add Title))                            -5 -
Open Source Scientific Software (OSSS) Group at ResearchGate.net


In the “Using Building Blocks” section, we insure that we have selected “Exponential”, and
deselected the trigonometric functions (we don't expect the data to fit a standard
trigonometric model). Finally, we start the search. Sit back and let it run for a while,
maybe go off and have a cup of coffee, and see what it turns up. After we have let the
program run for a bit, we can pause the process and check the progress:




On the left, we see a list of possible solutions, and on the upper right, we see the data
plotted against the selected equation (in this case, equation 22). If we are unhappy with
the fit, we can continue the search for a bit longer. Finally, once we are happy with our
results, we can export them into a *.html file which provides us with a list of all the current
solutions and information about the goodness of fit for further evaluation and final
selection. In our case, knowing the limitations of the source of the raw data, and looking
for a simple relationship that gives us a good approximation of the data, we ultimately
selected Equation 9 from the above. If we were dealing with better a better data source,
and looking for more insight into the processes underlying the data, we might chose one of
the more extensive models.




                         Manual ((Add Nr.)): ((Add Title))                              -6 -

More Related Content

What's hot

15. STL - Data Structures using C++ by Varsha Patil
15. STL - Data Structures using C++ by Varsha Patil15. STL - Data Structures using C++ by Varsha Patil
15. STL - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
Lecture 06 - CS-5040 - modern database systems
Lecture 06  - CS-5040 - modern database systemsLecture 06  - CS-5040 - modern database systems
Lecture 06 - CS-5040 - modern database systemsMichael Mathioudakis
 
Spatial query tutorial for nyc subway income level along subway
Spatial query tutorial  for nyc subway income level along subwaySpatial query tutorial  for nyc subway income level along subway
Spatial query tutorial for nyc subway income level along subwayVivian S. Zhang
 
Lecture 07 - CS-5040 - modern database systems
Lecture 07 -  CS-5040 - modern database systemsLecture 07 -  CS-5040 - modern database systems
Lecture 07 - CS-5040 - modern database systemsMichael Mathioudakis
 
Graphing stata (2 hour course)
Graphing stata (2 hour course)Graphing stata (2 hour course)
Graphing stata (2 hour course)izahn
 
DMDW Lesson 04 - Data Mining Theory
DMDW Lesson 04 - Data Mining TheoryDMDW Lesson 04 - Data Mining Theory
DMDW Lesson 04 - Data Mining TheoryJohannes Hoppe
 
Data Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
Data Patterns - A Native Open Source Data Profiling Tool for HPCC SystemsData Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
Data Patterns - A Native Open Source Data Profiling Tool for HPCC SystemsHPCC Systems
 

What's hot (10)

15. STL - Data Structures using C++ by Varsha Patil
15. STL - Data Structures using C++ by Varsha Patil15. STL - Data Structures using C++ by Varsha Patil
15. STL - Data Structures using C++ by Varsha Patil
 
Lecture 06 - CS-5040 - modern database systems
Lecture 06  - CS-5040 - modern database systemsLecture 06  - CS-5040 - modern database systems
Lecture 06 - CS-5040 - modern database systems
 
20110620 amst rdam_kpb
20110620 amst rdam_kpb20110620 amst rdam_kpb
20110620 amst rdam_kpb
 
Spatial query tutorial for nyc subway income level along subway
Spatial query tutorial  for nyc subway income level along subwaySpatial query tutorial  for nyc subway income level along subway
Spatial query tutorial for nyc subway income level along subway
 
Lecture 07 - CS-5040 - modern database systems
Lecture 07 -  CS-5040 - modern database systemsLecture 07 -  CS-5040 - modern database systems
Lecture 07 - CS-5040 - modern database systems
 
Graphing stata (2 hour course)
Graphing stata (2 hour course)Graphing stata (2 hour course)
Graphing stata (2 hour course)
 
DMDW Lesson 04 - Data Mining Theory
DMDW Lesson 04 - Data Mining TheoryDMDW Lesson 04 - Data Mining Theory
DMDW Lesson 04 - Data Mining Theory
 
9800-2016-poster
9800-2016-poster9800-2016-poster
9800-2016-poster
 
Data Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
Data Patterns - A Native Open Source Data Profiling Tool for HPCC SystemsData Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
Data Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
 
CSE509 Lecture 4
CSE509 Lecture 4CSE509 Lecture 4
CSE509 Lecture 4
 

Viewers also liked

Container housingdraft
Container housingdraftContainer housingdraft
Container housingdraftcwarner7_11
 
Ocean energy panama
Ocean energy  panamaOcean energy  panama
Ocean energy panamacwarner7_11
 
09 jus 20101123_optimisation_salomeaster
09 jus 20101123_optimisation_salomeaster09 jus 20101123_optimisation_salomeaster
09 jus 20101123_optimisation_salomeasterOpenCascade
 
Ocean energy panama
Ocean energy  panamaOcean energy  panama
Ocean energy panamacwarner7_11
 

Viewers also liked (6)

Final reportr1
Final reportr1Final reportr1
Final reportr1
 
Osss cad
Osss cadOsss cad
Osss cad
 
Container housingdraft
Container housingdraftContainer housingdraft
Container housingdraft
 
Ocean energy panama
Ocean energy  panamaOcean energy  panama
Ocean energy panama
 
09 jus 20101123_optimisation_salomeaster
09 jus 20101123_optimisation_salomeaster09 jus 20101123_optimisation_salomeaster
09 jus 20101123_optimisation_salomeaster
 
Ocean energy panama
Ocean energy  panamaOcean energy  panama
Ocean energy panama
 

Similar to Osss manual-5-extract data

Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxMalla Reddy University
 
Gsoc proposal 2021 polaris
Gsoc proposal 2021 polarisGsoc proposal 2021 polaris
Gsoc proposal 2021 polarisAyushBansal122
 
Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Rusif Eyvazli
 
Advanced Excel Technologies In Early Development Applications
Advanced Excel Technologies In Early Development ApplicationsAdvanced Excel Technologies In Early Development Applications
Advanced Excel Technologies In Early Development ApplicationsBrian Bissett
 
Report: Test49 Geant4 Monte-Carlo Models Testing Tools
Report: Test49 Geant4 Monte-Carlo Models Testing ToolsReport: Test49 Geant4 Monte-Carlo Models Testing Tools
Report: Test49 Geant4 Monte-Carlo Models Testing ToolsRoman Atachiants
 
Efficient Log Management using Oozie, Parquet and Hive
Efficient Log Management using Oozie, Parquet and HiveEfficient Log Management using Oozie, Parquet and Hive
Efficient Log Management using Oozie, Parquet and HiveGopi Krishnan Nambiar
 
Assignment 9Assignment 9.docxGIS 5103 – Week 9 Assignment – R.docx
Assignment 9Assignment 9.docxGIS 5103 – Week 9 Assignment – R.docxAssignment 9Assignment 9.docxGIS 5103 – Week 9 Assignment – R.docx
Assignment 9Assignment 9.docxGIS 5103 – Week 9 Assignment – R.docxbraycarissa250
 
841- Advanced Computer ForensicsUnix Forensics LabDue Date.docx
841- Advanced Computer ForensicsUnix Forensics LabDue Date.docx841- Advanced Computer ForensicsUnix Forensics LabDue Date.docx
841- Advanced Computer ForensicsUnix Forensics LabDue Date.docxevonnehoggarth79783
 
Teaching high school_stats_1_
Teaching high school_stats_1_Teaching high school_stats_1_
Teaching high school_stats_1_mcnewbold
 
Osgis 2010 notes
Osgis 2010 notesOsgis 2010 notes
Osgis 2010 notesJoanne Cook
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning ProjectEng Teong Cheah
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studioDerek Kane
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poliivascucristian
 
Spatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use CasesSpatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use Casesmathieuraj
 
Parallel programs to multi-processor computers!
Parallel programs to multi-processor computers!Parallel programs to multi-processor computers!
Parallel programs to multi-processor computers!PVS-Studio
 
The Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine LearningThe Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine LearningIRJET Journal
 

Similar to Osss manual-5-extract data (20)

Gsoc proposal
Gsoc proposalGsoc proposal
Gsoc proposal
 
Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptx
 
Gsoc proposal 2021 polaris
Gsoc proposal 2021 polarisGsoc proposal 2021 polaris
Gsoc proposal 2021 polaris
 
Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...
 
Advanced Excel Technologies In Early Development Applications
Advanced Excel Technologies In Early Development ApplicationsAdvanced Excel Technologies In Early Development Applications
Advanced Excel Technologies In Early Development Applications
 
Report: Test49 Geant4 Monte-Carlo Models Testing Tools
Report: Test49 Geant4 Monte-Carlo Models Testing ToolsReport: Test49 Geant4 Monte-Carlo Models Testing Tools
Report: Test49 Geant4 Monte-Carlo Models Testing Tools
 
Efficient Log Management using Oozie, Parquet and Hive
Efficient Log Management using Oozie, Parquet and HiveEfficient Log Management using Oozie, Parquet and Hive
Efficient Log Management using Oozie, Parquet and Hive
 
Assignment 9Assignment 9.docxGIS 5103 – Week 9 Assignment – R.docx
Assignment 9Assignment 9.docxGIS 5103 – Week 9 Assignment – R.docxAssignment 9Assignment 9.docxGIS 5103 – Week 9 Assignment – R.docx
Assignment 9Assignment 9.docxGIS 5103 – Week 9 Assignment – R.docx
 
841- Advanced Computer ForensicsUnix Forensics LabDue Date.docx
841- Advanced Computer ForensicsUnix Forensics LabDue Date.docx841- Advanced Computer ForensicsUnix Forensics LabDue Date.docx
841- Advanced Computer ForensicsUnix Forensics LabDue Date.docx
 
Teaching high school_stats_1_
Teaching high school_stats_1_Teaching high school_stats_1_
Teaching high school_stats_1_
 
Osgis 2010 notes
Osgis 2010 notesOsgis 2010 notes
Osgis 2010 notes
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
 
Lec1
Lec1Lec1
Lec1
 
Lec1
Lec1Lec1
Lec1
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
 
Spatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use CasesSpatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use Cases
 
Parallel programs to multi-processor computers!
Parallel programs to multi-processor computers!Parallel programs to multi-processor computers!
Parallel programs to multi-processor computers!
 
The Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine LearningThe Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine Learning
 
final
finalfinal
final
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

Osss manual-5-extract data

  • 1. Open Source Scientific Software (OSSS) Group at ResearchGate.net Manual 5: Extracting Data from Graphs and Coverting to a Mathematical Model Version: ((Add Date)) Author and © : Charles Warner, Panama ➔This group provides a forum to introduce useful Open Source Scientific Software (OSSS) to the community as well as a forum for mutual support and the exchange of ideas and experiences. It is intended to cover packages such as OpenFOAM for CFD, structural mechanics and PDE's, R for statistics and ODE's, Maxima for computer algebra, Salome for grid generation, ParaView for postprocessing, gmesh for CAE and CAD and so on. ➔If you like, join our group at http://www.researchgate.net/group/Open_Source_Scientific_Software_OSSS/
  • 2. Open Source Scientific Software (OSSS) Group at ResearchGate.net One occasionally encounters the need to extract information from graphical representations of data; perhaps the information is sourced from historical documentation for which the tabulated data are no longer available, or perhaps the original data are not publicly available in a format easily accessed. We present two Open Source tools that can facillitate this process. 1 The Tools The process here will be conducted with two Open Source tools. The first is g3data (http:// www.frantz.fi/software/g3data.php) which we will use to extract the data from the image of a graph. There are a number of alternative Open Source tools available for this function, such as engauge-digitizer (http://digitizer.sourceforge.net/index.php?c=2), which may be more suitable for different needs. Both of these tools happen to be available from the Ubuntu packages repositories, which happens to be my preferred source for adding applications to my system (this may not result in the latest available versions of the software, but it generally insures that the software has been tested with the operating system I am using, and one can be assured that most of these packages comply with the GPL license, although there are exceptions to this last concern). The second tool, Eureqa (http://ccsl.mae.cornell.edu/eureqa), which is “a software tool for detecting equations and hidden mathematical relationships in your data,” according to the web site. This program is currently available only as a freely downloadable Windows binary which runs fine on a Linux machine with Wine. I have been unable to determine the actual license under which this particular package is distributed, so it may not fully comply with strict Open Source criteria; however, it appears that the software is open, judging from the availability of a page dedicated to the Eureqa API (http://code.google.com/p/eureqa-api/). We will be using this tool to convert our extracted data to a mathematical model. 2 Extracting the Data As an example, we will extract information from a generic plot of Kinematic Viscosity as a function of Temperature, with the motivation of extrapolating the given information into higher temperature ranges. For our source graph, we will use an image downloaded from The Engineer's ToolBox which provides us with the basic generic information. We are working with a *.png image format; the software, being designed for use with scanned images, probably can work with other formats as well. We have not tested other formats, however. If this package can not handle the image format you prefer, the engauge- digitizer package will extract data from *.bmp, *.gif, *.jpg, *.png, *.pnm, and *.xpm image formats. Our raw image: Manual ((Add Nr.)): ((Add Title)) -2 -
  • 3. Open Source Scientific Software (OSSS) Group at ResearchGate.net We note the vertical scale is logarithmic, and in the units centiStokes (cSt). We also note that we will be looking at an exponential function. The temperature scale only takes us up to 100 ºC, and we would like a function that can give us an extrapolation to higher temperatures. We open the image in the g3data GUI and begin by defining the axes: Manual ((Add Nr.)): ((Add Title)) -3 -
  • 4. Open Source Scientific Software (OSSS) Group at ResearchGate.net Note the Zoom area to the left of the screen shot- this helps locate the points accurately. Once we have the axes defined to our satisfaction, we next define the data points of interest: In this image, we have toggled the Zoom View off (from the “View” menu) to give us access to the “Action” items. Note that we have designated that the y-axis is logarithmic. We are going to export the data to a file; the file format will be space-delimited ASCII, which will facilitate the next phase of our procedure, which is to convert the data into a mathematical model. 3 Building a Formula from Raw Data There are any number of curve-fitting tools available, but we have found the Eureqa platform to be very sophisticated and versatile. We begin this phase with entering the raw data from Step 2 into Eureqa. One could accomplish this by manually typing the data into the Eureqa spreadsheet (in this particular example, with only 11 data pairs, this would possibly be the quickest route), but being somewhat lazy and somewhat challenged in our typing skills, we first import the data into Open Office Calc, then copy and paste into Eureqa. Manual ((Add Nr.)): ((Add Title)) -4 -
  • 5. Open Source Scientific Software (OSSS) Group at ResearchGate.net We note that our raw data probably has far too many significant digits to reflect the actual accuracy of the source of our raw data, and we could edit this data accordingly if we so chose, but for this demonstration, this is not critical. Once we have the data entered, we next define the model we are looking for: Manual ((Add Nr.)): ((Add Title)) -5 -
  • 6. Open Source Scientific Software (OSSS) Group at ResearchGate.net In the “Using Building Blocks” section, we insure that we have selected “Exponential”, and deselected the trigonometric functions (we don't expect the data to fit a standard trigonometric model). Finally, we start the search. Sit back and let it run for a while, maybe go off and have a cup of coffee, and see what it turns up. After we have let the program run for a bit, we can pause the process and check the progress: On the left, we see a list of possible solutions, and on the upper right, we see the data plotted against the selected equation (in this case, equation 22). If we are unhappy with the fit, we can continue the search for a bit longer. Finally, once we are happy with our results, we can export them into a *.html file which provides us with a list of all the current solutions and information about the goodness of fit for further evaluation and final selection. In our case, knowing the limitations of the source of the raw data, and looking for a simple relationship that gives us a good approximation of the data, we ultimately selected Equation 9 from the above. If we were dealing with better a better data source, and looking for more insight into the processes underlying the data, we might chose one of the more extensive models. Manual ((Add Nr.)): ((Add Title)) -6 -