SlideShare a Scribd company logo
A REST API for
The IUPAC Solubility Data Series:
A ‘Skunkworks’ Project
Stuart J. Chalk
Department of Chemistry
University of North Florida
schalk@unf.edu
2014 Fall ACS Meeting
 Motivation
 What is Website ‘Scraping’?
 What are REST and API?
 Project Process
 NIST Website Analysis
 Database Definition
 Data Ingestion
 Project Website Design
 Using the Website
 Future Plans
 Conclusion
Outline
 Linked Open Data (LOD) is important for science
 Defining a process for grabbing high quality science
data and making it semantically available is useful
 Providing a REST API makes information easy to find
 Providing unique REST URLs for data allows linking
 A semantic description of data makes it more useful
 Increase value added -> link data to other available data
 SDS data is fundamentally important to chemistry
Motivation
(1) http://en.wikipedia.org/wiki/Linked_data
 Data in web pages is available for users to copy/paste
 When the available data is large, automation of the
scripts is necessary
 ‘Scraping’ is the processing of web page data using a
scripting language
 Data can be captured and stored in any format
 Most useful to capture data in a relational database
so that it can be repurposed at another website
 This is usually done without the permission of the
authors of the ‘scraped’ web page(s) 
What is Website Scraping?
 Representational State Transfer (REST) is…
“is a software architectural style consisting of a coordinated
set of architectural constraints applied to components,
connectors, and data elements, within a distributed
hypermedia system”2
 REST is applied to websites as a style for providing URL
access to information in a structured human readable way
 Application Programming Interface (API) is…
A standardized way for one computer/software system to
talk to another. For REST this a set of remote (http) based
calls to pre-defined URL’s
What are REST and API?
(2) http://en.wikipedia.org/wiki/Representational_state_transfer
(3) http://en.wikipedia.org/wiki/API
 Analysis of current NIST Solubility Database website
 Definition of database tables needed
 Code generation to automate data scraping
 Data cleanup
 REST API definition and description
 REST API development
 Output file format generation
 Addition of bells and whistles (if there’s time )
Project Process
 http://srdata.nist.gov/solubility/dataSeries.aspx
contains links to all the volumes that are available => volID
 http://srdata.nist.gov/solubility/sys_category.aspx
contains all the system types as part of a select list => typeID
 http://srdata.nist.gov/solubility/sol_sys_lst.aspx?sysID=<typeID>&FROM=
SSN contains the different datasets for a specific system type => sysID
 http://srdata.nist.gov/solubility/sol_detail.aspx?sysID=<sysID>
contains details of system: citation, data tables, refs, preparer etc.
 http://srdata.nist.gov/solubility/sol_2casno.aspx?
STR1=<CASRN1>&STR2=<CASRN2>&OPTION=CASNO allows searching by
chemical CASRN (also name (OPTION=CHEM) or formula (OPTION=MOL)
 http://srdata.nist.gov/solubility/citation_detail.aspx?REF_NO=<?REFNO?>
allows searching system date by paper
NIST Website Analysis
 What types of data are available and how should it be
organized?
 By Volume => volID
 By System Type => typeID
 By System => sysID
 By Chemical => CASRN, name, formula
 By Citation => refNO
 By Author (new)
 Also added Tables and Variables during development
 Note: the actual site uses sysID for the system and
type and particular set of data about a system type
Database Definition
 Data was imported into MySQL either from a tab
delimited text file or insertion via PHP scripts
 Scraped the volume id’s from
http://srdata.nist.gov/solubility/dataSeries.aspx html
cleaned up to generate a tab delimited text file
18 rows
 Similarly the system types were scraped from
http://srdata.nist.gov/solubility/sys_category.aspx
into a tab delimited text file => 2564 rows
Data Ingestion
 Individual systems with data were scraped using a PHP
script which involved
 Lookup of system type and retrieval of typeID
 Construction of system type page URL
http://srdata.nist.gov/solubility/sol_sys_lst.aspx?sysID=<typeID>&FROM=SSN
 Retrieval of the page content (HTML) into a PHP variable
 PCRE Regex expression match for the sysID of each system
 Creation of a new entry in the system database table
 4817 rows
Data Ingestion
 System details were scraped using a PHP script by
 Lookup of system and retrieval of sysID
 Construction of system detail page URL
http://srdata.nist.gov/solubility/sol_detail.aspx?sysID=<sysID>
 Retrieval of the page content (HTML) into a PHP variable
 Processing of HTML to retrieve
citation, variables, data analysis and tables, method, source,
errors, references
 Saving of details to systems table and related tables
Data Ingestion
 In addition to data extraction
 Chemical InChI strings were retrieved from NIH CIR1
 Citation DOI’s were retrieved from CrossRef2 and saved
(article titles and full author names were also added)
 Data tables were converted to JSON format for storage
and reproduction
 Table notes, sources, and additional refs were converted to
JSON for storage
Data Ingestion
(1) http://cactus.nci.nih.gov/chemical/structure
(2) http://www.crossref.org
Database
Database
 Constructed using the CakePHP framework (PHP)
 Index (listing) and view pages for each of
 Authors
 Chemicals
 Citations
 Systems
 System Types
 Volumes
 Search functionality provided via the homepage
 Example URL
http://chalk.coas.unf.edu/solubility/systems/view/20_135
Project Website Design
Project
Website
Design
Project
Website
Design
 Get this project funded
 Clean up references and link to DOI’s
 Clean up authors and link to ORCIDs
 Add procedural references
 Convert table data into searchable/linked format
 Add measurement type, unit, error, and variables
 Provide searching and plotting of data
 Automated calculation of additional parameters
e.g. solubility in different units, mole ratio
 Create solubility ontology => add RDF + searching
 Add microdata1 to each web page
 Next phase ? => Add the other volumes
Future Plans
(1) http://www.w3.org/TR/microdata/
 A RESTful version of the IUPAC-NIST Solubility Series
Database was successfully created and made available
 Metrics
 20 Volumes
 2564 System Types
 4817 Systems
 1484 Chemicals
 1247 References
 1968 Authors
 11 MB size of database
 One week worth of work
Conclusion
 schalk@unf.edu
 Phone: 904-620-5311
 Skype: stuartchalk
 LinkedIn/Slidehare: https://www.linkedin.com/in/stuchalk
 ORCID: http://orcid.org/0000-0002-0703-7776
 ResearcherID: http://www.researcherid.com/rid/D-8577-2013
Questions?

More Related Content

What's hot

Reference linking and Cited-by
Reference linking and Cited-byReference linking and Cited-by
Reference linking and Cited-by
Crossref
 
CEK KEMIRIPAN PADA CROSSREF
CEK KEMIRIPAN PADA CROSSREFCEK KEMIRIPAN PADA CROSSREF
CEK KEMIRIPAN PADA CROSSREF
Relawan Jurnal Indonesia
 
What's up LOD Cloud - Observing the state of Linked Open Data Cloud Metadata
What's up LOD Cloud - Observing the state of Linked Open Data Cloud MetadataWhat's up LOD Cloud - Observing the state of Linked Open Data Cloud Metadata
What's up LOD Cloud - Observing the state of Linked Open Data Cloud Metadata
Ahmad Assaf
 
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
ASIS&T
 
Working with Crossref and registering content
Working with Crossref and registering contentWorking with Crossref and registering content
Working with Crossref and registering content
Crossref
 
Automated creation of analytic catalog records for born digital journal articles
Automated creation of analytic catalog records for born digital journal articlesAutomated creation of analytic catalog records for born digital journal articles
Automated creation of analytic catalog records for born digital journal articles
NASIG
 
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, JapanISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
Philippe Rocca-Serra
 
ORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE IndonesiaORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE Indonesia
Crossref
 
Library Support For Ref
Library Support For RefLibrary Support For Ref
Library Support For RefDavid Clay
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
Chiara Del Vescovo
 
SubSift web services and workflows for profiling and comparing scientists and...
SubSift web services and workflows for profiling and comparing scientists and...SubSift web services and workflows for profiling and comparing scientists and...
SubSift web services and workflows for profiling and comparing scientists and...
Simon Price
 
Initial proposal for DSpace statistics application
Initial proposal for DSpace statistics applicationInitial proposal for DSpace statistics application
Initial proposal for DSpace statistics application
Federico Paparoni
 
ALIADA at MTRS15 Conference
ALIADA at MTRS15 ConferenceALIADA at MTRS15 Conference
ALIADA at MTRS15 Conference
aliada project
 
Electronic Library Bremen – state & focus of development
Electronic Library Bremen – state & focus of developmentElectronic Library Bremen – state & focus of development
Electronic Library Bremen – state & focus of development
Martin Blenkle
 
Wrangling RedCap_An Introduction and Inspiration
Wrangling RedCap_An Introduction and InspirationWrangling RedCap_An Introduction and Inspiration
Wrangling RedCap_An Introduction and InspirationJacqueline Stern
 
Planning for Libra Data
Planning for Libra DataPlanning for Libra Data
Planning for Libra Data
Sherry Lake
 
Sheet Music Consortium: Tools for Data Providers
Sheet Music Consortium: Tools for Data ProvidersSheet Music Consortium: Tools for Data Providers
Sheet Music Consortium: Tools for Data Providers
Jenn Riley
 
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Gilles Fedak
 

What's hot (20)

Jcdl2013 mklein
Jcdl2013 mkleinJcdl2013 mklein
Jcdl2013 mklein
 
Reference linking and Cited-by
Reference linking and Cited-byReference linking and Cited-by
Reference linking and Cited-by
 
CEK KEMIRIPAN PADA CROSSREF
CEK KEMIRIPAN PADA CROSSREFCEK KEMIRIPAN PADA CROSSREF
CEK KEMIRIPAN PADA CROSSREF
 
What's up LOD Cloud - Observing the state of Linked Open Data Cloud Metadata
What's up LOD Cloud - Observing the state of Linked Open Data Cloud MetadataWhat's up LOD Cloud - Observing the state of Linked Open Data Cloud Metadata
What's up LOD Cloud - Observing the state of Linked Open Data Cloud Metadata
 
Design and creation of ontologies for environmental information retrieval
Design and creation of ontologies for environmental information retrievalDesign and creation of ontologies for environmental information retrieval
Design and creation of ontologies for environmental information retrieval
 
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
 
Working with Crossref and registering content
Working with Crossref and registering contentWorking with Crossref and registering content
Working with Crossref and registering content
 
Automated creation of analytic catalog records for born digital journal articles
Automated creation of analytic catalog records for born digital journal articlesAutomated creation of analytic catalog records for born digital journal articles
Automated creation of analytic catalog records for born digital journal articles
 
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, JapanISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
 
ORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE IndonesiaORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE Indonesia
 
Library Support For Ref
Library Support For RefLibrary Support For Ref
Library Support For Ref
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
 
SubSift web services and workflows for profiling and comparing scientists and...
SubSift web services and workflows for profiling and comparing scientists and...SubSift web services and workflows for profiling and comparing scientists and...
SubSift web services and workflows for profiling and comparing scientists and...
 
Initial proposal for DSpace statistics application
Initial proposal for DSpace statistics applicationInitial proposal for DSpace statistics application
Initial proposal for DSpace statistics application
 
ALIADA at MTRS15 Conference
ALIADA at MTRS15 ConferenceALIADA at MTRS15 Conference
ALIADA at MTRS15 Conference
 
Electronic Library Bremen – state & focus of development
Electronic Library Bremen – state & focus of developmentElectronic Library Bremen – state & focus of development
Electronic Library Bremen – state & focus of development
 
Wrangling RedCap_An Introduction and Inspiration
Wrangling RedCap_An Introduction and InspirationWrangling RedCap_An Introduction and Inspiration
Wrangling RedCap_An Introduction and Inspiration
 
Planning for Libra Data
Planning for Libra DataPlanning for Libra Data
Planning for Libra Data
 
Sheet Music Consortium: Tools for Data Providers
Sheet Music Consortium: Tools for Data ProvidersSheet Music Consortium: Tools for Data Providers
Sheet Music Consortium: Tools for Data Providers
 
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
 

Similar to ACS 248th Paper 108 NIST-IUPAC Solubility Data

API Design, A Quick Guide to REST, SOAP, gRPC, and GraphQL, By Vahid Rahimian
API Design, A Quick Guide to REST, SOAP, gRPC, and GraphQL, By Vahid RahimianAPI Design, A Quick Guide to REST, SOAP, gRPC, and GraphQL, By Vahid Rahimian
API Design, A Quick Guide to REST, SOAP, gRPC, and GraphQL, By Vahid Rahimian
Vahid Rahimian
 
A Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXA Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSX
Stuart Chalk
 
A Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemA Look into the Apache OODT Ecosystem
A Look into the Apache OODT Ecosystem
Chris Mattmann
 
B2SHARE REST API Hands-on - EUDAT Summer School (Hans van Piggelen, SURFsara)
B2SHARE REST API Hands-on - EUDAT Summer School (Hans van Piggelen, SURFsara)B2SHARE REST API Hands-on - EUDAT Summer School (Hans van Piggelen, SURFsara)
B2SHARE REST API Hands-on - EUDAT Summer School (Hans van Piggelen, SURFsara)
EUDAT
 
Wikidata as a hub for the linked data cloud
Wikidata as a hub for the linked data cloudWikidata as a hub for the linked data cloud
Wikidata as a hub for the linked data cloud
Joachim Neubert
 
Lecture01 257
Lecture01 257Lecture01 257
Lecture01 257
hansamurli
 
Quantopix analytics system (qas)
Quantopix analytics system (qas)Quantopix analytics system (qas)
Quantopix analytics system (qas)
Al Sabawi
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platformibemam
 
Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Julie Allinson
 
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a CrawlerCSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
Sean Golliher
 
Linked Media and Data Using Apache Marmotta
Linked Media and Data Using Apache MarmottaLinked Media and Data Using Apache Marmotta
Linked Media and Data Using Apache Marmotta
Sebastian Schaffert
 
Linking Media and Data using Apache Marmotta (LIME workshop keynote)
Linking Media and Data using Apache Marmotta  (LIME workshop keynote)Linking Media and Data using Apache Marmotta  (LIME workshop keynote)
Linking Media and Data using Apache Marmotta (LIME workshop keynote)
LinkedTV
 
Database Systems Concepts, 5th Ed
Database Systems Concepts, 5th EdDatabase Systems Concepts, 5th Ed
Database Systems Concepts, 5th Ed
Daniel Francisco Tamayo
 
mx & dbs
mx & dbsmx & dbs
2004-11-13 Supersite Relational Database Project: (Data Portal?)
2004-11-13 Supersite Relational Database Project: (Data Portal?)2004-11-13 Supersite Relational Database Project: (Data Portal?)
2004-11-13 Supersite Relational Database Project: (Data Portal?)Rudolf Husar
 
Srds Pres011120
Srds Pres011120Srds Pres011120
Srds Pres011120
Rudolf Husar
 
Combining and easing the access of the eswc semantic web data 0
Combining and easing the access of the eswc semantic web data 0Combining and easing the access of the eswc semantic web data 0
Combining and easing the access of the eswc semantic web data 0
STIinnsbruck
 
Standard Web APIs for Multidisciplinary Collaboration
Standard Web APIs for Multidisciplinary CollaborationStandard Web APIs for Multidisciplinary Collaboration
Standard Web APIs for Multidisciplinary Collaboration
Axel Reichwein
 
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Axel Reichwein
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Laurent Alquier
 

Similar to ACS 248th Paper 108 NIST-IUPAC Solubility Data (20)

API Design, A Quick Guide to REST, SOAP, gRPC, and GraphQL, By Vahid Rahimian
API Design, A Quick Guide to REST, SOAP, gRPC, and GraphQL, By Vahid RahimianAPI Design, A Quick Guide to REST, SOAP, gRPC, and GraphQL, By Vahid Rahimian
API Design, A Quick Guide to REST, SOAP, gRPC, and GraphQL, By Vahid Rahimian
 
A Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXA Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSX
 
A Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemA Look into the Apache OODT Ecosystem
A Look into the Apache OODT Ecosystem
 
B2SHARE REST API Hands-on - EUDAT Summer School (Hans van Piggelen, SURFsara)
B2SHARE REST API Hands-on - EUDAT Summer School (Hans van Piggelen, SURFsara)B2SHARE REST API Hands-on - EUDAT Summer School (Hans van Piggelen, SURFsara)
B2SHARE REST API Hands-on - EUDAT Summer School (Hans van Piggelen, SURFsara)
 
Wikidata as a hub for the linked data cloud
Wikidata as a hub for the linked data cloudWikidata as a hub for the linked data cloud
Wikidata as a hub for the linked data cloud
 
Lecture01 257
Lecture01 257Lecture01 257
Lecture01 257
 
Quantopix analytics system (qas)
Quantopix analytics system (qas)Quantopix analytics system (qas)
Quantopix analytics system (qas)
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platform
 
Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29
 
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a CrawlerCSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
 
Linked Media and Data Using Apache Marmotta
Linked Media and Data Using Apache MarmottaLinked Media and Data Using Apache Marmotta
Linked Media and Data Using Apache Marmotta
 
Linking Media and Data using Apache Marmotta (LIME workshop keynote)
Linking Media and Data using Apache Marmotta  (LIME workshop keynote)Linking Media and Data using Apache Marmotta  (LIME workshop keynote)
Linking Media and Data using Apache Marmotta (LIME workshop keynote)
 
Database Systems Concepts, 5th Ed
Database Systems Concepts, 5th EdDatabase Systems Concepts, 5th Ed
Database Systems Concepts, 5th Ed
 
mx & dbs
mx & dbsmx & dbs
mx & dbs
 
2004-11-13 Supersite Relational Database Project: (Data Portal?)
2004-11-13 Supersite Relational Database Project: (Data Portal?)2004-11-13 Supersite Relational Database Project: (Data Portal?)
2004-11-13 Supersite Relational Database Project: (Data Portal?)
 
Srds Pres011120
Srds Pres011120Srds Pres011120
Srds Pres011120
 
Combining and easing the access of the eswc semantic web data 0
Combining and easing the access of the eswc semantic web data 0Combining and easing the access of the eswc semantic web data 0
Combining and easing the access of the eswc semantic web data 0
 
Standard Web APIs for Multidisciplinary Collaboration
Standard Web APIs for Multidisciplinary CollaborationStandard Web APIs for Multidisciplinary Collaboration
Standard Web APIs for Multidisciplinary Collaboration
 
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
 

More from Stuart Chalk

Semantic properties and units
Semantic properties and unitsSemantic properties and units
Semantic properties and units
Stuart Chalk
 
Open semantic chemical structures
Open semantic chemical structuresOpen semantic chemical structures
Open semantic chemical structures
Stuart Chalk
 
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
Stuart Chalk
 
AnIML: A New Analytical Data Standard
AnIML: A New Analytical Data StandardAnIML: A New Analytical Data Standard
AnIML: A New Analytical Data Standard
Stuart Chalk
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
Stuart Chalk
 
Scientific Units in the Electronic Age
Scientific Units in the Electronic AgeScientific Units in the Electronic Age
Scientific Units in the Electronic Age
Stuart Chalk
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Stuart Chalk
 
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Stuart Chalk
 
The Electronic Notebook Ontology
The Electronic Notebook OntologyThe Electronic Notebook Ontology
The Electronic Notebook Ontology
Stuart Chalk
 
Bringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic WebBringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic Web
Stuart Chalk
 
Reactions to the Open Spectral Database
Reactions to the Open Spectral DatabaseReactions to the Open Spectral Database
Reactions to the Open Spectral Database
Stuart Chalk
 
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Stuart Chalk
 
Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectStuart Chalk
 
Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)
Stuart Chalk
 
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
Stuart Chalk
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
Stuart Chalk
 
ACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData ProjectACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData Project
Stuart Chalk
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
Stuart Chalk
 
ACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka Collaboration
Stuart Chalk
 
247th ACS Meeting: The Eureka Research Workbench
247th ACS Meeting: The Eureka Research Workbench247th ACS Meeting: The Eureka Research Workbench
247th ACS Meeting: The Eureka Research Workbench
Stuart Chalk
 

More from Stuart Chalk (20)

Semantic properties and units
Semantic properties and unitsSemantic properties and units
Semantic properties and units
 
Open semantic chemical structures
Open semantic chemical structuresOpen semantic chemical structures
Open semantic chemical structures
 
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
 
AnIML: A New Analytical Data Standard
AnIML: A New Analytical Data StandardAnIML: A New Analytical Data Standard
AnIML: A New Analytical Data Standard
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
 
Scientific Units in the Electronic Age
Scientific Units in the Electronic AgeScientific Units in the Electronic Age
Scientific Units in the Electronic Age
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
 
The Electronic Notebook Ontology
The Electronic Notebook OntologyThe Electronic Notebook Ontology
The Electronic Notebook Ontology
 
Bringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic WebBringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic Web
 
Reactions to the Open Spectral Database
Reactions to the Open Spectral DatabaseReactions to the Open Spectral Database
Reactions to the Open Spectral Database
 
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
 
Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP Project
 
Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)
 
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
 
ACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData ProjectACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData Project
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
 
ACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka Collaboration
 
247th ACS Meeting: The Eureka Research Workbench
247th ACS Meeting: The Eureka Research Workbench247th ACS Meeting: The Eureka Research Workbench
247th ACS Meeting: The Eureka Research Workbench
 

Recently uploaded

Mudde & Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
Mudde &  Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...Mudde &  Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
Mudde & Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
frank0071
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
RASHMI M G
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
zeex60
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
frank0071
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 

Recently uploaded (20)

Mudde & Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
Mudde &  Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...Mudde &  Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
Mudde & Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 

ACS 248th Paper 108 NIST-IUPAC Solubility Data

  • 1. A REST API for The IUPAC Solubility Data Series: A ‘Skunkworks’ Project Stuart J. Chalk Department of Chemistry University of North Florida schalk@unf.edu 2014 Fall ACS Meeting
  • 2.  Motivation  What is Website ‘Scraping’?  What are REST and API?  Project Process  NIST Website Analysis  Database Definition  Data Ingestion  Project Website Design  Using the Website  Future Plans  Conclusion Outline
  • 3.  Linked Open Data (LOD) is important for science  Defining a process for grabbing high quality science data and making it semantically available is useful  Providing a REST API makes information easy to find  Providing unique REST URLs for data allows linking  A semantic description of data makes it more useful  Increase value added -> link data to other available data  SDS data is fundamentally important to chemistry Motivation (1) http://en.wikipedia.org/wiki/Linked_data
  • 4.  Data in web pages is available for users to copy/paste  When the available data is large, automation of the scripts is necessary  ‘Scraping’ is the processing of web page data using a scripting language  Data can be captured and stored in any format  Most useful to capture data in a relational database so that it can be repurposed at another website  This is usually done without the permission of the authors of the ‘scraped’ web page(s)  What is Website Scraping?
  • 5.  Representational State Transfer (REST) is… “is a software architectural style consisting of a coordinated set of architectural constraints applied to components, connectors, and data elements, within a distributed hypermedia system”2  REST is applied to websites as a style for providing URL access to information in a structured human readable way  Application Programming Interface (API) is… A standardized way for one computer/software system to talk to another. For REST this a set of remote (http) based calls to pre-defined URL’s What are REST and API? (2) http://en.wikipedia.org/wiki/Representational_state_transfer (3) http://en.wikipedia.org/wiki/API
  • 6.  Analysis of current NIST Solubility Database website  Definition of database tables needed  Code generation to automate data scraping  Data cleanup  REST API definition and description  REST API development  Output file format generation  Addition of bells and whistles (if there’s time ) Project Process
  • 7.  http://srdata.nist.gov/solubility/dataSeries.aspx contains links to all the volumes that are available => volID  http://srdata.nist.gov/solubility/sys_category.aspx contains all the system types as part of a select list => typeID  http://srdata.nist.gov/solubility/sol_sys_lst.aspx?sysID=<typeID>&FROM= SSN contains the different datasets for a specific system type => sysID  http://srdata.nist.gov/solubility/sol_detail.aspx?sysID=<sysID> contains details of system: citation, data tables, refs, preparer etc.  http://srdata.nist.gov/solubility/sol_2casno.aspx? STR1=<CASRN1>&STR2=<CASRN2>&OPTION=CASNO allows searching by chemical CASRN (also name (OPTION=CHEM) or formula (OPTION=MOL)  http://srdata.nist.gov/solubility/citation_detail.aspx?REF_NO=<?REFNO?> allows searching system date by paper NIST Website Analysis
  • 8.  What types of data are available and how should it be organized?  By Volume => volID  By System Type => typeID  By System => sysID  By Chemical => CASRN, name, formula  By Citation => refNO  By Author (new)  Also added Tables and Variables during development  Note: the actual site uses sysID for the system and type and particular set of data about a system type Database Definition
  • 9.  Data was imported into MySQL either from a tab delimited text file or insertion via PHP scripts  Scraped the volume id’s from http://srdata.nist.gov/solubility/dataSeries.aspx html cleaned up to generate a tab delimited text file 18 rows  Similarly the system types were scraped from http://srdata.nist.gov/solubility/sys_category.aspx into a tab delimited text file => 2564 rows Data Ingestion
  • 10.  Individual systems with data were scraped using a PHP script which involved  Lookup of system type and retrieval of typeID  Construction of system type page URL http://srdata.nist.gov/solubility/sol_sys_lst.aspx?sysID=<typeID>&FROM=SSN  Retrieval of the page content (HTML) into a PHP variable  PCRE Regex expression match for the sysID of each system  Creation of a new entry in the system database table  4817 rows Data Ingestion
  • 11.  System details were scraped using a PHP script by  Lookup of system and retrieval of sysID  Construction of system detail page URL http://srdata.nist.gov/solubility/sol_detail.aspx?sysID=<sysID>  Retrieval of the page content (HTML) into a PHP variable  Processing of HTML to retrieve citation, variables, data analysis and tables, method, source, errors, references  Saving of details to systems table and related tables Data Ingestion
  • 12.  In addition to data extraction  Chemical InChI strings were retrieved from NIH CIR1  Citation DOI’s were retrieved from CrossRef2 and saved (article titles and full author names were also added)  Data tables were converted to JSON format for storage and reproduction  Table notes, sources, and additional refs were converted to JSON for storage Data Ingestion (1) http://cactus.nci.nih.gov/chemical/structure (2) http://www.crossref.org
  • 15.  Constructed using the CakePHP framework (PHP)  Index (listing) and view pages for each of  Authors  Chemicals  Citations  Systems  System Types  Volumes  Search functionality provided via the homepage  Example URL http://chalk.coas.unf.edu/solubility/systems/view/20_135 Project Website Design
  • 18.  Get this project funded  Clean up references and link to DOI’s  Clean up authors and link to ORCIDs  Add procedural references  Convert table data into searchable/linked format  Add measurement type, unit, error, and variables  Provide searching and plotting of data  Automated calculation of additional parameters e.g. solubility in different units, mole ratio  Create solubility ontology => add RDF + searching  Add microdata1 to each web page  Next phase ? => Add the other volumes Future Plans (1) http://www.w3.org/TR/microdata/
  • 19.  A RESTful version of the IUPAC-NIST Solubility Series Database was successfully created and made available  Metrics  20 Volumes  2564 System Types  4817 Systems  1484 Chemicals  1247 References  1968 Authors  11 MB size of database  One week worth of work Conclusion
  • 20.  schalk@unf.edu  Phone: 904-620-5311  Skype: stuartchalk  LinkedIn/Slidehare: https://www.linkedin.com/in/stuchalk  ORCID: http://orcid.org/0000-0002-0703-7776  ResearcherID: http://www.researcherid.com/rid/D-8577-2013 Questions?