ChemConnect is a database that interconnects fine-grained information extracted from chemical kinetic and thermodynamic sources such as
CHEMKIN mechanism files, NASA polynomial files, and even the information behind automatic generation files.
The key to the interconnection is the Resource Description Framework (RDF) from Semantic Web technologies. The RDF is a triplet where an object item (first) is associated through a descriptor (second) to a subject item.
In this way the information of the object is connected (through the descriptor) to the subject.
In ChemConnect the object is word (text) and the subject can be text or a database item. The search mechanism within ChemConnect uses the object and subject text as search strings.
The presentation also contains an brief introduction to cloud computing.
This was presented at the COST Action 1404 SMARTCATS workshop on Databases and Systems Use Cases (http//http://www.smartcats.eu/wg4ws1dp/)
9. Software as a
Service (SaaS)
Platform as a
Service (PaaS)
Infrastructure as a
Service (IaaS)
Google
App
Engine
SalesForce CRM
LotusLive
Adopted from: Effectively and Securely Using the Cloud Computing Paradigm by peter Mell, Tim
Grance
12. These are types of services provided by Google
as a cloud service provider
For ChemConnect the
services of interest are:
To run the JAVA based website
(the ‘App’)
The ‘NOSQL’ database:
(for large amounts of information)
Storage (data files)
17. User interface on browser, tablet or phone
(adjustable for each)
Generates Interface
ChemConnect
Computing
and
Responses
SERVER CLIENT
18. Example:
ChemConnect is written in JAVA
Eclipse:
Uses a ‘standard’ (public domain)
Environment to write code
Local debug and then
Deploy to Google Cloud
19. Google Cloud The communityLocal Environment
Testing
feedback
Local Deploy
Deploy to Cloud
Local client Interface
Web client Interface
20.
21.
22. Not restricted to ‘accepted’ published data
Recognize interdependencies between data
Database as an analytical tool
Fine-grained
26. Subject: The subject of the description
Predicate: The description of the relationship between subject and object
Object: The object of the description
Subject Object
Predicate
28. Passive Connection:
Don’t need to know
which structures you want to connect to
If they share
an RDF subject or a RDF object
Then they are connected!!
Keyword: Passive
29. In one sense,
standards are only important for the initial parsing of the data
and maybe outputting the data
But not within the database itself
If new standards come up,
they can supplement the data
(thinking of the keys, identifiers, meta-data keys, DOIs, etc.)
34. Adds ‘meaning’
to the independent sources of
information
Gives ‘relationships’
Between the
Pieces of information
35. http://…isbn/000651409X
Ghosh, Amitav
Besse, Christianne
Le palais des miroirs
f:nom
f:traducteur
f:auteur
http://…isbn/2020386682
f:nom
http://…isbn/000651409X
Ghosh, Amitav
http://www.amitavghosh.co
m
The Glass Palace
2000
London
Harper Collins
a:name
a:homepag
e
a:author
Common URL!
Connecting sets of
Concepts
French
Language
English
Language
36. Ghosh, Amitav
Besse, Christianne
Le palais des miroirs
f:original
f:no
m
f:traducteu
r
f:auteur
http://…isbn/2020386682
f:nom
Ghosh, Amitav
http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:nam
e a:homepage
a:autho
r
http://…isbn/000651409X
Two independent data sources
(who did not know about each other)
Become connected
Passive
37. Extraction of all the bits of information within the data object
CHEMKIN model:
Extract set of molecules (with isomer,thermodynamic data)
Extract set of reactions (with ‘isomer’, kinetic data,
Extract relationships between
molecules and molecules (related through reactions)
molecules and reactions (reactants, products, etc.)
reactions and reactions (reaction network information)
Other Sources:
Automatic Generation:
Mechanism with the information as above, plus
2D-structure, reaction class information, substructure information
Thermodynamic Calculators: more thermodynamic information (plus 2d-structures)
Have to have database capacity to store
this immense amount of info
To be demonstrated
today
38. Chemkin
Model I
Chemkin
Model II
2-D Structure
Computational
Chemistry
Calculations
Automatically
Generated
CHEMKIN
Model
1-Butyl-3-hydroperoxide
C4H11O2
ch2ch2ch(ooh)ch31-c4hh8-3-ooh
hasSpecies
hasSpecies
hasSpecies
hasThermo
isIsomer isIsomer
isIsomer
Thermo
hasThermo
Thermo
hasThermo
Thermo
49. Database as analytic device
isAProduct
Species
isAReactant
Reaction
isAProduct
Species
isAReactant
Reaction
isAProduct
Species
isAReactant
Reaction
Species
Establishes
a further relationship
between two species
Could even supplement
Database
Species1 PathTo Species2
50. Database as analytic device
CHEMKIN
Mechanism
Species are labels:
Only know atomic composition
(NASA polynomial)
Not structure
CHEMKIN
Mechanism
C3H7
N-C3H7
i-C3H7
Reactions
(asProduct)
Reactions
(asReactant)
Reactions
(asProduct)
Reactions
(asReactant)
Reactions
(asProduct)
Reactions
(asReactant)
Compare
reactions
(species as isomers)
The set with the most similarities:
wins
51. Database as analytic device
Reactions
(asProduct)
Reactions
(asReactant)
Reactions
(asProduct)
Reactions
(asReactant)
The set with the most similarities:
wins
C3H7 N-C3H7
A new relationship
can be established
For the cautious:
The relationship can be qualified
With a probability
(related to degree of matching)
For more certainty:
One can extend the comparison through
A larger network
(path through two or more reactions)
52. If one of the mechanisms is automatically generated
Then have the 2D structure
The species goes from a ‘label’
to a
Species with a structure
(can be further classified with substructures)
Database as analytic device
53.
54.
55. Account Sign in:
Query:
Which data do you have access to
Data input:
How is your data shared
Security
Inhibit hacking Social media concepts: groups
Each data point has sharing and ownership parameters
56. Transactions:
How who and when was the data entered (or analysed)
How was the database used: which queries
Why?
Have to filter query results are shown and order them
Both personal and in general
General Field (computer science):
Recommendation Systems
Each google search (from different people) gives different results
eCommerce sites use this to
57. Some basic functionality is present:
Reading in CHEMKIN mechanisms from many sources
Management of RDFs
Simple Query (single keyword search)
Data Sources:
Automatic generated mechanisms (mechanism)
Data behind automatic generation (reaction classes, 2-D (sub)structures)
Independent thermodynamic data
Computational chemistry results
Query
More complex searches
multiple keywords
interpretation/preprocessing of keyword expression before search
Ordering and filtering results (passive and with check boxes)
58. See you there!
If the gods of the internet
(and the demon - ’demo effect’)
allows,
you can try it out
Editor's Notes
Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.
cloud computing customers do not own the physical infrastructure.
Cloud computing users avoid capital expenditure (CapEx) on hardware, software, and services when they pay a provider only for what they use.
Low shared infrastructure and costs, low management overhead, and immediate access to a broad range of applications
Take the poll
Have you used the cloud
For one, two, three, or more of these services
IaaSdelivers computer infrastructure, typically a platform virtualization environment, as a service. Rather than purchasing servers, software, data center space or network equipment, clients instead buy those resources as a fully outsourced service.
PaaSdeliver a computing platform where the developers can develop their own applications.
SaaSis a model of software deployment where the software applications are provided to the customers as a service.