SlideShare a Scribd company logo
1 of 19
Importing linked life science databases into
Neo4j
Simon Jupp
Sample Phenotypes and Ontologies Team
European Bioinformatics Institute
jupp@ebi.ac.uk
Purpose of the workshop
• Introduce two alternate graph models
• RDF graphs
• Property graph
• Demonstrate a simple data integration use-case
• Show how Neo4j data import tools can be used to rapidly
import life science data from public APIs
• Example Cypher for querying biological data
• Introduction to Neo4j sandboxes, Apoc procedures and
tips for creating your own Neo4j guide
Some biological questions
“Differentially expressed genes in adult mice, bred in oxygen rich vs
oxygen poor environments? Of this set, which biological processes
(GO) are enriched?”
“Where are genes with antigen binding function differentially
expressed, which disease and which associated pathways?”
“Get metformin associated pathways with differentially expressed
genes, find any proteins that are targets for known diabetes drugs”
How do you go about answering these kinds of questions?...
… you go to the data
Literature & ontologies
•Experimental Factor Ontology
•Gene Ontology
•BioStudies
•Europe PMC
Chemical biology
•ChEBI
•ChEMBL
•SureChEMBL
Molecular structures
•Protein Data Bank in Europe
•Electron Microscopy Data Bank
Gene, protein & metabolite expression
•Expression Atlas
•Metabolights
•PRIDE
•RNA Central
Protein sequences,
families & motifs
•InterPro
•Pfam
•UniProt
Genes, genomes & variation
•Ensembl
•Ensembl Genomes
•GWAS Catalog
•Metagenomics portal
Systems
•BioModels
•BioSamples
•Enzyme Portal
•IntAct
•Reactome
Molecular Archives
•European Nucleotide Archive
•European Variation Archive
•European Genome-phenome Archive
•ArrayExpress
Data integration challenges
• Heterogeneous formats and identifiers
• We invest heavily in mapping and cross-linking
resources, but it’s still hard to integrate and query across
internal/external resources.
• Lots of effort doing mapping, each groups duplicate these
efforts
Standardise data publishing
• What is we could standardise the way we publish data?
• Global identification systems (so we can identify the things
in our data)
• Common semantics (talking about the same things)
• A common query language to the data
Original vision of the Web
Information Management: A Proposal, Tim Berners-Lee, CERN, March 1989, May 1990,
http://www.w3.org/History/1989/proposal.html
Relations
“Things”
Vocabularies
Early Web
Semantic Web ( or Linked Data)
"The Semantic Web is a webby way to
link data"
“Turning the web into a global API”
“The existing web links documents, the
semantic web links data”
“Shared meaning through ontologies”
The Linking Open Data cloud 2017
http://lod-cloud.net
RDF is for describing graphs
• 1995-2004 W3C develop specification for a vocabulary
for Web meta-data called Resource Description
Framework (RDF)
http://en.wikipedia.org/wiki/Barack_Obama
Web
Document
Structured
dataPublishing data as a graph
dbpedia:Barack_Obama
Human President of the United States
Honolulu
1961-08-04
birthplace
birthdate
position_held
type
Anatomy of a triple statement
• All triples are composed of a subject, predicate and an
object
Barack Obama
Honolulu
birth place
Subject
Predicate
Object
Identify things on the web
• Build on existing Web technology
• global identifiers for resources (things) using URIs
• URIs should resolve
http://dbpedia.org/page/Barack_Obama
http://dbpedia.org/page/Honolulu
http://dbpedia.org/property/birthPlace
Subject
Predicate
Object
Turning relational data to RDF –
EBI Gene Expression Atlas database
Relational Data to RDF graph conversion
•Give “things” URIs
•Type “things” with ontologies
•Link “things” to other related “things”
Stardog
Apache Jena
SesameVirtuoso
Allegrograph
OWLIM
Storing and querying RDF
• Optimized databases for RDF data
• SPARQL query language
Querying RDF with SPARQL
• W3C standard query language for querying RDF data
• Query language for matching graph patterns in RDF
• SPARQL endpoints – common API to query RDF data
• ”Get all presidents of the united states?” from
https://query.wikidata.org/
PREFIX position_held: http://www.wikidata.org/prop/direct/P39
PREFIX potus: http://www.wikidata.org/entity/Q11696
SELECT ?label WHERE {
?subject position_held: potus: .
?subject rdfs:label ?label .
filter (lang(?label) = "en")
}
RDF and the Property graph
RDF graphs
dbpedia:Barack_Obama
Human President of the United States
Honolulu
“1961-08-04”xsd:datetime
birthplace
birthdate
position_held
type
Every statement adds a new edge to the graph
All nodes are resources (with URIs) or literals (with types)
Property graphs (Neo4j)
“Barack_Obama”xsd:string
name
dbpedia:Barack_Obam
a
{
name: “Barack Obama”
Type: “Human”
}
Nodes and edges have internal structure
Honolulu
Birthplace
{ Birthdate: “1961-08-04” }
Working with RDF and Neo4j
• RDF great for publishing data
• SPARQL gives flexible access to query data
• But…
• RDF schemas are often (necessarily) complex
• Expose the full underlying data semantics
• RDF comes with baggage that can be turn off for newomers
• Neo4j is for graphs
• Easier to grasp for begniners
• Powerful query language (Cypher)
• Excellent third-party tools, community and developer
integrations
Working with RDF and Neo
• In this tutorial we will harness the publishing power of
RDF
• Combine with the simplicity and querying power of Neo4j
• Use Neo4j data import tools to rapidly import data from
public SPARQL endpoints
• Simplify the graph schema to fit a specific use case
Use-case
• Build a simple graph of gene-disease and drug-disease
associations
• Data from public resources (Ensembl, GWAS, ChEMBL)
Setup for workshop
• Sandbox Neo4j instance from https://neo4j.com/sandbox-
v2/
• Optionally run your own local installation, but you’ll need
Apoc procedures installed to run
• Run the Neo4j guide
:play https://guides.neo4j.com/life-science-import

More Related Content

What's hot

Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data WarehouseMaking Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data WarehouseJustin Clark-Casey
 
Federated data stores using semantic web technology
Federated data stores using semantic web technologyFederated data stores using semantic web technology
Federated data stores using semantic web technologySteve Ray
 
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...Hilmar Lapp
 
Linking Data, Linking People
Linking Data, Linking PeopleLinking Data, Linking People
Linking Data, Linking PeoplefereiraJ
 
20130622 okfn hackathon t2
20130622 okfn hackathon t220130622 okfn hackathon t2
20130622 okfn hackathon t2Seonho Kim
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnected Data World
 
Using Public RDF Resources in Neo4j
Using Public RDF Resources in Neo4jUsing Public RDF Resources in Neo4j
Using Public RDF Resources in Neo4jNeo4j
 
Semantic Technologies in ST&DL
Semantic Technologies in ST&DLSemantic Technologies in ST&DL
Semantic Technologies in ST&DLAndrea Nuzzolese
 
OEG-Tools for supporting Ontology Engineering
OEG-Tools for supporting Ontology EngineeringOEG-Tools for supporting Ontology Engineering
OEG-Tools for supporting Ontology EngineeringMaría Poveda Villalón
 
Knowledge Patterns for the Web: extraction, transformation, and reuse
Knowledge Patterns for the Web: extraction, transformation, and reuseKnowledge Patterns for the Web: extraction, transformation, and reuse
Knowledge Patterns for the Web: extraction, transformation, and reuseAndrea Nuzzolese
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...Carole Goble
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.FAIRDOM
 
Genome science intermine
Genome science intermineGenome science intermine
Genome science intermineELIXIR UK
 

What's hot (20)

Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data WarehouseMaking Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
 
Federated data stores using semantic web technology
Federated data stores using semantic web technologyFederated data stores using semantic web technology
Federated data stores using semantic web technology
 
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
 
Linking Data, Linking People
Linking Data, Linking PeopleLinking Data, Linking People
Linking Data, Linking People
 
Neo4j and bioinformatics
Neo4j and bioinformaticsNeo4j and bioinformatics
Neo4j and bioinformatics
 
20130622 okfn hackathon t2
20130622 okfn hackathon t220130622 okfn hackathon t2
20130622 okfn hackathon t2
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics Institute
 
Using Public RDF Resources in Neo4j
Using Public RDF Resources in Neo4jUsing Public RDF Resources in Neo4j
Using Public RDF Resources in Neo4j
 
Semantic Technologies in ST&DL
Semantic Technologies in ST&DLSemantic Technologies in ST&DL
Semantic Technologies in ST&DL
 
OEG-Tools for supporting Ontology Engineering
OEG-Tools for supporting Ontology EngineeringOEG-Tools for supporting Ontology Engineering
OEG-Tools for supporting Ontology Engineering
 
Knowledge Patterns for the Web: extraction, transformation, and reuse
Knowledge Patterns for the Web: extraction, transformation, and reuseKnowledge Patterns for the Web: extraction, transformation, and reuse
Knowledge Patterns for the Web: extraction, transformation, and reuse
 
Oke
OkeOke
Oke
 
Paul Groth
Paul GrothPaul Groth
Paul Groth
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
Bio4j
Bio4jBio4j
Bio4j
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.
 
Genome science intermine
Genome science intermineGenome science intermine
Genome science intermine
 

Similar to Importing life science at a into Neo4j

Choices, modelling and Frankenstein Ontologies
Choices, modelling and Frankenstein OntologiesChoices, modelling and Frankenstein Ontologies
Choices, modelling and Frankenstein Ontologiesbenosteen
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked dataLaura Po
 
20141112 courtot big_datasemwebontologies
20141112 courtot big_datasemwebontologies20141112 courtot big_datasemwebontologies
20141112 courtot big_datasemwebontologiesMelanie Courtot
 
Data Designed for Discovery
Data Designed for DiscoveryData Designed for Discovery
Data Designed for DiscoveryOCLC
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsJon Voss
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overviewAmit Sheth
 
Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Morgan Briles
 
Publishing and Using Linked Open Data - Day 2
Publishing and Using Linked Open Data - Day 2Publishing and Using Linked Open Data - Day 2
Publishing and Using Linked Open Data - Day 2Richard Urban
 
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinAnja Jentzsch
 
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014eswcsummerschool
 
Metadata for researchers
Metadata for researchers Metadata for researchers
Metadata for researchers Getaneh Alemu
 
Publishing and Using Linked Open Data - Day 1
Publishing and Using Linked Open Data - Day 1 Publishing and Using Linked Open Data - Day 1
Publishing and Using Linked Open Data - Day 1 Richard Urban
 
LOD/LAM Presentation
LOD/LAM PresentationLOD/LAM Presentation
LOD/LAM PresentationHafabe
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphSören Auer
 
It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011Ross Singer
 

Similar to Importing life science at a into Neo4j (20)

Open data and linked data
Open data and linked dataOpen data and linked data
Open data and linked data
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Choices, modelling and Frankenstein Ontologies
Choices, modelling and Frankenstein OntologiesChoices, modelling and Frankenstein Ontologies
Choices, modelling and Frankenstein Ontologies
 
Linked Data
Linked DataLinked Data
Linked Data
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
20141112 courtot big_datasemwebontologies
20141112 courtot big_datasemwebontologies20141112 courtot big_datasemwebontologies
20141112 courtot big_datasemwebontologies
 
Data Designed for Discovery
Data Designed for DiscoveryData Designed for Discovery
Data Designed for Discovery
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overview
 
20110728 datalift-rpi-troy
20110728 datalift-rpi-troy20110728 datalift-rpi-troy
20110728 datalift-rpi-troy
 
Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web
 
20140521 sem-tech-biz-guest-lecture
20140521 sem-tech-biz-guest-lecture20140521 sem-tech-biz-guest-lecture
20140521 sem-tech-biz-guest-lecture
 
Publishing and Using Linked Open Data - Day 2
Publishing and Using Linked Open Data - Day 2Publishing and Using Linked Open Data - Day 2
Publishing and Using Linked Open Data - Day 2
 
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
 
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
 
Metadata for researchers
Metadata for researchers Metadata for researchers
Metadata for researchers
 
Publishing and Using Linked Open Data - Day 1
Publishing and Using Linked Open Data - Day 1 Publishing and Using Linked Open Data - Day 1
Publishing and Using Linked Open Data - Day 1
 
LOD/LAM Presentation
LOD/LAM PresentationLOD/LAM Presentation
LOD/LAM Presentation
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge Graph
 
It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011
 

Recently uploaded

preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxnoordubaliya2003
 

Recently uploaded (20)

preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
 

Importing life science at a into Neo4j

  • 1. Importing linked life science databases into Neo4j Simon Jupp Sample Phenotypes and Ontologies Team European Bioinformatics Institute jupp@ebi.ac.uk
  • 2. Purpose of the workshop • Introduce two alternate graph models • RDF graphs • Property graph • Demonstrate a simple data integration use-case • Show how Neo4j data import tools can be used to rapidly import life science data from public APIs • Example Cypher for querying biological data • Introduction to Neo4j sandboxes, Apoc procedures and tips for creating your own Neo4j guide
  • 3. Some biological questions “Differentially expressed genes in adult mice, bred in oxygen rich vs oxygen poor environments? Of this set, which biological processes (GO) are enriched?” “Where are genes with antigen binding function differentially expressed, which disease and which associated pathways?” “Get metformin associated pathways with differentially expressed genes, find any proteins that are targets for known diabetes drugs” How do you go about answering these kinds of questions?...
  • 4. … you go to the data Literature & ontologies •Experimental Factor Ontology •Gene Ontology •BioStudies •Europe PMC Chemical biology •ChEBI •ChEMBL •SureChEMBL Molecular structures •Protein Data Bank in Europe •Electron Microscopy Data Bank Gene, protein & metabolite expression •Expression Atlas •Metabolights •PRIDE •RNA Central Protein sequences, families & motifs •InterPro •Pfam •UniProt Genes, genomes & variation •Ensembl •Ensembl Genomes •GWAS Catalog •Metagenomics portal Systems •BioModels •BioSamples •Enzyme Portal •IntAct •Reactome Molecular Archives •European Nucleotide Archive •European Variation Archive •European Genome-phenome Archive •ArrayExpress
  • 5. Data integration challenges • Heterogeneous formats and identifiers • We invest heavily in mapping and cross-linking resources, but it’s still hard to integrate and query across internal/external resources. • Lots of effort doing mapping, each groups duplicate these efforts
  • 6. Standardise data publishing • What is we could standardise the way we publish data? • Global identification systems (so we can identify the things in our data) • Common semantics (talking about the same things) • A common query language to the data
  • 7. Original vision of the Web Information Management: A Proposal, Tim Berners-Lee, CERN, March 1989, May 1990, http://www.w3.org/History/1989/proposal.html Relations “Things” Vocabularies Early Web
  • 8. Semantic Web ( or Linked Data) "The Semantic Web is a webby way to link data" “Turning the web into a global API” “The existing web links documents, the semantic web links data” “Shared meaning through ontologies” The Linking Open Data cloud 2017 http://lod-cloud.net
  • 9. RDF is for describing graphs • 1995-2004 W3C develop specification for a vocabulary for Web meta-data called Resource Description Framework (RDF) http://en.wikipedia.org/wiki/Barack_Obama Web Document Structured dataPublishing data as a graph dbpedia:Barack_Obama Human President of the United States Honolulu 1961-08-04 birthplace birthdate position_held type
  • 10. Anatomy of a triple statement • All triples are composed of a subject, predicate and an object Barack Obama Honolulu birth place Subject Predicate Object
  • 11. Identify things on the web • Build on existing Web technology • global identifiers for resources (things) using URIs • URIs should resolve http://dbpedia.org/page/Barack_Obama http://dbpedia.org/page/Honolulu http://dbpedia.org/property/birthPlace Subject Predicate Object
  • 12. Turning relational data to RDF – EBI Gene Expression Atlas database Relational Data to RDF graph conversion •Give “things” URIs •Type “things” with ontologies •Link “things” to other related “things”
  • 13. Stardog Apache Jena SesameVirtuoso Allegrograph OWLIM Storing and querying RDF • Optimized databases for RDF data • SPARQL query language
  • 14. Querying RDF with SPARQL • W3C standard query language for querying RDF data • Query language for matching graph patterns in RDF • SPARQL endpoints – common API to query RDF data • ”Get all presidents of the united states?” from https://query.wikidata.org/ PREFIX position_held: http://www.wikidata.org/prop/direct/P39 PREFIX potus: http://www.wikidata.org/entity/Q11696 SELECT ?label WHERE { ?subject position_held: potus: . ?subject rdfs:label ?label . filter (lang(?label) = "en") }
  • 15. RDF and the Property graph RDF graphs dbpedia:Barack_Obama Human President of the United States Honolulu “1961-08-04”xsd:datetime birthplace birthdate position_held type Every statement adds a new edge to the graph All nodes are resources (with URIs) or literals (with types) Property graphs (Neo4j) “Barack_Obama”xsd:string name dbpedia:Barack_Obam a { name: “Barack Obama” Type: “Human” } Nodes and edges have internal structure Honolulu Birthplace { Birthdate: “1961-08-04” }
  • 16. Working with RDF and Neo4j • RDF great for publishing data • SPARQL gives flexible access to query data • But… • RDF schemas are often (necessarily) complex • Expose the full underlying data semantics • RDF comes with baggage that can be turn off for newomers • Neo4j is for graphs • Easier to grasp for begniners • Powerful query language (Cypher) • Excellent third-party tools, community and developer integrations
  • 17. Working with RDF and Neo • In this tutorial we will harness the publishing power of RDF • Combine with the simplicity and querying power of Neo4j • Use Neo4j data import tools to rapidly import data from public SPARQL endpoints • Simplify the graph schema to fit a specific use case
  • 18. Use-case • Build a simple graph of gene-disease and drug-disease associations • Data from public resources (Ensembl, GWAS, ChEMBL)
  • 19. Setup for workshop • Sandbox Neo4j instance from https://neo4j.com/sandbox- v2/ • Optionally run your own local installation, but you’ll need Apoc procedures installed to run • Run the Neo4j guide :play https://guides.neo4j.com/life-science-import

Editor's Notes

  1. The slide shows the core resources at the EBI to show the range of data you can access through the EBI.