SlideShare a Scribd company logo
1 of 126
A Graph platform for the Life Scientists
Anjani K. Dhrangadhariya
Realizing Neo4j
• September 2015 – Start of the thesis
• December 2015 – No implementation success!
• 1 solution, 3 technologies, 3 failures!
My solution
Neo4j
GraphNeo4j
Graph
Neo4j
Successful
Even bigger realization
Thesis
Changing your view
Illustration by David Pohl
Reductionist view
Systems view
Reductionism vs. Systems
Reductionism
Source: http://sysbiol.cnb.csic.es/SysBiol/sysbiol.html
System
sDissection Integration
Reductionist view
Experiment: Microarray
Sample: Huntingtons disease
Drug: Drug_X
Action: Increased
Protein: Protein_A, Protein_B
Experiment: Yeast two hybrid
Sample: Alzheimer’s disease
Protein1: Protein_A
Protein2: Mutated_Protein_X
Action: Binding, deactivation
Experiment: Mass spectrometry
Sample: Alzheimer’s disease late
stage
Observation: Mutated_Protein_X
Biomarker: Brain atrophy
DB: DrugBank
Microarray
Huntington’s disease
Drug_X
Protein_A, Protein_B
Action: Increased
DB: BioGRID
Yeast 2 hybrid screening
Alzheimer’s disease
Protein1: Protein_A
Protein2: Mutated_Protein_X
Effect: Binding, deactivation
DB: MassBank
Mass spectrometry
Alzheimer’s disease Late st.
Protein: Mutated_Protein_X
Phenotype: Brain atrophy
Effect: Increase
Brain Atrophy
• Nerve cell death
• Tissue loss throughout the brain
Systems View
DB: DrugBank
Microarray
Huntington’s disease
Drug_X
Protein_A, Protein_B
Action: Increase
DB: BioGRID
Yeast 2 hybrid screening
Alzheimer’s disease
Protein1: Protein_A
Protein2: Mutated_Protein_X
Action: Binding, deactivation
DB: MassBank
Mass spectrometry
Alzheimer’s disease Late St.
Protein: Protein_X
Phenotype: Brain atrophy
Effect: Increase
Drug_X
Protein_A
Protein_B
Protein_X
Binds Brain
atrophy
Increases
So if we use Drug_X to increase Protein_A, which in turn
binds to Mutated_Protein_X and deactivates it, then will
it also reduce brain atrophy in Alzheimer’s disease.
Database statistics
• Number of interactions in BioGRID: 80,050
• Total Drug in DrugBank: 10,562
• Total Drug-Target interactions in DrugBank: 16,959
• Total Spectra studies in MassBank: 41,092
Source: bioGRID, DrugBank, MassBank
1 2 3
Difference?
1. Table vs. graph
2. Separate vs. connected
3. No relationship vs.
Relationship
1 - Introduction
Why Graph?
• Data explosion
– Genomics, Proteomics, Metabolomics, Transcriptomics,
Metagenomics, Lipidomics, Foodomics, Glycomics, …
• Most of the biological data is stored in such tabular
databases or flat files.
• No connections between the already available
biological data stored in tables.
• No bigger picture!
Why Graph?
Why Graph? (cotd.)
• Connections are very important in Biology.
• Biology does not happen in tables!
• Graphs are the closest to the natural way of
representing biological situations.
A Graph?
• Pie graph?
• Bar graph?
• Line graph?
What is a Graph?
Schematic representation of a graph
Vertex
2
Vertex
1
Vertex
4
Vertex
3
Graph = A
set/collection
of Vertices
and Edges
Mathematical notation
G = (V, E)
Graph or Network
Graph Network
Vertex Node
Edge Relationship/link
Mathematical
representation
Applied
representation
Node
Node
Node
Node
Schematic representation of a network
Directed and undirected
A B
Real world
• The main elements in a graph are Nodes and
these elements are connected by relationships
as they are connected in the real world.
• Model real world scenario into a graph
Model?
Graphs are Everywhere
General examples
• Community/ Social
• Internet
• Terrorist network
• World Wide Web
• Air-route connection
Examples in Biology
• Protein-protein Interaction
• Gene-regulatory network
• Metabolic network
• Neuronal network
• Drug mechanism of action
network
Internet graph
The network structure of the
Internet by Hal Burch and Bill
Cheswick. Copyright Lumeta
Corporation 2009.
Source URL: http://www.mathaware.org/mam/04/6_Internet_structure.html
https://www.lucidchart.com/blog/network-diagram-templates
Simple internet graph: A network of
connections (wired/wireless) between
devices like laptop, phone, printer, etc.
Social graph
Facebook friendships between people
across the globe. A visualization created by
Paul, an engineering intern at Facebook,
using R programming language.
Source: https://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919
Twitter graph
Generated using yEd graph editor, yED works
The WWW
Terrorist Network
Saddam Hussein Network (2003)
The Universe C. Wilson.
Searching for Saddam: a five-part
series on how the US military
used social networking to
capture the Iraqi dictator. 2010.
Source: https://www.slideshare.net/mathieu-bastian/visualize-big-graph-data
Airport connection
This infographic is a map of the
3,275 global airports and all of
the connecting flight routes.
Designed by Martin Grandjean,
each bubble represents an
individual airport and the
bubble sizes represents the
number of flight routes (37,153
routes in total) based
on OpenFlights.org data.
Source: http://coolinfographics.com/blog/tag/network
Life Sciences
• Graphs are the closest to the natural way of
representing biological situations.
• Biology, biochemistry, pharmaceuticals or
even healthcare, anatomy, neurosciences
Protein-Protein Interaction
Image source: https://www.nature.com/articles/nrg1272
A protein-protein interaction network for
yeast. A network of interactions between
proteins in the single-celled organism
Saccharomyces cerevisiae (bakerʹs yeast),
as determined using, primarily, two-hybrid
screen experiments. From Jeong et al.
Copyright Macmillan Publishers Ltd.
Protein-protein Interaction
Fig: Schematic Protein
interaction
Fig: Protein interaction network of
insulin in human Source: STRING
Database
In a protein interaction
network, nodes denote a
protein molecules and the
links denote physical
interaction between
proteins.
Metabolic network
Source: Kyoto Encyclopedia of Genes and Genomes
URL: http://www.genome.jp/kegg-bin/show_pathway?map01100
Global metabolic pathway
Entry: map01100
Neuronal network
White matter tracts within a
human brain, as visualized
by MRItractography.
In a neuronal network, the
neurons are the nodes and
the synapses are the links
between them. These
networks are usually studied
using Graph theory and
machine learning.
Source: https://en.wikipedia.org/wiki/Connectome
Human Connectome Project
Graph Data Model
Labeled Property Graph model
ANN DAN
IS_MARRIED_TO
LIVES_WITH
Person
name: Ann Mueller
DOB: May, 1980
Twitter: @ann
NODELABEL RELATIONSHIPS PROPERTIES
name: Dan Mueller
DOB: Dec, 1981
Twitter: @danPerson
Another familiar example
ANJANI
CHRIST
COLLEGE
STUDIED_AT
Person
name: Anjani KD
DOB: May, 1980
Twitter: @anjani
NODELABEL RELATIONSHIPS PROPERTIES
name: Christ College
established: 2002
Twitter: @christClgPlace
Student
Insulin – INSR Interaction
INS INSR
LINKED_TO
Protein Protein
name: Insulin receptor
ID: ENSP00000303830
alias: CD220, HHF5
org: Homo sapiens
Hormone
link_Evi: 0.866
coMention_scr: 0.900
binding_scr: 0.900
activation_scr: 0.900
total_scr: 0.999
Source: STRING database: © STRING CONSORTIUM 2017
URL: https://string-db.org/cgi/network.pl?taskId=XRAdWcZivE1K
name: Insulin
ID: ENSP00000250971
alias: IDDM, IDDM1
org: Homo sapiens
Node
• Nodes are the main data elements
• Nodes are connected to other nodes
via relationships
• Nodes can have one or
more properties (i.e., attributes
stored as key/value pairs)
• Nodes have one or more labels that
describes its role in the graph
ANJANI
name: Anjani KD
DOB: May, 1980
Twitter: @anjaniPerson
Student
Relationships
• Relationships connect two nodes
• Relationships are directional
• Nodes can have multiple, even
recursive relationships
• Relationships can have one or
more properties (i.e., attributes
stored as key/value pairs)
Properties
• Properties are named values
where the name (or key) is a
string.
– Key:Value
• Properties are used to describe
nodes and relationships.
INS
name: Insulin
ID: ENSP00000250971
alias: IDDM, IDDM1
org: Homo sapiens
name : Insulin
id : ENSP00000250971
alias : IDDM, IDDM1
org : Homo sapiens
Labels
• Labels are used to group nodes into
sets
• A node may have multiple labels
• Labels are indexed to accelerate
finding nodes in the graph
INS
Protein
Hormone
Task: Graph modeling
Model any biological scenario into a labeled
property graph
• Rules
1. Should have at least 15 nodes
2. Be creative
• Hint
• Glycolysis!
Database
Database
Collection of data,
stores data
Genomics – Genome sequencing
Microarray databases - GEO
Crystallography - PDB
Literature databases - PubMed
Protein-protein interaction data – STRING, BioGRID
Human biological pathway - KEGG
Chemical databases – ChEMBL, DrugBank
Database
• CRUD – Creating, Reading, Updating, Deleting
• Data security
• Transaction data – Secure data transfer without loss
• Backup data (unlike on paper)
• …
Relational databases Non-relational
databases
Tables XML files, Graphs
Relational vs. Graph
Name Stud_id
Alice 111
Fatema 222
Sonya 333
Dept_id Dept_name
001 Chemistry
002 Physics
003 English
Stud_id Dept_id
111 003
222 001
333 002
In which department Alice studies?
Graph database
• Online database management system
• CRUD - Create, Read, Update, Delete
• Data model – Storage for graphs
• Connections are first class citizens of a graph
database
What is Neo4j?
• Database, store information
• Data model – Graph or labeled property graph
• Cypher is its query language
• Easy modeling
• Active Community
Labeled Property graph
model
Who uses Neo4j?
Who uses Neo4j for the
research?
Google trends – Neo4j
Source: Google trends
Query language
The bridge between our data model or data and
a database
What is Cypher?
• Declarative query language for Neo4j
• Allows creating, querying and updating database
Cypher
Property graph Database
Cypher - properties
• Expressive and efficient
• Simple and powerful
• Human-friendly – Suitable for developers and non-
developers alike
• Based on English language
• Readable
Getting started
1. Download Neo4j
2. Install Neo4j
3. Run Neo4j
4. Password change (Don’t forget it  )
Start Neo4j
• Find the Neo4j application and double click it.
1 3
2
Explore the empty interface
Node
info
Relation
info
Property
info
Load: The Movie Database
:play movie graph
1
2
Node
info
Relation
info
Property
info
Movie database schema
A Triple
• A graph triple is composed of three element. Two
nodes and relationship.
Cypher syntax - Node
()
(x)
(x:Person)
(x:Person {name:”Your name”})
(x:Person {name:”Your name”, born:birthYear})
Node
Variable
Node
Label
Property – Key:Value
Variable
• A variable can take any value
• Similarly, a “Node variable” can be any Node from
the graph.
– E.g. “x” student in classroom
(node:Movie {name:”Alien”, tagline:”In space…”, released:1979})
Exercise: Cypher Node syntax
Cypher syntax - Properties
Property pattern - {Key:Value}
Property types comprise
• String values {name:”Kalpana Chawla”}
• Numerical Integer {born:1989}
• Numerical Float {height_ft:6.5}
• Boolean value {switch_on:true}
Exercise: Cypher Label syntax
(x:Person:Actor {name:”Jannis Niewohner”, born:1992})
Cypher syntax - Relationship
(x:Person {name:”Abdul”})-[rel:STUDENT_IN {id: 560914}]->(y:University {uni_name: BMSIT})
Exercise: Cypher Relationship syntax
(x:Actor{name:”Charlize Theron”, born:1977})-
[rel:ACTED_IN ]->(y:Movie {title:”The devils advocate”,
tagline:”Evil has its own winning ways”, released:1997})
Add a Node
• CREATE clause – To create a Node or a Relationship
• RETURN clause – Returns the requested values
CREATE (me:Person {name: "My Name"})
RETURN me
MATCH a node
• Fetch the last node you created with MATCH clause.
• MATCH clause – Specify the patterns to match in the
database
MATCH (me:Person {name:”My Name”})
RETURN me.name
The All Nodes Query
• If you want to retrieve all the nodes in the graph
Any
Node
MATCH (n) RETURN n
MATCH a node
• Fetch the last node “My Name” that you created
using WHERE.
MATCH (me:Person)
WHERE me.name="My Name“
RETURN me.name
Exercise: Create a Node
• Create a node with label “Movie” where title of the
movie is “Mystic River” and the release year is 1993.
Fetch the node.
CREATE (n:Movie {title:”Mystic River”, released:1993})
RETURN n
Add a property
• Add the “We bury our sins here, Dave. We wash
them clean” tagline to the movie Mystic River.
• First fetch the movie Mystic river and then set its
“tagline” property.
MATCH (n:MOVIE)
WHERE n.name=“Mystic River”
SET n.tagline = “We bury our sins here, Dave. We wash them clean”
RETURN movie.title, movie.tagline
Exercise: Update a Node property
• The movie “Mystic river” was released in 2003, not
1993. Update this property.
• Hint: The pattern for adding a property and
updating it are the same.
Schema free
Because Neo4j is schema-free,
you can add any property you
want to any node or
relationship.
Format for adding relationship
(n)-[:REL_TYPE {prop: value}]->(m)
Add a Relationship
• Now, find the actor Kevin Bacon and the
movie Mystic River and add the relationship
between the movie and the actor to the dataset.
MATCH (kevin:Person) WHERE kevin.name = "Kevin Bacon"
MATCH (mystic:Movie) WHERE mystic.title = "Mystic River"
CREATE (kevin)-[rel:ACTED_IN {roles:["Sean"]}]->(mystic)
RETURN mystic,r, kevin
Exercise: Update a Relation property
• Change the role of Kevin Bacon in Mystic
River from ["Sean"] to ["Sean Devine"].
• Hint: The pattern for adding a property and
updating it are the same.
Answer
MATCH (kevin:Person) WHERE kevin.name = "Kevin Bacon"
MATCH (mystic:Movie) WHERE mystic.title = "Mystic River"
MATCH (kevin)-[r:ACTED_IN {roles:["Sean"]}]->(mystic)
SET r.roles = "Sean Devine"
RETURN kevin.name, r.roles, mystic.title
Exercise: Add a relationship
• Create a relationship between “Yourself” and the
movie “Mystic river” where you are the reviewer of
the movie.
MATCH (me:Person), (movie:Movie)
WHERE me.name="My Name" AND movie.title="Mystic River“
CREATE (me)-[r:REVIEWED {rating:80, summary:"tragic character movie"}]->(movie)
RETURN me, r, movie
Two nodes, One relationship
• Find all the nodes that have a relationship between
them.
MATCH (n)-[relationship]->(m)
RETURN n, relationship, m
Exercise
• Add Clint Eastwood as the director of Mystic River.
1. Add a “Clint Eastwood” node
2. Create a relationship between “Clint Eastwood”
and “Mystic River”
Answer
• Fetch the “Clint Eastwood” node and “The Matrix”
node. Then create a relationship between them
where Clint Eastwood is the director of The Matrix.
MATCH (n:Person {name:"Clint Eastwood"})
MATCH (m:Movie {title:"The Matrix"})
CREATE (n)-[rel:DIRECTED]->(m)
RETURN n,rel,m
Delete a Node
• You already added yourself to the graph. Now you
want to delete yourself. Please run the following
query.
• Did you delete yourself? No.
MATCH (n:Person {name:”Your name”})
DELETE n
Delete a Node
• Deleting a node with relationship
MATCH (n:Person {name:“Your name"})
OPTIONAL MATCH (n)-[r]-()
DELETE n,r
Optional match checks if a node has
any relationship with other nodes.
DETACH DELETE
• Deleting a node the easy way
MATCH (emil:Person {name:"Emil Eifrem"})
DETACH DELETE emil
Exercise
• RETURN a list of all the characters in the
movie ”The Matrix”.
Hint
1. Find out all the actors that played in the movie
“The Matrix”.
2. List their roles from the movie
Answer
MATCH (n:Person)-[r:ACTED_IN]->(m:Movie {title:"The
Matrix"})
RETURN n.name, r.roles, m.title
ORDER BY
• Display the oldest people in our database.
• ORDER BY – Allows you to order the returned results
MATCH (person:Person)
RETURN person.name, person.born
ORDER BY person.born
LIMIT
• LIMIT - To limit the number of results returned.
MATCH (actor:Person)-[:ACTED_IN]->(movie:Movie)
RETURN actor.name, movie.title
LIMIT 10;
Exercise
• Return the five oldest people in the database
MATCH (person:Person)
RETURN person ORDER BY person.born
LIMIT 5;
Using DISTINCT
• Often you find yourself wanting to return only
distinct results for a query. For example, let’s look at
the list of the oldest actors. Initially, we might try the
following:
•
MATCH (actor:Person)-[:ACTED_IN]->()
RETURN actor ORDER BY actor.born
LIMIT 5
Using DISTINCT
• But if any of the five oldest actors were in more than
one movie, we’ll get them multiple times. So the
query we really want to run is:
• DISTINCT – Returns unique values by removing the
duplicate entries
MATCH (actor:Person)-[:ACTED_IN]->()
RETURN DISTINCT actor
ORDER BY actor.born
LIMIT 5
Using conditions!
• WHERE
• ><=
Filter comparisons
• Find all the actors that acted with Tom Hanks and are
older than him.
MATCH (tom:Person)-[:ACTED_IN]->()<-[:ACTED_IN]-(actor:Person)
WHERE tom.name="Tom Hanks" AND actor.born < tom.born
RETURN actor.name AS Name
Filter comparisons
• Find all the actors that acted with Gene Hackman.
MATCH (gene:Person)-[:ACTED_IN]->()<-[:ACTED_IN]-(other:Person)
WHERE gene.name="Gene Hackman“
RETURN DISTINCT other
Filter comparisons
• Find all the actors that acted with Gene Hackman
who are also directors
MATCH (gene:Person)-[:ACTED_IN]->()<-[:ACTED_IN]-(other:Person) WHERE
gene.name="Gene Hackman" AND exists( (other)-[:DIRECTED]->() )
RETURN DISTINCT othe
Filter comparisons
• Gene Hackman and not Robin Williams
• Find actors who worked with Gene Hackman, but not
when he was also working with Robin Williamsin the
same movie.
•MATCH (gene:Person {name:"Gene Hackman"})-[:ACTED_IN]-
>(movie:Movie), (other:Person)-[:ACTED_IN]->(movie), (robin:Person
{name:"Robin Williams"}) WHERE NOT exists( (robin)-[:ACTED_IN]-
>(movie) ) RETURN DISTINCT other
Filter comparisons
• Find all the movies that Tom Hanks acted in which
were released after the year 2000.
MATCH (tom:Person)-[:ACTED_IN]->(movie)
WHERE tom.name="Tom Hanks" AND movie.released > 2000
RETURN movie.title AS `Movie Title`
Filter comparisons
• Find all movies that Keanu Reeves acted in.
MATCH (keanu:Person)-[r:ACTED_IN]->(movie)
WHERE keanu.name="Keanu Reeves”
RETURN movie.title
Filter comparisons
• Find all movies in which Keanu Reeves played the
role Neo.
MATCH (keanu:Person)-[r:ACTED_IN]->(movie)
WHERE keanu.name="Keanu Reeves" AND "Neo" IN r.roles
RETURN movie.title
Path
• Path: Series of connected nodes and
relationships.
Exercise: Path finding
Collection
• Listing the movie titles that an actor participated in.
• For every Person who has acted in at least one
movie, the query will RETURN their name and an
array of strings containing the movie titles.
MATCH (person:Person)-[:ACTED_IN]->(movie:Movie)
RETURN person.name, collect(movie.title);
Exercise: Collection
• Return the names of all the directors each actor has
worked with.
MATCH (person:Person)-[:ACTED_IN]->(movie:Movie)<-[:DIRECTED]-(director:Person)
RETURN person.name, collect(director.name);
Count
• Return the count of movies that each actor has
worked in.
MATCH (person:Person)-[:ACTED_IN]->(movie:Movie)
RETURN person.name, count(movie);
Exercise: Count
• Return the count of movies in which an actor and
director have jointly worked.
MATCH (person:Person)-[:ACTED_IN]->(movie:Movie)<-[:DIRECTED]-(director:Person)
RETURN person.name, director.name, count(movie);
Top n
• If we were interested in the top ten actors who acted
in the most movies, the query would look like this.
1. Find the number of movies each actor played in. (COLLECT)
2. Arrange the results in order (ORDER BY) (DESC)
3. Get only top 10 results (LIMIT)
MATCH (person:Person)-[:ACTED_IN]->(movie:Movie)<-[:DIRECTED]-(director:Person)
RETURN person.name, director.name, count(movie);
Exercise
• Who are the five busiest actors?
MATCH
Lets play another game
• Hands-on
• Create a small biological graph in Neo4j (Maybe
the one that you designed earlier on the paper?)
• Rules
1. There must be at least than 15 nodes
2. The scenario should be biological
• Be creative
The HetNet Awakens
Dr. Daniel Himmelstein
Uni Penn.
https://neo4j.het.io/browser/
HetioNet
• Hetionet is a network of biology, disease, and
pharmacology.
• Knowledge from millions of biomedical studies
over the last half century have been encoded into
a single hetnet.
• Version 1.0 contains 47,031 nodes of 11 types
and 2,250,197 relationships of 24 types.
HetioNet – Neo4j
Explore: HetioNet
CALL db.schema()
• Analyze the HetioNet by yourself
Explore: HetioNet
Neo4j Resources
• Books: https://neo4j.com/books/
• Graph tours: https://neo4j.com/graphtour/
• Best learning resources: https://neo4j.com/blog/top-13-
resources-graph-theory-algorithms/
• Neo4j training: Neo4j certification, Udemy, PluralSight
Get
connected
with Neo4j
Get connected with me
(graphs)-[:ARE]->(everywhere)

More Related Content

What's hot

4. Do you know how to develop your research design and methodology?
4. Do you know how to develop your research design and methodology?4. Do you know how to develop your research design and methodology?
4. Do you know how to develop your research design and methodology?DoctoralNet Limited
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge GraphsJeff Z. Pan
 
Converting Relational to Graph Databases
Converting Relational to Graph DatabasesConverting Relational to Graph Databases
Converting Relational to Graph DatabasesAntonio Maccioni
 
What is software engineering
What is software engineeringWhat is software engineering
What is software engineeringJennifer Polack
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4jNeo4j
 
My PhD thesis defense presentation
My PhD thesis defense presentationMy PhD thesis defense presentation
My PhD thesis defense presentationSuman Srinivasan
 
Scaling the mirrorworld with knowledge graphs
Scaling the mirrorworld with knowledge graphsScaling the mirrorworld with knowledge graphs
Scaling the mirrorworld with knowledge graphsAlan Morrison
 
Chapter 15 software product metrics
Chapter 15 software product metricsChapter 15 software product metrics
Chapter 15 software product metricsSHREEHARI WADAWADAGI
 
Writing scientific papers FINALDec 2018
Writing scientific papers FINALDec 2018Writing scientific papers FINALDec 2018
Writing scientific papers FINALDec 2018Bhaswat Chakraborty
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
Neo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic trainingNeo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic trainingNeo4j
 
SlapOS Presentation at VW2011 Seoul
SlapOS Presentation at VW2011 SeoulSlapOS Presentation at VW2011 Seoul
SlapOS Presentation at VW2011 SeoulJean-Paul Smets
 
A Winning Combination: IBM Storage and VMware
A Winning Combination: IBM Storage and VMwareA Winning Combination: IBM Storage and VMware
A Winning Combination: IBM Storage and VMwarePaula Koziol
 
Methods Section of your Research Paper
Methods Section of your Research PaperMethods Section of your Research Paper
Methods Section of your Research PaperCognibrain Healthcare
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: ParsingRushdi Shams
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchNeo4j
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for DatabricksDatabricks
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesNeo4j
 

What's hot (20)

4. Do you know how to develop your research design and methodology?
4. Do you know how to develop your research design and methodology?4. Do you know how to develop your research design and methodology?
4. Do you know how to develop your research design and methodology?
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
Converting Relational to Graph Databases
Converting Relational to Graph DatabasesConverting Relational to Graph Databases
Converting Relational to Graph Databases
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
What is software engineering
What is software engineeringWhat is software engineering
What is software engineering
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4j
 
My PhD thesis defense presentation
My PhD thesis defense presentationMy PhD thesis defense presentation
My PhD thesis defense presentation
 
Scaling the mirrorworld with knowledge graphs
Scaling the mirrorworld with knowledge graphsScaling the mirrorworld with knowledge graphs
Scaling the mirrorworld with knowledge graphs
 
Chapter 15 software product metrics
Chapter 15 software product metricsChapter 15 software product metrics
Chapter 15 software product metrics
 
Writing scientific papers FINALDec 2018
Writing scientific papers FINALDec 2018Writing scientific papers FINALDec 2018
Writing scientific papers FINALDec 2018
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Neo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic trainingNeo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic training
 
SlapOS Presentation at VW2011 Seoul
SlapOS Presentation at VW2011 SeoulSlapOS Presentation at VW2011 Seoul
SlapOS Presentation at VW2011 Seoul
 
A Winning Combination: IBM Storage and VMware
A Winning Combination: IBM Storage and VMwareA Winning Combination: IBM Storage and VMware
A Winning Combination: IBM Storage and VMware
 
Methods Section of your Research Paper
Methods Section of your Research PaperMethods Section of your Research Paper
Methods Section of your Research Paper
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: Parsing
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
 
Las Failure
Las FailureLas Failure
Las Failure
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for Databricks
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
 

Similar to Introduction to graph databases: Neo4j and Cypher

Algorithmic approach to computational biology using graphs
Algorithmic approach to computational biology using graphsAlgorithmic approach to computational biology using graphs
Algorithmic approach to computational biology using graphsS P Sajjan
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08Russ Altman
 
System Biology and Pathway Network.pptx
System Biology and Pathway Network.pptxSystem Biology and Pathway Network.pptx
System Biology and Pathway Network.pptxssuserecbdb6
 
Session ii g2 overview chemical modeling mmc
Session ii g2 overview chemical modeling mmcSession ii g2 overview chemical modeling mmc
Session ii g2 overview chemical modeling mmcUSD Bioinformatics
 
Network Biology: A paradigm for modeling biological complex systems
Network Biology: A paradigm for modeling biological complex systemsNetwork Biology: A paradigm for modeling biological complex systems
Network Biology: A paradigm for modeling biological complex systemsGanesh Bagler
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformaticsphilmaweb
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple nadeem akhter
 
Friend harvard 2013-01-30
Friend harvard 2013-01-30Friend harvard 2013-01-30
Friend harvard 2013-01-30Sage Base
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactionsPrianca12
 
The Impact of Information Technology on Chemistry and Related Sciences
The Impact of Information Technology on Chemistry and Related SciencesThe Impact of Information Technology on Chemistry and Related Sciences
The Impact of Information Technology on Chemistry and Related SciencesAshutosh Jogalekar
 
Pcmd bioinformatics-lecture i
Pcmd bioinformatics-lecture iPcmd bioinformatics-lecture i
Pcmd bioinformatics-lecture iMuhammad Younis
 
System biology and its tools
System biology and its toolsSystem biology and its tools
System biology and its toolsGaurav Diwakar
 
Friend NRNB 2012-12-13
Friend NRNB 2012-12-13Friend NRNB 2012-12-13
Friend NRNB 2012-12-13Sage Base
 
Machine learning in biology
Machine learning in biologyMachine learning in biology
Machine learning in biologyPranavathiyani G
 
Technology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network RepresentationsTechnology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network RepresentationsAlexander Pico
 
Boosting probabilistic graphical model inference by incorporating prior knowl...
Boosting probabilistic graphical model inference by incorporating prior knowl...Boosting probabilistic graphical model inference by incorporating prior knowl...
Boosting probabilistic graphical model inference by incorporating prior knowl...Hakky St
 

Similar to Introduction to graph databases: Neo4j and Cypher (20)

Algorithmic approach to computational biology using graphs
Algorithmic approach to computational biology using graphsAlgorithmic approach to computational biology using graphs
Algorithmic approach to computational biology using graphs
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
 
System Biology and Pathway Network.pptx
System Biology and Pathway Network.pptxSystem Biology and Pathway Network.pptx
System Biology and Pathway Network.pptx
 
Session ii g2 overview chemical modeling mmc
Session ii g2 overview chemical modeling mmcSession ii g2 overview chemical modeling mmc
Session ii g2 overview chemical modeling mmc
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Network Biology: A paradigm for modeling biological complex systems
Network Biology: A paradigm for modeling biological complex systemsNetwork Biology: A paradigm for modeling biological complex systems
Network Biology: A paradigm for modeling biological complex systems
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
Friend harvard 2013-01-30
Friend harvard 2013-01-30Friend harvard 2013-01-30
Friend harvard 2013-01-30
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactions
 
The Impact of Information Technology on Chemistry and Related Sciences
The Impact of Information Technology on Chemistry and Related SciencesThe Impact of Information Technology on Chemistry and Related Sciences
The Impact of Information Technology on Chemistry and Related Sciences
 
Pcmd bioinformatics-lecture i
Pcmd bioinformatics-lecture iPcmd bioinformatics-lecture i
Pcmd bioinformatics-lecture i
 
presentation
presentationpresentation
presentation
 
System biology and its tools
System biology and its toolsSystem biology and its tools
System biology and its tools
 
Friend NRNB 2012-12-13
Friend NRNB 2012-12-13Friend NRNB 2012-12-13
Friend NRNB 2012-12-13
 
Machine learning in biology
Machine learning in biologyMachine learning in biology
Machine learning in biology
 
Technology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network RepresentationsTechnology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network Representations
 
MoM2010: Bioinformatics
MoM2010: BioinformaticsMoM2010: Bioinformatics
MoM2010: Bioinformatics
 
Use of data
Use of dataUse of data
Use of data
 
Boosting probabilistic graphical model inference by incorporating prior knowl...
Boosting probabilistic graphical model inference by incorporating prior knowl...Boosting probabilistic graphical model inference by incorporating prior knowl...
Boosting probabilistic graphical model inference by incorporating prior knowl...
 

More from Anjani Dhrangadhariya

Weakly supervised PICO information extraction using Snorkel
Weakly supervised PICO information extraction using SnorkelWeakly supervised PICO information extraction using Snorkel
Weakly supervised PICO information extraction using SnorkelAnjani Dhrangadhariya
 
DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...
DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...
DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...Anjani Dhrangadhariya
 
Machine Learning Assisted Citation Screening for Systematic Reviews
Machine Learning Assisted Citation Screening for Systematic ReviewsMachine Learning Assisted Citation Screening for Systematic Reviews
Machine Learning Assisted Citation Screening for Systematic ReviewsAnjani Dhrangadhariya
 
End-to-end Fine-grained Neural Entity Recognition of Patients, Interventions,...
End-to-end Fine-grained Neural Entity Recognition of Patients, Interventions,...End-to-end Fine-grained Neural Entity Recognition of Patients, Interventions,...
End-to-end Fine-grained Neural Entity Recognition of Patients, Interventions,...Anjani Dhrangadhariya
 
Classification of prostate cancer pathology reports using natural language pr...
Classification of prostate cancer pathology reports using natural language pr...Classification of prostate cancer pathology reports using natural language pr...
Classification of prostate cancer pathology reports using natural language pr...Anjani Dhrangadhariya
 
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...Anjani Dhrangadhariya
 

More from Anjani Dhrangadhariya (6)

Weakly supervised PICO information extraction using Snorkel
Weakly supervised PICO information extraction using SnorkelWeakly supervised PICO information extraction using Snorkel
Weakly supervised PICO information extraction using Snorkel
 
DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...
DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...
DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...
 
Machine Learning Assisted Citation Screening for Systematic Reviews
Machine Learning Assisted Citation Screening for Systematic ReviewsMachine Learning Assisted Citation Screening for Systematic Reviews
Machine Learning Assisted Citation Screening for Systematic Reviews
 
End-to-end Fine-grained Neural Entity Recognition of Patients, Interventions,...
End-to-end Fine-grained Neural Entity Recognition of Patients, Interventions,...End-to-end Fine-grained Neural Entity Recognition of Patients, Interventions,...
End-to-end Fine-grained Neural Entity Recognition of Patients, Interventions,...
 
Classification of prostate cancer pathology reports using natural language pr...
Classification of prostate cancer pathology reports using natural language pr...Classification of prostate cancer pathology reports using natural language pr...
Classification of prostate cancer pathology reports using natural language pr...
 
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
 

Recently uploaded

Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 

Recently uploaded (20)

Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 

Introduction to graph databases: Neo4j and Cypher

  • 1. A Graph platform for the Life Scientists Anjani K. Dhrangadhariya
  • 2. Realizing Neo4j • September 2015 – Start of the thesis • December 2015 – No implementation success! • 1 solution, 3 technologies, 3 failures!
  • 6. Changing your view Illustration by David Pohl Reductionist view Systems view
  • 7. Reductionism vs. Systems Reductionism Source: http://sysbiol.cnb.csic.es/SysBiol/sysbiol.html System sDissection Integration
  • 8. Reductionist view Experiment: Microarray Sample: Huntingtons disease Drug: Drug_X Action: Increased Protein: Protein_A, Protein_B Experiment: Yeast two hybrid Sample: Alzheimer’s disease Protein1: Protein_A Protein2: Mutated_Protein_X Action: Binding, deactivation Experiment: Mass spectrometry Sample: Alzheimer’s disease late stage Observation: Mutated_Protein_X Biomarker: Brain atrophy DB: DrugBank Microarray Huntington’s disease Drug_X Protein_A, Protein_B Action: Increased DB: BioGRID Yeast 2 hybrid screening Alzheimer’s disease Protein1: Protein_A Protein2: Mutated_Protein_X Effect: Binding, deactivation DB: MassBank Mass spectrometry Alzheimer’s disease Late st. Protein: Mutated_Protein_X Phenotype: Brain atrophy Effect: Increase
  • 9. Brain Atrophy • Nerve cell death • Tissue loss throughout the brain
  • 10. Systems View DB: DrugBank Microarray Huntington’s disease Drug_X Protein_A, Protein_B Action: Increase DB: BioGRID Yeast 2 hybrid screening Alzheimer’s disease Protein1: Protein_A Protein2: Mutated_Protein_X Action: Binding, deactivation DB: MassBank Mass spectrometry Alzheimer’s disease Late St. Protein: Protein_X Phenotype: Brain atrophy Effect: Increase Drug_X Protein_A Protein_B Protein_X Binds Brain atrophy Increases So if we use Drug_X to increase Protein_A, which in turn binds to Mutated_Protein_X and deactivates it, then will it also reduce brain atrophy in Alzheimer’s disease.
  • 11. Database statistics • Number of interactions in BioGRID: 80,050 • Total Drug in DrugBank: 10,562 • Total Drug-Target interactions in DrugBank: 16,959 • Total Spectra studies in MassBank: 41,092 Source: bioGRID, DrugBank, MassBank
  • 12. 1 2 3 Difference? 1. Table vs. graph 2. Separate vs. connected 3. No relationship vs. Relationship 1 - Introduction
  • 13. Why Graph? • Data explosion – Genomics, Proteomics, Metabolomics, Transcriptomics, Metagenomics, Lipidomics, Foodomics, Glycomics, … • Most of the biological data is stored in such tabular databases or flat files. • No connections between the already available biological data stored in tables. • No bigger picture!
  • 15. Why Graph? (cotd.) • Connections are very important in Biology. • Biology does not happen in tables! • Graphs are the closest to the natural way of representing biological situations.
  • 16. A Graph? • Pie graph? • Bar graph? • Line graph?
  • 17. What is a Graph? Schematic representation of a graph Vertex 2 Vertex 1 Vertex 4 Vertex 3 Graph = A set/collection of Vertices and Edges Mathematical notation G = (V, E)
  • 18. Graph or Network Graph Network Vertex Node Edge Relationship/link Mathematical representation Applied representation Node Node Node Node Schematic representation of a network
  • 20. Real world • The main elements in a graph are Nodes and these elements are connected by relationships as they are connected in the real world. • Model real world scenario into a graph
  • 22. Graphs are Everywhere General examples • Community/ Social • Internet • Terrorist network • World Wide Web • Air-route connection Examples in Biology • Protein-protein Interaction • Gene-regulatory network • Metabolic network • Neuronal network • Drug mechanism of action network
  • 23. Internet graph The network structure of the Internet by Hal Burch and Bill Cheswick. Copyright Lumeta Corporation 2009. Source URL: http://www.mathaware.org/mam/04/6_Internet_structure.html https://www.lucidchart.com/blog/network-diagram-templates Simple internet graph: A network of connections (wired/wireless) between devices like laptop, phone, printer, etc.
  • 24. Social graph Facebook friendships between people across the globe. A visualization created by Paul, an engineering intern at Facebook, using R programming language. Source: https://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919
  • 25. Twitter graph Generated using yEd graph editor, yED works
  • 27. Terrorist Network Saddam Hussein Network (2003) The Universe C. Wilson. Searching for Saddam: a five-part series on how the US military used social networking to capture the Iraqi dictator. 2010. Source: https://www.slideshare.net/mathieu-bastian/visualize-big-graph-data
  • 28. Airport connection This infographic is a map of the 3,275 global airports and all of the connecting flight routes. Designed by Martin Grandjean, each bubble represents an individual airport and the bubble sizes represents the number of flight routes (37,153 routes in total) based on OpenFlights.org data. Source: http://coolinfographics.com/blog/tag/network
  • 29. Life Sciences • Graphs are the closest to the natural way of representing biological situations. • Biology, biochemistry, pharmaceuticals or even healthcare, anatomy, neurosciences
  • 30. Protein-Protein Interaction Image source: https://www.nature.com/articles/nrg1272 A protein-protein interaction network for yeast. A network of interactions between proteins in the single-celled organism Saccharomyces cerevisiae (bakerʹs yeast), as determined using, primarily, two-hybrid screen experiments. From Jeong et al. Copyright Macmillan Publishers Ltd.
  • 31. Protein-protein Interaction Fig: Schematic Protein interaction Fig: Protein interaction network of insulin in human Source: STRING Database In a protein interaction network, nodes denote a protein molecules and the links denote physical interaction between proteins.
  • 32. Metabolic network Source: Kyoto Encyclopedia of Genes and Genomes URL: http://www.genome.jp/kegg-bin/show_pathway?map01100 Global metabolic pathway Entry: map01100
  • 33. Neuronal network White matter tracts within a human brain, as visualized by MRItractography. In a neuronal network, the neurons are the nodes and the synapses are the links between them. These networks are usually studied using Graph theory and machine learning. Source: https://en.wikipedia.org/wiki/Connectome
  • 36. Labeled Property Graph model ANN DAN IS_MARRIED_TO LIVES_WITH Person name: Ann Mueller DOB: May, 1980 Twitter: @ann NODELABEL RELATIONSHIPS PROPERTIES name: Dan Mueller DOB: Dec, 1981 Twitter: @danPerson
  • 37. Another familiar example ANJANI CHRIST COLLEGE STUDIED_AT Person name: Anjani KD DOB: May, 1980 Twitter: @anjani NODELABEL RELATIONSHIPS PROPERTIES name: Christ College established: 2002 Twitter: @christClgPlace Student
  • 38. Insulin – INSR Interaction INS INSR LINKED_TO Protein Protein name: Insulin receptor ID: ENSP00000303830 alias: CD220, HHF5 org: Homo sapiens Hormone link_Evi: 0.866 coMention_scr: 0.900 binding_scr: 0.900 activation_scr: 0.900 total_scr: 0.999 Source: STRING database: © STRING CONSORTIUM 2017 URL: https://string-db.org/cgi/network.pl?taskId=XRAdWcZivE1K name: Insulin ID: ENSP00000250971 alias: IDDM, IDDM1 org: Homo sapiens
  • 39. Node • Nodes are the main data elements • Nodes are connected to other nodes via relationships • Nodes can have one or more properties (i.e., attributes stored as key/value pairs) • Nodes have one or more labels that describes its role in the graph ANJANI name: Anjani KD DOB: May, 1980 Twitter: @anjaniPerson Student
  • 40. Relationships • Relationships connect two nodes • Relationships are directional • Nodes can have multiple, even recursive relationships • Relationships can have one or more properties (i.e., attributes stored as key/value pairs)
  • 41. Properties • Properties are named values where the name (or key) is a string. – Key:Value • Properties are used to describe nodes and relationships. INS name: Insulin ID: ENSP00000250971 alias: IDDM, IDDM1 org: Homo sapiens name : Insulin id : ENSP00000250971 alias : IDDM, IDDM1 org : Homo sapiens
  • 42. Labels • Labels are used to group nodes into sets • A node may have multiple labels • Labels are indexed to accelerate finding nodes in the graph INS Protein Hormone
  • 43. Task: Graph modeling Model any biological scenario into a labeled property graph • Rules 1. Should have at least 15 nodes 2. Be creative • Hint • Glycolysis!
  • 45. Database Collection of data, stores data Genomics – Genome sequencing Microarray databases - GEO Crystallography - PDB Literature databases - PubMed Protein-protein interaction data – STRING, BioGRID Human biological pathway - KEGG Chemical databases – ChEMBL, DrugBank
  • 46. Database • CRUD – Creating, Reading, Updating, Deleting • Data security • Transaction data – Secure data transfer without loss • Backup data (unlike on paper) • … Relational databases Non-relational databases Tables XML files, Graphs
  • 47. Relational vs. Graph Name Stud_id Alice 111 Fatema 222 Sonya 333 Dept_id Dept_name 001 Chemistry 002 Physics 003 English Stud_id Dept_id 111 003 222 001 333 002 In which department Alice studies?
  • 48. Graph database • Online database management system • CRUD - Create, Read, Update, Delete • Data model – Storage for graphs • Connections are first class citizens of a graph database
  • 49. What is Neo4j? • Database, store information • Data model – Graph or labeled property graph • Cypher is its query language • Easy modeling • Active Community Labeled Property graph model
  • 51. Who uses Neo4j for the research?
  • 52. Google trends – Neo4j Source: Google trends
  • 53. Query language The bridge between our data model or data and a database
  • 54. What is Cypher? • Declarative query language for Neo4j • Allows creating, querying and updating database Cypher Property graph Database
  • 55. Cypher - properties • Expressive and efficient • Simple and powerful • Human-friendly – Suitable for developers and non- developers alike • Based on English language • Readable
  • 56. Getting started 1. Download Neo4j 2. Install Neo4j 3. Run Neo4j 4. Password change (Don’t forget it  )
  • 57. Start Neo4j • Find the Neo4j application and double click it. 1 3 2
  • 58. Explore the empty interface Node info Relation info Property info
  • 59. Load: The Movie Database :play movie graph
  • 60.
  • 61. 1 2
  • 64. A Triple • A graph triple is composed of three element. Two nodes and relationship.
  • 65. Cypher syntax - Node () (x) (x:Person) (x:Person {name:”Your name”}) (x:Person {name:”Your name”, born:birthYear}) Node Variable Node Label Property – Key:Value
  • 66. Variable • A variable can take any value • Similarly, a “Node variable” can be any Node from the graph. – E.g. “x” student in classroom
  • 67. (node:Movie {name:”Alien”, tagline:”In space…”, released:1979}) Exercise: Cypher Node syntax
  • 68. Cypher syntax - Properties Property pattern - {Key:Value} Property types comprise • String values {name:”Kalpana Chawla”} • Numerical Integer {born:1989} • Numerical Float {height_ft:6.5} • Boolean value {switch_on:true}
  • 69. Exercise: Cypher Label syntax (x:Person:Actor {name:”Jannis Niewohner”, born:1992})
  • 70. Cypher syntax - Relationship (x:Person {name:”Abdul”})-[rel:STUDENT_IN {id: 560914}]->(y:University {uni_name: BMSIT})
  • 71. Exercise: Cypher Relationship syntax (x:Actor{name:”Charlize Theron”, born:1977})- [rel:ACTED_IN ]->(y:Movie {title:”The devils advocate”, tagline:”Evil has its own winning ways”, released:1997})
  • 72.
  • 73. Add a Node • CREATE clause – To create a Node or a Relationship • RETURN clause – Returns the requested values CREATE (me:Person {name: "My Name"}) RETURN me
  • 74. MATCH a node • Fetch the last node you created with MATCH clause. • MATCH clause – Specify the patterns to match in the database MATCH (me:Person {name:”My Name”}) RETURN me.name
  • 75. The All Nodes Query • If you want to retrieve all the nodes in the graph Any Node MATCH (n) RETURN n
  • 76. MATCH a node • Fetch the last node “My Name” that you created using WHERE. MATCH (me:Person) WHERE me.name="My Name“ RETURN me.name
  • 77. Exercise: Create a Node • Create a node with label “Movie” where title of the movie is “Mystic River” and the release year is 1993. Fetch the node. CREATE (n:Movie {title:”Mystic River”, released:1993}) RETURN n
  • 78. Add a property • Add the “We bury our sins here, Dave. We wash them clean” tagline to the movie Mystic River. • First fetch the movie Mystic river and then set its “tagline” property. MATCH (n:MOVIE) WHERE n.name=“Mystic River” SET n.tagline = “We bury our sins here, Dave. We wash them clean” RETURN movie.title, movie.tagline
  • 79. Exercise: Update a Node property • The movie “Mystic river” was released in 2003, not 1993. Update this property. • Hint: The pattern for adding a property and updating it are the same.
  • 80. Schema free Because Neo4j is schema-free, you can add any property you want to any node or relationship.
  • 81. Format for adding relationship (n)-[:REL_TYPE {prop: value}]->(m)
  • 82. Add a Relationship • Now, find the actor Kevin Bacon and the movie Mystic River and add the relationship between the movie and the actor to the dataset. MATCH (kevin:Person) WHERE kevin.name = "Kevin Bacon" MATCH (mystic:Movie) WHERE mystic.title = "Mystic River" CREATE (kevin)-[rel:ACTED_IN {roles:["Sean"]}]->(mystic) RETURN mystic,r, kevin
  • 83. Exercise: Update a Relation property • Change the role of Kevin Bacon in Mystic River from ["Sean"] to ["Sean Devine"]. • Hint: The pattern for adding a property and updating it are the same.
  • 84. Answer MATCH (kevin:Person) WHERE kevin.name = "Kevin Bacon" MATCH (mystic:Movie) WHERE mystic.title = "Mystic River" MATCH (kevin)-[r:ACTED_IN {roles:["Sean"]}]->(mystic) SET r.roles = "Sean Devine" RETURN kevin.name, r.roles, mystic.title
  • 85. Exercise: Add a relationship • Create a relationship between “Yourself” and the movie “Mystic river” where you are the reviewer of the movie. MATCH (me:Person), (movie:Movie) WHERE me.name="My Name" AND movie.title="Mystic River“ CREATE (me)-[r:REVIEWED {rating:80, summary:"tragic character movie"}]->(movie) RETURN me, r, movie
  • 86. Two nodes, One relationship • Find all the nodes that have a relationship between them. MATCH (n)-[relationship]->(m) RETURN n, relationship, m
  • 87. Exercise • Add Clint Eastwood as the director of Mystic River. 1. Add a “Clint Eastwood” node 2. Create a relationship between “Clint Eastwood” and “Mystic River”
  • 88. Answer • Fetch the “Clint Eastwood” node and “The Matrix” node. Then create a relationship between them where Clint Eastwood is the director of The Matrix. MATCH (n:Person {name:"Clint Eastwood"}) MATCH (m:Movie {title:"The Matrix"}) CREATE (n)-[rel:DIRECTED]->(m) RETURN n,rel,m
  • 89. Delete a Node • You already added yourself to the graph. Now you want to delete yourself. Please run the following query. • Did you delete yourself? No. MATCH (n:Person {name:”Your name”}) DELETE n
  • 90.
  • 91. Delete a Node • Deleting a node with relationship MATCH (n:Person {name:“Your name"}) OPTIONAL MATCH (n)-[r]-() DELETE n,r Optional match checks if a node has any relationship with other nodes.
  • 92. DETACH DELETE • Deleting a node the easy way MATCH (emil:Person {name:"Emil Eifrem"}) DETACH DELETE emil
  • 93. Exercise • RETURN a list of all the characters in the movie ”The Matrix”. Hint 1. Find out all the actors that played in the movie “The Matrix”. 2. List their roles from the movie
  • 95. ORDER BY • Display the oldest people in our database. • ORDER BY – Allows you to order the returned results MATCH (person:Person) RETURN person.name, person.born ORDER BY person.born
  • 96. LIMIT • LIMIT - To limit the number of results returned. MATCH (actor:Person)-[:ACTED_IN]->(movie:Movie) RETURN actor.name, movie.title LIMIT 10;
  • 97. Exercise • Return the five oldest people in the database MATCH (person:Person) RETURN person ORDER BY person.born LIMIT 5;
  • 98. Using DISTINCT • Often you find yourself wanting to return only distinct results for a query. For example, let’s look at the list of the oldest actors. Initially, we might try the following: • MATCH (actor:Person)-[:ACTED_IN]->() RETURN actor ORDER BY actor.born LIMIT 5
  • 99. Using DISTINCT • But if any of the five oldest actors were in more than one movie, we’ll get them multiple times. So the query we really want to run is: • DISTINCT – Returns unique values by removing the duplicate entries MATCH (actor:Person)-[:ACTED_IN]->() RETURN DISTINCT actor ORDER BY actor.born LIMIT 5
  • 101. Filter comparisons • Find all the actors that acted with Tom Hanks and are older than him. MATCH (tom:Person)-[:ACTED_IN]->()<-[:ACTED_IN]-(actor:Person) WHERE tom.name="Tom Hanks" AND actor.born < tom.born RETURN actor.name AS Name
  • 102.
  • 103. Filter comparisons • Find all the actors that acted with Gene Hackman. MATCH (gene:Person)-[:ACTED_IN]->()<-[:ACTED_IN]-(other:Person) WHERE gene.name="Gene Hackman“ RETURN DISTINCT other
  • 104. Filter comparisons • Find all the actors that acted with Gene Hackman who are also directors MATCH (gene:Person)-[:ACTED_IN]->()<-[:ACTED_IN]-(other:Person) WHERE gene.name="Gene Hackman" AND exists( (other)-[:DIRECTED]->() ) RETURN DISTINCT othe
  • 105. Filter comparisons • Gene Hackman and not Robin Williams • Find actors who worked with Gene Hackman, but not when he was also working with Robin Williamsin the same movie. •MATCH (gene:Person {name:"Gene Hackman"})-[:ACTED_IN]- >(movie:Movie), (other:Person)-[:ACTED_IN]->(movie), (robin:Person {name:"Robin Williams"}) WHERE NOT exists( (robin)-[:ACTED_IN]- >(movie) ) RETURN DISTINCT other
  • 106. Filter comparisons • Find all the movies that Tom Hanks acted in which were released after the year 2000. MATCH (tom:Person)-[:ACTED_IN]->(movie) WHERE tom.name="Tom Hanks" AND movie.released > 2000 RETURN movie.title AS `Movie Title`
  • 107. Filter comparisons • Find all movies that Keanu Reeves acted in. MATCH (keanu:Person)-[r:ACTED_IN]->(movie) WHERE keanu.name="Keanu Reeves” RETURN movie.title
  • 108. Filter comparisons • Find all movies in which Keanu Reeves played the role Neo. MATCH (keanu:Person)-[r:ACTED_IN]->(movie) WHERE keanu.name="Keanu Reeves" AND "Neo" IN r.roles RETURN movie.title
  • 109. Path • Path: Series of connected nodes and relationships.
  • 111. Collection • Listing the movie titles that an actor participated in. • For every Person who has acted in at least one movie, the query will RETURN their name and an array of strings containing the movie titles. MATCH (person:Person)-[:ACTED_IN]->(movie:Movie) RETURN person.name, collect(movie.title);
  • 112. Exercise: Collection • Return the names of all the directors each actor has worked with. MATCH (person:Person)-[:ACTED_IN]->(movie:Movie)<-[:DIRECTED]-(director:Person) RETURN person.name, collect(director.name);
  • 113. Count • Return the count of movies that each actor has worked in. MATCH (person:Person)-[:ACTED_IN]->(movie:Movie) RETURN person.name, count(movie);
  • 114. Exercise: Count • Return the count of movies in which an actor and director have jointly worked. MATCH (person:Person)-[:ACTED_IN]->(movie:Movie)<-[:DIRECTED]-(director:Person) RETURN person.name, director.name, count(movie);
  • 115. Top n • If we were interested in the top ten actors who acted in the most movies, the query would look like this. 1. Find the number of movies each actor played in. (COLLECT) 2. Arrange the results in order (ORDER BY) (DESC) 3. Get only top 10 results (LIMIT) MATCH (person:Person)-[:ACTED_IN]->(movie:Movie)<-[:DIRECTED]-(director:Person) RETURN person.name, director.name, count(movie);
  • 116. Exercise • Who are the five busiest actors? MATCH
  • 117. Lets play another game • Hands-on • Create a small biological graph in Neo4j (Maybe the one that you designed earlier on the paper?) • Rules 1. There must be at least than 15 nodes 2. The scenario should be biological • Be creative
  • 118. The HetNet Awakens Dr. Daniel Himmelstein Uni Penn. https://neo4j.het.io/browser/
  • 119. HetioNet • Hetionet is a network of biology, disease, and pharmacology. • Knowledge from millions of biomedical studies over the last half century have been encoded into a single hetnet. • Version 1.0 contains 47,031 nodes of 11 types and 2,250,197 relationships of 24 types.
  • 122. • Analyze the HetioNet by yourself Explore: HetioNet
  • 123. Neo4j Resources • Books: https://neo4j.com/books/ • Graph tours: https://neo4j.com/graphtour/ • Best learning resources: https://neo4j.com/blog/top-13- resources-graph-theory-algorithms/ • Neo4j training: Neo4j certification, Udemy, PluralSight

Editor's Notes

  1. Reductionist approach is based on dividing biological systems into its constituent parts: From studying organisms to dividing it into organ system, later into organs, then studying at the level of tissues, biological assemblies, and finally getting at the level of biomolecules like DNA, RNA, Protein, Metabolities. They started studying individual biomolecules like protein-protein interaction, Drug-target interaction, Protein-profiling, transcript/RNA profiling, etc. That is it. How does the system on the whole play? How does a pathway connect to another pathway resulting into a function of the cell? How do different disease pathways cancer interact in a way to kill an entire organism?
  2. The movie database has two types of nodes. The nodes with Person label and the ones with Movie label. Each node has its own properties. Person node has two properties: Name of the person and his/her birth date. Movie node has three properties related to the movie. The title of the movie, the tagline and the release date of the movie. Every person node has a relationship with the movie node. There are 6 kinds of relationships between the movie and the person. A person follows a movie, a person acted in a movie, a person directed a movie, a person produced a movie, a person review or wrote the movie. Some of the relationships have their own property. The properties of the relationship include the role a person played in the movie, the summary of the movie, and movie rating. This graph was already made by someone else, but even you can do it/code it. For that reason, you need to know how to work with Cypher Query Language.
  3. What this query does is it tries to MATCH a node “n” with the label “Person” where n.name property = your name and after fetching the this specific node it returns it to the user.
  4. The query is doing a full graph search. It visits every single node to see whether it matches the pattern of (n). In this concrete case, the pattern is simply a node that may or may not have a label or relationships, so it will match every single node in the graph. The RETURN clause then returns all of the information about each of those nodes, including all of their properties.
  5. What this query does is it tries to MATCH a node “n” with the label “Person” where n.name property = your name and after fetching the this specific node it returns it to the user.
  6. Let’s say we wanted to add a tagline to the Mystic River :Movie node we’ve just added. First, we have to locate the single movie again by its title, then SET the tagline property.
  7. Let’s say we wanted to return all of the nodes that have relationships to another node. This is still going to return every single node that has a relationship to another node, along with the other node. But it’s moving us in an important direction, so stay with us for a little longer.