Neo4j for Discovering Drugs and Biomarkers

— CONFIDENTIAL—
MICROBIOME TO MEDICINE™
Helios2(Neo4j) for Discovering Drugs and Biomarkers
Satish Viswanatham, Head of Data Engineering
Brendan, Cesar, Divya, Jin and Richard

CONFIDENTIAL
Outline of the talk
• Technical Terms will be explained briefly as they are encountered
• Links provided
• Why Microbiome?
• Challenges in Microbiome data
• High Level Architecture
• Implementation Highlights
• Future Work
• Lastly, more examples from the industry.
2

CONFIDENTIAL
The microbiome is a rich source of biomarkers
and potent bacterial peptides
3
Glucose/
lipids
BIOLOGICAL FUNCTIONS INFLUENCED: 100
• Untapped library of novel drugs
• Rich data source of
host:microbial interactions
• New “organ” to re(de)fine
patients and medical practice
*PMID:31415755. Compare to 25,000 human genes
GI health
Immune function
Metabolism
Pathogens
TRILLION BACTERIA!
>25,000,000 genes*
Cancer

CONFIDENTIAL
The sg-4sight Platform Summary
We built SG-4Sight to
• Collect Clinical microbiome data
• Conduct multi-technology (16S, MTT/MTG) meta-analysis (diff. abundance)
• Find bacterial biomarkers (Gene, Strain, Peptide, ...)
• Select bacterial polypeptide therapeutic candidates in a data-driven manner
• Efficiently prepare and screen them through in vitro and in vivo models of
disease
• Lastly, to find their human targets by which they stimulate the therapeutic
effect.
4

— CONFIDENTIAL—
MICROBIOME TO MEDICINE™
sg-4sight platform
Federated Data Engine - SGKnowledgeBase (Helios/Neo4J, Buho/Athena, …)

CONFIDENTIAL
CONFIDENTIAL
sg-4sight’s approach to drug
discovery

— CONFIDENTIAL—
*Data Engine was continually evolving as new technology was added so each program over time was analyzed according to the current status of our Data Engine.
SG KnowledgeBaseTM
: is a proprietary database that organizes -omics data and clinical metadata for systematic mining AWS: Amazon Web Services; sg-4sight is proposed platform name along with
multiple variations submitted for trademark approval; MS: Mass Spectrometry
7
Our sg-4sight tech-powered drug discovery engine is
built to disrupt drug discovery

CONFIDENTIAL
CONFIDENTIAL
Why Neo4J
8
• Flexible Schema - NoSQL
• Graph Queries
• Easy to learn Cypher Query Language: Less Learning Curve
• Query performance > SQL
• 1000 times faster
• Community Edition
• Neo4J was used for another experimental project
• Great Community!

CONFIDENTIAL
CONFIDENTIAL
Data from multiple clinical sources are compiled in the
SGKnowledgeBase for powerful cross-cohort discovery
9
Second Genome
Proprietary Datasets
Metadata
Standardization
&
Data Quality
Control/Sanity
Checks
&
Custom Data
Loaders
Public datasets
Second Genome
KnowledgeBase
& Helios2
Odessa (Django)
Constraint/Sanity
checks
Vocab/Onto
Data Loading

CONFIDENTIAL
CONFIDENTIAL
Helios Nodes and Relationships
10
Node Label
Node
Count
Average Number of
Relationships
Dataset Any Millions
Phage_display Any Millions
Meta_analysis Any High Thousands
Meso_scale_discovery Any Thousands
... Any Hundreds
Bin Thousands Any
NCBI_assembly_accession Thousands Any
Strain Thousands Any
Peptide Millions Any
Our schema is centered around
peptides:
● With every experiment we add
the knowledge around that
protein.
● Every mtma, lab assay, and
phage display adds more
information on how the
peptide looks in a set of
published studies, an immune
assay, or a binding assay

CONFIDENTIAL
CONFIDENTIAL
● Connects high-throughput past observations to accelerate future
discovery
○ Between microbial peptides and host cells
○ Between microbial taxa and disease states
○ Between microbial functional genes and disease states
● Enables discovery of common peptide features which predict a desired
functionality.
Helios is the largest known database of interactions
11

CONFIDENTIAL
CONFIDENTIAL
What we built in Helios2?
12
• DevSecOps - Data Confidentiality Controls
• Partial Updates
• Constraint System
• Two Phase Commits
• Automatic Backups
• Weekly, Daily and Monthly to a remote region
• Security/SSL, Logs to Fluentd
• Domain Name & AWS Security Group Via CloudFormation
• DevOps - Alerts

CONFIDENTIAL
CONFIDENTIAL
Future
13
The design of Helios and the underlying Neo4j graph-database allows for the easy
integration of additional layers of biomedical data, such as
• pharmacological action of drugs
• non-small molecule drugs
• disease information
• target development categories
• Schema optimizations!
• Labels vs properties, Super nodes
We also intend to integrate more cheminformatics and network analysis features
into the platform in the future.

CONFIDENTIAL
CONFIDENTIAL
● Also we want to give a shout out to CKG project (Clinical Knowledge Graph) for
uploading a dump of their database that can be used to easily create a Neo4
graph database harmonizing 9 ontologies, 26 relevant biomedical databases.
Experimental studies included in the publication are also included as CKG
reports.
○ https://ckg.readthedocs.io/en/latest/project_report/project-report.html
● https://reactome.org/dev/graph-database/extract-participating-molecules
● https://neo4j.com/blog/integrating-biology-public-neo4j-database/
● https://cytoscape.org/what_is_cytoscape.html
● https://www.researchgate.net/publication/304407871_Using_Neo4j_for_Mini
ng_Protein_Graphs_A_Case_Study (PPI paper)
● https://link.springer.com/article/10.1186/s13321-020-0409-9
References
14

CONFIDENTIAL
CONFIDENTIAL
● Thanks for your time.
○ https://www.secondgenome.com/development-platform
○ Our turnkey platform - parterning@secodngenome.com
● Please email if you want to continue to the conversation:
satish@secondgenome.com
● Second Genome is proud to be named to a Top 10 Best Places to
Work in Biopharma
● We are hiring!
○ https://www.secondgenome.com/culture-careers/careers
Q&A
15

Neo4j for Discovering Drugs and Biomarkers

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Neo4j for Discovering Drugs and Biomarkers

Similar to Neo4j for Discovering Drugs and Biomarkers (20)

More from Neo4j

More from Neo4j (20)

Recently uploaded

Recently uploaded (20)

Neo4j for Discovering Drugs and Biomarkers