Graphs to fight
diabetes
Dr. Alexander Jarasch
Head of Data and Knowledge Management
The German Center for Diabetes Research
(DZD)
Evolutionary advantage becomes
disadvantage
energy storage
essential for survival
upon lack of food
energy storage
essential for survival
upon food abundance
What is diabetes mellitus?
• metabolic disease
• insulin production is reduced in pancreas or
body poorly responds on insulin
(insulin=hormone, the body needs to get glucose out of the blood
stream into the cells)
• consequences:
• less absorbtion of sugar
• sugar will not be stored in liver and muscle cells
• persistently high levels of sugar in blood (hyperglycemia)
• tremendous complications
• currently, not curable (only treatable)
diabetes
T1D
diabetes
Gestational
diabetes
special
types
T2D
diabetes
Diabetes TYPE 1 (T1D)
• appr. 5-10 % of diabetes patients have T1D
• often starts in childhood
• autoimmune reaction
• independent from life style
• patients need external insulin source
throughout their life
• appr. 20 genes involved
• currently, T1D is not curable
Diabetes TYPE 2 (T2D)
• appr. 90-95 % of diabetes patients have T2D
(mostly after age 40)
• insulin resistance, pancreas is not able to
produce enough insulin
• symptoms develop slowly
• >150 genes are identified that increase risk
• “the cocktail of evil“: predisposition +
overweight + physical inactivity
Some numbers (worldwide)
1 in 11 adults has diabetes (425 million)
Since 1980 quadrupled
12% of global health expenditure is spent on
diabetes ($727 billion)
Over 1 million children
and adolescents have
type 1 diabetes
Two-thirds of people with diabetes are of
working age (327 million)
2017
Three quarters of people with diabetes
live in low and middle income countries
2017
1 in 2 adults with diabetes is
undiagnosed (212 million)
International Diabetes Federation (IDF)
Some numbers (USA and Germany)
30 million have diabetes (9.4 % of adults )1
+1‘500‘000 p.a.
84 mio. prediabetes2
16 billion € costs p.a.1
7 million have diabetes (7.4 % of adults)1
+500‘000 p.a.
~ 7 mio. prediabetes and undiagnozed
$327 billion USD costs p.a.1
($237 bn. medical costs,
$90 bn. reduced productivity)2
1 www.statistica.com 2 American Diabetes Association
Overweight/obesity in the US (1985-
2009) obese adults in the US (BMI* >= 30)
*BMI=30: 5”11 = 220,46 lbs (180cm = 100 kg)
Complications develop after many years
kidney
Diabetic nephropathy
40 % of kidney failure/dialysis
feet
70 % of all foot
amputations
eyes
Diabetic retinopathy
30 % of loss of sight
brain
2-4 fold increased risk
for stroke
acute cardiac death
Main reason of death of diabetic patients
(33 % of all heart attacks)
nerves
Diabetic Neuropathy
Amputations of
extremeties
Complex emergence / complex disease
live style
gene epigenetics
metabolism
cellular
processes
environment
Inherited lifestyle
genetically
identical
epigenetically different
Epigenetics – beyond generation
weight[g]
age [weeks]
daughters of
obese mice
having diabtes
daughters of
healthy mice
Huypens and Beckers, Nat Genet. 2016
The German Center for Diabetes Research
funded by the Federal Ministry for
Education and Research and the states
5 Partners, 5 associated partners – 400 researchers (basic research and university hospitals)
DZD bundles competencies so that those affected benefit more quickly from research results.
academic, non-profit
The German Center for Diabetes Research
hospitals
prevention
nutrition / diet
beta cells
genetics
therapy
clinial studies
cohorts
basic researchhealthcare
diabetes
treatment
diabetes
prevention
prevention of
complications
Goal:
better diabetes prevention and therapy
personalized prevention and therapy
identify and cluster diabetes subtypes
individualized treatment of subtypes
How do we fight diabetes with graphs?
The challenge
Easy question -> Complex query
Find information within our organisation
Originally different research areas
Hospitals
Basic
Research
Data
Analysis
We all “serve“ the same “customer“
Hospitals
Basic
Research
Data
Analysis
But we all see the “customer“ a little
differently
“Patient“
“Gene“
“Study“
“Metabolite“
“drug“
“statistics“
64kg, 178cm, male
C6H12O6
Metformin
T2D
AAGCTTCACATGG
cell
insulin resistance
inactive
mice
prediabetic pig
microscope
image
complications
Look at our “customer“ in a new way
“Patient“
“Gene“
“Study“
“Metabolite“
“drug“
“statistics“
64kg, 178cm, male
C6H12O6
Metformin
T2D
AAGCTTCACATGG
cell
insulin resistance
inactive
mice
prediabetic pig
microscope
image
complications
Look at our “customer“ from many
perspectives simultaneously – connect data
Hospitals
Basic
Research
Data
Analysis
data
Connect data – one option
Hospitals
Basic
Research
Data
Analysis
“Patient“
64kg, 178cm, male
“drug“
Metformin
“Study“
T2D
insulin resistance
“Gene“
AAGCTTCACATGG
“Metabolite“
C6H12O6
cell
inactive
mice
prediabetic pig
“statistics“
microscope
image
complications
Connect data – better option
“Patient“
64kg, 178cm, male
“drug“
Metformin
“Study“
T2D
“statistics“
“Gene“
AAGCTTCACATGG
“Metabolite“
C6H12O6
insulin resistance
cell
inactive
mice
prediabetic pig
microscope
image
complications
DZDConnect – a Neo4j graph database
Graph that can help answering
bio-medical questions
across locations
across disciplines
across species
extendable
scalable
visualizable
Homogenous and heterogenous data
(First) connect data a meta level
RAW
DATA
RAW
DATA
RAW
DATA
RAW
DATA
RAW
DATA
Classify types of data
Classify types of data
clin.
study
clin.
study
clin.
study
statis
tics
statis
tics
RNA
DNA
RNA
DNA
images
chem
istry
patient
patient
patient
bio
sample
bio
sample bio
sample
wet
lab
chem
istry
drug
Connect types of data
statis
tics
statis
tics
RNA
DNA
images
chem
istry
patient
wet
lab
chem
istry
drug
patient
patient
bio
sample
bio
sample bio
sample
clin.
study
clin.
study
clin.
study
RNA
DNA
Build graph model
clin.
study
statis
tics
RNA
DNA
images
bio
sample
wet
lab
chem
istry
drug
patient
Why graph?
• in „biology“ everything is connected anyway
• data is connected
• human readable – easy-to-understand for non-computer
scientists
• easy to query: queries are similar to human-like questions
• scalable
• easy-adoptable and extendable
• visualization
Meta data
name: IL-6
unit: mg/ml
sample: blood
organism: pig
amount: 50ml
aliquots: 362
location: Freezer68
name: pancreas dissection
format: TIFF
dimension: 3840x2160
amount: 125
staining: no staining
microscope: Zeiss Light sheet Z1
location: Dresden
title: „about diabetes and Alzheimer‘s“
PMID: 1255864
doi: http://doi.102r3d
year: 2016
journal: Diabetes
Extend graph
Literature
protein
database
other
diseases
Electronic
Laboratory
Notebook
lipid metabolism
Diabetes is a metabolic
disease
Extending our graph
RNA-seq
proteomicsAssociations
~800 mio. nodes
~800 mio. relationships
Dr. Martin Preusse
Dr. Nikola Müller
Extending our graph
Dr. Jan Krumsiek, Assistant Professor, Weill Cornell Medicine, NYC
metabolic pathway data
from 15-20 very rich data sources
~900’000 nodes
~1.7 mio. relationships
phenotype associations studies
Summary
“Patient“
64kg, 178cm, male
“drug“
Metformin
“Study“
T2D
“statistics“
“Gene“
AAGCTTCACATGG
“Metabolite“
C6H12O6
insulin resistance
cell
inactive
mice
prediabetic pig
microscope
image
complications
Examples
How many biosamples were aquired in visit
17 of ‘PLIS‘ and which parameters were
measured?Goals:
1. Connect data from our clinical studies and biobanks
2. Researches can easily browse through measured parameters and available biosamples
3. Meta data of parameters helps to assess which samples are comparable
name: HMGU
name: AJ
position: data mgmt
name: PLIS
multi-center: true
recruiting: closed
analysis: on-going
no. of patients: 1105
visit: 17
name: blood
type: OGTT
number of samples: 3436
organism: Human
name: laboratory
Study
Study
Person
Visit
Study
Person
Visit
BioSample
Experiment
Parameter
Can human T2D genes be studied in
the pre-diabetic pig model?
Goals:
1. Connect data from different species (i.e. mice, pig, human)
2. Connect multiomics data
3. Researches can easily find information between human and comparable data from animal models
genomics
transcriptomics
metabolomics
proteomics
Human GWAS cataloge (Diabetes)
103 genes
97 genes
96 genes
16 enzymes
63 compounds
31 compounds
7 compounds
16 metabolites
Targeted metabolomics
analysis in prediabetic pig
ENSEMBL
Gennamen (human)
KEGG Gen IDs
KEGG Enzyme
KEGG compounds
Biocrates IDs
7/16 metabolites
Xxaa C11:0
Xxaa C11:1
Xxaa C11:2
Xxaa C11:3
Xxaa C11:4
Xxaa C11:5
Xxaa C11:6
genomics
transcriptomics
proteomics
metabolomics
pathway analysis
Outlook
Automatically learn from large literature texts
Natural language processing (NLP) example
Identification of genetic elements in metabolism by high-throughput mouse
phenotyping.
Metabolic diseases are a worldwide problem but the underlying
genetic factors and their relevance to metabolic disease remain
incompletely understood. Genome-wide research is needed to
characterize so-far unannotated mammalian metabolic genes.
Here, we generate and analyze metabolic phenotypic data
of 2016 knockout mouse strains under the aegis of the
International Mouse Phenotyping Consortium (IMPC) and find 974
gene knockouts with strong metabolic phenotypes. 429 of those
had no previous link to metabolism and 51 genes remain functionally completely
unannotated. We compared human orthologues of these uncharacterized genes in
five GWAS consortia and indeed 23 candidate genes, like ABC1, XYZ2, are associated
with metabolic disease. We further identify common regulatory elements in promoters
of candidate genes. As each regulatory element is composed of several transcription
factor binding sites, our data reveal an extensive metabolic phenotype-associated
network of co-regulated genes.
Our systematic mouse phenotype analysis thus paves the way for full functional
annotation of the genome. Metabolic disorders, including obesity and type 2 diabetes
mellitus, are major challenges for public health.
Rozman and Hrabe de Angelis, Nat Commun. 2018 NLP method by GraphAware
Alzheimer‘s
cancer
cardio
vascular
diseases
diabetes
Lung
diseases
infectious
diseases
Find connections...
Machine learning for personalized prevention and
therapy
identify and cluster diabetes subtypes
individualized treatment of subtypes
Expert
Knowledge
validation of personalized treatment
Graph
Technology
DDPC – Digital Diabetes Prevention Center
• pattern recognition in huge amounts of data
• (un)supervised ML methods to identify subtypes of diabetes
• developing/validating individulized prevention/therapy
transparency to people benefit for people benefit for society
Next level in diabetes prevention and treatment
Hospitals
Basic
Research
Data
Analysis
Acknowledgements
The scientists of the DZD at: Funding by:
Thank you

Neo4j GraphDay Munich - Graphs to fight Diabetes

  • 1.
    Graphs to fight diabetes Dr.Alexander Jarasch Head of Data and Knowledge Management The German Center for Diabetes Research (DZD)
  • 2.
    Evolutionary advantage becomes disadvantage energystorage essential for survival upon lack of food energy storage essential for survival upon food abundance
  • 3.
    What is diabetesmellitus? • metabolic disease • insulin production is reduced in pancreas or body poorly responds on insulin (insulin=hormone, the body needs to get glucose out of the blood stream into the cells) • consequences: • less absorbtion of sugar • sugar will not be stored in liver and muscle cells • persistently high levels of sugar in blood (hyperglycemia) • tremendous complications • currently, not curable (only treatable) diabetes T1D diabetes Gestational diabetes special types T2D diabetes
  • 4.
    Diabetes TYPE 1(T1D) • appr. 5-10 % of diabetes patients have T1D • often starts in childhood • autoimmune reaction • independent from life style • patients need external insulin source throughout their life • appr. 20 genes involved • currently, T1D is not curable
  • 5.
    Diabetes TYPE 2(T2D) • appr. 90-95 % of diabetes patients have T2D (mostly after age 40) • insulin resistance, pancreas is not able to produce enough insulin • symptoms develop slowly • >150 genes are identified that increase risk • “the cocktail of evil“: predisposition + overweight + physical inactivity
  • 6.
    Some numbers (worldwide) 1in 11 adults has diabetes (425 million) Since 1980 quadrupled 12% of global health expenditure is spent on diabetes ($727 billion) Over 1 million children and adolescents have type 1 diabetes Two-thirds of people with diabetes are of working age (327 million) 2017 Three quarters of people with diabetes live in low and middle income countries 2017 1 in 2 adults with diabetes is undiagnosed (212 million) International Diabetes Federation (IDF)
  • 7.
    Some numbers (USAand Germany) 30 million have diabetes (9.4 % of adults )1 +1‘500‘000 p.a. 84 mio. prediabetes2 16 billion € costs p.a.1 7 million have diabetes (7.4 % of adults)1 +500‘000 p.a. ~ 7 mio. prediabetes and undiagnozed $327 billion USD costs p.a.1 ($237 bn. medical costs, $90 bn. reduced productivity)2 1 www.statistica.com 2 American Diabetes Association
  • 8.
    Overweight/obesity in theUS (1985- 2009) obese adults in the US (BMI* >= 30) *BMI=30: 5”11 = 220,46 lbs (180cm = 100 kg)
  • 9.
    Complications develop aftermany years kidney Diabetic nephropathy 40 % of kidney failure/dialysis feet 70 % of all foot amputations eyes Diabetic retinopathy 30 % of loss of sight brain 2-4 fold increased risk for stroke acute cardiac death Main reason of death of diabetic patients (33 % of all heart attacks) nerves Diabetic Neuropathy Amputations of extremeties
  • 10.
    Complex emergence /complex disease live style gene epigenetics metabolism cellular processes environment
  • 11.
  • 12.
    Epigenetics – beyondgeneration weight[g] age [weeks] daughters of obese mice having diabtes daughters of healthy mice Huypens and Beckers, Nat Genet. 2016
  • 13.
    The German Centerfor Diabetes Research funded by the Federal Ministry for Education and Research and the states 5 Partners, 5 associated partners – 400 researchers (basic research and university hospitals) DZD bundles competencies so that those affected benefit more quickly from research results. academic, non-profit
  • 14.
    The German Centerfor Diabetes Research hospitals prevention nutrition / diet beta cells genetics therapy clinial studies cohorts basic researchhealthcare diabetes treatment diabetes prevention prevention of complications
  • 15.
    Goal: better diabetes preventionand therapy personalized prevention and therapy identify and cluster diabetes subtypes individualized treatment of subtypes
  • 16.
    How do wefight diabetes with graphs?
  • 17.
    The challenge Easy question-> Complex query Find information within our organisation
  • 18.
    Originally different researchareas Hospitals Basic Research Data Analysis
  • 19.
    We all “serve“the same “customer“ Hospitals Basic Research Data Analysis
  • 20.
    But we allsee the “customer“ a little differently “Patient“ “Gene“ “Study“ “Metabolite“ “drug“ “statistics“ 64kg, 178cm, male C6H12O6 Metformin T2D AAGCTTCACATGG cell insulin resistance inactive mice prediabetic pig microscope image complications
  • 21.
    Look at our“customer“ in a new way “Patient“ “Gene“ “Study“ “Metabolite“ “drug“ “statistics“ 64kg, 178cm, male C6H12O6 Metformin T2D AAGCTTCACATGG cell insulin resistance inactive mice prediabetic pig microscope image complications
  • 22.
    Look at our“customer“ from many perspectives simultaneously – connect data Hospitals Basic Research Data Analysis data
  • 23.
    Connect data –one option Hospitals Basic Research Data Analysis “Patient“ 64kg, 178cm, male “drug“ Metformin “Study“ T2D insulin resistance “Gene“ AAGCTTCACATGG “Metabolite“ C6H12O6 cell inactive mice prediabetic pig “statistics“ microscope image complications
  • 24.
    Connect data –better option “Patient“ 64kg, 178cm, male “drug“ Metformin “Study“ T2D “statistics“ “Gene“ AAGCTTCACATGG “Metabolite“ C6H12O6 insulin resistance cell inactive mice prediabetic pig microscope image complications
  • 25.
    DZDConnect – aNeo4j graph database Graph that can help answering bio-medical questions across locations across disciplines across species extendable scalable visualizable
  • 26.
  • 27.
    (First) connect dataa meta level RAW DATA RAW DATA RAW DATA RAW DATA RAW DATA
  • 28.
  • 29.
    Classify types ofdata clin. study clin. study clin. study statis tics statis tics RNA DNA RNA DNA images chem istry patient patient patient bio sample bio sample bio sample wet lab chem istry drug
  • 30.
    Connect types ofdata statis tics statis tics RNA DNA images chem istry patient wet lab chem istry drug patient patient bio sample bio sample bio sample clin. study clin. study clin. study RNA DNA
  • 31.
  • 32.
    Why graph? • in„biology“ everything is connected anyway • data is connected • human readable – easy-to-understand for non-computer scientists • easy to query: queries are similar to human-like questions • scalable • easy-adoptable and extendable • visualization
  • 33.
    Meta data name: IL-6 unit:mg/ml sample: blood organism: pig amount: 50ml aliquots: 362 location: Freezer68 name: pancreas dissection format: TIFF dimension: 3840x2160 amount: 125 staining: no staining microscope: Zeiss Light sheet Z1 location: Dresden title: „about diabetes and Alzheimer‘s“ PMID: 1255864 doi: http://doi.102r3d year: 2016 journal: Diabetes
  • 34.
  • 35.
    lipid metabolism Diabetes isa metabolic disease
  • 37.
    Extending our graph RNA-seq proteomicsAssociations ~800mio. nodes ~800 mio. relationships Dr. Martin Preusse Dr. Nikola Müller
  • 38.
    Extending our graph Dr.Jan Krumsiek, Assistant Professor, Weill Cornell Medicine, NYC metabolic pathway data from 15-20 very rich data sources ~900’000 nodes ~1.7 mio. relationships phenotype associations studies
  • 39.
  • 40.
  • 41.
    How many biosampleswere aquired in visit 17 of ‘PLIS‘ and which parameters were measured?Goals: 1. Connect data from our clinical studies and biobanks 2. Researches can easily browse through measured parameters and available biosamples 3. Meta data of parameters helps to assess which samples are comparable
  • 42.
    name: HMGU name: AJ position:data mgmt name: PLIS multi-center: true recruiting: closed analysis: on-going no. of patients: 1105 visit: 17 name: blood type: OGTT number of samples: 3436 organism: Human name: laboratory
  • 43.
  • 44.
  • 45.
  • 46.
    Can human T2Dgenes be studied in the pre-diabetic pig model? Goals: 1. Connect data from different species (i.e. mice, pig, human) 2. Connect multiomics data 3. Researches can easily find information between human and comparable data from animal models
  • 47.
  • 48.
    Human GWAS cataloge(Diabetes) 103 genes 97 genes 96 genes 16 enzymes 63 compounds 31 compounds 7 compounds 16 metabolites Targeted metabolomics analysis in prediabetic pig ENSEMBL Gennamen (human) KEGG Gen IDs KEGG Enzyme KEGG compounds Biocrates IDs 7/16 metabolites Xxaa C11:0 Xxaa C11:1 Xxaa C11:2 Xxaa C11:3 Xxaa C11:4 Xxaa C11:5 Xxaa C11:6 genomics transcriptomics proteomics metabolomics pathway analysis
  • 49.
  • 50.
    Automatically learn fromlarge literature texts
  • 51.
    Natural language processing(NLP) example Identification of genetic elements in metabolism by high-throughput mouse phenotyping. Metabolic diseases are a worldwide problem but the underlying genetic factors and their relevance to metabolic disease remain incompletely understood. Genome-wide research is needed to characterize so-far unannotated mammalian metabolic genes. Here, we generate and analyze metabolic phenotypic data of 2016 knockout mouse strains under the aegis of the International Mouse Phenotyping Consortium (IMPC) and find 974 gene knockouts with strong metabolic phenotypes. 429 of those had no previous link to metabolism and 51 genes remain functionally completely unannotated. We compared human orthologues of these uncharacterized genes in five GWAS consortia and indeed 23 candidate genes, like ABC1, XYZ2, are associated with metabolic disease. We further identify common regulatory elements in promoters of candidate genes. As each regulatory element is composed of several transcription factor binding sites, our data reveal an extensive metabolic phenotype-associated network of co-regulated genes. Our systematic mouse phenotype analysis thus paves the way for full functional annotation of the genome. Metabolic disorders, including obesity and type 2 diabetes mellitus, are major challenges for public health. Rozman and Hrabe de Angelis, Nat Commun. 2018 NLP method by GraphAware
  • 52.
  • 53.
    Machine learning forpersonalized prevention and therapy identify and cluster diabetes subtypes individualized treatment of subtypes Expert Knowledge validation of personalized treatment Graph Technology
  • 54.
    DDPC – DigitalDiabetes Prevention Center • pattern recognition in huge amounts of data • (un)supervised ML methods to identify subtypes of diabetes • developing/validating individulized prevention/therapy transparency to people benefit for people benefit for society
  • 55.
    Next level indiabetes prevention and treatment Hospitals Basic Research Data Analysis
  • 56.
    Acknowledgements The scientists ofthe DZD at: Funding by:
  • 57.

Editor's Notes

  • #12 lebenstil der eltern vor zeugung hat einfluss bereits für risk obisisty für die knder durch epigenetische enflüsse VOR der schwangerschaft
  • #15 cover all aspects of diabetes research from molecular studies in cell models. and animal models to clinicial investigations in patients and health care research