Abstract
The Type 2 Diabetes Knowledge Portal (T2DKP; type2diabetesgenetics.org), produced by the Accelerating Medicines Partnership in Type 2 Diabetes (AMP T2D), is an open-access resource that aims to facilitate the translation of genomic data into actionable knowledge for understanding and treatment of T2D and its complications. The supporting data and software platform is a modular system for data aggregation, analysis, and display, including: software for managing and tracking the transfer of data from contributors; automated analysis of Individual-level data (i.e. genotypes and phenotypes) or association summary statistics via statistical genetic or bioinformatic methods; storage of this information within a database accessible by a collection of Representation State Transfer (REST) APIs; and a web interface for visualizing these data. The T2DKP, which currently contains 84 datasets with genetic associations for 191 traits, makes genetic associations available for browsing by gene, variant, or genomic region, or browsing by phenotype in Manhattan plots. It presents distilled at-a-glance summaries for genes and regions while also offering the ability to drill down to the details of individual variant associations. The T2DKP also integrates epigenomic annotations and results of computational methods with GWAS results, to help researchers prioritize variants, genes, and tissues for further research. Interactive tools allow users to perform custom association analyses that securely access and compute on individual-level data without ever exposing the raw data. All datasets are fully documented, and summary statistic files may be made available for download from the T2DKP upon request of the study authors. The data and software platform have been applied to 4 additional open access resources for cardio-metabolic diseases; cardiovascular disease, cerebrovascular disease, and sleep disorders. We aim to release a companion resource for Type 1 Diabetes in 2020. All these resources provide 2 definitive features: access to authoritative results supplied by the generating research community; powered by a single underlying software system, thus allowing future integration into a common resource for common cardio-metabolic disease.
Questions you can address with the T2DKP
1. What are the genomic-wide associations for T2D and related traits from the definitive T2D genetic community datasets?
2. What are the most up to date and curated list of predicted T2D effector genes, along with the supporting lines for evidence?
3. What is the credible set of variants to study for functional follow up from any GWAS locus for T2D?
4. What are the results of computational approaches and relevant genomic annotations to assist in prioritization of a variant or gene from GWAS loci?
Presenters:
Noël Burtt and Dr. Jason Flanni, Broad Institute of Harvard and MIT
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
dkNET Webinar: The Type 2 Diabetes Knowledge Portal 02/28/2020
1. The AMP-T2D Knowledge Portal
February 28, 2020
Jason Flannick
Assistant Professor of Pediatrics, Division of Genetics
and Genomics, Boston Children’s Hospital &
Harvard Medical School
Associate Member, Broad Institute of Harvard & MIT
Noël Burtt
Director, Operations & Development,
Diabetes Research & Knowledge Portals
Program in Medical & Population Genetics
Metabolism Program, The Broad Institute
2. The AMP-T2D
Knowledge
Portal
An open access resource
providing data & tools to
promote understanding
of type 2 diabetes & its
complications
type2diabetesgenetics.org
4. Motivation: rooted in human genetics
• Understanding disease biology or
developing therapies requires
experiments
• Human genetics: experiments of
nature (variants) that perturb the
function of a gene in the human
system
• We have a means to test these
natural perturbations:
– Testing for association between
variants and a collection of
phenotypes produces effect sizes
(direction, magnitude) and p-values
(significance) for each variant (or
gene)
Prediction Prevention Treatment
Genetic variation
5. We are NOT limited by the amount data
www.ebi.ac.uk
• Routine GWAS studies are ~1,000,000
samples
• National & International efforts to
generate large scale genetic datasets linked
with medical records, and rich phenotype
data
7. Genetic discovery for T2D
• Over 403 loci
associated with T2D
• Latest T2D genetic
association analysis
(GWAS) includes
~900,000 samples
Mahajan et al. Nat Gen. 2018
8. Similar story across complex diseases
>100,000 associations, including*:
Type 2 diabetes 403
Inflammatory bowel disease 273
Coronary artery disease 166
Schizophrenia 208
Atrial Fibrillation 100
*ICDA white paper
J. Engreitz
9. But…
• In few cases have associations been translated to specific genes or
variants
• Common variant GWAS produce associated regions
• Most associations have not yet been translated into new biological
insights
• We do not know the molecular, cellular, and physiological
mechanism(s)
• In most cases a therapeutic hypothesis does not easily follow from the
association
• What is the target? MoA? Directional relationship?
11. There are success stories…
…but we have
hundreds of loci to do
this for in T2D alone
12. Addressing a gap:
Making genetic & related genomic data more broadly
accessible & useful could have a significant impact on
our ability to understand or treat human disease
13. A catalyzing collaboration to address these opportunities:
Accelerating Medicines Partnership (AMP) T2D
Public-private partnership (government, industry, non-profit
organizations, universities) to advance the use of human genetics in
designing new medicines
Mission:
– Generate data for T2D and complications
– Make genetic information more broadly accessible
– Create a knowledge portal containing comprehensive phenotype & genotype data for T2D
and complications
– Enable use of genetic information to support drug discovery
14. Unique scientific & software development collaboration
Data Coordinating
Center
(DCC)
DATA:
14 sites to deposit and
generate new data
FEDERATED NODES:
EBI: Technical replicate of DCC to
seamlessly store/serve data on Portal
DGA/UCSD: Functional annotation
database
TOOLS & METHODS :
5 labs/sites to
develop a suite of
methods, tools &
visualizations
Coordinate data, analysis,
operations, define standards, build
the full software stack, build &
administer the Knowledge Portal
WORKING GROUPS:
Collaborate across activity and define
directions
OPPORTUNITY POOL
13 sites for complementary
research and data
T2D Genetics
Community:
Datasets & Consortia
T2DK
15. What is needed?
• Community engagement, collaboration, representation
• Data ingest, warehousing, harmonization
• Automated QC, analysis, & implementation of best
practice methods from the community
• Federation with other Data Coordination sites
• Representation, visualization & knowledge delivery
Modular&flexiblesoftwaresystem
17. Engagement
&
Procedures
Intake
&
Inventory
QC
Harmonization
Analysis
Deposition
Release
Genotype/Sequence Data
Summary Statistics
Extensive Phenotype Data
Data
storage
Knowledgebase
TraitsSummary StatisticsDataCoordination&IntakePlatform
MethodsSample
QC
Variant QC
Population
structure
Annotation
Confounders
Single variant
association
Gene based
association
Gene set association
Result integration Result presentation
Phenotype QC
Collaborator &
Research
Team
interaction
Designed a
robust &
reproducible
approach to
analyze &
represent
authoritative
data from the
research
community
18. Scaled to an integrated analysis system to ingest raw data
& summary statistics- process & represent results
Methods
Sam
ple
QC
Varia
nt
QC
Populat
ion
structu
re
Annota
tion
Confounders
Single variant
association
Gene based
association
Gene set
association
Result
integration
Result
presentation
Phenot
ype QC
Pipeline APIs
Distributed APIs
Method
implementations
Association results
Loamstream
19. A means share data across geographic boundaries
& domain expertise
Knowledge Portal
Knowledge Base
US data
Knowledge Base
UK data
I
LoamStream LoamStream
Diabetes
EpiGenome Atlas
(DGA) UCSD
Storing &
Processing:
Annotations, CHiP
Seq, Hi-C data, etc
Federation
20. A means to store & represent relationships between data
types
Layering of ‘omics data to
enhance the interpretation of
association statistics
• Neo4J “Knowledge graph” with
links among biological entities,
integrating ‘omics data and
results of computational
methods (LD score regression,
METAL, eQTL methods, DEPICT,
etc.
21. Data ingest
Data
analysis &
integration
Data
visualization
Data sharing
via federation
DGA
EBI
Methods
S
a
m
p
l
e
Q
C
V
a
r
i
a
n
t
Q
C
Po
pu
lat
io
n
str
uc
tu
re
An
no
ta
tio
n
Confou
nders
Single
variant
associat
ion
Gene
based
associat
ion
Gene
set
associat
ion
Result
integratio
n
Result
presentat
ion
Ph
en
ot
yp
e
Q
C
Pipeline APIs
Distributed APIs
Method
implementation
Loamstream semi-automated
pipeline for QC, association analysis
• Results shared via REST services
• Enables secure interaction with
data that may not be transferred
from federated site
• Javascript libraries draw data
from REST services
• Results and content largely
configurable by text “metadata”
• All Portals deployed from same
codebase
MySql
Association results
MySql
MySqlMySql
Neo4J “Knowledge graph” with links among biological entities,
integrating ‘omics data and results of computational methods
(LD score regression, METAL, etc.)
Individual-level data* Association statistics ‘omics data to support interpretation
*Individual-level data are not directly accessible to users.
Aggregator
The AMP-T2DKP Data & Software Platform
22. The AMP-T2D
Knowledge
Portal
• Soft-launched in February
2015 to original contributors
• Open access October 2015
with 9 datasets & 25 traits
• Only need a Google-based
email account to log in
type2diabetesgenetics.org
27. Largest collection of ‘definitive’ T2D genetics data in the
world T2D Genetics Community
Public GWAS datasets of value
AMP-T2D Funded datasets
Disease agnostics resources
International BioBank Data
28. Rich curation of related traits & measures
Filter by data
type & trait
46. Summarize human genetics
data by phenotype
• What are the genome-wide
significant hits?
• For my favorite traits, what
are the top gene-based
results?
48. 4
8
• Run a custom aggregation
test based on a user defined
set of variants & phenotypic
criteria.
• Platform will access
protected individual-level
data to compute results-
exposing only results to the
User
Interactive analysis
&
Gene-based results
51. LuCAMP
Addressed the first opportunity & challenge, NOW a new
opportunity & challenge
Courtesy M.. McCarthy
Motivated T2D Genetics
Research community
Identification of over
403 loci
Research needed in
many areas
Mahajan et al. Nat Gen. 2018
52. Shift in focus over the coming years- function
•What is the variant?
•What is the regulatory effect and in what tissue?
•What is the gene?
•What is the pathway?
•What is the mechanism?
60. New set of gaps to address with this opportunity
• What are the needed datasets/types (validation)?
• What information must be captured/retained & represented?
• What methods need to be run and how are they validated?
• How do you express relationships between these outputs?
• How do you represent results from experimental work in a
computational framework/open access resource?
61. Catalog & integrate these resources & visualize these
together Causal
variant
Tissue Gene Pathway DiseaseFunction
Association
Aim 1:
Data
- Association
statistics
- Reference chromatin state
- Transcription factor binding sites
- eQTLs
- Chromatin capture
- Gene expression
- Gene function
- Networks
- Pathways
- Cellular
models
- Animal
models
Aim 2:
Methods
- Meta-analysis
- Fine mapping
- Variant effect
predictors
- Regulatory element prediction
- Tissue-specific enrichment
- Gene prioritization
- eQTL colocalization
- Chromosome
contact prediction
- Gene set
enrichment
Aim 3:
Access
SmartAPIs and web portals
M. Claussnitzer et al, 2020
J. Flannick
63. Example data resource:
promoter capture Hi-C
Miguel-Escalada et al, Nat Genet, 2019
• Predict variant to
gene links
• Using for non-
coding annotations
64. Example method resource:
colocalized eQTLs
•Clues to function
•Predict variants that
affect both a trait and
expression of a gene
Hormozdiari et al, AJHG, 2016
High colocalization Low colocalization
65. Knowledge Graph
• Enhance the interpretation of association statistics
• Neo4J system with links among biological entities, integrating ‘omics data
and results of computational methods (LD score regression, METAL, eQTL
methods, etc.
Layering of ‘omics data & systematic application of
GWAS mining methods
75. API
Data ingest
Data
analysis &
integration
Data
visualization
Data sharing
via federation
DGA
EBI
Methods
S
a
m
p
l
e
Q
C
V
a
r
i
a
n
t
Q
C
Po
pu
lat
io
n
str
uc
tu
re
An
no
ta
tio
n
Confou
nders
Single
variant
associat
ion
Gene
based
associat
ion
Gene
set
associat
ion
Result
integratio
n
Result
presentat
ion
Ph
en
ot
yp
e
Q
C
Pipeline APIs
Distributed APIs
Method
implementation
Loamstream semi-automated
pipeline for QC, association analysis
• Results shared via REST services
• Enables secure interaction with
data that may not be transferred
from federated site
• Javascript libraries draw data
from REST services
• Results and content largely
configurable by text “metadata”
• All Portals deployed from same
codebase
MySql
Association results
MySql
MySqlMySql
Neo4J “Knowledge graph” with links among biological entities,
integrating ‘omics data and results of computational methods
(LD score regression, METAL, etc.)
Individual-level data* Association statistics ‘omics data to support interpretation
*Individual-level data are not directly accessible to users.
Aggregator
Enable programmatic access to the results
API
API
79. What questions can you ask?
• What are the genomic-wide associations for T2D & related
traits from the definitive T2D genetic community datasets?
• What is the credible set of variants to study for functional
follow up from any GWAS locus for T2D?
• What are the most up to date & curated list of predicted
T2D effector genes, along with the supporting lines for
evidence?
• What are the results of computational approaches and
relevant genomic annotations to assist in prioritization of a
variant or gene from a GWAS locus?
80. Who is using the T2DKP?
• Average daily Users ~109
• Average session time 7.17 minutes
• Over 80 citations
• Over 17 synchronized data releases with
publication
• Over 20,000 Visitors since its
inception, 15,000 registered Users
81. Numberofusersper
week
1/24-30/16:
968 users
Launch &
Workshop in
Mexico
7/10-16/16: 631 users
Publication of Nature
paper on genetic
architecture of T2D
10/4-10/15:
1,011 users
Official Site
Launch
6/5-11/16: 828 users
Workshop at American
Diabetes Association
Scientific Sessions
4/30-5/6/17: 459 users
Release of multiple new
datasets and new
interactive Data page
6/17-23/18: 378 users
Talk and exhibit booth at
American Diabetes
Association Scientific
Sessions
5/19-25/19: 902 users
Publication of AMP T2D-
GENES exome sequence
paper and results in Portal
3/24-30/19: 567 users
Large release of new
data and features; first
webinar
7/14-20/19: 524 users
2nd webinar
10/15-21/17: 380 users
American Society of Human
Genetics conference
Invited Session
10/14-20/18: 502 users
American Society of
Human Genetics
conference and Portal
workshop
Our Users are driven by data released & public events
82. The ethos of the T2D Knowledge Portal contagious
8
2
• Scientific
• Other complex disease communities with same scientific goals- use of
human genetics for target validation and functional investigation
• Mission & collaborative alignment
• Data sharing was a barrier for many communities, but our work helped
make it possible for other communities
• Communities desire a means to share ‘definitive’ results
• Our community/data platform & modular software was readily adaptable
to other disease areas
83. Flexible platform allows us to respond to more
communities
Data ingest
Data
analysis &
integration
Data
visualization
Data sharing
via federation
DGA
EBI
Methods
S
a
m
p
l
e
Q
C
V
a
r
i
a
n
t
Q
C
Po
pu
lat
io
n
str
uc
tu
re
An
no
ta
tio
n
Confou
nders
Single
variant
associat
ion
Gene
based
associat
ion
Gene
set
associat
ion
Result
integratio
n
Result
presentat
ion
Ph
en
ot
yp
e
Q
C
Pipeline APIs
Distributed APIs
Method
implementation
Loamstream semi-automated
pipeline for QC, association analysis
• Results shared via REST services
• Enables secure interaction with data that may
not be transferred from federated site
• Javascript libraries draw data
from REST services
• Results and content largely
configurable by text “metadata”
• All Portals deployed from same
codebase
MySql
Association results
MySql
MySqlMySql
Neo4J “Knowledge graph” with links among biological entities,
integrating ‘omics data and results of computational methods
(LD score regression, METAL, etc.)
Individual-level data* Association statistics ‘omics data to support interpretation
*Individual-level data are not directly accessible to users.
Disease X Knowledge Portal API
API
API
84. Open access portals for cardio-metabolic diseases
Cerebrovascular Disease
Knowledge Portal
• Launched, August 2017
• Largest collection of definitive
Stroke genetics results in the
world
• 4,739+ Users
cerebrovascularportal.org
broadcvdi.org
Cardiovascular Disease
Knowledge Portal
• Launched: November 2017
• Largest collection of definitive
CVD genetics results in the
world
• 8,479+ Users
Sleep Disorder Knowledge Portal
• Launched, October 2018
• Largest collection of definitive
results sleep traits in the world
• 1,234+ Users
sleepdisordergenetics.org
86. Coming 2020:
Resource for
Common Metabolic
Diseases
New Interface
integrating the current
cardio-metabolic sites
together & adding
more data (T2D,
Kidney disease, etc.)
87. Other portals in
new disease
areas
All portals are
powered by the same
data and software
platform.
All documentation,
API access, etc. are
available here:
kp4cd.org
89. DCC & Knowledge Portal Team
Knowledge Portal Team
Benjamin Alexander
Lizz Caulkins
Maria Costanzo
Marc Duby
Clint Gilbert
Quy Hoang
DK Jang
Alexandria Kluge
Ryan Koesterer
Jeffrey Massung
Oliver Ruebenacker
Preeti Singh
Marcin von Grotthuss
Leadership
Noël Burtt
Jason Flannick
Jose Florez
90. Jose Florez
Jason Flannick
Noël Burtt
Ben Alexander
Lizz Caulkins
Maria Costanzo
Marc Duby
Clint Gilbert
Qut Goang
DK Jang
Alexandria Kluge
Ryan Koesterer
Jeffrey Massung
Oliver Ruebenacker
Preeti Singh
Marcin von Grotthuss
Josep Mercader
Miriam Udler
T2DKP and DCC
AMP Federated
Nodes
Method and Tool Development Teams
EBI Federated Node
Paul Flicek
Mark McCarthy
Gil McVean
Thomas Keane
Dylan Spalding
AMP Enhanced
Diabetes Portal
(EDP)
Michael Boehnke
Gonçalo Abecasis
Christopher Clark
Matthew Flickinger
Daniel Taliun
Ryan Welch
DGA
Kyle Gaulton
Parul Kudtarkar
Ying Sun
Samuel Morabito
Daniel MacArthur
Benjamin Neale
Jonathan Bloom
Konrad Karczewski
Cotton Seed
Matthew Solomonson
AMP Type 2 Diabetes
Knowledge (T2DK)
AMP T2D Knowledge Portal Development
91. DCC & Portal Staff
• Ben Alexander
• Lizz Caulkins
• Maria Costanzo
• Marc Duby
• Clint Gilbert
• Quy Hoang
• DK Jang
• Alexandria Kluge
• Ryan Koesterer
• Jeffrey Massung
• Oliver Ruebenacker
• Preeti Singh
• Marcin von Grotthuss
Analysis Contributors
• Josep Mercader
• Miriam Udler
The following consortia contributed
data to establish the AMP T2D Portal
• T2D-GENES
• GoT2D
• SIGMA
• DIAGRAM
A special thanks to all of the individuals whose participation in scientific studies makes
discovery possible
Foundation for the National Institutes of
Health
• David Wholley, FNIH
• Tania Kamphaus, FNIH
• Sidra Iqbal, FNIH
NIH and FNIH funded investigators
• Mark McCarthy
• Anna Gloyn
• Andrew Morris
• Maggie Ng
• Donald Bowden
• Bing Ren
• Kelly Frazer
• Maike Sander
• Ravindranath Duggirala
• John Blangero
• Karen Mohlke
• Stephen Parker
• James B Meig
• Jerome Rotter
• Jose C Florez
• Michael Boehnke
• Noël Burtt
• Jason Flannick
• Kasper Lage
• Robert Sladek
• John Chambers
• Xueling Sim
• Kyle Gaulton
• Duc Dong
• Kim Seung
• Marcel den Hoed
• Kerrin Small
• Patrick MacDonald
• Melina Claussnitzer
• Suzanne Jacobs
• Jesse Engreitz
• Brent Richards
• Rany Salem
• Miriam Udler
• Ines Cebola
• Katalin Susztak
• Laura Scott
AMP Type 2 Diabetes Steering
Committee
Chairs:
• Philip Smith, NIDDK
• Melissa Thomas, Lilly
Members:
• Hartmut Ruetten, Sanofi
• Dermot Reilly, Janssen
• Julia Brosnan, Pfizer
• Melissa Miller, Pfizer
• Eric Fauman, Pfizer
• Caroline Fox, Merck
• Audrey Chu, Merck
• Beena Akolkar, NIDDK
AMP-T2D Partnership