dkNET Webinar: The Type 2 Diabetes Knowledge Portal 02/28/2020

The AMP-T2D Knowledge Portal
February 28, 2020
Jason Flannick
Assistant Professor of Pediatrics, Division of Genetics
and Genomics, Boston Children’s Hospital &
Harvard Medical School
Associate Member, Broad Institute of Harvard & MIT
Noël Burtt
Director, Operations & Development,
Diabetes Research & Knowledge Portals
Program in Medical & Population Genetics
Metabolism Program, The Broad Institute

The AMP-T2D
Knowledge
Portal
An open access resource
providing data & tools to
promote understanding
of type 2 diabetes & its
complications
type2diabetesgenetics.org

Today:
•Motivation
•How we built the resource
•What you can do with the AMP-T2DKP
•What is to come & related resources

Motivation: rooted in human genetics
• Understanding disease biology or
developing therapies requires
experiments
• Human genetics: experiments of
nature (variants) that perturb the
function of a gene in the human
system
• We have a means to test these
natural perturbations:
– Testing for association between
variants and a collection of
phenotypes produces effect sizes
(direction, magnitude) and p-values
(significance) for each variant (or
gene)
Prediction Prevention Treatment
Genetic variation

We are NOT limited by the amount data
www.ebi.ac.uk
• Routine GWAS studies are ~1,000,000
samples
• National & International efforts to
generate large scale genetic datasets linked
with medical records, and rich phenotype
data

Opportunity in the community of T2D genetics
LuCAMP

Genetic discovery for T2D
• Over 403 loci
associated with T2D
• Latest T2D genetic
association analysis
(GWAS) includes
~900,000 samples
Mahajan et al. Nat Gen. 2018

Similar story across complex diseases
>100,000 associations, including*:
Type 2 diabetes 403
Inflammatory bowel disease 273
Coronary artery disease 166
Schizophrenia 208
Atrial Fibrillation 100
*ICDA white paper
J. Engreitz

But…
• In few cases have associations been translated to specific genes or
variants
• Common variant GWAS produce associated regions
• Most associations have not yet been translated into new biological
insights
• We do not know the molecular, cellular, and physiological
mechanism(s)
• In most cases a therapeutic hypothesis does not easily follow from the
association
• What is the target? MoA? Directional relationship?

M. McCarthy
GWAS
causal
variants
& elements
effector
genes
Exome
sequencing
causal
genes
Existing
biology
biological
candidacy
Network
biologycontext
Human
data
Genotype
recall
RCT
Cellular
screens
Functional
assays
Animal
models
mechanism
Why is this so hard?

There are success stories…
…but we have
hundreds of loci to do
this for in T2D alone

Addressing a gap:
Making genetic & related genomic data more broadly
accessible & useful could have a significant impact on
our ability to understand or treat human disease

A catalyzing collaboration to address these opportunities:
Accelerating Medicines Partnership (AMP) T2D
Public-private partnership (government, industry, non-profit
organizations, universities) to advance the use of human genetics in
designing new medicines
Mission:
– Generate data for T2D and complications
– Make genetic information more broadly accessible
– Create a knowledge portal containing comprehensive phenotype & genotype data for T2D
and complications
– Enable use of genetic information to support drug discovery

Unique scientific & software development collaboration
Data Coordinating
Center
(DCC)
DATA:
14 sites to deposit and
generate new data
FEDERATED NODES:
EBI: Technical replicate of DCC to
seamlessly store/serve data on Portal
DGA/UCSD: Functional annotation
database
TOOLS & METHODS :
5 labs/sites to
develop a suite of
methods, tools &
visualizations
Coordinate data, analysis,
operations, define standards, build
the full software stack, build &
administer the Knowledge Portal
WORKING GROUPS:
Collaborate across activity and define
directions
OPPORTUNITY POOL
13 sites for complementary
research and data
T2D Genetics
Community:
Datasets & Consortia
T2DK

What is needed?
• Community engagement, collaboration, representation
• Data ingest, warehousing, harmonization
• Automated QC, analysis, & implementation of best
practice methods from the community
• Federation with other Data Coordination sites
• Representation, visualization & knowledge delivery
Modular&flexiblesoftwaresystem

Opportunity: motivated research community
LuCAMP

Engagement
&
Procedures
Intake
&
Inventory
QC
Harmonization
Analysis
Deposition
Release
Genotype/Sequence Data
Summary Statistics
Extensive Phenotype Data
Data
storage
Knowledgebase
TraitsSummary StatisticsDataCoordination&IntakePlatform
MethodsSample
QC
Variant QC
Population
structure
Annotation
Confounders
Single variant
association
Gene based
association
Gene set association
Result integration Result presentation
Phenotype QC
Collaborator &
Research
Team
interaction
Designed a
robust &
reproducible
approach to
analyze &
represent
authoritative
data from the
research
community

Scaled to an integrated analysis system to ingest raw data
& summary statistics- process & represent results
Methods
Sam
ple
QC
Varia
nt
QC
Populat
ion
structu
re
Annota
tion
Confounders
Single variant
association
Gene based
association
Gene set
association
Result
integration
Result
presentation
Phenot
ype QC
Pipeline APIs
Distributed APIs
Method
implementations
Association results
Loamstream

A means share data across geographic boundaries
& domain expertise
Knowledge Portal
Knowledge Base
US data
Knowledge Base
UK data
I
LoamStream LoamStream
Diabetes
EpiGenome Atlas
(DGA) UCSD
Storing &
Processing:
Annotations, CHiP
Seq, Hi-C data, etc
Federation

A means to store & represent relationships between data
types
Layering of ‘omics data to
enhance the interpretation of
association statistics
• Neo4J “Knowledge graph” with
links among biological entities,
integrating ‘omics data and
results of computational
methods (LD score regression,
METAL, eQTL methods, DEPICT,
etc.

Data ingest
Data
analysis &
integration
Data
visualization
Data sharing
via federation
DGA
EBI
Methods
S
a
m
p
l
e
Q
C
V
a
r
i
a
n
t
Q
C
Po
pu
lat
io
n
str
uc
tu
re
An
no
ta
tio
n
Confou
nders
Single
variant
associat
ion
Gene
based
associat
ion
Gene
set
associat
ion
Result
integratio
n
Result
presentat
ion
Ph
en
ot
yp
e
Q
C
Pipeline APIs
Distributed APIs
Method
implementation
Loamstream semi-automated
pipeline for QC, association analysis
• Results shared via REST services
• Enables secure interaction with
data that may not be transferred
from federated site
• Javascript libraries draw data
from REST services
• Results and content largely
configurable by text “metadata”
• All Portals deployed from same
codebase
MySql
Association results
MySql
MySqlMySql
Neo4J “Knowledge graph” with links among biological entities,
integrating ‘omics data and results of computational methods
(LD score regression, METAL, etc.)
Individual-level data* Association statistics ‘omics data to support interpretation
*Individual-level data are not directly accessible to users.
Aggregator
The AMP-T2DKP Data & Software Platform

The AMP-T2D
Knowledge
Portal
• Soft-launched in February
2015 to original contributors
• Open access October 2015
with 9 datasets & 25 traits
• Only need a Google-based
email account to log in
type2diabetesgenetics.org

Getting familiar with the resource

Signature Features: Curated & definitive genetics data

Largest collection of ‘definitive’ T2D genetics data in the
world T2D Genetics Community
Public GWAS datasets of value
AMP-T2D Funded datasets
Disease agnostics resources
International BioBank Data

Rich curation of related traits & measures
Filter by data
type & trait

Comprehensive
documentation &
cohort descriptions
• Access to direct download
• Citations
• Study design, cohort
information
• Traits preparation
• Extensive QC/analysis
reports

Distilled presentation of genetic data for non-experts

What is known about my disease/trait of interest?

Summarize human genetics
data by phenotype
• What are the genome-wide
significant hits?
• For my favorite traits, what
are the top gene-based
results?

Can I perform dynamic custom analysis?

4
8
• Run a custom aggregation
test based on a user defined
set of variants & phenotypic
criteria.
• Platform will access
protected individual-level
data to compute results-
exposing only results to the
User
Interactive analysis
&
Gene-based results

Current features for distilled presentation of genetic
data for non-experts

LuCAMP
Addressed the first opportunity & challenge, NOW a new
opportunity & challenge
Courtesy M.. McCarthy
Motivated T2D Genetics
Research community
Identification of over
403 loci
Research needed in
many areas
Mahajan et al. Nat Gen. 2018

Shift in focus over the coming years- function
•What is the variant?
•What is the regulatory effect and in what tissue?
•What is the gene?
•What is the pathway?
•What is the mechanism?

What do we know about a given disease?

What is known today for T2D?
54

ABCC8, ANGPTL4, ANKH, APOE, CDKN1B, GCK, GCKR, GIPR, GLIS3, GLP1R, HNF1A, HNF1B, HNF4A, IGF2, INS,
IRS2, KCNJ11, LPL, MC4R, MNX1, MTNR1B, NEUROG3, NKX2-2, PAM, PATJ, PAX4, PDX1, PLCB3, PNPLA3, POC5,
PPARG, QSER1, RREB1, SLC16A11, SLC30A8, SLC5A1, TBC1D4, TM6SF2, WFS1, WSCD2, ZNF771
ABCB9, BCAR1, C2CD4B, CAMK1D, CCND2, DGKB, INSR, IRS1, IRX3, IRX5, KLF14, KLHL42, LMNA, SLC2A2,
STARD10, TCF7L2, ZMIZ1
ADCY5, AGPAT2, AGTR2, AP3S2, BCL11A, CISD2, FAM63A, FOXA2, GPSM1, IGF2BP2, JAZF1, KCNK17, MACF1,
MADD, NKX6-3, PDE8B, PLIN1, SGSM2, SPRY2, UBE2E2, VPS13C
ANK1, ASCC2, CALCOCO2, FADS1, HMG20A, IL17REL, MRPS30, PRC1, PTRF, SCD5, SNAPC4, ST6GAL1,
TP53INP1
ABO, CARD9, CDK2AP1, CTNNAL1, DNZL, ITGB6
ADRA2A, AKT2, APPL1, BLK, BSCL2, CAV1, CEL, EIF2AK3, ERAP2, FOXP3, G6PC2, G6PD, GATA4, GATA6, GCG,
GRB10, IER3IP1, IGF1, KLF11, NAT2, NEUROD1, PAX6, PCBD1, PCSK1, POLD1, PPP1R15B, PTF1A, RFX6, SIX2,
SIX3, SLC19A2, TRMT10A, WARS, ZFP57
CAUSAL (n=41)
STRONG (n=17)
MODERATE (n=21)
POSSIBLE (n=13)
WEAK (n=6)
(T2D_related) (n=34)
Predicted T2D effector
transcripts
A. Mahajan

Complete list of the each gene & its supporting
evidence

Documentation of the approach for each class of
evidence

Investigate the underlying data & rationale

New set of gaps to address with this opportunity
• What are the needed datasets/types (validation)?
• What information must be captured/retained & represented?
• What methods need to be run and how are they validated?
• How do you express relationships between these outputs?
• How do you represent results from experimental work in a
computational framework/open access resource?

Catalog & integrate these resources & visualize these
together Causal
variant
Tissue Gene Pathway DiseaseFunction
Association
Aim 1:
Data
- Association
statistics
- Reference chromatin state
- Transcription factor binding sites
- eQTLs
- Chromatin capture
- Gene expression
- Gene function
- Networks
- Pathways
- Cellular
models
- Animal
models
Aim 2:
Methods
- Meta-analysis
- Fine mapping
- Variant effect
predictors
- Regulatory element prediction
- Tissue-specific enrichment
- Gene prioritization
- eQTL colocalization
- Chromosome
contact prediction
- Gene set
enrichment
Aim 3:
Access
SmartAPIs and web portals
M. Claussnitzer et al, 2020
J. Flannick

Example data resource:
ATAC-seq
Bysani et al, Sci Rep, 2019
•Chromatin accessibility
•Predict variants with
regulatory effects

Example data resource:
promoter capture Hi-C
Miguel-Escalada et al, Nat Genet, 2019
• Predict variant to
gene links
• Using for non-
coding annotations

Example method resource:
colocalized eQTLs
•Clues to function
•Predict variants that
affect both a trait and
expression of a gene
Hormozdiari et al, AJHG, 2016
High colocalization Low colocalization

Knowledge Graph
• Enhance the interpretation of association statistics
• Neo4J system with links among biological entities, integrating ‘omics data
and results of computational methods (LD score regression, METAL, eQTL
methods, etc.
Layering of ‘omics data & systematic application of
GWAS mining methods

What can you do in the T2DKP today?

ATAC-seq results in tissues of interest for a given
disease

Access these results from the DGA resource
diabetesepigenome.org

Access these results from the DGA resource

Results of
computational
methods for
genes in a locus
72
A means to prioritize genes from GWAS loci

COLOC, eQTL colocalization methods for variant
based predictions
73

Distilled presentation of genetic & genomic data for
non-experts

API
Data ingest
Data
analysis &
integration
Data
visualization
Data sharing
via federation
DGA
EBI
Methods
S
a
m
p
l
e
Q
C
V
a
r
i
a
n
t
Q
C
Po
pu
lat
io
n
str
uc
tu
re
An
no
ta
tio
n
Confou
nders
Single
variant
associat
ion
Gene
based
associat
ion
Gene
set
associat
ion
Result
integratio
n
Result
presentat
ion
Ph
en
ot
yp
e
Q
C
Pipeline APIs
Distributed APIs
Method
implementation
• Enables secure interaction with
data that may not be transferred
from federated site
from REST services
codebase
MySql
Association results
MySql
MySqlMySql
Aggregator
Enable programmatic access to the results
API
API

Signature Features: APIs for all portal results

Documentation & commands to access all portal results

What questions can you ask?
• What are the genomic-wide associations for T2D & related
traits from the definitive T2D genetic community datasets?
• What is the credible set of variants to study for functional
follow up from any GWAS locus for T2D?
• What are the most up to date & curated list of predicted
T2D effector genes, along with the supporting lines for
evidence?
• What are the results of computational approaches and
relevant genomic annotations to assist in prioritization of a
variant or gene from a GWAS locus?

Who is using the T2DKP?
• Average daily Users ~109
• Average session time 7.17 minutes
• Over 80 citations
• Over 17 synchronized data releases with
publication
• Over 20,000 Visitors since its
inception, 15,000 registered Users

Numberofusersper
week
1/24-30/16:
968 users
Launch &
Workshop in
Mexico
7/10-16/16: 631 users
Publication of Nature
paper on genetic
architecture of T2D
10/4-10/15:
1,011 users
Official Site
Launch
6/5-11/16: 828 users
Workshop at American
Diabetes Association
Scientific Sessions
4/30-5/6/17: 459 users
Release of multiple new
datasets and new
interactive Data page
6/17-23/18: 378 users
Talk and exhibit booth at
American Diabetes
Association Scientific
Sessions
5/19-25/19: 902 users
Publication of AMP T2D-
GENES exome sequence
paper and results in Portal
3/24-30/19: 567 users
Large release of new
data and features; first
webinar
7/14-20/19: 524 users
2nd webinar
10/15-21/17: 380 users
American Society of Human
Genetics conference
Invited Session
10/14-20/18: 502 users
American Society of
Human Genetics
conference and Portal
workshop
Our Users are driven by data released & public events

The ethos of the T2D Knowledge Portal contagious
8
2
• Scientific
• Other complex disease communities with same scientific goals- use of
human genetics for target validation and functional investigation
• Mission & collaborative alignment
• Data sharing was a barrier for many communities, but our work helped
make it possible for other communities
• Communities desire a means to share ‘definitive’ results
• Our community/data platform & modular software was readily adaptable
to other disease areas

Flexible platform allows us to respond to more
communities
Data ingest
Data
analysis &
integration
Data
visualization
Data sharing
via federation
DGA
EBI
Methods
S
a
m
p
l
e
Q
C
V
a
r
i
a
n
t
Q
C
Po
pu
lat
io
n
str
uc
tu
re
An
no
ta
tio
n
Confou
nders
Single
variant
associat
ion
Gene
based
associat
ion
Gene
set
associat
ion
Result
integratio
n
Result
presentat
ion
Ph
en
ot
yp
e
Q
C
Pipeline APIs
Distributed APIs
Method
implementation
• Enables secure interaction with data that may
not be transferred from federated site
from REST services
codebase
MySql
Association results
MySql
MySqlMySql
Disease X Knowledge Portal API
API
API

Open access portals for cardio-metabolic diseases
Cerebrovascular Disease
Knowledge Portal
• Launched, August 2017
• Largest collection of definitive
Stroke genetics results in the
world
• 4,739+ Users
cerebrovascularportal.org
broadcvdi.org
Cardiovascular Disease
Knowledge Portal
• Launched: November 2017
CVD genetics results in the
world
• 8,479+ Users
Sleep Disorder Knowledge Portal
• Launched, October 2018
results sleep traits in the world
• 1,234+ Users
sleepdisordergenetics.org

Connected ecosystem of portals from our homepages

Coming 2020:
Resource for
Common Metabolic
Diseases
New Interface
integrating the current
cardio-metabolic sites
together & adding
more data (T2D,
Kidney disease, etc.)

Other portals in
new disease
areas
All portals are
powered by the same
data and software
platform.
All documentation,
API access, etc. are
available here:
kp4cd.org

DCC & Knowledge Portal Team
Knowledge Portal Team
Benjamin Alexander
Lizz Caulkins
Maria Costanzo
Marc Duby
Clint Gilbert
Quy Hoang
DK Jang
Alexandria Kluge
Ryan Koesterer
Jeffrey Massung
Oliver Ruebenacker
Preeti Singh
Marcin von Grotthuss
Leadership
Noël Burtt
Jason Flannick
Jose Florez

Jose Florez
Jason Flannick
Noël Burtt
Ben Alexander
Lizz Caulkins
Maria Costanzo
Marc Duby
Clint Gilbert
Qut Goang
DK Jang
Alexandria Kluge
Ryan Koesterer
Jeffrey Massung
Oliver Ruebenacker
Preeti Singh
Marcin von Grotthuss
Josep Mercader
Miriam Udler
T2DKP and DCC
AMP Federated
Nodes
Method and Tool Development Teams
EBI Federated Node
Paul Flicek
Mark McCarthy
Gil McVean
Thomas Keane
Dylan Spalding
AMP Enhanced
Diabetes Portal
(EDP)
Michael Boehnke
Gonçalo Abecasis
Christopher Clark
Matthew Flickinger
Daniel Taliun
Ryan Welch
DGA
Kyle Gaulton
Parul Kudtarkar
Ying Sun
Samuel Morabito
Daniel MacArthur
Benjamin Neale
Jonathan Bloom
Konrad Karczewski
Cotton Seed
Matthew Solomonson
AMP Type 2 Diabetes
Knowledge (T2DK)
AMP T2D Knowledge Portal Development

DCC & Portal Staff
• Ben Alexander
• Lizz Caulkins
• Maria Costanzo
• Marc Duby
• Clint Gilbert
• Quy Hoang
• DK Jang
• Alexandria Kluge
• Ryan Koesterer
• Jeffrey Massung
• Oliver Ruebenacker
• Preeti Singh
• Marcin von Grotthuss
Analysis Contributors
• Josep Mercader
• Miriam Udler
The following consortia contributed
data to establish the AMP T2D Portal
• T2D-GENES
• GoT2D
• SIGMA
• DIAGRAM
A special thanks to all of the individuals whose participation in scientific studies makes
discovery possible
Foundation for the National Institutes of
Health
• David Wholley, FNIH
• Tania Kamphaus, FNIH
• Sidra Iqbal, FNIH
NIH and FNIH funded investigators
• Mark McCarthy
• Anna Gloyn
• Andrew Morris
• Maggie Ng
• Donald Bowden
• Bing Ren
• Kelly Frazer
• Maike Sander
• Ravindranath Duggirala
• John Blangero
• Karen Mohlke
• Stephen Parker
• James B Meig
• Jerome Rotter
• Jose C Florez
• Michael Boehnke
• Noël Burtt
• Jason Flannick
• Kasper Lage
• Robert Sladek
• John Chambers
• Xueling Sim
• Kyle Gaulton
• Duc Dong
• Kim Seung
• Marcel den Hoed
• Kerrin Small
• Patrick MacDonald
• Melina Claussnitzer
• Suzanne Jacobs
• Jesse Engreitz
• Brent Richards
• Rany Salem
• Miriam Udler
• Ines Cebola
• Katalin Susztak
• Laura Scott
AMP Type 2 Diabetes Steering
Committee
Chairs:
• Philip Smith, NIDDK
• Melissa Thomas, Lilly
Members:
• Hartmut Ruetten, Sanofi
• Dermot Reilly, Janssen
• Julia Brosnan, Pfizer
• Melissa Miller, Pfizer
• Eric Fauman, Pfizer
• Caroline Fox, Merck
• Audrey Chu, Merck
• Beena Akolkar, NIDDK
AMP-T2D Partnership

dkNET Webinar: The Type 2 Diabetes Knowledge Portal 02/28/2020

Recommended

Recommended

More Related Content

More from dkNET

More from dkNET (20)

Recently uploaded

Recently uploaded (20)

dkNET Webinar: The Type 2 Diabetes Knowledge Portal 02/28/2020