BRCA Exchange: Sharing Global Knowledge about BRCA1/2
1. The x
Sharing Global Knowledge about
BRCA1/2
Current State, Challenges & Opportunities
Gunnar Rätsch
ETH Zürich
(MSKCC New York)
Presentation at the Human Genome Meeting 2018, Yokohama, Japan
@gxr #GA4GH #pi
#BRCAExchange
2. @gxr
#BRCAExchange
Motivation for the BRCA Challenge
Familial BC
No Familial BC
General
Population
BRCA variation is relatively
common with well known
medical implications
ClinVar
7961 Variants
1041
2107
1191
1778
LOVD
3276 Variants
UMD
3675 Variants
Problem 1: Many variants
lack clear interpretation
Problem 2: Variation
databases are disjoint
Problem 3: Too little available data
for effective curation
VUS
Shown: the BRCA variants in ClinVar as of 6/22/17
5. @gxr
#BRCAExchange
BRCA Exchange Core
BRCA Interpretation Group
ENIGMA ConsortiumBRCA Challenge
members
GA4GH Technical Work Streams
Regulatory & Ethics, Security
Bioinformaticians
Software Engineers
Clinicians
6. @gxr
#BRCAExchange
Types of Data
Variant-level (data annotated to a variant)
● Genotype, classification, allele frequencies, etc.; the majority of our data.
● Well-structured, most problems from inaccurate/ambiguous variant specs.
● Easy to share.
Case-level (data annotated to a case/patient)
● Currently a small percentage of our data. Hoping to grow!
● Detailed data of cancer history, molecular features, family history, pedigrees.
● Heterogeneous clinical data.
● Privacy concerns, may need controlled-access mechanisms or other privacy
enhancing mechanisms.
7. @gxr
#BRCAExchange
BRCA Exchange: Variant Exchange Platform
Highlights:
● Federated network for variant data exchange: ClinVar, LOVD, BIC, ExAC, ...
● Uniform variant processing and identification.
● Open source, cloud based, fully automatic (https://github.com/BRCAChallenge/brca-exchange).
● Public access, monthly releases and versioning support.
● Programmatic access via GA4GH interfaces.
● Largest public (federated) repository of BRCA1/2 variants.
8. @gxr
#BRCAExchange
What does the BRCA Exchange pipeline do?
1. Merges data on the same
variant from different sources
2. Merges data on equivalent
variants
○ (e.g., chr13:.g.32362600:T>TA
(and chr13:.g.32362601:A>AA)
3. Checks the data quality
○ Verifies the reference allele
against the genome
9. Each repository contributes distinct information on
BRCA1/2 variation
Combined, BRCA Exchange has 19,432 individual deduplicated variants
(3/2018, monthly release)
Largest BRCA1/2 public variant database worldwide.
(As of 10/2016)
10. @gxr
#BRCAExchange
Monthly Release Cycle & Data Versioning
Motivation
● Curation efforts need to
know exactly which data
was used to make a
pathogenicity
classification.
● Regulatory bodies, such
as the FDA, will require
versioning.
What is in Versioning?
● Summary of new data/software releases.
● Ability to see which variants are new variants,
which ones have a new classification, etc.
● Current version 16.
11. @gxr
#BRCAExchange
How do I access the data?
BRCA Exchange website
● Browse to your variant of interest
● Download all the variants and
evidence from a particular version
Programmatic access via GA4GH APIs
● Example of how to access these data
is available as a python notebook
● Integratable in other workflows
14. @gxr
#BRCAExchange
Aim 1: Enable Finding Variant Classifications
● One place for all known BRCA1/2 variants
● Highlight expert-panel reviewed variant interpretations for clinical use
● Simple to use user interface
BIC
21. @gxr
#BRCAExchange
Aim 2: Research Data & Curation Environment
● Information necessary for variant classification (allele frequencies, priors, …)
● Data from many different, possibly disagreeing sources
● Curation tools & partially automatic variant classification
● All public data!
BIC
22. @gxr
#BRCAExchange
Automate Parts of Variant Curation, Streamline Rest
● Automatically collect
relevant variant info
● Precompute allele
frequencies, priors,
splice sites, etc
● Workflow support for
pushing variants through
decision process
○ Perform preliminary
classifications
○ Find relevant literature
○ Approve classifications
26. @gxr
#BRCAExchange
Aim 3: Case Level Data Exchange
● Provide infrastructure to collect & store case-level data
● Genotypes, clinical data, family history, etc.
● Analysis tools: based on family history; multi-factorial
● Controlled access mechanisms Protected Data
27. @gxr
#BRCAExchange
Case-level Data: Who, What, Why, How?
● Identify individuals that have VUS to collect more information facilitating
variant classification
○ Connect to genetic counselors & physicians
○ Connect to genetic testing laboratories & health care systems
○ Connect to national sequencing efforts and cancer registries
28. @gxr
#BRCAExchange
Case-level Data: Who, What, Why, How?
● Identify individuals that have VUS to collect more information facilitating
variant classification
○ Connect to genetic counselors & physicians
○ Connect to genetic testing laboratories & health care systems
○ Connect to national sequencing efforts and cancer registries
● Which data?
○ Existence of a patient in an institution
■ To follow up manually with genetic counselor to get relevant information
○ Obtain Family history data
■ Allows posterior calculations for VUS when enough families provide family history info
○ Detailed clinical & genetic information
■ Tumor pathology, demographics, co-segregation, etc. help multifactorial likelihood model
29. @gxr
#BRCAExchange
Case-level Data: Who, What, Why, How?
● Identify individuals that have VUS to collect more information facilitating
variant classification
○ Connect to genetic counselors & physicians
○ Connect to genetic testing laboratories & health care systems
○ Connect to national sequencing efforts and cancer registries
● Which data?
○ Existence of a patient in an institution: GA4GH Beacon
■ To follow up manually with genetic counselor to get relevant information
○ Obtain Family history data
■ Allows posterior calculations for VUS when enough families provide family history info
○ Detailed clinical & genetic information
■ Tumor pathology, demographics, co-segregation, etc. help multifactorial likelihood model
● Technical & Legal Challenges
● Develop secure case-level storage
● establish analysis workflow to distill summary data used in curation
30. Family History Data Sharing Pilot Project
Lab 1
Lab 2
…
Lab 3
BRCA Exchange
Family history data
• Train on cases associated
with benign and
pathogenic variants
• Compute pathogenicity
probability for VUSs
Pathogenic
Probabilities
Publically available, forwarded to others
• Curators
• ENIGMA
• ClinGen etc
Variant-level summaries
31. Centralized Model: Central Analysis and Exchange of Datasets
BRCA Case level Data
...
• Institutional data is
assumed protected/
identifiable
• BRCA Exchange centrally
stores data in secure
environment
• Curation based on case-
level data and other data
within secure
environment
• Variant-level information
is published
Institution X_1
Institution X_n
Curation
BRCA Exchange
32. Decentralized Model: Local Analysis & Exchange of Minimal Datasets
BRCA Case level Data
...
• Institutional data is
assumed protected
accesss / identifiable
• BRCA Exchange
container de-identifies
data, transmits
“condensed, public”
data to brca exchange.
• Curation is based on
condensed data and
published on BRCA
exchange.
Institution X_1
Institution X_n
Curation
BRCA Exchange
33. We need your help!
● Submit your variant-level data to public repositories (ClinVar,
LOVD, etc.) or directly to us:
○ Classifications, literature, assays, variant allele frequencies, splicing
predictions, etc.
○ Connect clinical testing labs to GA4GH beacons!
● Talk to us about participation in Family History Sharing project
○ Share Family Histories of tested individuals
● Help us to connect to national initiatives & cancer registries
○ Implement ways to collect case-level or summary data
Contact: raetsch@ethz.ch or Rachel.Liao@ga4gh.org
34. @gxr
#BRCAExchange
Summary & Outlook
Aims of BRCA Exchange:
● Share variant information to clinicians/physicians
● Provide platform to facilitate research & variant curation
● Collect data from case-level data repositories help curation of VUS
We need your help!
● Help connect us to large case-level repositories. National initiatives/consortia.
● Come talk to me after the session, during lunch or dinner. raetsch@ethz.ch
Technical, legal, organizational challenges are similar for other diseases:
● Relatively easy to replicate BRCA Exchange for other diseases/genes
○ MMR/InSIGHT variant database
○ Other hereditary cancers
Please contact me or Rachel Liao to talk!
35. Acknowledgements
BRCA Challenge Steering Committee
BRCA Challenge Evidence Gathering Group
BRCA Challenge Interpretation Group
Open Source: https://github.com/BRCAChallenge/brca-exchange/
Contact: raetsch@ethz.ch or Rachel.Liao@ga4gh.org
Charlie Markello
Benedict Paten
Mary Goldman
Gunnar Rätsch
Melissa Cline
Zack Fischmann
Faisal Alquaddoomi
Rachel Liao
Stephen Chanock
Sir John Burn