1. MIT Licensed
Customized by Mountain Genomics in collaboration
with the bioinformatics community.
Version 1.0
Addressing Privacy Concerns
in the Age of Federated Data
Access
2. MIT Licensed
Customized for Mountain Genomics in collaboration
with the bioinformatics community.
Version 1.0
Sections
Part 1: Federated Computing and Identity
Part 2: Data Indexing
Part 3: Data Structures
Part 4: Extracting Knowledge
3. Disclosures
The idea and material presented here is not the view of NCBI, NLM, NIH or any other federal agency. Some of
the underlying work was supported by the Intramural Research Program of the NLM and various other federal
agencies.
Ben Busby is also an advisor to:
● Johns Hopkins
● Ariel Precision Medicine
● Deloitte
through Mountain Genomics, a company headquartered in Pittsburgh, PA.
Ben Busby will work primarily for DNANexus starting February 3rd, 2020; with limited other engagements
6. Make it worth it for
investigators and
stakeholders to identify
themselves
7. Making Compliance and Authentication Worthwhile:
Four Illustrative Topic Areas
1
Indexing Data for Federated Discovery on
any Platform, Anywhere in the World
Flexible presentation (API) of viral protein
domains, host-pathogen interactions and
eventually graph loops as proof-of-principle
data federation.
2
Setting up Federated Compute for Secure
and Reproducible Science
Unified control and accounting of compute
from a single node
3
Genome Graphs
Generation of simple, usable systems to not
compare an individual patient or organism to
another -- or small group of -- individual, but
to an entire community, dramatically
compress data, and immediately find “toxic
paths”.
4
Extracting Knowledge from Data
Annotation of haplotypes, instead of graphs,
to allowing investigators to query complex
disease
20. Making it worth it for
investigators and
stakeholders to identify
themselves
21.
22.
23.
24. Indexing Data for Federated Discovery on
any Platform, Anywhere in the World!
25. Indexing Data for Federated Discovery on
any Platform, Anywhere in the World!
26. Indexing Data for Federated Discovery on
any Platform, anywhere in the World!
Christiam Camacho, Sej
Modha, Alex Efremov,
Joan Marti-Carreras
27. Indexing Data for Federated Discovery on
any Platform, anywhere in the World!
Christiam Camacho, Sej
Modha, Alex Efremov,
Joan Marti-Carreras
28. A Brief Illustration of Utility for
Human Data, using
Graph Genomes
as an Example
29. Genome Graphs
Initially leveraging standard workflows for annotation of primary datasets
in cloud infrastructure mapped to linear genomes, we will likely transition to
genome graphs over the next ten years. This will allow us to compress
reads, do phasing automatically, do reference-guided assembly on complex
genomes, and detect “toxic paths” automatically. Most importantly, this will
allow us to see complex genotype - phenotype interactions easily.
40. A Brief Proposal for Graph-Based Sequence
Compression and Phasing (personal opinion)
++
41. A Brief Proposal for Graph-Based Sequence
Compression and Phasing (personal opinion)
=
(no longer available on
CD-ROM)
In this case, the sequence
data is not PII without
linked metadata
50. Thanks to a lot of fancy people for their work on this!
Rob Edwards
Ahmed Moustafa
Surya Saha
Rob Davey
Jason Williams
Fernanda Forterre
Juerven Bolleman
Nuala O’leary
Nick Cooley
Joyce Lee
Yuriy Skripchenko
Felix Shaw
Adrien Cantu
Sonia Gavasso
Michael Crusoe
Lisa Federer
Bhavya Papudeshi
Jordan Izenga
Alexis Norris
Others cited in the talk!
So Many Hackathoners!
We’ll continue working on it at the
upcoming Baylor SV/Pangenomics
Hackathon and other events!
https://www.hgsc.bcm.edu/events/hacka
thon
https://biohackathons.github.io