Variation and Assembly Resources at EMBL-EBI

Variation and Assembly Resources at EMBL-EBI
Laura Clarke
Variant Discovery and Genome Assembly
Wednesday November 1st
Genome Informatics

EVA
Variation and Assembly Resources at EMBL-EBI
PDX Finder
Oxford Nanopore
MARC ReadUntil
BlobToolKit
GWAS
Catalog

Looking back
• 1982 EMBL and Genbank established
• 1982 Data sharing and standardization
collaboration put in place
• 1983 first full phage genome published
• First public in November 1982
• Enterobacteria phage T7
https://www.ebi.ac.uk/ena/data/view/V01146
Credit: Ana Toribio

Assembly archiving today
• UI and API submission interfaces
• Reads and Assemblies accepted
• Chlamydia trachomatis A2497 serovar A
Comprehensive global genome dynamics
of Chlamydia trachomatis show ancient
diversification followed by contemporary
mixing and recent lineage expansion.
563 full genomes (455 novel)
Genome Res. 2017 Jul;27(7):1220-1229.
doi: 10.1101/gr.212647.116.
J Hadfield et al
https://www.ebi.ac.uk/ena/data/view/FM872306
Credit: Ana Toribio

Accessing managed human data
EGA By the numbers
● 1,698 studies
● 3,591 datasets
● 777 data providers
● >10,000 requestors
● EMBL-EBI and CRG
By volume
● 4.7 Petabytes
https://ega-archive.org/
Credit: Thomas Keane

EGA ~2015

Looking Forward

HTS Get
What is it?
• An efficient non-file based API interface for accessing read data
• Separate backend storage implementation from interface
• A bridge from existing file formats to API client/server model
Progress
• Launch of v1.0 at GA4GH plenary October 2017!
• Demonstrations of integration with AAI+secure transfer
http://samtools.github.io/hts-specs/

Beacon project
• Allele based genotype queries
• Each beacon determines it’s own access poliy
• Data returned can be determined depending on tier
• Allele frequency
• Data set
• Population
• Sample
• Phenotype
• Anything else?
• EGA Beacon has 3 tiers of access
• Public
• Registered
• Controlled
https://beacon-network.org/#/
Credit Thomas Keane

European Variation Archive
• European Variation Archive
• Established in 2014
• Accepts VCF submissions (no archive specific format)
• Can link to ENA read submissions
• Taking over non-human RS assignment from dbSNP
Credit: Cristina Yenyxe Gonzalez
https://www.ebi.ac.uk/eva/

Non Human RS number assignment and releases
• EVA to assign rs (locus) and ss (submission) numbers for non human variants
• Existing accessions will remain in use
• Continues rolling release of variants as submitted
• Bi-annual merging of submitted variants into loci
• Always connected to existing rs numbers on search
• Per species VCFs released
• API and streaming access available
• EVA continues to broker Human variants to dbSNP
Credit Cristina Yenyxe Gonzalez
https://www.ebi.ac.uk/eva/

VCF specification and validation
• Maintained by GA4GH file formats group
• EVA validates against official specification
• www.github.com/ebivariation/vcf-validator
• Proposal in place to improve SV structure in VCFs
• Maintainers of variation archives
• Structural variation caller methods developers
• Pull request on https://github.com/samtools/hts-specs/pull/231
• Please give feedback
Credit: Cristina Yenyxe Gonzalez
https://www.github.com/samtools/hts-specs

ATG AAAAAAA
Regulatory
3’ UTRIntronic
CODING
Missense
CODING
Synonymous
Splice site5’ UTR 3’ Downstream
http://www.ensembl.org/info/genome/variation/sources_documentation.html
Credit: Emily Perry

Phenotype and disease data can be
searched by ontology term to retrieve
aggregated results.
Improved allele frequency
views with more data
available
Credit: Sarah Hunt

The GWAS Catalog
• Public catalog of Genome Wide Association Studies
• Curated from the literature
• Now with summary statistics
• > 3000 publications
• > 44,000 variant-trait associations
https://www.ebi.ac.uk/gwas/downloads/summary-statistics
Credit: Fiona
Cunningham

Turning pathogen data collection into actionable information
• Risk-assessment models and risk-based sampling
• From samples and metadata to comparable data
• From comparable data to actionable information
• Pathogen identification and characterization
• Outbreak detection
• Outbreak investigation
• Outbreak prediction
• Building a common data platform and analysis framework
• Risk communication

PDX Finder
2010
2011
2016
Credit: Terry Meehan

PDX Finder
Build a comprehensive global catalogue of PDX models and their data available
for researchers
www.pdxfinder.org
JAX and EMBL-EBI co-developed resource
Carol Bult – Helen Parkinson/Terry Meehan
NCI funding
EC EuroPDX
Credit Terry Meehan

Questions
We are almost always hiring
https://www.ebi.ac.uk/about/jobs

Variation and Assembly Resources at EMBL-EBI

Recommended

Recommended

More Related Content

Similar to Variation and Assembly Resources at EMBL-EBI

Similar to Variation and Assembly Resources at EMBL-EBI (20)

Recently uploaded

Recently uploaded (20)

Variation and Assembly Resources at EMBL-EBI

Editor's Notes