Ensembl, ELIXIR and engineering
interconnections
e-ROSA stakeholders’ meeting July 6th-7th 2017
Paul Kersey
What is EMBL-EBI?
• Europe’s home for biological data services, research
and training
• A trusted data provider for the life sciences
• Part of the European Molecular Biology Laboratory,
an intergovernmental research organisation
• International: 600 members of staff from 60 nations
• Home of the ELIXIR Technical hub.
• Providers of open (FAIR) data since 1982!
Data resources at EMBL-EBI
Literature & ontologies
• Experimental Factor
Ontology
• Gene Ontology
• BioStudies
• Europe PMC
Chemical biology
• ChEBI
• ChEMBL
• SureChEMBL
Molecular structures
• Protein Data Bank in Europe
• Electron Microscopy Data Bank
Gene, protein & metabolite expression
• Expression Atlas
• Metabolights
• PRIDE
• RNA Central
Protein sequences,
families & motifs
• InterPro
• Pfam
• UniProt
Genes, genomes & variation
• Ensembl
• Ensembl Genomes
• GWAS Catalog
• Metagenomics portal
Systems
• BioModels
• BioSamples
• Enzyme Portal
• IntAct
• Reactome
Molecular Archives
• European Nucleotide Archive
• European Variation Archive
• European Genome-phenome Archive
• ArrayExpress
Big data, big demand
~27 million
requests to EMBL-EBI websites
every day
120 petabytes
of storage capacity in our data centres
EMBL-EBI delivered
152 million
jobs to its users in 2016
Scientists at over
3.2 million
unique IP addresses use
EMBL-EBI websites
pkersey@ebi.ac.uk e-ROSA stakeholders' meeting, Montpellier, 6th-7th July 201729.08.20175
Ensembl
• A modular suite of software for genome analysis and visualisation
• Originally used just for vertebrate genomes, since 2009 also used for
genomes from across the taxonomic space
• This extension to non-vertebrate genomes is “Ensembl Genomes”
vertebrates
metazoaplants
protistsfungibacteria
pkersey@ebi.ac.uk e-ROSA stakeholders' meeting, Montpellier, 6th-7th July 201729.08.20176
Ensembl
• Integrates access to a wide range of genome-scale
data, including:
• Sequence
• Structural and Functional Annotation
• Comparative and population data
• A powerful toolset for the storage and analysis of genetic
variation
pkersey@ebi.ac.uk e-ROSA stakeholders' meeting, Montpellier, 6th-7th July 201729.08.20177
Species of Agricultural Interest in Ensembl
• Farm animals
• Crop and forest plants
• Pathogens and other symbionts
• Pollinators
• Pests
• Vectors
ELIXIR
ELIXIR unites Europe’s
leading life science
organisations in managing
and safeguarding the data
generated by publicly
funded research.
It coordinates, integrates
and sustains data
resources across its
member states, enabling
scientists to access
services that are vital for
their research.
pkersey@ebi.ac.uk e-ROSA stakeholders' meeting, Montpellier, 6th-7th July 201729.08.20179
ELIXIR: The Challenges
• ELIXIR is the largest project to emerge from the ESFRI
process, with over 20 countries having signed up to develop
a data infrastructure for the life sciences
• Over dozens of biology domains, institutions, funding
agencies and nations, we need to:
• Agree data access and interoperability policies
• Strong commitment to FAIR data - and more
• Secure sustainable, justified funding for long term resources
• Agree divisions of labour to maximise total impact
pkersey@ebi.ac.uk e-ROSA stakeholders' meeting, Montpellier, 6th-7th July 201729.08.201710
The View From the Molecular Sciences
• Be better than FAIR. Open data – available without
restriction – has been a key driver of the molecular biology
revolution. Some data is properly private, but universal data
access is the first choice (especially for publicly funded-
data).
• Don’t settle for sweet F.A. - without data structure.
Interoperability and Re-usability are very limited.
• Choice between lightweight REST-ful and RDF-based
approaches.
• Development of standards for representation of both data
and meta data is critically important.
pkersey@ebi.ac.uk e-ROSA stakeholders' meeting, Montpellier, 6th-7th July 201729.08.201711
pkersey@ebi.ac.uk e-ROSA stakeholders' meeting, Montpellier, 6th-7th July 201729.08.201712
Relationship of ELIXIR and other
communities working with biological data
• ELIXIR has a governance structure and extensive buy—in
• ELIXIR has a broad perspective
• Could specific communities enter the ELIXIR framework as
new modules, rather than establishing independent
bureaucracy?

eROSA Stakeholder WS1: Ensembl, ELIXIR and engineering interconnections

  • 1.
    Ensembl, ELIXIR andengineering interconnections e-ROSA stakeholders’ meeting July 6th-7th 2017 Paul Kersey
  • 2.
    What is EMBL-EBI? •Europe’s home for biological data services, research and training • A trusted data provider for the life sciences • Part of the European Molecular Biology Laboratory, an intergovernmental research organisation • International: 600 members of staff from 60 nations • Home of the ELIXIR Technical hub. • Providers of open (FAIR) data since 1982!
  • 3.
    Data resources atEMBL-EBI Literature & ontologies • Experimental Factor Ontology • Gene Ontology • BioStudies • Europe PMC Chemical biology • ChEBI • ChEMBL • SureChEMBL Molecular structures • Protein Data Bank in Europe • Electron Microscopy Data Bank Gene, protein & metabolite expression • Expression Atlas • Metabolights • PRIDE • RNA Central Protein sequences, families & motifs • InterPro • Pfam • UniProt Genes, genomes & variation • Ensembl • Ensembl Genomes • GWAS Catalog • Metagenomics portal Systems • BioModels • BioSamples • Enzyme Portal • IntAct • Reactome Molecular Archives • European Nucleotide Archive • European Variation Archive • European Genome-phenome Archive • ArrayExpress
  • 4.
    Big data, bigdemand ~27 million requests to EMBL-EBI websites every day 120 petabytes of storage capacity in our data centres EMBL-EBI delivered 152 million jobs to its users in 2016 Scientists at over 3.2 million unique IP addresses use EMBL-EBI websites
  • 5.
    pkersey@ebi.ac.uk e-ROSA stakeholders'meeting, Montpellier, 6th-7th July 201729.08.20175 Ensembl • A modular suite of software for genome analysis and visualisation • Originally used just for vertebrate genomes, since 2009 also used for genomes from across the taxonomic space • This extension to non-vertebrate genomes is “Ensembl Genomes” vertebrates metazoaplants protistsfungibacteria
  • 6.
    pkersey@ebi.ac.uk e-ROSA stakeholders'meeting, Montpellier, 6th-7th July 201729.08.20176 Ensembl • Integrates access to a wide range of genome-scale data, including: • Sequence • Structural and Functional Annotation • Comparative and population data • A powerful toolset for the storage and analysis of genetic variation
  • 7.
    pkersey@ebi.ac.uk e-ROSA stakeholders'meeting, Montpellier, 6th-7th July 201729.08.20177 Species of Agricultural Interest in Ensembl • Farm animals • Crop and forest plants • Pathogens and other symbionts • Pollinators • Pests • Vectors
  • 8.
    ELIXIR ELIXIR unites Europe’s leadinglife science organisations in managing and safeguarding the data generated by publicly funded research. It coordinates, integrates and sustains data resources across its member states, enabling scientists to access services that are vital for their research.
  • 9.
    pkersey@ebi.ac.uk e-ROSA stakeholders'meeting, Montpellier, 6th-7th July 201729.08.20179 ELIXIR: The Challenges • ELIXIR is the largest project to emerge from the ESFRI process, with over 20 countries having signed up to develop a data infrastructure for the life sciences • Over dozens of biology domains, institutions, funding agencies and nations, we need to: • Agree data access and interoperability policies • Strong commitment to FAIR data - and more • Secure sustainable, justified funding for long term resources • Agree divisions of labour to maximise total impact
  • 10.
    pkersey@ebi.ac.uk e-ROSA stakeholders'meeting, Montpellier, 6th-7th July 201729.08.201710 The View From the Molecular Sciences • Be better than FAIR. Open data – available without restriction – has been a key driver of the molecular biology revolution. Some data is properly private, but universal data access is the first choice (especially for publicly funded- data). • Don’t settle for sweet F.A. - without data structure. Interoperability and Re-usability are very limited. • Choice between lightweight REST-ful and RDF-based approaches. • Development of standards for representation of both data and meta data is critically important.
  • 11.
    pkersey@ebi.ac.uk e-ROSA stakeholders'meeting, Montpellier, 6th-7th July 201729.08.201711
  • 12.
    pkersey@ebi.ac.uk e-ROSA stakeholders'meeting, Montpellier, 6th-7th July 201729.08.201712 Relationship of ELIXIR and other communities working with biological data • ELIXIR has a governance structure and extensive buy—in • ELIXIR has a broad perspective • Could specific communities enter the ELIXIR framework as new modules, rather than establishing independent bureaucracy?