ELIXIR: European infrastructure for
biological information
Data infrastructure for
Europe’s life-science
research:
www.elixir-europe.org @ELIXIREurope
Data
Interoperability
Tools
Compute
Training
Marine metagenomics
Human data
Crop and forest plants
Rare diseases
A network of data Nodes
• ELIXIR Nodes are funded
nationally
• ELIXIR Nodes build on
national strengths and
priorities
• ELIXIR Nodes provides a
national framework for long-
term resource management
de.NBI - The German Network for
Bioinformatics Infrastructure
de.NBI consortium
• 39 project partners
• 30 institutions
• 8 service centers
• designated national German
node in ELIXIR
www.denbi.de
Common Services, Common Standards,
Data deposition:
ENA, EGA, PDBe, EuropePMC, …
Bioinformatics tools:
Bio.tools,Containers, Galaxy
Data Interoperability:
Standards,Identifiers, Ontologies
Compute:
Secure data transfer, cloud computing, AAI
Community partnerships:
Human access controlled, Plant, Marine, Rare
Disease, Proteomics, Metabolomics, Industry
outreach
Training:
TeSS, Data Carpentry, eLearning
Data management:
Genome annotation
Data management plans
Added value data:
UniProt, Ensembl,OrphaNet, …
Open data requires infrastructure
What are ELIXIR Core Data Resources?
Fundamental
importance
Complete
collections
of generic
value
High levels
of usage,
scientific quality
and service
ELIXIR Core Data Resources – fundamentally important to life-
science research • The ELIXIR Deposition Databases meet the
technical quality and governance criteria
expected of ELIXIR Core Data Resources
• ELIXIR is committed to Open Access as a core
principle for publicly funded research.
• ELIXIR Core Data Resources should reflect this
commitment and have terms of use or a licence
that enables the reuse and remixing of data.
• See “Identifying ELIXIR Core Data Resources”
• Agreed collectively by 21 Node directors
https://www.elixir-europe.org/platforms/data/core-data-
resources
Eligible for
shared
international
support
Determine
which data
resources are
of fundamental
importance
Define
indicators to
establish the
core data
resources
To design and implement an
international plan for long-term sustainability
Towards a global effort
Significant interest internationally, e.g.
NIH/NSF
Global coalition organized by HFSP:
Presented by W. Anderson/Eric Green at
HIRO (June 2017)
Changing landscape with many actors
• Highly distributed data-generating &
monitoring
• Distributed analysis requires reference
datasets (organized centrally, locally or
in distributed networks)
• Manage Legal requirements in
transnational settings
International
Resources
National data
centres
Institutional
data centres
ELIXIR’s principles on FAIR data management
• Open sharing of research data is a core principle for publicly-funded research and ELIXIR
encourages all funders to adopt Open Data mandates.
• Data Management is crucial part of good scientific practice and research excellence.
• Whenever possible, biological research data should be submitted to the recommended
community deposition databases.
• All data submitted to Open Data archives must be annotated in accordance with
community-defined standards.
• ELIXIR Nodes are the national implementation of a harmonised FAIR Data Management
programme for the life sciences.
• FAIR data management requires professional skills and adequate resources.
• Good research data management requires appropriate funding for data infrastructures.
ELIXIR position paper on FAIR data management in the life sciences
(doi: 10.7490/f1000research.1114985.1)
“Whenever possible, biological research data should be submitted to
the recommended community deposition databases"
• The ELIXIR Deposition Databases meet the
technical quality and governance criteria
expected of ELIXIR Core Data Resources
• See “Identifying ELIXIR Core Data Resources”
• Agreed collectively by 21 Node directors
• International collaborative effort
https://elixir-europe.org/platforms/data/elixir-deposition-databases
“All data submitted to Open Data archives must be annotated in
accordance with community-defined standards”
https://elixir-europe.org/platforms/interoperability
“FAIR data management requires professional skills and adequate resources”
Bring your own data workshops
• Problem-centered
workshops
• Integration experts -
Data resources –Users
• With national nodes or
pan-European projects
“ELIXIR Nodes are the national implementation of a harmonised FAIR
Data Management programme for the life sciences”
Findability
How do you find a needle in a federated haystack?
Bioschemas
“schema.org markup for life sciences – minimum
properties needed for finding data”
http://bioschemas.org
Bioschemas.org
Search enginesRegistries
Data
Aggregators
• Standardised
metadata
• Metadata
publish and
harvest
without APIs
or special
feeds
• Feed bio
registries and
aggregators
A community initiative built on top of Schemas.org to
improve Findability and Accessibility in Life Sciences • Rapid markup
• Exposed to harvesting
• Find
Major data
resources
Smaller
datasets
Bioschemas Bioschemas
Data Repository and Datasets Descriptions
Information about repositories
with consistent structured data
Align overlapping registry efforts
around certain metadata.
Help with consistency of
metadata collected by registries
With:
omicsDI
Bioschemas.org
omicsDI
Early adopters
Google research blog:
Facilitating the discovery of public
datasets
Dataset index
Scientific File
PID
Dataset index
Scientific File
PID
Dataset index
Scientific File
PID
EarthLife ...
Common Access Common Access Common Access
Data
Services
Compute Storage Transfer …
”Science schemas” as Emerging federation architecture in EOSC
EOSC Catalogue
• Accept that much sensitive data are
stored locally
• Data discovery & data access services
• EGA Central to ELIXIR approach
• Underpinned by GA4GH standards
Find and Access human genomic data
Human genomics in research & health needs…
Standards Networks of trust Reference archives
Reliable electronic identification of users (ELIXIR ID) is
needed to access the key services and capacities of ELIXIR
• Existing user accounts can be used to create your ELIXIR
ID today at www.elixir-europe.org. ELIXIR AAI allows
users to continue using their federated academic,
corporate or social media identity by linking it to a
personal ELIXIR ID.
• In production since January 2017 (running 08/2016)
• 292 Identity providers
• 584 ELIXIR identities in 101 groups
• 16 relying ELIXIR services
• ELIXIR AAI credential accepted by e-infrastructures; EGI
CheckIn and pilot on EDUAT B2ACCESS
ELIXIR AAI
 The ELIXIR service providers connected
to ELIXIR AAI benefit from a centralised
user identity and access management
services
 Permananet ID even if affiliation change
 Protocols SAML2, OpenIDConnect
 Contact: Mikael Linden, Michal Prochazka
Federated AAI: What is a “registered ELIXIR user”?
• Identification (ELIXIR ID)
• Group/role and attribute (such as researchers home
organization)
• Authentication (via GEANT/eduGAIN, social media or ORCID)
• Strong step-up authentication (for sensitive services)
• Personal authorisation management (for datasets that require DAC
approval)
• International mutual recognition – code-of-conducts, policies
• Institutional maturation models (cf OECD)
• Bona fide researcher status management (e.g. restricted services)
ELIXIR Cloud
WG: towards
interoperable
clouds
ELIXIR Industry & SME programme
ELIXIR Innovation and SME programme: 2017/2018
Previous Events Upcoming Events
• France - Paris 14-15 November
2017: Rare diseases and
personalized medicine
• Cambridge – UK 24-25 January
2018: Discovery of data, tools and
training
CONFIRMED SPEAKERS:
• Jean Francoise Deleuze (Director of the Centre National de Génotypage (CNG-IG-
CEA)
• Daria Julkowska (Programme coordinator eRARE)
• Frederic Revah (President,Yposkesi - CEO, Genethon)
• Ana Rath (Orphanet)
DATA RESOURCE SHOWCASE |TRAINING | FLASH-TALK
Registration OPEN:
https://sme_paris.eventbrite.co.uk
Total
Private
%
Copenhagen
2014
Wageningen
2015
Basel
2015
Oslo
2016
Helsinki
2017
Barcelona
2017
• Work with Industry Clusters
• European
• Regional
Building a network of life-science data actors
• >200 Companies in network
• regular updates – industry newsletter
• Case study of business-models in progress
ELIXIR in numbers
• ~ 180 institutes involved
• 600+ staff
• 11 Implementation Studies
currently in operation
• 10 papers in ELIXIR F1000R
channel
• 223 live events in TeSS
• 200 companies attended
Innovation and SME
programme
www.elixir-europe.org/excelerateELIXIR-EXCELERATE is funded by the European Commission within the
Research Infrastructures programme of Horizon 2020, grant agreement number
676559.
Thank you!
ELIXIR Platforms and Use cases leads
Interoperability:
Chris Evelo (ELIXIR NL), Carole Goble
(ELIXIR UK), Helen Parkinson (ELIXIR EMBL-EBI)
Tools:
Søren Brunak (ELIXIR DK),
Alfonso Valencia (ELIXIR ES)
Compute:
Tommi Nyrönen (ELIXIR FI), Luděk Matyska
(ELIXIR CZ), Steven Newhouse (ELIXIR EMBL-EBI)
Data:
Jo McEntyre (ELIXIR EMBL-EBI),
Christine Durinx (ELIXIR CH)
Training:
Celia van Gelder (ELIXIR-NL) Gabry Rusticci (ELIXIR
UK), Patricia Palagi (ELIXIR CH)
Human data:
Jordi Rambla (ELIXIR ES),Thomas Keane (ELIXIR
EMBL-EBI)
Plants:
Celia Miguel (ELIXIR PT), Paul
Kersey (ELIXIR EMBL-EBI)
Rare diseases:
Ivo Gut (ELIXIR ES),
Marco Roos (ELIXIR NL)
Marine metagenomics:
NilsWillassen (ELIXIR No),
Rob Finn (ELIXIR EMBL-EBI)
ELIXIR Compute: 3 core pan-ELIXIR services for data sharing
Link national resources using
common standards, shared
services and user management
protocols

Elixir at de.nbi meeting

  • 1.
    ELIXIR: European infrastructurefor biological information Data infrastructure for Europe’s life-science research: www.elixir-europe.org @ELIXIREurope Data Interoperability Tools Compute Training Marine metagenomics Human data Crop and forest plants Rare diseases
  • 3.
    A network ofdata Nodes • ELIXIR Nodes are funded nationally • ELIXIR Nodes build on national strengths and priorities • ELIXIR Nodes provides a national framework for long- term resource management de.NBI - The German Network for Bioinformatics Infrastructure de.NBI consortium • 39 project partners • 30 institutions • 8 service centers • designated national German node in ELIXIR www.denbi.de
  • 4.
    Common Services, CommonStandards, Data deposition: ENA, EGA, PDBe, EuropePMC, … Bioinformatics tools: Bio.tools,Containers, Galaxy Data Interoperability: Standards,Identifiers, Ontologies Compute: Secure data transfer, cloud computing, AAI Community partnerships: Human access controlled, Plant, Marine, Rare Disease, Proteomics, Metabolomics, Industry outreach Training: TeSS, Data Carpentry, eLearning Data management: Genome annotation Data management plans Added value data: UniProt, Ensembl,OrphaNet, …
  • 5.
    Open data requiresinfrastructure
  • 6.
    What are ELIXIRCore Data Resources? Fundamental importance Complete collections of generic value High levels of usage, scientific quality and service
  • 7.
    ELIXIR Core DataResources – fundamentally important to life- science research • The ELIXIR Deposition Databases meet the technical quality and governance criteria expected of ELIXIR Core Data Resources • ELIXIR is committed to Open Access as a core principle for publicly funded research. • ELIXIR Core Data Resources should reflect this commitment and have terms of use or a licence that enables the reuse and remixing of data. • See “Identifying ELIXIR Core Data Resources” • Agreed collectively by 21 Node directors https://www.elixir-europe.org/platforms/data/core-data- resources
  • 8.
    Eligible for shared international support Determine which data resourcesare of fundamental importance Define indicators to establish the core data resources To design and implement an international plan for long-term sustainability
  • 9.
    Towards a globaleffort Significant interest internationally, e.g. NIH/NSF Global coalition organized by HFSP: Presented by W. Anderson/Eric Green at HIRO (June 2017)
  • 10.
    Changing landscape withmany actors • Highly distributed data-generating & monitoring • Distributed analysis requires reference datasets (organized centrally, locally or in distributed networks) • Manage Legal requirements in transnational settings International Resources National data centres Institutional data centres
  • 11.
    ELIXIR’s principles onFAIR data management • Open sharing of research data is a core principle for publicly-funded research and ELIXIR encourages all funders to adopt Open Data mandates. • Data Management is crucial part of good scientific practice and research excellence. • Whenever possible, biological research data should be submitted to the recommended community deposition databases. • All data submitted to Open Data archives must be annotated in accordance with community-defined standards. • ELIXIR Nodes are the national implementation of a harmonised FAIR Data Management programme for the life sciences. • FAIR data management requires professional skills and adequate resources. • Good research data management requires appropriate funding for data infrastructures. ELIXIR position paper on FAIR data management in the life sciences (doi: 10.7490/f1000research.1114985.1)
  • 12.
    “Whenever possible, biologicalresearch data should be submitted to the recommended community deposition databases" • The ELIXIR Deposition Databases meet the technical quality and governance criteria expected of ELIXIR Core Data Resources • See “Identifying ELIXIR Core Data Resources” • Agreed collectively by 21 Node directors • International collaborative effort https://elixir-europe.org/platforms/data/elixir-deposition-databases
  • 13.
    “All data submittedto Open Data archives must be annotated in accordance with community-defined standards” https://elixir-europe.org/platforms/interoperability
  • 14.
    “FAIR data managementrequires professional skills and adequate resources” Bring your own data workshops • Problem-centered workshops • Integration experts - Data resources –Users • With national nodes or pan-European projects
  • 15.
    “ELIXIR Nodes arethe national implementation of a harmonised FAIR Data Management programme for the life sciences”
  • 16.
    Findability How do youfind a needle in a federated haystack?
  • 17.
    Bioschemas “schema.org markup forlife sciences – minimum properties needed for finding data” http://bioschemas.org
  • 18.
    Bioschemas.org Search enginesRegistries Data Aggregators • Standardised metadata •Metadata publish and harvest without APIs or special feeds • Feed bio registries and aggregators A community initiative built on top of Schemas.org to improve Findability and Accessibility in Life Sciences • Rapid markup • Exposed to harvesting • Find Major data resources Smaller datasets Bioschemas Bioschemas
  • 19.
    Data Repository andDatasets Descriptions Information about repositories with consistent structured data Align overlapping registry efforts around certain metadata. Help with consistency of metadata collected by registries With: omicsDI Bioschemas.org
  • 20.
    omicsDI Early adopters Google researchblog: Facilitating the discovery of public datasets
  • 21.
    Dataset index Scientific File PID Datasetindex Scientific File PID Dataset index Scientific File PID EarthLife ... Common Access Common Access Common Access Data Services Compute Storage Transfer … ”Science schemas” as Emerging federation architecture in EOSC EOSC Catalogue
  • 22.
    • Accept thatmuch sensitive data are stored locally • Data discovery & data access services • EGA Central to ELIXIR approach • Underpinned by GA4GH standards Find and Access human genomic data
  • 23.
    Human genomics inresearch & health needs… Standards Networks of trust Reference archives
  • 24.
    Reliable electronic identificationof users (ELIXIR ID) is needed to access the key services and capacities of ELIXIR • Existing user accounts can be used to create your ELIXIR ID today at www.elixir-europe.org. ELIXIR AAI allows users to continue using their federated academic, corporate or social media identity by linking it to a personal ELIXIR ID. • In production since January 2017 (running 08/2016) • 292 Identity providers • 584 ELIXIR identities in 101 groups • 16 relying ELIXIR services • ELIXIR AAI credential accepted by e-infrastructures; EGI CheckIn and pilot on EDUAT B2ACCESS ELIXIR AAI  The ELIXIR service providers connected to ELIXIR AAI benefit from a centralised user identity and access management services  Permananet ID even if affiliation change  Protocols SAML2, OpenIDConnect  Contact: Mikael Linden, Michal Prochazka
  • 25.
    Federated AAI: Whatis a “registered ELIXIR user”? • Identification (ELIXIR ID) • Group/role and attribute (such as researchers home organization) • Authentication (via GEANT/eduGAIN, social media or ORCID) • Strong step-up authentication (for sensitive services) • Personal authorisation management (for datasets that require DAC approval) • International mutual recognition – code-of-conducts, policies • Institutional maturation models (cf OECD) • Bona fide researcher status management (e.g. restricted services)
  • 26.
  • 27.
    ELIXIR Industry &SME programme
  • 28.
    ELIXIR Innovation andSME programme: 2017/2018 Previous Events Upcoming Events • France - Paris 14-15 November 2017: Rare diseases and personalized medicine • Cambridge – UK 24-25 January 2018: Discovery of data, tools and training
  • 29.
    CONFIRMED SPEAKERS: • JeanFrancoise Deleuze (Director of the Centre National de Génotypage (CNG-IG- CEA) • Daria Julkowska (Programme coordinator eRARE) • Frederic Revah (President,Yposkesi - CEO, Genethon) • Ana Rath (Orphanet) DATA RESOURCE SHOWCASE |TRAINING | FLASH-TALK Registration OPEN: https://sme_paris.eventbrite.co.uk
  • 30.
    Total Private % Copenhagen 2014 Wageningen 2015 Basel 2015 Oslo 2016 Helsinki 2017 Barcelona 2017 • Work withIndustry Clusters • European • Regional Building a network of life-science data actors • >200 Companies in network • regular updates – industry newsletter • Case study of business-models in progress
  • 31.
    ELIXIR in numbers •~ 180 institutes involved • 600+ staff • 11 Implementation Studies currently in operation • 10 papers in ELIXIR F1000R channel • 223 live events in TeSS • 200 companies attended Innovation and SME programme
  • 32.
    www.elixir-europe.org/excelerateELIXIR-EXCELERATE is fundedby the European Commission within the Research Infrastructures programme of Horizon 2020, grant agreement number 676559. Thank you!
  • 33.
    ELIXIR Platforms andUse cases leads Interoperability: Chris Evelo (ELIXIR NL), Carole Goble (ELIXIR UK), Helen Parkinson (ELIXIR EMBL-EBI) Tools: Søren Brunak (ELIXIR DK), Alfonso Valencia (ELIXIR ES) Compute: Tommi Nyrönen (ELIXIR FI), Luděk Matyska (ELIXIR CZ), Steven Newhouse (ELIXIR EMBL-EBI) Data: Jo McEntyre (ELIXIR EMBL-EBI), Christine Durinx (ELIXIR CH) Training: Celia van Gelder (ELIXIR-NL) Gabry Rusticci (ELIXIR UK), Patricia Palagi (ELIXIR CH) Human data: Jordi Rambla (ELIXIR ES),Thomas Keane (ELIXIR EMBL-EBI) Plants: Celia Miguel (ELIXIR PT), Paul Kersey (ELIXIR EMBL-EBI) Rare diseases: Ivo Gut (ELIXIR ES), Marco Roos (ELIXIR NL) Marine metagenomics: NilsWillassen (ELIXIR No), Rob Finn (ELIXIR EMBL-EBI)
  • 34.
    ELIXIR Compute: 3core pan-ELIXIR services for data sharing Link national resources using common standards, shared services and user management protocols

Editor's Notes

  • #2 ELIXIR unites Europe’s leading life science organisations in managing and safeguarding the increasing volume of data being generated by publicly funded research. ELIXIR coordinates, integrates and sustains bioinformatics resources across its member states and enables users in academia and industry to access vital data, tools, standards, compute and training services for their research. ELIXIR provides data infrastructure for Europe’s 500,000 life-science researchers ELIXIR is organised in five technical platforms: Data, Interoperability, Tools, Compute and Training. The four Use case of ELIXIR connect the technical activities to the real needs of user communities in the life sciences: Marine metagenomics, Crop and forest plants, Human data, Rare diseases. The ELIXIR network currently counts 17 Members and 2 Observers. The Network is coordinated from ELIXIR Hub, based alongside EMBL-EBI in Hinxton, UK
  • #3 Five new ELIXIR Members in 2015-2016: France, Spain, Belgium, Italy, Slovenia Luxembourg, Germany and Irelandm, Hungary joined in January 2017
  • #4 Creating a robust infrastructure for biological information is a bigger task than any individual organisation or nation can take on alone These are issues of such complexity that no single institution or country can tackle alone ELIXIR Nodes ensures local bioinformatics capacity throughout Europe Represented by SIB - Swiss Institute of Bioinformatics The largest ELIXIR Node – 48 research groups, 650 scientists Co-Leader of ELIXIR Platform on Data Services At heart of long-term sustainability objectives www.sib.swiss
  • #5 ELIXIR’s services include databases, tools, standards and how to achieve interoperability of standards, training courses, compute resources, data management support and industry cooperation and technology transfer. The list of services is constantly evolving Data deposition:  ELIXIR Nodes run several data deposition archives where researchers can save their data. Deposition is usually done online and is free as a public resource. The major deposition archives run by ELIXIR Nodes include ENA, EGA, PDBe, EuropePMC Added-value databases : Added-value databases process, analyse and annotate data (adding comments and other information) from deposition archives and make them accessible to wider scientific community. The added value comes from data processing, additional annotation, or mapping to standardized vocabularies or ontologies (a formal specification of terms and relationships among them). They facilitate the discovery of useful data and the usage of it. Examples of the major knowledge-bases run by ELIXIR Nodes include: UniProt, Ensembl, OrphaNet Using Compute services, researchers can carry out computationally intensive modeling and simulation studies and make the most of the big data revolution in the life science. Several ELIXIR Nodes offer Cloud computing services, which enable researchers use compute resources on-demand, without the need to manage their own hardware or datacenter in-house. Services currently available: Computerome (ELIXIR Denmark, ePouta, cPouta (ELIXIR Finland) Embassy cloud (EMBL-BLI) Training: ELIXIR Training trains European developers, trainers and researchers within ELIXIR communities. The developers are trained to get a better performance and use relevant methodologies to implement software. Trainers are trained to deliver courses to use ELIXIR services. Researchers are trained to effectively use the tools and services offered by ELIXIR. ELIXIR training portal TeSS collects training related materials - users can browse, discover and organise life science training resources aggregated automatically from ELIXIR Nodes and 3rd party providers. Data management: Several ELIXIR Nodes offer practical help to research teams in developing and implementing their data management plans. The support ranges from practical help to research project in drafting their data management plans to services Industry: ELIXIR's Innovation and SME programme is a series of specialised events that bring together operators of ELIXIR services with industry and SME representatives, who have the opportunity to learn how to effectively use the bioinformatics tools and services offered by ELIXIR.   Tools:  
  • #7 We defined the ELIXIR Core Data Resources as A set of data resources that are of [click] fundamental importance to the broad life science community and the long-term preservation of biological data [click] They provide complete collections of generic value to life science, [click] and show high levels of usage, scientific quality and service.
  • #9 8
  • #20 Information about repositories with consistent structured data to help index, search & tools Align overlapping registry efforts to collect & curate certain metadata. Help with consistency of metadata collected by registries Use Case Properties
  • #24 Data sharing requires compatible technology – GA4GH incredibly important role as the global, neutral body that look after and preserve standards. From our perspective it has been a pleasure to collaborate. It is important to recognise the role of global, open , community driven organisations for setting the technical and scientific standards. As an international organisation, supported by government funders we can support, we can facilitate and we can help sustain. But the standards needs to be driven out of the community. Networks of trust – sharing of human derived, potentially identifieable data requires trust between the parties. This is one of the key aspects of the Euopean General Data Protection – it is positive to sharing of research data but it sets out a number of legal requirements for this to happen. The Data Controller is responsible to ensure that these are met by recipent - and so we need to broker trust. (Just like in the moblie phone case – you go roaming and AT&T trusts that vodaphone will reimburse. Vodafone trusts that you pay the bills – and you display trustworthiness via credit rating and signing a contract.) We also need the reference databases - lighthouses that help us navigate an ocean of data. BRCA exchange – but also the genbanks, genbuilds, nomenclature commitees without which it is impossible to do genomics research.
  • #32 Largest of all ESFRI RIs The output and activities to the
  • #35 In production since January 2017 (running 08/2016) 292 Identity providers 584 ELIXIR identities in 101 groups 16 relying ELIXIR services ELIXIR AAI credential accepted Including two private companies that provide cloud brokering services (nationally)