Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Secure data management, analysis, infrastructure and policy in an international context


Published on

Presentation from Dr Steven Newhouse at the Human Brain Project conference 2018 on data governance in international neuro-ICT collaborations

Published in: Science
  • Be the first to comment

  • Be the first to like this

Secure data management, analysis, infrastructure and policy in an international context

  1. 1. Secure data management, analysis, infrastructure and policy in an international context Steven Newhouse Head of Technical Services, EMBL- EBI
  2. 2. International Collaborative Data Analysis • Distributed data generation • Distributed data analysis • Distributed (in)formal governance • Increasingly sensitive data • Increasingly valuable analysis resources • Increasingly moving closer to production
  3. 3. Some Examples • Worldwide Large Hadron Collider Computing Grid (WLCG) • A worldwide federation of federated sites • EMBL-EBI and ELIXIR • Infrastructure to support multiple communities • Global Alliance for Genomics and Health (GA4GH) • International collaboration to support
  4. 4. WLCG Collaboration WLCG Workshop, Manchester 19 June 2017 4 April 2017: - 63 MoU’s - 167 sites; 42 countries 985 PB Storage  395 PB disk  590 PB tape
  5. 5. Security Policy & Operations in e- Infrastructures • e-Infrastructures: • Generally federation of clusters/clouds in research community • Structured geographically nationally and/or regionally • Make local resources available to remote users • Build trust around common policies • Site Security Policy: What a site commits to • Acceptable Use Policy: What a user commits to • Security Operations • Monitor use to contain & eliminate any security breach
  6. 6. WISE • WISE: Wise Information Security for e-Infrastructures • Community activity driven by the e-Infrastructures • Supporting user communities that span e-Infrastructures • Active Working Groups • Security for Collaborating e-Infrastructures • Security Training and Awareness • Risk Assessment • Security in Big and Open Data
  7. 7. Security for Collaborating e-Infrastructures Build a trust framework to enable interoperation between e- Infrastructures and to manage cross-infrastructure security risks • Manage risk through mitigation & counter measures • Minimise impact of a security incident • Identify the cause of incidents to stop repeats • Identify users & services to control access to resources
  8. 8. Building trust by exposing maturity • Expose Maturity across different Capabilities • Operational Security, Incident Response & Traceability • Participant Responsibilities • Data Protection • Capability Maturity Levels • 0: Not implemented for critical services • 1: Implemented for critical services but not documented • 2: Implemented and documented for critical services • 3: Implemented, documented and reviewed
  9. 9. EMBL sites – over 1600 people and more than 80 nationalities Structural biology Hamburg Life sciences Heidelberg Epigenetics and neurobiology Rome Bioinformatics Cambridge (EMBL-EBI) Structural biology Grenoble Tissue biology and disease modelling Barcelona
  10. 10. Data Resources at EMBL-EBI Literature & ontologies • Experimental Factor Ontology • Gene Ontology • BioStudies • Europe PMC Chemical biology • ChEBI • ChEMBL • SureChEMBL Molecular structures • Protein Data Bank in Europe • Electron Microscopy Data Bank Gene, protein & metabolite expression • Expression Atlas • Metabolights • PRIDE • RNA Central Protein sequences, families & motifs • InterPro • Pfam • UniProt Genes, genomes & variation • Ensembl • Ensembl Genomes • GWAS Catalog • Metagenomics portal Systems • BioModels • BioSamples • Enzyme Portal • IntAct • Reactome Molecular Archives • European Nucleotide Archive • European Variation Archive • European Genome-phenome Archive • ArrayExpress
  11. 11. ~25 million requests to EMBL-EBI websites every day Big Data, Big Demand Scientists at over 5 million unique sites use EMBL-EBI websites 200 petabytes of scientific data managed by EMBL
  12. 12. Storage Use Cases are Evolving • Evolving away from ‘simple’ archiving • Challenge used to be scale, now tackling diversity • Not just diversity in type, but diversity in access • Common use case • Public data embargoed before publication • Hosting sensitive data • European Genome-phenome Archive (EGA) • Analysing sensitive data • Formal access to named individuals for specific research goals
  13. 13. Classifying and controlling the data • What data do we store? • Personal, Scientific Research, Administrative, Professional, Private • How sensitive is the data? • Controlled, Confidential, Restricted, Public • What are the storage options? • ‘Vault’, Managed, Standard, Any Cloud, EU Cloud, Hosting End up with a matrix describing what can go where!
  14. 14. Data Sensitivity Classification Data Type On Site (inc. Embassy Cloud) Off-Site Confidential or Controlled Restricted Restricted Public or Controlled Public Confidential or Controlled Restricted Restricted Public or Controlled Public Scientific Research Vault (as contains Personal Data) Managed Standard EMBL Hosting EMBL Hosting or as specified by the Data Access agreement Any Professional N/A Standard Standard N/A EMBL Hosting Any Administrative SAP Facility (as contains Personal Data) Managed Standard EMBL Hosting EMBL Hosting EMBL Hosting Private Standard Standard Standard Any Any Any Personal Only as part of the Vault (Scientific Data) or SAP Facility Administrative Data) EMBL Hosting
  15. 15. European Genome-Phenome Archive • Data hosted by EMBL-EBI and CRG • Several PB and growing • Data sets managed through individual Data Access Committees • EMBL-EBI data stored in the ‘vault’ • Isolated network area in ISO27K leased data centre space • Requires 2 factor auth to access • Data encrypted at rest • Data released to specific individuals • Encrypted with unique individual key
  16. 16. ELIXIR – Research Infrastructure for Life Science 16 • Compute Access, Exchange & Compute on sensitive data • Data Sustain core data resources • Tools Services & connectors to drive access and exploitation • Standards Integration and interoperability of data and services. • Training Professional skills for managing and exploiting data
  17. 17. ELIXIR: European Open Science Cloud • Cloud activities to support BMS Research Infrastructures • Commercial cloud providers: Helix Nebula Science Cloud, … • Community cloud providers: EMBL-EBI, CSC, de.NBI, … • Sensitive data may have complex requirements • Not to leave institution or legal jurisdiction • National legal requirements • Specific data protection requirements • Compile maturity matrix around key security features • Map user requirements to complient cloud providers
  18. 18. Place photo here: 2000px x 595px @ 72 dpi
  19. 19. 19 Data Security Work Stream
  20. 20. Overview • Data Security Work Stream helps assess security risk assessments associated with new GA4GH standards 20 At Project Start: Assessment of security risks associated with use case(s) to be addressed Prior to Standard Release: Assessment of how standard has addressed identified risks, and identification of residual risk Work Stream Standards-Development Activity Timeline
  21. 21. Breach Response Strategy • Projected timeline: Begun at 2017 Plenary, projected end date TBD • Milestones 1) Write Scope and Principles document 2) Inventory practices in place with Driver Projects 3) Define a policy for sharing breach data 4) Develop protocol for sharing breach data 5) Define strategy for responding to breaches associated with GA4GH standards
  22. 22. Authentication and Authorization Infrastructure (AAI) • Projected timeline: Identification/authentication development begun in 2017; end date TBD • Milestones 1) Document OpenID Connect profile developed for and implemented by ELIXIR Beacons 2) Define authorization use cases 3) Document standard GA4GH OAuth 2.0 authorization profile for RESTful APIs
  23. 23. Linkages with Other Work Streams • Breach Information Exchange protocol will be informed by legal, regulatory, and ethical guidance provided by Regulatory and Ethics Work Stream • AAI profiles will consume vocabulary and ontology developed by Data Use and Research Identities (DURI) Work Stream • AAI use cases will be based on APIs being defined by Genomic Knowledge Sharing (GKS), Clinical and Phenotypic Data Capture, and Discovery Work Streams
  24. 24. Conclusions • One size does not fit all • But there are some common approaches that can be adopted • Challenge is to build scalable trust networks • ‘Tea and biscuits’ strategy • Having confidence in those running sites & services • Security is just one aspect of data protection • Understand the data and what you are protecting it from