Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Open Data and Cross Disciplinary Research - EUDAT Summer School (Brian Matthews, EOSC)

1,096 views

Published on

The European Open Science Cloud (EOSC) has become a driving force behind the current evolution of e-Infrastructure to support research. The EOSC offers the vision of an integrated ecosystem of data, services and expertise providing a common platform for open cross-community research in Europe and beyond. In this session, I shall consider the aims of the EOSC and discuss some the opportunities it offers, and barriers it needs to overcome to realise the vision. I shall introduce the EOSC-Pilot project which is aiming to pave the way towards the EOSC by exploring the opportunities and barriers, and proposing how the EOSC should evolve, both technically, including its architecture, and organisationally, including how it should be managed. Participants will be invited to consider what the issues of the EOSC are and how it might affect their own domain.
Visit: https://www.eudat.eu/eudat-summer-school

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Open Data and Cross Disciplinary Research - EUDAT Summer School (Brian Matthews, EOSC)

  1. 1. www.eudat.eu EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 Open Data and Cross-disciplinary Research: The European Open Science Cloud Brian Matthews Science and Technology Facilities Council
  2. 2. EUDAT Summer School, 3-7 July 2017, Crete Contents 1. Why Open Science ? 2. Research Infrastructures and open science 3. Towards the European Open Science Cloud 4. What do we need to do build an EOSC ? 5. The EOSC Pilot Project
  3. 3. EUDAT Summer School, 3-7 July 2017, Crete OPEN SCIENCE 03/07/17
  4. 4. EUDAT Summer School, 3-7 July 2017, Crete Open Science Open science is the movement to make scientific research, data and dissemination accessible to all levels of an inquiring society, amateur or professional. (Wikipedia). 03/07/17
  5. 5. EUDAT Summer School, 3-7 July 2017, Crete Open Science not new http://www.darwinproject.ac.uk/darwins-letters
  6. 6. EUDAT Summer School, 3-7 July 2017, Crete The Age of the Journals But the world became too big Journals became the main mechanism for scientific communications • Printing • Quality control • Dissemination • Priority • Permanent record • Credit Worked pretty well for ~100 years - Particularly since 50s Main basis of evaluation - Citation and impact factors But it is a narrow, controlled viewpoint
  7. 7. EUDAT Summer School, 3-7 July 2017, Crete Disruptive Technology Computing technology has changed the way people do research • Generating large amounts of data • Aggregating large amounts of data • Processing large amounts of data • Visualising large amounts of data Changed the way people talk about research • Email, websites, newsgroups, blogs, social media, presentations … Open science offers new ways to do science. • Meet the new challenges – handling and processing large amounts of data • Back to the “old ways” – but on a much larger scale
  8. 8. EUDAT Summer School, 3-7 July 2017, Crete Aspects of Open Science 03/07/17
  9. 9. EUDAT Summer School, 3-7 July 2017, Crete Why Open Science? Opportunities for Data Exchange (ODE) EC FP7 Project: 2010-12 Workshops and interviews Conceptual model Drivers, barriers, enablers to data sharing R. Darby, S. Lambert, B. Matthews, M. Wilson, K. Gitmans, S. Dallmeier- Tiessen, S. Mele, J. Suhonen Enabling Scientific Data Sharing and Re-use. IEEE Conf. on E-Science, Chicago, Oct 2012. http://www.alliancepermanentaccess.org/index.php/communit y/current-projects/ode/
  10. 10. EUDAT Summer School, 3-7 July 2017, Crete Drivers for Open Science • Better scrutiny of research • Validation and verification • Opening up peer review • Reproducing results • 70% of researchers failed to reproduce others experiments (Nature, May 16 https://www.nature.com/polopoly_fs/1.19970!/menu/main/topColumns/topLeftColumn/ pdf/533452a.pdf ) • Prevalence of irreproducible preclinical research exceeds 50% (PLOS Biology 2015, https://doi.org/10.1371/journal.pbio.1002165 ) • Confidence in the scientific method 03/07/17
  11. 11. EUDAT Summer School, 3-7 July 2017, Crete Drivers to Open Science • Better Reuse of research • Easier to Find, Access, Interoperate, Reproduce data • Not regenerating data needlessly • Can try data in new situations • Multidisciplinary science • Public funded research belongs to the public More science Impact Funders see it as a way of getting more Research for the same money 03/07/17
  12. 12. EUDAT Summer School, 3-7 July 2017, Crete Barriers to open science? • Availability of a Sustainable Data Management Infrastructure • And expertise • And ease of use • http://cameronneylon.net/blog/as-a-researcher-im-a-bit-bloody-fed-up-with- data-management/ • Not knowing where data is • Not being able to access it • Not being able to understand it sufficiently to reuse. • Trustworthiness of the data, • Data Usability, • Finance • Funding • Legislation/Regulation 03/07/17
  13. 13. EUDAT Summer School, 3-7 July 2017, Crete Cultural Barriers to Data Sharing Publisher Practises: Journal articles do not describe available data as a publication Data not recognised as a citable publication Lack of data reviewers to assess data quality Personal data confidentiality Anonymity of subjects in medical and social science in particular Perceived conflicts between data protection and FOI Thus unrestricted data access has ethical implications Research Assessment Publication and citation of data not tracked Not counted as part of performance evaluation for careers Academic Defensiveness Fear that others will benefit from their data and gain priority for results Fear that their results will not be validated Fear that misuse of data will harm the data contributor Fear that use of data to support arguments the data contributor disagrees with
  14. 14. EUDAT Summer School, 3-7 July 2017, Crete INFRASTRUCTURES FOR OPEN SCIENCE 03/07/17
  15. 15. EUDAT Summer School, 3-7 July 2017, Crete WLCG: a Global Infrastructure 15 Varied distributed data model for multi- petabyte datasets. Either: 1. Move, cache and locally process 2. Remote data access (AAA or FAX) 3. Hybrid of 1&2 (mainly cached) 4. Event put services for opportunistic HPC and cloud computing Which is used depends on many factors but ever growing exploitation of wide area network use of remote data access 30GB/s Global Collaboration • 42 countries • 170 computer centres • 300PB disk • 380PB Tape • 400,000 cores (1 usable exaflop?) • >2 million jobs/day LHC Data placement service
  16. 16. EUDAT Summer School, 3-7 July 2017, Crete ... to construct and operate a shared data infrastructure for Photon and Neutron laboratories... Neutron diffraction X-ray diffraction High-quality structure refinement • Common data catalogue • Integration of users data from different facilities • Track provenance of data through analysis stages • Deploy standards for long-term curation • Support scalability through parallelisation • Deploy infrastructure in three different techniques Open Data Infrastructure (Nov 11–Apr 14)
  17. 17. EUDAT Summer School, 3-7 July 2017, Crete PaN-Data Integration Shared Data Policy Framework Federated User Authentication Federated Data Catalogue Common Data Format NeXus Common data environment, common user experience
  18. 18. EUDAT Summer School, 3-7 July 2017, Crete03/07/17
  19. 19. EUDAT Summer School, 3-7 July 2017, Crete03/07/17 ELIXIR connects national bioinformatics centres and EMBL-EBI into a sustainable European infrastructure for biological research data
  20. 20. EUDAT Summer School, 3-7 July 2017, Crete ADVOCACY AND POLICY 03/07/17
  21. 21. EUDAT Summer School, 3-7 July 2017, Crete RCUK Principles on Data Policy Common Principles 1. Public good 2. Preservation 3. Discoverability 4. Confidentiality 5. First use 6. Recognition 7. Costs A tension between these principles http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx Publicly funded research data are a public good, produced in the public interest, which should be made openly available with as few restrictions as possible in a timely and responsible manner that does not harm intellectual property. RCUK recognises that there are legal, ethical and commercial constraints on release of research data. To ensure that the research process is not damaged by inappropriate release of data, research organisation policies and practices should ensure that these are considered at all stages in the research process
  22. 22. EUDAT Summer School, 3-7 July 2017, Crete Context • G8 Ministerial Communiqué, 2013 “… [publically funded] scientific research data should be open…” G7 Ministerial Communiqué, October 2015 Research Data Alliance Open-data Science Environment EC Communication on European Science Cloud Initiatives, 19th April 2016 European Open Science Cloud (EOSC) European Data Infrastructure (EDI) High Level Expert Group on EOSC
  23. 23. EUDAT Summer School, 3-7 July 2017, Crete03/07/17
  24. 24. EUDAT Summer School, 3-7 July 2017, Crete Why Europe is not fully tapping into the potential of data: Data not always open and lack of incentives and rewards for data sharing Lack of interoperability required for data sharing … noting deep-rooted walls between disciplines. Fragmentation between data infrastructures that are split by scientific and economic domains, countries and governance models Surging demand for High Performance Computing at a scale above single member state resources Data reuse employing advance analysis techniques adequate protection of personal data considering forthcoming revision of Copyright legislation.
  25. 25. EUDAT Summer School, 3-7 July 2017, Crete Proposed a European Open Science Cloud Make all scientific data produced by the Horizon 2020 programme open by default. Raise awareness and change incentive structures for academics industry and public services to share their data. Develop specification for interoperability and data sharing across disciplines and infrastructures Create a fit-for-purpose pan-European governance structure to federate scientific data infrastructures and overcome fragmentation. Develop cloud based services for Open science supported by the necessary data infrastructure Enlarge the scientific user base to researchers and innovators from all disciplines.
  26. 26. EUDAT Summer School, 3-7 July 2017, Crete High Level Expert Group for the "European Open Science Cloud". http://ec.europa.eu/research/openscience/pdf/hleg/hleg-eosc-first-report_(draft).pdf
  27. 27. EUDAT Summer School, 3-7 July 2017, Crete Definitions European: research and innovation are global - EOSC cannot be built exclusively in and for Europe Europe, is in a strong position to lead this initiative as already distributed and collaborative Open: not all data and tools can be open. E.g. confidentially and privacy. Open is also often confused with ‘for free'. Free data and services do not exist. Intelligently open is what we mean, Science: explicitly includes all disciplines including the arts and humanities, Also societal innovation and productivity, support broad societal participation in Open Innovation and Open Science. Cloud: It can be misinterpreted to indicate that the EOSC is mostly about hard ICT infrastructure But it is much more a commons of data, software, standards, expertise and policy related to data-driven science and innovation.
  28. 28. EUDAT Summer School, 3-7 July 2017, Crete Evolution of infrastructure
  29. 29. EUDAT Summer School, 3-7 July 2017, Crete WHAT DO WE NEED TO BUILD AN EOSC? 03/07/17
  30. 30. EUDAT Summer School, 3-7 July 2017, Crete Technical Challenges: developing technical solutions that meet the scientific needs 31 EOSCpilot Challenges Scientific Challenges are really Opportunities Technical Challenges are Barriers to overcome Cultural Challenges are also Barriers Scientific Challenges: deploying the EOSC to deliver Open Science Cultural Challenges: adopting new, more open ways of working Three types of challenges addressed by the EOSCpilot:
  31. 31. EUDAT Summer School, 3-7 July 2017, Crete Challenges : Interoperability Accessing and understanding data within and across disciplines Interoperability of data, tools and services ‒ Common services, common APIs, service catalogues ‒ Common formats, common metadata ‒ Persistent Identifiers: constancy of reference for data, people, software, things … Deepening understanding ‒ Context and provenance: assessing the quality of data ‒ Comparison between experiment/observation and simulation; ‒ Preserving the record of science ‒ Reproducible Science Working internationally ‒ crossing borders and communities, ‒ across the world 32
  32. 32. EUDAT Summer School, 3-7 July 2017, Crete Challenges : Social and cultural Changing culture to make the most of open science Sharing of data and services ‒ Data: “You can use mine” ‒ Services : “I will use yours” Developing Skills ‒ Data Scientists : data engineers, data custodians, data analysts ‒ Expertise in quality software engineering Credit where credit’s due ‒ Recognition for sharing ‒ Recognition for contributing ‒ Rewards should follow the contribution 33
  33. 33. EUDAT Summer School, 3-7 July 2017, Crete Challenges : Infrastructure Accessing shared resources to realise the promise of data intensive open science Accessing Data ‒ Storing, accessing and integrating data at scale: common data centres and services ‒ Moving data at scale: limitations of networks ‒ Keeping data for the long-term: digital preservation Accessing Compute ‒ Access to scarce large-scale computing architectures (HPC, HTC, HPDA) ‒ Co-location of data and compute ‒ Cloud interfaces and Virtual Research Environments ‒ User identity and trusted work-spaces Accessing Software ‒ Complex code for computational modelling and simulation ‒ Adapting code to large-scale computing architectures ‒ Data analysis algorithms becoming more sophisticated ‒ Sustainability of software for the long-term www.eoscpilot.eu 34
  34. 34. EUDAT Summer School, 3-7 July 2017, Crete So what do we need to do?• Bring the current Research Infrastructures together • We do not want to replace their work • Bring the e-Infrastructure projects together • GEANT , PRACE • EGI, EUDat, OpenAire • Open up their services • Catalogue of services • Allow people to select services to build new infrastructures • Open up their data • FAIR services • Interoperable standards and metadata • Allow new resources to be added • Cloud providers, HPC providers, data providers • Within the common governance and resourcing processes • Need some set of core services and processes to hold the EOSC together 03/07/17
  35. 35. EUDAT Summer School, 3-7 July 2017, Crete03/07/17
  36. 36. EUDAT Summer School, 3-7 July 2017, Crete EOSC PILOT 03/07/17
  37. 37. EUDAT Summer School, 3-7 July 2017, Crete EOSC-Pilot Project Setting the EOSC in the right direction First of the EOSC projects 10M€ over 2 years • Jan 2017 – Dec 2018 33 Partners + 15 3rd parties • Led by STFC • A range of e-Infrastructure providers, research institutes, research consortia, across disciplines. • EGI, EUDat, OpenAire, PRACE, GEANT • ELIXIR, ICOS, ECRIN, BBMRI, DESY, CERN, XFEL, CEA • STFC, CNR, DANS, DCC, BSC, MPG, CNRS Try to answer some basic questions • What is the EOSC going to provide? • How is the EOSC going to operate ? • How is the EOSC going to change how science is done ? www.eoscpilot.eu 38
  38. 38. EUDAT Summer School, 3-7 July 2017, Crete EOSCpilot: High Level Aims The EOSCpilot project will support the first phase in the development of the EOSC. It will Establish the governance framework for the EOSC and contribute to the development of European open science policy and best practice; Develop a number of demonstrators functioning as high-profile pilots that integrate services and infrastructures to show interoperability and its benefits in a number of scientific domains; Engage with a broad range of stakeholders, crossing borders and communities, to build the trust and skills required for adoption of an open approach to scientific research. (More detailed objectives later)
  39. 39. EUDAT Summer School, 3-7 July 2017, Crete40 Workpackages 1. Governance • Propose a governance framework 2. Policy • Devise a policy environment 3. Demonstrators • Use real demonstrators to drive the requirements for the EOSC 4. Services • Specify service architecture, catalogue and pilot services 5. Interoperability • Identify interfaces and standards to drive interoperability 6. Skills • Specify a skills and competencies framework for the EOSC 7. Engagement • involve as many stakeholders as possible.
  40. 40. EUDAT Summer School, 3-7 July 2017, Crete Science Demonstrators First 5 Demonstrators • Environmental & Earth Sciences - ENVRI Radiative Forcing Integration to enable harmonised data access and integration across multiple research communities • High Energy Physics - WLCG: large-scale, long-term preservation and re-use of HEP data in the EOSC open to other researchers • Humanities – TEXTCROWD: Collaborative semantic enrichment of text-based datasets by make new software available on the EOSC. • Life Sciences - Pan-Cancer Analyses & Cloud Computing within the EOSC to accelerate genomic analysis on the EOSC • Physics - The photon-neutron community to improve the community’s computing facilities by creating a virtual platform for all users www.eoscpilot.eu 41
  41. 41. EUDAT Summer School, 3-7 July 2017, Crete 2nd Set of Demonstrators • HPCaaS for Fusion - Culham Science Centre, UK • Life Science Leveraging EOSC to offload updating and standardizing life sciences datasets and to improve studies reproducibility, reusability and interoperability- CRG, Spain • Seismology: EPOS Virtual Earthquake and Computational Earth Science e-science environment in Europe- University of Liverpool, UK • CryoEM Linking distributed data and data analysis resources as workflows in Structural Biology with cryo-Electron Microscopy: Interoperability and reuse CSIC, Spain • Astronomy Open Science Cloud access to LOFAR data - ASTRON, NL • 5 more demonstrators to be selected in the autumn. 03/07/17
  42. 42. EUDAT Summer School, 3-7 July 2017, Crete The Governance framework will: • enable and encourage engagement from the key stakeholder communities: European e-Infrastructures, Data and Research Initiatives, Service and cloud providers, Research funders, Research Communities and Institutions, Research Infrastructures, Policy makers. • enable interoperability and co-ordination within a number of different domains: legal interoperability, interoperability of organisational processes, technical interoperability, operational interoperability, data and information interoperability Governance: Approach
  43. 43. EUDAT Summer School, 3-7 July 2017, Crete Undertaken: Stakeholder mapping exercise Progressing with a framework which will help conceptualise the range of stakeholders and interoperability objectives Assessing different governance approaches across these, and how these may fit together. Next Steps: Planning to have a strawman framework late summer To gather feedback from a broader community Feedback initially via online tools and forum Then via workshops, including EOSCpilot stakeholder event at end Nov. Governance: Status and Next Steps
  44. 44. EUDAT Summer School, 3-7 July 2017, Crete Service Infrastructure for the EOSC • EOSC Architecture • “Systems of Systems” approach • EOSC Service Portfolio • Rules of Engagement • Service demonstrators • with the Science Demos • EGI and EUDat Services 03/07/17
  45. 45. EUDAT Summer School, 3-7 July 2017, Crete Interoperability • Service interoperability ‒gap analysis of service frameworks • Data interoperability • Recommendations on how to make data interoperable in the EOSC ‒exploring how FAIR principles apply to EOSC. ‒Base line interoperability metadata ‒Schema.org 03/07/17
  46. 46. EUDAT Summer School, 3-7 July 2017, Crete Reasons for GAPs Gap1: Diversity and incompatibility of the AAIs Gap5: Low awareness of the e- infrastructure s and services Gap2: Network services Gap4: Diversity of access policies Gap3: Diversity of services and providers Gap6: Lack of expertise, training, easy tools, human networks Service Interoperability: Gap Analysis
  47. 47. EUDAT Summer School, 3-7 July 2017, Crete Bridging the GAPs Gap1: Global AAI Gap5: Common vocabulary, global services catalogue, dissemination Gap2: Network services improvement Gap4: Multidisciplinary mutualised space Gap3: Services technical interoperability Gap6: Foster adoption, expertise sharing, user friendly tools, human networks Service Interoperability: Bridging the gaps.
  48. 48. EUDAT Summer School, 3-7 July 2017, Crete Operational and managerial independence each system is independent and it achieves its purposes by itself and for its own objective rather than for the purposes of the SoS Geographical distribution a SoS is distributed over a large geographic extent Emergent behavior a SoS has capabilities and properties that do not reside in the component systems Evolutionary development a SoS evolves with time and experience Heterogeneity of constituent systems a SoS consists of multiple, heterogeneous, operating systems embedded in networks at multiple levels Components: Existing and emerging RIs, e-Infras, data repositories, registries,… Architecture: Systems of Systems Approach The EOSC needs to be developed as a data infrastructure commons: • an eco-system of infrastructures • building on existing capacity and expertise where possible
  49. 49. EUDAT Summer School, 3-7 July 2017, Crete As-a-service provision mode a work in progress Architecture: schematic
  50. 50. EUDAT Summer School, 3-7 July 2017, Crete Skills and Training Need to identify • what skills individuals need to have to work with the EOSC. • What competencies Organisations should have to effectively take part in the EOSC Some first recommendations • Produce FAIR training material • Provide “training-as-a-service” • highlight the relevance of enabling and rewarding data skills development. • refine a Skills Framework in the development of careers and expertise in data stewardship 03/07/17
  51. 51. EUDAT Summer School, 3-7 July 2017, Crete Next Steps EOSC Pilot goes on to December 2018 - Governance and Policy Framework - Service architecture, portfolio, rules of engagement, trials - Interoperability recommendations and trials - Training and capability recommendations EC setting the governance and funding framework with the EC. Next set of EOSC Projects announced soon: - Setting the core service provision EC 2018-21 research infrastructure workprogramme built around EOSC. 03/07/17
  52. 52. EUDAT Summer School, 3-7 July 2017, Crete Upcoming events D4IR conference 30 Nov-1 Dec, Brussels www.eoscpilot.eu The European Open Science Cloud for 53
  53. 53. EUDAT Summer School, 3-7 July 2017, Crete Thank you! Brian.Matthews@stfc.ac.uk

×