Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DIRISA for Open Data and Open Science/Anwar Vahed

48 views

Published on

Presented during the SA-EU Dialogue Facility, 15-16 May 2018.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

DIRISA for Open Data and Open Science/Anwar Vahed

  1. 1. DIRISA for Open Data and Open Science SA-EU Open Science Dialogue Project 15 May 2018
  2. 2. National Integrated Cyberinfrastructure System National Integrated Cyberinfrastructure System (NICIS) • Advanced Integrated cyber platform offering services for • HPC • Data • Networking • Priority science and education • Overarching coordination implemented by CSIR © CSIR, 2018 2 Core services Networked resources Skills&expertise Computing Services (CHPC +) Networking Services (SANReN) Data Services (DIRISA) e-Research environments (Cloud) Materials & Manuf. Energy Earth & Environment Phy Sci & Eng. Humans & Society Health, Bio & Food
  3. 3. DIRISA Objectives 3 Build national data infrastructure • Build and maintain Tier 1 nodes and services • Start Tier 2 domain nodes Develop human capital and skills • e-Science postgraduate programs • Conferences and training workshops Research data management • DMP tool • PID service • User policies and practices Advocate and coordinate • R&D initiatives • Stakeholder workshops © CSIR, 2018
  4. 4. National Data Infrastructure © CSIR, 2018 4 • “I just want to store/preserve my data (reliably)” • “I just want to share my data (in a controlled way)” • “I just want to process my data” • NICIS-DIRISA role: • Link into Tier 0 • Build and maintain Tier 1 • Support starting up Tier 2 • Link into Tier 3 One-Stop-Shop: Federated access to research data
  5. 5. Underpinning Open Data & Open Science © CSIR, 2018 5 1. National infrastructure and services for Open data • DIRISA Tier 1 (8PB) store & Research Data Management services • Regional Tier 2 Node 2. Human capital development • National e-Science Masters • Data Science training 3. Data management • PID Allocation: Handle and DOI registries • SA_DMP: SA Data Management Planning tool • Policies across data life cycle 4. Outreach and coordination • Conferences and workshops: SA Data Conference 19-21 June • Africa Open Science Platform • Big Data strategy
  6. 6. © CSIR, 2018 6 South African National Data Infrastructure (SANDI) DSubscribe • Subscribe as DIRISA user DataDrop • Deposit and store data reliably FindGet • Discover, download data sets SafeShare • Safely share data with users DataStage • Prepare data for HPC User documentation Help & support Core services (DMP, PID) Phase1: Research Data Management • My data management plans • My workflows • My data sets and outputs • My communities Phase2: Collaborative Research Environments (References: EUDAT, ANDS, JISC, Data.gov)
  7. 7. Data Access Spectrum: Open by default 7 Small – Medium – Big data Personal – Business – Government Closed Shared Open Internal access • Private • Confidential • Sensitive • Surveillance data Named access • Assigned by contract • Regulation authorised • Drivers licences Group based access • Project assigned • Selected membership • Genomic data Public access • Licence that limits use • Terms and conditions • Geospatial data Anyone • Open to public • No limits on use • Weather data (ODI)
  8. 8. Actions Data custodians/stewards: individuals; institutions; groups/consortia; government; business • Advocate and promote: Increase visibility and benefits of open data • Clarify Open data and related concepts: Governance, Stewardship, Custodianship, IP, Copyright,… • Change accreditation model: data citation recognition, altmetrics, etc • Develop policies: institutional strategies, standards, protocols, principles and recommended practices • Change training: Include Open data concepts in (data) science curricula • Funding model: incentives/requirements for Open data principles • Harmonise privacy and openness regulation © CSIR, 2018 8
  9. 9. Beyond FAIR • FAIR: Findable, Accessible, Interoperable, Reusable • FAIReR: FAIR + Reproducible • FAIReST: FAIR + Stewardship + Transparency (truth and trust) [Liz Lyon, University of Pittsburgh] Open Science => New roles… © CSIR, 2018 9
  10. 10. Thank you © CSIR, 2018 10 www.dirisa.ac.za dirisa@csir.co.za
  11. 11. © CSIR, 2018 11
  12. 12. © CSIR, 2018 12
  13. 13. Accelerating Data Intensive Research © CSIR, 2018 13 “We need to get greater value (benefit, impact) from our investments in data”
  14. 14. Architecture © CSIR, 2018 14 • Open (FAIR) Data & Open Science • Federated locally and globally (“One-stop- shop” catalogue) • Certified as Trusted Repository • Linked to funder systems • Suite of services for RDM and data intensive analytics 40 PB 2 PB Archival data & staging: VM access 8 PB Active data: near real time interactive access 0.5 PB Services & staging between DIRISA and CHPC storage systems Storage Virtualisation ServerCHPC Lustre or Posix storage systems CHPC compute systems * PB Software defined storage hierarchySmall, fast Big, slow iRODS DIRISA cloud portal

×