DIRISA for Open Data and
Open Science
SA-EU Open Science Dialogue Project
15 May 2018
National Integrated Cyberinfrastructure System
National Integrated
Cyberinfrastructure System
(NICIS)
• Advanced Integrated cyber
platform offering services for
• HPC
• Data
• Networking
• Priority science and
education
• Overarching coordination
implemented by CSIR
© CSIR, 2018 2
Core
services
Networked
resources
Skills&expertise
Computing
Services
(CHPC +)
Networking
Services
(SANReN)
Data
Services
(DIRISA)
e-Research environments (Cloud)
Materials & Manuf.
Energy
Earth & Environment
Phy Sci & Eng.
Humans & Society
Health, Bio & Food
DIRISA Objectives
3
Build national data infrastructure
• Build and maintain Tier 1 nodes and services
• Start Tier 2 domain nodes
Develop human capital and skills
• e-Science postgraduate programs
• Conferences and training workshops
Research data
management
• DMP tool
• PID service
• User policies
and practices
Advocate and
coordinate
• R&D initiatives
• Stakeholder
workshops
© CSIR, 2018
National Data Infrastructure
© CSIR, 2018 4
• “I just want to store/preserve my data (reliably)”
• “I just want to share my data (in a controlled way)”
• “I just want to process my data”
• NICIS-DIRISA role:
• Link into Tier 0
• Build and maintain Tier 1
• Support starting up Tier 2
• Link into Tier 3
One-Stop-Shop:
Federated access to
research data
Underpinning Open Data & Open Science
© CSIR, 2018 5
1. National infrastructure and services for Open data
• DIRISA Tier 1 (8PB) store & Research Data Management services
• Regional Tier 2 Node
2. Human capital development
• National e-Science Masters
• Data Science training
3. Data management
• PID Allocation: Handle and DOI registries
• SA_DMP: SA Data Management Planning tool
• Policies across data life cycle
4. Outreach and coordination
• Conferences and workshops: SA Data Conference 19-21 June
• Africa Open Science Platform
• Big Data strategy
© CSIR, 2018 6
South African National Data Infrastructure (SANDI)
DSubscribe
• Subscribe
as DIRISA
user
DataDrop
• Deposit
and store
data
reliably
FindGet
• Discover,
download
data sets
SafeShare
• Safely
share data
with users
DataStage
• Prepare
data for
HPC
User documentation Help & support Core services (DMP, PID)
Phase1: Research Data Management
• My data
management plans
• My workflows
• My data sets and
outputs
• My communities
Phase2:
Collaborative
Research
Environments
(References: EUDAT, ANDS, JISC, Data.gov)
Data Access Spectrum: Open by default
7
Small – Medium – Big data
Personal – Business – Government
Closed Shared Open
Internal access
• Private
• Confidential
• Sensitive
• Surveillance data
Named access
• Assigned by
contract
• Regulation
authorised
• Drivers licences
Group based
access
• Project assigned
• Selected
membership
• Genomic data
Public access
• Licence that
limits use
• Terms and
conditions
• Geospatial data
Anyone
• Open to public
• No limits on use
• Weather data
(ODI)
Actions
Data custodians/stewards: individuals; institutions; groups/consortia;
government; business
• Advocate and promote: Increase visibility and benefits of open data
• Clarify Open data and related concepts: Governance, Stewardship,
Custodianship, IP, Copyright,…
• Change accreditation model: data citation recognition, altmetrics, etc
• Develop policies: institutional strategies, standards, protocols, principles
and recommended practices
• Change training: Include Open data concepts in (data) science curricula
• Funding model: incentives/requirements for Open data principles
• Harmonise privacy and openness regulation
© CSIR, 2018 8
Beyond FAIR
• FAIR: Findable, Accessible, Interoperable, Reusable
• FAIReR: FAIR + Reproducible
• FAIReST: FAIR + Stewardship + Transparency (truth
and trust)
[Liz Lyon, University of Pittsburgh]
Open Science => New roles…
© CSIR, 2018 9
Thank you
© CSIR, 2018 10
www.dirisa.ac.za
dirisa@csir.co.za
© CSIR, 2018 11
© CSIR, 2018 12
Accelerating Data Intensive Research
© CSIR, 2018 13
“We need to get greater value (benefit, impact)
from our investments in data”
Architecture
© CSIR, 2018 14
• Open (FAIR) Data &
Open Science
• Federated locally and
globally (“One-stop-
shop” catalogue)
• Certified as Trusted
Repository
• Linked to funder systems
• Suite of services for RDM
and data intensive
analytics
40 PB
2 PB
Archival data & staging:
VM access
8 PB
Active data: near real time
interactive access
0.5
PB
Services & staging
between DIRISA and CHPC
storage systems
Storage
Virtualisation
ServerCHPC Lustre or
Posix storage
systems
CHPC
compute
systems
* PB
Software defined storage hierarchySmall, fast Big, slow
iRODS
DIRISA cloud portal

DIRISA for Open Data and Open Science/Anwar Vahed

  • 1.
    DIRISA for OpenData and Open Science SA-EU Open Science Dialogue Project 15 May 2018
  • 2.
    National Integrated CyberinfrastructureSystem National Integrated Cyberinfrastructure System (NICIS) • Advanced Integrated cyber platform offering services for • HPC • Data • Networking • Priority science and education • Overarching coordination implemented by CSIR © CSIR, 2018 2 Core services Networked resources Skills&expertise Computing Services (CHPC +) Networking Services (SANReN) Data Services (DIRISA) e-Research environments (Cloud) Materials & Manuf. Energy Earth & Environment Phy Sci & Eng. Humans & Society Health, Bio & Food
  • 3.
    DIRISA Objectives 3 Build nationaldata infrastructure • Build and maintain Tier 1 nodes and services • Start Tier 2 domain nodes Develop human capital and skills • e-Science postgraduate programs • Conferences and training workshops Research data management • DMP tool • PID service • User policies and practices Advocate and coordinate • R&D initiatives • Stakeholder workshops © CSIR, 2018
  • 4.
    National Data Infrastructure ©CSIR, 2018 4 • “I just want to store/preserve my data (reliably)” • “I just want to share my data (in a controlled way)” • “I just want to process my data” • NICIS-DIRISA role: • Link into Tier 0 • Build and maintain Tier 1 • Support starting up Tier 2 • Link into Tier 3 One-Stop-Shop: Federated access to research data
  • 5.
    Underpinning Open Data& Open Science © CSIR, 2018 5 1. National infrastructure and services for Open data • DIRISA Tier 1 (8PB) store & Research Data Management services • Regional Tier 2 Node 2. Human capital development • National e-Science Masters • Data Science training 3. Data management • PID Allocation: Handle and DOI registries • SA_DMP: SA Data Management Planning tool • Policies across data life cycle 4. Outreach and coordination • Conferences and workshops: SA Data Conference 19-21 June • Africa Open Science Platform • Big Data strategy
  • 6.
    © CSIR, 20186 South African National Data Infrastructure (SANDI) DSubscribe • Subscribe as DIRISA user DataDrop • Deposit and store data reliably FindGet • Discover, download data sets SafeShare • Safely share data with users DataStage • Prepare data for HPC User documentation Help & support Core services (DMP, PID) Phase1: Research Data Management • My data management plans • My workflows • My data sets and outputs • My communities Phase2: Collaborative Research Environments (References: EUDAT, ANDS, JISC, Data.gov)
  • 7.
    Data Access Spectrum:Open by default 7 Small – Medium – Big data Personal – Business – Government Closed Shared Open Internal access • Private • Confidential • Sensitive • Surveillance data Named access • Assigned by contract • Regulation authorised • Drivers licences Group based access • Project assigned • Selected membership • Genomic data Public access • Licence that limits use • Terms and conditions • Geospatial data Anyone • Open to public • No limits on use • Weather data (ODI)
  • 8.
    Actions Data custodians/stewards: individuals;institutions; groups/consortia; government; business • Advocate and promote: Increase visibility and benefits of open data • Clarify Open data and related concepts: Governance, Stewardship, Custodianship, IP, Copyright,… • Change accreditation model: data citation recognition, altmetrics, etc • Develop policies: institutional strategies, standards, protocols, principles and recommended practices • Change training: Include Open data concepts in (data) science curricula • Funding model: incentives/requirements for Open data principles • Harmonise privacy and openness regulation © CSIR, 2018 8
  • 9.
    Beyond FAIR • FAIR:Findable, Accessible, Interoperable, Reusable • FAIReR: FAIR + Reproducible • FAIReST: FAIR + Stewardship + Transparency (truth and trust) [Liz Lyon, University of Pittsburgh] Open Science => New roles… © CSIR, 2018 9
  • 10.
    Thank you © CSIR,2018 10 www.dirisa.ac.za dirisa@csir.co.za
  • 11.
  • 12.
  • 13.
    Accelerating Data IntensiveResearch © CSIR, 2018 13 “We need to get greater value (benefit, impact) from our investments in data”
  • 14.
    Architecture © CSIR, 201814 • Open (FAIR) Data & Open Science • Federated locally and globally (“One-stop- shop” catalogue) • Certified as Trusted Repository • Linked to funder systems • Suite of services for RDM and data intensive analytics 40 PB 2 PB Archival data & staging: VM access 8 PB Active data: near real time interactive access 0.5 PB Services & staging between DIRISA and CHPC storage systems Storage Virtualisation ServerCHPC Lustre or Posix storage systems CHPC compute systems * PB Software defined storage hierarchySmall, fast Big, slow iRODS DIRISA cloud portal