The African Open Science Platform
Presented by Susan Veldsman
Director: Scholarly Publishing Porgramme
Academy of Science of South Africa (ASSAf)
Research Infrastructure Workshop, 14 May 2018
Data Driven World
Trusted Research & Data
• Trust is at the centre of the process of science
• Trust research & researchers who have your
best interest at heart
• Build new research on existing research/data
• To be trusted, it needs to be managed
Square Kilometre Array (SKA)
• Data collection on a massive scale
• Telescope array to consist of 250,000 radio
antennas between Australia & SA
• Investment in machine learning and artificial
intelligence software tools to enable data analysis
• 400+ engineers and technicians in infrastructure,
fibre optics, data collection
• Supercomputers to process data (IBM)
• To come: super computer 3x times power of
world’s current fastest computer (Tianhe-2) to cope
with SKA data
Africa participation
• Botswana
• Ghana
• Kenya
• Madagascar
• Mauritius
• Mozambique
• Namibia
• Zambia
H3ABioNet (H3Africa)
30 institutions, 15 African countries, 2 partners
outside Africa
• African human genomic research; Central node at University of
Cape Town
• Using NetMap to monitor connectivity
• Data transfer: Africa Globus Online (668,622 files transferred
between Rhodes University & UCT; 140TB data transferred from
USA to SA
• Challenges: slow & unstable Internet, unreliable power supply,
continent-wide obsolete computer infrastructure that varies
between medium-scale server infrastructure to a small number of
workstations, with multiple operating systems, lack of centralized,
secure data storage
• Other: database of participants (H3APRDB, REDCap), data analysis
incl. Galaxy, Job Management System, eBiokits, REDCap,
WebProtege, Pipelines for data execution, data repository
(European Genome-Phenome Archive)
Open Science Defined
“Open Science is the practice of science in such a
way that others can collaborate and contribute,
where research data, lab notes and other
research processes are freely available, under
terms that enable reuse, redistribution and
reproduction of the research and its
underlying data and methods.” - FOSTER Project,
funded by the European Commission
Open Data Repositories (re3data - 16)
https://index.okfn.org/place/#map
African Open Science Platform
• Platform = opportunity to engage in dialogue,
create awareness, connect all, provide continental
view
• Funded by SA Dept. of Science & Technology
through National Research Foundation
• 3 years (1 Nov. 2016 – 31 Oct. 2019)
• Managed by Academy of Science of South Africa
(ASSAf)
• Through ASSAf hosting ICSU Regional Office for Africa
(ICSU ROA)
• Direction from CODATA
http://africanopenscience.org.za/
Accord on Open Data in a
Big Data World
• Values of open data in
emerging scientific culture
of big data
• Need for an international
framework
• Proposes comprehensive
set of principles
• FAIR Principles
• Provides framework & plan
for African data science
capacity mobilization
initiative
• Proposes African Platform
Call to Endorse
Key Stakeholders
• Global Network of Science Academies (IAP)
• International Council for Science (ICSU)
• The World Academy of Sciences (TWAS)
• Research Data Alliance (RDA)
• NRENs (Internet Service Providers for Education)
• Association of African Universities (AAU)
• Network of African Science Academies (NASAC)
• African Research Councils (incl. DIRISA, funders)
• African Universities
• African Governments
• Other
Database Experts & Initiatives
800+
Landscape Survey: Countries & Initiatives
567
Concentration of Activities
Click to view Initiatives/Country
https://www.targetmap.com/viewer.aspx?reportId=56245
Please note: this is just a preview and data still to be cleaned and
updated and corrected.
AOSP Focus Areas
Policy Infrastructur
Capacity
Building
Incentives
Policy Framework
• Policy provide guidance & see to well-being of
all citizens - political will
• Policies to address (also see existing policies):
• FAIR Principles
• Raw vs Processed/other data
• Licensing
• Sensitive data
• Intellectual Property Issues
Policy Framework
• JKUAT (Kenya) Institutional Open Data Policy
• Uganda Draft Open Data Policy
• Madagascar Lobbying for Open Data Policy
• Towards a White Paper on Open Research Data
Strategy in Botswana
• White Paper on Science , Technology and Innovation in
South Africa
• South Africa Open Science Framework—EU/SA
dialogue
• Funder Policy: National Research Foundation (NRF)(SA)
• OECD Principles & Guidelines for Access to Research
Data from Public Funding
Capacity Building Framework
• Data collector vs data user vs data manager
Therefore the following are core aspects to capacity building:
• Research Data Management Planning
• Repositories
• Command Line Interpretation
• Software Development
• Data Organisation
• Data Cleaning
• Data Management & Databases
• Data Analysis & Visualisation (incl. programming)
Capacity Building Framework
• Engineers, Statisticians, Data Scientists, Librarians, Data
Curators, Researchers, System Administrators,
Policymakers, Auditors, Data Centre Managers, Data
Architects – Wim Hugo
• Different skills for different categories of data workers
• Existing workshops presented
• Tertiary curricula need to adapt more rapidly
• Never too early to learn to work with data, program
Incentives Framework
• Funder requirements changing
• Mechanisms that acknowledge publication of
datasets and to promote data sharing
• How do we deal with difficulties in sharing
data—what are the solutions
• Why is sharing essential
• How do we make sharing successful
• How do we lay the fears down and ensure buy-
in
African NRENs
Source: https://www.geant.org/News_and_Events/PublishingImages/
WorldBank_Map2.jpg
• Cluster 1: Southern and
Easter Africa-
UbuntuNet Alliance
• Cluster 2: Western and
Central Africa-WACREN
• Cluster 3: North Africa-
--Asren
South African Research and Infrastructure
Roadmap, DST
• Focus on global infrastructure in South Africa,
• South African Research Infrastructure
Roadmap (SARIR) has been developed to
facilitate a research infrastructure investment
programme
• “SARIR is intended to provide a strategic,
rational, medium to long term framework for
planning, implementing, monitoring and
evaluating the provision of research
infrastructures (RIs) necessary for a
competitive and sustainable national system of
innovation” (DST, 2016:2).
National Integrated
Cyberinfrastructure Service (NICIS)
• Research Infrastructure is dependent on cyber-
infrastructure.
• This dependency refers to access to physical sites, data
sharing, curation, provenance, protection, and developing
interoperability and metadata standards.
• From the start of any RI programme, E-science and cyber-
infrastructure need to allow virtual access and open access
to national and international data.
• Centre of High Performance Computing,
• South African National Research Network
• Data Intensive Research Initiative of South Africa
• will provide the necessary cyber-infrastructure capabilities
for the successful operation of all the RIs on a generic basis,
this will be known as the National Integrated
Cyberinfrastructure Service” (DST, 206:55
Ilifu
• http://www.researchsupport.uct.ac.za/ilifu
• Consortium of 6 Western Cape institutions
• Data-centric, high-performance computing facility
for data-intensive research
• Proto-typing distributed, federated cloud-based
infrastructure as a platform for data-intensive
research (African Research Cloud)
• Data-processing pipelines and e-science research
tools for big data analysis, visualisation and analytics
• Development and implementation of research data
management systems and tools
• Development of platforms, portals and middleware
to support access and collaborative research by
distributed teams on data-intensive projects
• Towards strategy and action plan,
implementation plan and governance structure
• Support strategic plans on Science, Technology,
Innovation
• Guide on creating and enabling environment to
harness science, technology and innovation
• Impact socio-economic development & industrialization
• Enhance education in developing & using technologies
• Support collaborative research development &
innovation
SADC Cyber-Infrastructure Framework
• Cyber-infrastructure is a key driver for a
knowledge based economy
• Comprises of technologies, skills, people and
policies which support generation, analysis,
transport, sharing, stewardship of information
(incl. data)
• Framework provides Roadmap towards Cyber-
infrastructure Strategy
Se
[Source: Colin Wright SADC/ET-ST1/1/2016/11 Document]
Closing Remarks
• Collaborate & learn from one another –
strength in diversity
• Take ownership & collect/curate data in ethical
way
• Downloaders vs Uploaders
• Trusted & valid data managed in trusted way
• Exploit data for the benefit of society (Min
Naledi Pandor)
• Tell the African story, in an African way
http://internationaldataweek.org/
Objectives
• Identify the issues that need to be dealt with when
drafting framework/roadmap of RI
• Raise awareness within REN’s regarding services
beyond connectivity
• Consult with YOU as experts to shape
• Our own ideas about the way forward
• Possible Phase two of project
• Identify a”writer”
• Identify a “team” to assist with a
framework/roadmap
Thank you!
Susan Veldsman
susan@assaf.org.za
http://africanopenscience.org.za

The African Open Science Platform/Susan Veldsman

  • 1.
    The African OpenScience Platform Presented by Susan Veldsman Director: Scholarly Publishing Porgramme Academy of Science of South Africa (ASSAf) Research Infrastructure Workshop, 14 May 2018
  • 2.
  • 3.
    Trusted Research &Data • Trust is at the centre of the process of science • Trust research & researchers who have your best interest at heart • Build new research on existing research/data • To be trusted, it needs to be managed
  • 4.
    Square Kilometre Array(SKA) • Data collection on a massive scale • Telescope array to consist of 250,000 radio antennas between Australia & SA • Investment in machine learning and artificial intelligence software tools to enable data analysis • 400+ engineers and technicians in infrastructure, fibre optics, data collection • Supercomputers to process data (IBM) • To come: super computer 3x times power of world’s current fastest computer (Tianhe-2) to cope with SKA data
  • 5.
    Africa participation • Botswana •Ghana • Kenya • Madagascar • Mauritius • Mozambique • Namibia • Zambia
  • 6.
    H3ABioNet (H3Africa) 30 institutions,15 African countries, 2 partners outside Africa
  • 7.
    • African humangenomic research; Central node at University of Cape Town • Using NetMap to monitor connectivity • Data transfer: Africa Globus Online (668,622 files transferred between Rhodes University & UCT; 140TB data transferred from USA to SA • Challenges: slow & unstable Internet, unreliable power supply, continent-wide obsolete computer infrastructure that varies between medium-scale server infrastructure to a small number of workstations, with multiple operating systems, lack of centralized, secure data storage • Other: database of participants (H3APRDB, REDCap), data analysis incl. Galaxy, Job Management System, eBiokits, REDCap, WebProtege, Pipelines for data execution, data repository (European Genome-Phenome Archive)
  • 8.
    Open Science Defined “OpenScience is the practice of science in such a way that others can collaborate and contribute, where research data, lab notes and other research processes are freely available, under terms that enable reuse, redistribution and reproduction of the research and its underlying data and methods.” - FOSTER Project, funded by the European Commission
  • 9.
    Open Data Repositories(re3data - 16)
  • 10.
  • 11.
    African Open SciencePlatform • Platform = opportunity to engage in dialogue, create awareness, connect all, provide continental view • Funded by SA Dept. of Science & Technology through National Research Foundation • 3 years (1 Nov. 2016 – 31 Oct. 2019) • Managed by Academy of Science of South Africa (ASSAf) • Through ASSAf hosting ICSU Regional Office for Africa (ICSU ROA) • Direction from CODATA http://africanopenscience.org.za/
  • 12.
    Accord on OpenData in a Big Data World • Values of open data in emerging scientific culture of big data • Need for an international framework • Proposes comprehensive set of principles • FAIR Principles • Provides framework & plan for African data science capacity mobilization initiative • Proposes African Platform Call to Endorse
  • 13.
    Key Stakeholders • GlobalNetwork of Science Academies (IAP) • International Council for Science (ICSU) • The World Academy of Sciences (TWAS) • Research Data Alliance (RDA) • NRENs (Internet Service Providers for Education) • Association of African Universities (AAU) • Network of African Science Academies (NASAC) • African Research Councils (incl. DIRISA, funders) • African Universities • African Governments • Other
  • 14.
    Database Experts &Initiatives 800+
  • 15.
    Landscape Survey: Countries& Initiatives 567
  • 16.
  • 17.
    Click to viewInitiatives/Country https://www.targetmap.com/viewer.aspx?reportId=56245 Please note: this is just a preview and data still to be cleaned and updated and corrected.
  • 18.
    AOSP Focus Areas PolicyInfrastructur Capacity Building Incentives
  • 19.
    Policy Framework • Policyprovide guidance & see to well-being of all citizens - political will • Policies to address (also see existing policies): • FAIR Principles • Raw vs Processed/other data • Licensing • Sensitive data • Intellectual Property Issues
  • 20.
    Policy Framework • JKUAT(Kenya) Institutional Open Data Policy • Uganda Draft Open Data Policy • Madagascar Lobbying for Open Data Policy • Towards a White Paper on Open Research Data Strategy in Botswana • White Paper on Science , Technology and Innovation in South Africa • South Africa Open Science Framework—EU/SA dialogue • Funder Policy: National Research Foundation (NRF)(SA) • OECD Principles & Guidelines for Access to Research Data from Public Funding
  • 21.
    Capacity Building Framework •Data collector vs data user vs data manager Therefore the following are core aspects to capacity building: • Research Data Management Planning • Repositories • Command Line Interpretation • Software Development • Data Organisation • Data Cleaning • Data Management & Databases • Data Analysis & Visualisation (incl. programming)
  • 22.
    Capacity Building Framework •Engineers, Statisticians, Data Scientists, Librarians, Data Curators, Researchers, System Administrators, Policymakers, Auditors, Data Centre Managers, Data Architects – Wim Hugo • Different skills for different categories of data workers • Existing workshops presented • Tertiary curricula need to adapt more rapidly • Never too early to learn to work with data, program
  • 23.
    Incentives Framework • Funderrequirements changing • Mechanisms that acknowledge publication of datasets and to promote data sharing • How do we deal with difficulties in sharing data—what are the solutions • Why is sharing essential • How do we make sharing successful • How do we lay the fears down and ensure buy- in
  • 24.
    African NRENs Source: https://www.geant.org/News_and_Events/PublishingImages/ WorldBank_Map2.jpg •Cluster 1: Southern and Easter Africa- UbuntuNet Alliance • Cluster 2: Western and Central Africa-WACREN • Cluster 3: North Africa- --Asren
  • 25.
    South African Researchand Infrastructure Roadmap, DST • Focus on global infrastructure in South Africa, • South African Research Infrastructure Roadmap (SARIR) has been developed to facilitate a research infrastructure investment programme • “SARIR is intended to provide a strategic, rational, medium to long term framework for planning, implementing, monitoring and evaluating the provision of research infrastructures (RIs) necessary for a competitive and sustainable national system of innovation” (DST, 2016:2).
  • 26.
    National Integrated Cyberinfrastructure Service(NICIS) • Research Infrastructure is dependent on cyber- infrastructure. • This dependency refers to access to physical sites, data sharing, curation, provenance, protection, and developing interoperability and metadata standards. • From the start of any RI programme, E-science and cyber- infrastructure need to allow virtual access and open access to national and international data. • Centre of High Performance Computing, • South African National Research Network • Data Intensive Research Initiative of South Africa • will provide the necessary cyber-infrastructure capabilities for the successful operation of all the RIs on a generic basis, this will be known as the National Integrated Cyberinfrastructure Service” (DST, 206:55
  • 27.
    Ilifu • http://www.researchsupport.uct.ac.za/ilifu • Consortiumof 6 Western Cape institutions • Data-centric, high-performance computing facility for data-intensive research • Proto-typing distributed, federated cloud-based infrastructure as a platform for data-intensive research (African Research Cloud) • Data-processing pipelines and e-science research tools for big data analysis, visualisation and analytics • Development and implementation of research data management systems and tools • Development of platforms, portals and middleware to support access and collaborative research by distributed teams on data-intensive projects
  • 28.
    • Towards strategyand action plan, implementation plan and governance structure • Support strategic plans on Science, Technology, Innovation • Guide on creating and enabling environment to harness science, technology and innovation • Impact socio-economic development & industrialization • Enhance education in developing & using technologies • Support collaborative research development & innovation SADC Cyber-Infrastructure Framework
  • 29.
    • Cyber-infrastructure isa key driver for a knowledge based economy • Comprises of technologies, skills, people and policies which support generation, analysis, transport, sharing, stewardship of information (incl. data) • Framework provides Roadmap towards Cyber- infrastructure Strategy
  • 30.
    Se [Source: Colin WrightSADC/ET-ST1/1/2016/11 Document]
  • 31.
    Closing Remarks • Collaborate& learn from one another – strength in diversity • Take ownership & collect/curate data in ethical way • Downloaders vs Uploaders • Trusted & valid data managed in trusted way • Exploit data for the benefit of society (Min Naledi Pandor) • Tell the African story, in an African way
  • 32.
  • 33.
    Objectives • Identify theissues that need to be dealt with when drafting framework/roadmap of RI • Raise awareness within REN’s regarding services beyond connectivity • Consult with YOU as experts to shape • Our own ideas about the way forward • Possible Phase two of project • Identify a”writer” • Identify a “team” to assist with a framework/roadmap
  • 34.

Editor's Notes

  • #3 We are living in an increasingly data driven world – facebook, twitter, air bnb, uber Malaria outbreak 2014-2015 World Economic Forum 2018 How to get rid of fake data
  • #7 Collaborative projects in Biomedical Sciences – genomics research – catching up with outbreaks, ebola, malaria and more Bioinformatics legs of H3Africa (Human Heridity and Health in Africa) Work among 30 institutions, 15 Afrucan countries, 2 partners outside Africa
  • #12 To get Africa talking to one another
  • #19 Engineers, Statisticians, Data Scientists, Librarians, Data Curators, Researchers, System Administrators, Policymakers, Auditors, Data Centre Managers, Data Architects – Wim Hugo
  • #28 Ilifu is a consortium of Western Cape institutions that together will establish and operate a data-centric, high-performance computing facility for data-intensive research. The partner institutions are Cape Peninsula University of Technology Stellenbosch University Sol Plaatje University South African Radio Astronomy Observatory (SARAO, formerly SKA South Africa). University of Cape Town (lead institute) University of the Western Cape. In addition to establishing and operating a data-intensive computing facility, the consortium will – in collaboration with local and international collaborators and partners – undertake research and development programmes for Proto-typing a distributed, federated cloud-based infrastructure as a platform for data-intensive research, the African Research Cloud. Development of data-processing pipelines and e-science research tools for big data analysis, visualisation and analytics. Development and implementation of research data management systems and tools. Development of platforms, portals and middleware to support access and collaborative research by distributed teams on data-intensive projects.