Policy, infrastructure, skills and incentives driving
African data sharing
The African Open Science Platform Project
Policy | Infrastructure | Skills | Incentives
Tshiamo Motshegwa, Computer Science, University Of Botswana
On behalf of the
Academy of Science of South Africa (ASSAf)
PV2018 Conference
About UB
• Largest established university in Botswana,
• 7 Established Faculties & Schools
- Science, Engineering and Technology, Faculty of Health Sciences, School of Medicine,
Business, Humanities, Social Sciences.
• School of graduate Studies & Well resourced Library
- 5 Stories high, 460,000 Books, 123,000 Full Text Journals, 200 Workstations
• Office of Research & Development
- Small commercialization unit, Research management systems, Digital
Repository for university research
• 9 Research Centres & Institutes,
• A detailed University Research Strategy approved 2008,
• Strategic goals and vision for research intensive institution.
INTERNATIONAL DATA WEEK
IDW 2018
Gaborone, Botswana: 5-8 November 2018
Information: http://internationaldataweek.org/
Deadline for abstracts, 31 May:
https://www.scidatacon.org/IDW2018/
Upcoming Data Events
Digital Frontiers of Global Science
Frontier issues for research in a global and digital age.
Applications, progress and challenges of data intensive research.
Data infrastructure and enabling practices for international and
collaborative research.
Data, development and innovation: data as an interface between
research, industry, government, society and development.
Themes: research and data; data science and data analysis;
data stewardship; policy and practice of data in research;
education and data; data, society, ethics and politics; open data,
innovation and development; data and cybersecurity
SciDataCon part of
International Data Week
 SciDataCon aims to help this community ensure that it has a concrete scientific
record of its work: peer reviewed abstracts > presentations > Special Collection in
the Data Science Journal.
 Themes and Scope: see
https://www.scidatacon.org/conference/IDW2018/conference_themes_and_scope/
 Approved Sessions:
https://www.scidatacon.org/conference/IDW2018/approved_sessions/
 Incredibly rich range of topics. If you do not find a topic there you can submit an
abstract to the general submissions.
 Abstracts can be submitted to Approved Sessions or to General Submissions. Will
be peer reviewed and distributed into the programme.
 Abstracts for presentations and lightning talks/posters.
 Deadline is 31 May:
https://www.scidatacon.org/conference/IDW2018/call_for_papers/
170+ Submitted , 65 Accepted, ~45% Developing Countries Relevant Themes
Part 0
Collaborations -Knowledge,
Networks & Nations
Patterns & Trends - research Collaborations
Part I
Revolutions, Paradigm Shifts, Open
Data Open Science and SDGs
On Revolutions & Paradigm Shifts
On Revolutions & Paradigm Shifts
Source: William Genovese, Huawei
On Revolutions & Paradigm Shifts
https://www.weforum.org/agenda/2016/04/africa-doesn-t-just-need-a-digital-revolution-it-needs-a-cultural-one-too/
Digital What What
Source: Serges Nanfack -The Learning Experience @ Cisco Networking Academy
Doing Science
• Scientific method not going anywhere
• …But
• Experimental - > Theoretical - > Data Intensive?
Software Processes
• Then -> Future/Now
• Source Code -> Data
• Compiler -> Deep Learning
• Executable -> Predictor
Mike Houston – NVIDIA Distinguished Engineer – Production
Deep Learning and Scale, SC’17 Keynote
Data Driven World
Fake Data, Fake Research
http://www.bbc.com/news/science-environment-39357819
• Official data: 30
Allegations of
research misconduct
in 2012 – 2015
• BBC obtained
figures under FoI
rules: Hundreds in
23 Universities over
same period
• Global concern over
research integrity
• House of Commons
enquiry to reassure
public
Getting more value from data
© CSIR, 2017
25
How can we improve data sharing?
How to make data sharing work?
• Interoperability?
• Data science expertise and skills?
• Security, trust and identity?
• Findability, accessibility and services?
• Business models, sustainability and
policies?
Emerging Policy Consensus? FAIR Data
• FAIR Data (see original guiding principles at
https://www.force11.org/node/6062
• Findable: have sufficiently rich metadata and a unique and persistent
identifier.
• Accessible: retrievable by humans and machines through a standard
protocol; open and free by default; authentication and authorization where
necessary.
• Interoperable: metadata use a ‘formal, accessible, shared, and broadly
applicable language for knowledge representation’.
• Reusable: metadata provide rich and accurate information; clear usage
license; detailed provenance.
What is Open Science? (1)
 Open access to research literature.
 Data that is as Open as possible, as closed as
necessary.
 FAIR Data (Findable, Accessible, Interoperable,
Reusable).
 Data is a recognised and important output of
research.
 A culture and methodology of open discussion and
enquiry (including methodology, lab notebooks, pre-
prints).
 Data code and analysis processes are shared for
reproducibility.
 Engagement with society and the economy in
research activities (citizen science, co-design /
transdisciplinary research, interface between
research, development and innovation).
Why Open Science / FAIR Data?
• Good scientific practice depends on communicating the
evidence.
• Open research data are essential for reproducibility, self-
correction.
• Academic publishing has not kept up with age of digital data.
• Danger of an replication / evidence / credibility gap.
• Boulton: to fail to communicate the data that supports
scientific assertions is malpractice
• Open data practices have transformed certain areas of research.
• Genomics and related biomedical sciences; crystallography;
astronomy; areas of earth systems science; various disciplines
using remote sensing data…
• FAIR data helps use of data at scale, by machines,
harnessing technological potential.
• Research data often have considerable potential for reuse,
reinterpretation, use in different studies.
• Open data foster innovation and accelerate scientific discovery
through reuse of data within and outside the academic system.
• Research data produced by publicly funded research are a
public asset.
Open Science (incl. Data) Defined
“Open Science is the practice of science in such a
way that others can collaborate and contribute,
where research data, lab notes and other
research processes are freely available, under
terms that enable reuse, redistribution and
reproduction of the research and its
underlying data and methods.” - FOSTER Project,
funded by the European Commission
Open Science and FAIR Data:
Benefits for Stakeholders
 Government and Innovation / Development
 Increased impact from investment in activities relating to data; economic, innovation and
research benefits.
 Partnerships for research, development and innovation around co-design, Open Science
and FAIR data.
 Research Institutions:
 Development of data capacity and data skills;
 Not losing valuable data (stored on hard drives, not annotated or reusable);
 Shop window of research activities and expertise (Open Access, Open Data / FAIR Data)
 Capacity to build research schools around data assets and skills, attract international
collaboration and investment.
 Build case for ‘data sovereignty’, data (re-)patriation.
 Researchers:
 Increased data skills, expertise in FAIR data builds competitive edge.
 Citation advantage of Open Access / Open Data.
 Culture of certain research disciplines is already strongly in favour of Open Data / Open
Science.
Open Data, Open Science &
Research Lifecycle (Foster)
Original Research Data Lifecycle image from University of California, Santa Cruz
http://guides.library.ucsc.edu/datamanagement/
Repositories
Repositories
Tools
Plan
Policy&Infrastructure
Data & Sustainable Development
Goals
9: Industry, Innovation Infrastructure
https://theodi.org/news/new-report-reveals-how-open-data-is-fuelling-problemsolving-in-the-developing-
world-from-mapping-ebola-to-protecting-banana-crops
Data in support of the SDGs
Protecting banana farmers’
livelihoods (Uganda)
Using maps to increase access
to education (Kenya)
Monitoring child malnutrition
(Uganda)
http://theconversation.com/what-115-years-of-data-tells-us-about-africas-battle-with-malaria-past-and-
present-85482
The prevalence of malaria infection in sub-
Saharan Africa today is at the lowest point since
1900.
http://www.nature.com/news/data-sharing-make-
outbreak-research-open-access-1.16966
Data innovations Examples – Earth
Observation
• Planet Labs - https://www.planet.com/
• Use cases – Agriculture, Mapping, Defense,Forestry
Data innovations Examples – Precision
Agriculture
• Crop Yield Monitoring & Modelling
Princeton University researchers use a low-cost, low-energy device to collect environmental data in
Burkina Faso and Zambia that can help with crop-yield monitoring and modeling.
Data innovations Examples – Earth
Observation
• Radiant Earth - https://radiant.earth/
• Upcoming Botswana Mapathon@UB
Rapidly Developing Thunderstorms
• MeteoSat Second
Generation
satellite data
• Merged with
numerical
weather
prediction data
• Track
thunderstorms
Weather & Climate
**Slides courteous of Dr Mary-Jane Bopape, SAWS
Part I
About the African Open Science
Platform
Observation
“Several open science activities are underway
across Africa, but a great deal will be gained if, in
the context of developing inter-regional links,
these activities were to be coordinated and
developed through such a coordinating
initiative.” - CODATA
Accord on Open Data in a
Big Data World
• Proposes
comprehensive set of
principles
• FAIR Principles
• Data as open possible,
as closed necessary
• Provides framework &
plan for African data
science capacity
mobilization initiative –
AOSP
Call to Endorse
African Open Science Platform (AOSP)
• Platform = opportunity to engage in dialogue,
create awareness, connect all, provide continental
view
• Funded by SA Dept. of Science & Technology
through National Research Foundation
• 3 years (1 Nov. 2016 – 31 Oct. 2019)
• Managed by Academy of Science of South Africa
(ASSAf)
• Through ASSAf hosting ICSU Regional Office
for Africa (ICSU ROA)
• Direction from CODATA
http://africanopenscience.org.za/
 Key deliverables of the pilot project will be foundations for the platform in these four
key area:
1. Frameworks and guidance to assist policy development at national and
institutional level.
2. Study and recommendations to reduce barriers and provide constructive
incentives for Open Science.
3. Framework for data science training (including RDM, data stewardship and
science of data); curriculum framework, training materials, recommendations
for training initiatives.
4. Framework and roadmap for data infrastructure development: emphasising
partnerships and de-duplication between national systems, economies of scale,
institutions and domain initiatives.
Framework for Policies, Incentives,
Training and Technical
Infrastructures
Vision and Mission of an
African Open Science Platform
 African scientists are at the cutting edge of contemporary, data-intensive science as a
fundamental resource for a modern society.
 A digital ecosystem with five complementary aims governed by a set of common
principles and practices:
1. A virtual space for scientists to find, deposit, manage, share and reuse data,
software and metadata;
2. A means of continually developing capacities at all levels of national science
systems and amongst professionals and their institutions operating in the public
and private domain;
3. A basis for multi-stakeholder consortia that wish to utilise powerful digital
tools in addressing major common problems, and for work in the trans-
disciplinary mode;
4. A forum for exchange of ideas, best practices and opportunities amongst
Platform partners and with the international data-science community.
5. An African Data Science Institute, to advance the frontiers of data science and
provide support for interdisciplinary research domains where there are
particularly strong data assets in Africa.
AOSP Focus Areas (Frameworks &
Roadmaps)
Policy Infrastructure
Capacity
Building
Incentives
SA, Uganda,
Botswana,
Madagascar,
Kenya, Ethiopia
Expected Deliverables
1. A framework for each focus areas, to promote the
development and coordination of data policies, data
training, data incentives and data infrastructure;
2. A database with information on open science (incl. open
data) initiatives and key role players/experts in open
science, across all disciplines;
3. A policy toolkit guiding governments on implementing
open science policies;
4. An advocacy toolkit to assist representatives from various
African countries to create more awareness of the
benefits of open science;
5. A competency index for all working with data; and
6. A training toolkit, building on existing resources.
Key Stakeholders
• Global Network of Science Academies (IAP)
• International Council for Science (ICSU)
• The World Academy of Sciences (TWAS)
• Research Data Alliance (RDA)
• NRENs (Internet Service Providers for Education)
• Association of African Universities (AAU)
• Network of African Science Academies (NASAC)
• African Research Councils (incl. DIRISA, funders)
• African Universities
• African Governments
• Other
Advocate & create awareness of the need for open science/open
data policies & research data to be FAIR and as open possible, and
well managed for the unforeseeable future.
Outcomes: Meetings, Workshops, Conferences, Training
Conduct a landscape study to better understand and provide a
coherent view of open data activities on the African Continent.
Outcomes: Database, Visualisation of data activities &
stakeholders
Phase1Year1
Phase1Year2
Develop, test and publish frameworks and roadmaps to guide
African governments on developing Policies, ICT Infrastructure,
Skills and Incentives, in support of open research data sharing.
Outcomes: Frameworks & Roadmaps
Continue creating awareness of the need for research data to be
open, and profile African initiatives globally, also through
IDW2018.
Outcomes: Meetings, Workshops, Conference
Phase1Year3
Tobeapproved
Conduct a series of CODATA-RDA-AOSP Data Schools, funded
through the AOSP project.
Outcomes: Data Schools
Consolidating & finalizing work conducted during Phase 1. Work
towards a conceptual plan as basis for an African Open Science
Commons.
Outcomes: Conceptual Plan for an African Open Science Commons
Part III
Landscape Survey
Some Precedence - Africa Data
Consensus Study
• Adopted in March 2015 at High Level Conference
on Data Revolution
• Strategy for implementing data revolution in
Africa
• Plan of action to be guided by United Nations
Economic Commission for Africa (UNECA), African
Union Commission (AUC), African Development
Bank (AfDB), supported by UN Development
Programme (UNDP), UN Populations Fund
(UNFPA)
• Implemented in collaboration with partner
institutions from public & private sectors, civil
society organisations
AOSP Study of Data Activities in
Africa
AOSP Study of Data Activities in Africa:
Initial Results - Coverage
https://www.targetmap.com/viewer.aspx?reportId=56245
Please note: this
is just a preview
and data still to
be cleaned and
updated and
corrected.
Projects & Initiatives
Results - Some Selected Initiatives
Tunisia
Data Computing Centre el Khawarizmi
Kenya
Data Centre & Services @ Research & Education
Network (KENET)
South Africa
DIRISA, Data Intensive Research Cloud Infrastructure
Initiatives – ARC, SADIRC, Ilifu
High Performance Computing Centres
Botswana, Lesotho, Mozambique, SA, Tanzania,
Zambia, Zimbabwe
Open Data for Africa
African Development Bank
Initiatives,HPCCs,Services
Example National Institutional
Initiative – Ilifu, South Africa
• http://www.researchsupport.uct.ac.za/ilifu
• Consortium of 6 Western Cape institutions
• Data-centric, high-performance computing facility
for data-intensive research
• Proto-typing distributed, federated cloud-based
infrastructure as a platform for data-intensive
research (African Research Cloud)
• Data-processing pipelines and e-science research
tools for big data analysis, visualisation and
analytics
• Development and implementation of research data
management systems and tools
• Development of platforms, portals and middleware
to support access and collaborative research by
distributed teams on data-intensive projects
Regional SADC
Cyberinfrastructure Framework[SADC
member states & Working Group
Approved 30th June 2016 at Joint Meeting of Ministers of Education& Training And
Science & Technology]
SADC 15
Members
Cyberinfrastructure Ecosystem
Source: Colin Wright,SADC CI Framework
Components of a CI
• National Research Networks - Specialized broadband
infrastructure networks and service providers for
education, research and innovation ,
• Computational Resources - Ranging from HPC to
other computing capabilities ,
• Data - tools and facilities (including repositories) to
enable sharing and efficient data driven discoveries,
technologies and innovations,
• Policies - To enable optimal establishment and
utilization of cyber-infrastructure, generation, analysis,
transport as well as stewardship of information, and
• Human Capital - To make effective use of the
Cyberinfrastructure.
Data infrastructure
Source: E‐Infrastructure Roadmap Engineering and Physical
Sciences Research Council (EPSRC)
Immediate Opportunities:
International Collaborations
H3ABioNet (H3Africa)
Bioinformatics Network - 32 institutions, 15
African countries, 2 partners outside Africa
Immediate Opportunities:
International Collaborations
Source: SKA Africa
SKA Science Case
Source: SKA Africa
SKA Science Goals
• How do galaxies evolve? What is dark
energy?
• Does General Relativity theory break down in
strong gravity?
• What generates giant magnetic fields in
space?
• How were the first black holes and stars
formed?
• Are we alone?
[https://www.skatelescope.org/science/]
SKA Data Requirements
Source: SKA Africa
Figure 2 Showing SKA Data requirements [ Source HPCWire and Philip Diamond, Director General of SKA, and
Rosie Bolton, SKA Regional Centre Project Scientist Supecomputing, SC17, Keynote talk 17th November, 2017,
Denver, Colorado, USA].
https://www.hpcwire.com/2017/11/17/sc17-keynote-hpc-powers-ska-efforts-peer-deep-cosmos/
Astronomers
estimate that the
project will
generate 35,000-
DVDs-worth of
data every second.
This is equivalent
to “the whole
world wide web
every day,” said
Fanaroff.”
Immediate Opportunities:
Continental Collaborations – African Very
Long Base Interferometry Network
Source: SKA Africa
Immediate Regional Opportunities
Immediate Opportunities:
International Collaborations
Immediate Opportunities
National – sharing instruments?( Botswana)
Focus Areas – To be unpacked in
implementation
• Policy/Strategy development, institutionalisation, implementation
and support;
• Education, Research & Development and Innovation;
• Cyber-Infrastructure support for education;
• Support existing and new Centres of Excellence (CoE) and provide them
with tools;
• Cyber-Infrastructure support for research by promoting research
collaborations and supporting regional flagship projects; and
• Cyber-Infrastructure support for innovation
• Human Capital Development;
• Infrastructure development;
• Resource mobilisation;
• Communication, Awareness & Advocacy; and
• Strategic Partnerships.
Progress
- Connectivity
-High Performance Computing
-Data
Human capital Development
World Bank Report on Role of NRENs
in Africa
CMM of NRENs in SADC Region
[World Bank Report - “The role and status of African National Research and Education Networks” (Foley, 2016).]
External Connectivity
ICTInfrastructure(NRENs) ASREN
WACREN UbuntuNet
HPC Capacity Ecosystems
Development
SA CHPC
Cambridge HPCS
TACC – University Of Texas
TACC Ranger Voyage Story
Example - University Of Botswana – CS
Dept.
HPC ECOSYSTEMS GLOBAL MAP (2017-10+)
Slide from HPC Ecosystems Project [Bryan Johnston, AceLab CHPC]
Human Capital Development
A cyber-infrastructure training map [Source Neil Chue Hong, Software Sustainability Institute]
19 Southern African scholars attend workshop at TACC – Source Stem-Trek
SADC-TACC Workshop@TACC 2015
SADC Delegates @ SC’15
US/Pan-African Workshop: HPC
On Common Ground @ SC16
URISC@ SC’17 Denver,Colorado
Understanding Risk in Shared
Cyberecosystems Source: Stem-Trek
[Datascience Training: @International Center
For Theoretical Physics]
CODATA-RDA Datascience Schools@ICTP
CODATA-RDA Datascience Schools@ICTP
[2016 School -Image Source : CODATA]
Part III Upcoming Events
INTERNATIONAL DATA WEEK
IDW 2018
Gaborone, Botswana: 5-8 November 2018
Information: http://internationaldataweek.org/
Deadline for abstracts, 31 May:
https://www.scidatacon.org/IDW2018/
Digital Frontiers of Global Science
Frontier issues for research in a global and digital age.
Applications, progress and challenges of data intensive research.
Data infrastructure and enabling practices for international and
collaborative research.
Data, development and innovation: data as an interface between
research, industry, government, society and development.
Themes: research and data; data science and data analysis;
data stewardship; policy and practice of data in research;
education and data; data, society, ethics and politics; open data,
innovation and development; data and cybersecurity
SciDataCon part of
International Data Week
 SciDataCon aims to help this community ensure that it has a concrete scientific
record of its work: peer reviewed abstracts > presentations > Special Collection in
the Data Science Journal.
 Themes and Scope: see
https://www.scidatacon.org/conference/IDW2018/conference_themes_and_scope/
 Approved Sessions:
https://www.scidatacon.org/conference/IDW2018/approved_sessions/
 Incredibly rich range of topics. If you do not find a topic there you can submit an
abstract to the general submissions.
 Abstracts can be submitted to Approved Sessions or to General Submissions. Will
be peer reviewed and distributed into the programme.
 Abstracts for presentations and lightning talks/posters.
 Deadline is 31 May:
https://www.scidatacon.org/conference/IDW2018/call_for_papers/
170+ Submitted , 65 Accepted, ~45% African-centric
Next Steps
Phase 2 of the project will be focussing on
developing an actual African Open Science
Commons, similar to the European Open Science
Cloud (EOSC),
• integrating open science initiatives,
• policy,
• infrastructure,
• skills development and
• incentives.
Summary
1. Ongoing phased African Open Science platform project on data, data
policies and development of data infrastructure in the African continent
2. A landscape survey has been conducted - focus areas - policy,
infrastructure, capacity building and incentives
3. The survey reveals a significant number of initiatives and developments in
infrastructure and human capital development, primarily driven by project
needs and preparedness
4. There is however limited advances in policy and incentive schemes
5. The main challenge in the project was to direct the focus on research data
vs open government data
6. Policy development and slow and limited,
7. Phase 2 of the project will be focussing on developing an actual African
Open Science Commons, similar to the European Open Science Cloud
(EOSC), integrating open science initiatives, policy, infrastructure, skills
development and incentives.
Summary
“You can have data without information, but you
cannot have information without data.” –
Daniel Keys Moran, an American computer programmer and science
fiction writer.
“Data is a precious thing and will last longer than
the systems themselves.” – Tim Berners-Lee, inventor of the
World Wide Web.
Thank you
Ina Smith
Project Manager, African Open Science Platform Project, Academy of
Science of South Africa (ASSAf)
ina@assaf.org.za
Tshiamo Motshegwa
Computer Science Department, University Of Botswana
Tshiamo.motshegwa@mopipi.ub.bw
Acknowledgments
Simon Hodson – executive director, CODATA
simon@codata.org
Visit http://africanopenscience.org.za

The African Open Science Platform: Policy, Infrastructure, Skills and Incentives driving Open Science/Tshiamo Motshegwa

  • 1.
    Policy, infrastructure, skillsand incentives driving African data sharing The African Open Science Platform Project Policy | Infrastructure | Skills | Incentives Tshiamo Motshegwa, Computer Science, University Of Botswana On behalf of the Academy of Science of South Africa (ASSAf) PV2018 Conference
  • 2.
    About UB • Largestestablished university in Botswana, • 7 Established Faculties & Schools - Science, Engineering and Technology, Faculty of Health Sciences, School of Medicine, Business, Humanities, Social Sciences. • School of graduate Studies & Well resourced Library - 5 Stories high, 460,000 Books, 123,000 Full Text Journals, 200 Workstations • Office of Research & Development - Small commercialization unit, Research management systems, Digital Repository for university research • 9 Research Centres & Institutes, • A detailed University Research Strategy approved 2008, • Strategic goals and vision for research intensive institution.
  • 5.
    INTERNATIONAL DATA WEEK IDW2018 Gaborone, Botswana: 5-8 November 2018 Information: http://internationaldataweek.org/ Deadline for abstracts, 31 May: https://www.scidatacon.org/IDW2018/ Upcoming Data Events
  • 6.
    Digital Frontiers ofGlobal Science Frontier issues for research in a global and digital age. Applications, progress and challenges of data intensive research. Data infrastructure and enabling practices for international and collaborative research. Data, development and innovation: data as an interface between research, industry, government, society and development.
  • 7.
    Themes: research anddata; data science and data analysis; data stewardship; policy and practice of data in research; education and data; data, society, ethics and politics; open data, innovation and development; data and cybersecurity
  • 8.
    SciDataCon part of InternationalData Week  SciDataCon aims to help this community ensure that it has a concrete scientific record of its work: peer reviewed abstracts > presentations > Special Collection in the Data Science Journal.  Themes and Scope: see https://www.scidatacon.org/conference/IDW2018/conference_themes_and_scope/  Approved Sessions: https://www.scidatacon.org/conference/IDW2018/approved_sessions/  Incredibly rich range of topics. If you do not find a topic there you can submit an abstract to the general submissions.  Abstracts can be submitted to Approved Sessions or to General Submissions. Will be peer reviewed and distributed into the programme.  Abstracts for presentations and lightning talks/posters.  Deadline is 31 May: https://www.scidatacon.org/conference/IDW2018/call_for_papers/
  • 9.
    170+ Submitted ,65 Accepted, ~45% Developing Countries Relevant Themes
  • 10.
  • 11.
    Patterns & Trends- research Collaborations
  • 15.
    Part I Revolutions, ParadigmShifts, Open Data Open Science and SDGs
  • 16.
    On Revolutions &Paradigm Shifts
  • 17.
    On Revolutions &Paradigm Shifts Source: William Genovese, Huawei
  • 18.
    On Revolutions &Paradigm Shifts https://www.weforum.org/agenda/2016/04/africa-doesn-t-just-need-a-digital-revolution-it-needs-a-cultural-one-too/
  • 19.
    Digital What What Source:Serges Nanfack -The Learning Experience @ Cisco Networking Academy
  • 20.
    Doing Science • Scientificmethod not going anywhere • …But • Experimental - > Theoretical - > Data Intensive?
  • 21.
    Software Processes • Then-> Future/Now • Source Code -> Data • Compiler -> Deep Learning • Executable -> Predictor Mike Houston – NVIDIA Distinguished Engineer – Production Deep Learning and Scale, SC’17 Keynote
  • 22.
  • 23.
    Fake Data, FakeResearch http://www.bbc.com/news/science-environment-39357819 • Official data: 30 Allegations of research misconduct in 2012 – 2015 • BBC obtained figures under FoI rules: Hundreds in 23 Universities over same period • Global concern over research integrity • House of Commons enquiry to reassure public
  • 25.
    Getting more valuefrom data © CSIR, 2017 25 How can we improve data sharing?
  • 26.
    How to makedata sharing work? • Interoperability? • Data science expertise and skills? • Security, trust and identity? • Findability, accessibility and services? • Business models, sustainability and policies?
  • 27.
    Emerging Policy Consensus?FAIR Data • FAIR Data (see original guiding principles at https://www.force11.org/node/6062 • Findable: have sufficiently rich metadata and a unique and persistent identifier. • Accessible: retrievable by humans and machines through a standard protocol; open and free by default; authentication and authorization where necessary. • Interoperable: metadata use a ‘formal, accessible, shared, and broadly applicable language for knowledge representation’. • Reusable: metadata provide rich and accurate information; clear usage license; detailed provenance.
  • 28.
    What is OpenScience? (1)  Open access to research literature.  Data that is as Open as possible, as closed as necessary.  FAIR Data (Findable, Accessible, Interoperable, Reusable).  Data is a recognised and important output of research.  A culture and methodology of open discussion and enquiry (including methodology, lab notebooks, pre- prints).  Data code and analysis processes are shared for reproducibility.  Engagement with society and the economy in research activities (citizen science, co-design / transdisciplinary research, interface between research, development and innovation).
  • 29.
    Why Open Science/ FAIR Data? • Good scientific practice depends on communicating the evidence. • Open research data are essential for reproducibility, self- correction. • Academic publishing has not kept up with age of digital data. • Danger of an replication / evidence / credibility gap. • Boulton: to fail to communicate the data that supports scientific assertions is malpractice • Open data practices have transformed certain areas of research. • Genomics and related biomedical sciences; crystallography; astronomy; areas of earth systems science; various disciplines using remote sensing data… • FAIR data helps use of data at scale, by machines, harnessing technological potential. • Research data often have considerable potential for reuse, reinterpretation, use in different studies. • Open data foster innovation and accelerate scientific discovery through reuse of data within and outside the academic system. • Research data produced by publicly funded research are a public asset.
  • 30.
    Open Science (incl.Data) Defined “Open Science is the practice of science in such a way that others can collaborate and contribute, where research data, lab notes and other research processes are freely available, under terms that enable reuse, redistribution and reproduction of the research and its underlying data and methods.” - FOSTER Project, funded by the European Commission
  • 31.
    Open Science andFAIR Data: Benefits for Stakeholders  Government and Innovation / Development  Increased impact from investment in activities relating to data; economic, innovation and research benefits.  Partnerships for research, development and innovation around co-design, Open Science and FAIR data.  Research Institutions:  Development of data capacity and data skills;  Not losing valuable data (stored on hard drives, not annotated or reusable);  Shop window of research activities and expertise (Open Access, Open Data / FAIR Data)  Capacity to build research schools around data assets and skills, attract international collaboration and investment.  Build case for ‘data sovereignty’, data (re-)patriation.  Researchers:  Increased data skills, expertise in FAIR data builds competitive edge.  Citation advantage of Open Access / Open Data.  Culture of certain research disciplines is already strongly in favour of Open Data / Open Science.
  • 32.
    Open Data, OpenScience & Research Lifecycle (Foster)
  • 33.
    Original Research DataLifecycle image from University of California, Santa Cruz http://guides.library.ucsc.edu/datamanagement/ Repositories Repositories Tools Plan Policy&Infrastructure
  • 34.
    Data & SustainableDevelopment Goals 9: Industry, Innovation Infrastructure
  • 35.
  • 36.
    Protecting banana farmers’ livelihoods(Uganda) Using maps to increase access to education (Kenya) Monitoring child malnutrition (Uganda)
  • 37.
  • 38.
  • 39.
    Data innovations Examples– Earth Observation • Planet Labs - https://www.planet.com/ • Use cases – Agriculture, Mapping, Defense,Forestry
  • 40.
    Data innovations Examples– Precision Agriculture • Crop Yield Monitoring & Modelling Princeton University researchers use a low-cost, low-energy device to collect environmental data in Burkina Faso and Zambia that can help with crop-yield monitoring and modeling.
  • 41.
    Data innovations Examples– Earth Observation • Radiant Earth - https://radiant.earth/ • Upcoming Botswana Mapathon@UB
  • 42.
    Rapidly Developing Thunderstorms •MeteoSat Second Generation satellite data • Merged with numerical weather prediction data • Track thunderstorms Weather & Climate **Slides courteous of Dr Mary-Jane Bopape, SAWS
  • 43.
    Part I About theAfrican Open Science Platform
  • 44.
    Observation “Several open scienceactivities are underway across Africa, but a great deal will be gained if, in the context of developing inter-regional links, these activities were to be coordinated and developed through such a coordinating initiative.” - CODATA
  • 45.
    Accord on OpenData in a Big Data World • Proposes comprehensive set of principles • FAIR Principles • Data as open possible, as closed necessary • Provides framework & plan for African data science capacity mobilization initiative – AOSP Call to Endorse
  • 46.
    African Open SciencePlatform (AOSP) • Platform = opportunity to engage in dialogue, create awareness, connect all, provide continental view • Funded by SA Dept. of Science & Technology through National Research Foundation • 3 years (1 Nov. 2016 – 31 Oct. 2019) • Managed by Academy of Science of South Africa (ASSAf) • Through ASSAf hosting ICSU Regional Office for Africa (ICSU ROA) • Direction from CODATA http://africanopenscience.org.za/
  • 47.
     Key deliverablesof the pilot project will be foundations for the platform in these four key area: 1. Frameworks and guidance to assist policy development at national and institutional level. 2. Study and recommendations to reduce barriers and provide constructive incentives for Open Science. 3. Framework for data science training (including RDM, data stewardship and science of data); curriculum framework, training materials, recommendations for training initiatives. 4. Framework and roadmap for data infrastructure development: emphasising partnerships and de-duplication between national systems, economies of scale, institutions and domain initiatives. Framework for Policies, Incentives, Training and Technical Infrastructures
  • 48.
    Vision and Missionof an African Open Science Platform  African scientists are at the cutting edge of contemporary, data-intensive science as a fundamental resource for a modern society.  A digital ecosystem with five complementary aims governed by a set of common principles and practices: 1. A virtual space for scientists to find, deposit, manage, share and reuse data, software and metadata; 2. A means of continually developing capacities at all levels of national science systems and amongst professionals and their institutions operating in the public and private domain; 3. A basis for multi-stakeholder consortia that wish to utilise powerful digital tools in addressing major common problems, and for work in the trans- disciplinary mode; 4. A forum for exchange of ideas, best practices and opportunities amongst Platform partners and with the international data-science community. 5. An African Data Science Institute, to advance the frontiers of data science and provide support for interdisciplinary research domains where there are particularly strong data assets in Africa.
  • 49.
    AOSP Focus Areas(Frameworks & Roadmaps) Policy Infrastructure Capacity Building Incentives SA, Uganda, Botswana, Madagascar, Kenya, Ethiopia
  • 50.
    Expected Deliverables 1. Aframework for each focus areas, to promote the development and coordination of data policies, data training, data incentives and data infrastructure; 2. A database with information on open science (incl. open data) initiatives and key role players/experts in open science, across all disciplines; 3. A policy toolkit guiding governments on implementing open science policies; 4. An advocacy toolkit to assist representatives from various African countries to create more awareness of the benefits of open science; 5. A competency index for all working with data; and 6. A training toolkit, building on existing resources.
  • 51.
    Key Stakeholders • GlobalNetwork of Science Academies (IAP) • International Council for Science (ICSU) • The World Academy of Sciences (TWAS) • Research Data Alliance (RDA) • NRENs (Internet Service Providers for Education) • Association of African Universities (AAU) • Network of African Science Academies (NASAC) • African Research Councils (incl. DIRISA, funders) • African Universities • African Governments • Other
  • 52.
    Advocate & createawareness of the need for open science/open data policies & research data to be FAIR and as open possible, and well managed for the unforeseeable future. Outcomes: Meetings, Workshops, Conferences, Training Conduct a landscape study to better understand and provide a coherent view of open data activities on the African Continent. Outcomes: Database, Visualisation of data activities & stakeholders Phase1Year1 Phase1Year2 Develop, test and publish frameworks and roadmaps to guide African governments on developing Policies, ICT Infrastructure, Skills and Incentives, in support of open research data sharing. Outcomes: Frameworks & Roadmaps Continue creating awareness of the need for research data to be open, and profile African initiatives globally, also through IDW2018. Outcomes: Meetings, Workshops, Conference Phase1Year3 Tobeapproved Conduct a series of CODATA-RDA-AOSP Data Schools, funded through the AOSP project. Outcomes: Data Schools Consolidating & finalizing work conducted during Phase 1. Work towards a conceptual plan as basis for an African Open Science Commons. Outcomes: Conceptual Plan for an African Open Science Commons
  • 53.
  • 54.
    Some Precedence -Africa Data Consensus Study • Adopted in March 2015 at High Level Conference on Data Revolution • Strategy for implementing data revolution in Africa • Plan of action to be guided by United Nations Economic Commission for Africa (UNECA), African Union Commission (AUC), African Development Bank (AfDB), supported by UN Development Programme (UNDP), UN Populations Fund (UNFPA) • Implemented in collaboration with partner institutions from public & private sectors, civil society organisations
  • 55.
    AOSP Study ofData Activities in Africa
  • 56.
    AOSP Study ofData Activities in Africa: Initial Results - Coverage https://www.targetmap.com/viewer.aspx?reportId=56245 Please note: this is just a preview and data still to be cleaned and updated and corrected.
  • 57.
  • 58.
    Results - SomeSelected Initiatives
  • 59.
    Tunisia Data Computing Centreel Khawarizmi Kenya Data Centre & Services @ Research & Education Network (KENET) South Africa DIRISA, Data Intensive Research Cloud Infrastructure Initiatives – ARC, SADIRC, Ilifu High Performance Computing Centres Botswana, Lesotho, Mozambique, SA, Tanzania, Zambia, Zimbabwe Open Data for Africa African Development Bank Initiatives,HPCCs,Services
  • 60.
    Example National Institutional Initiative– Ilifu, South Africa • http://www.researchsupport.uct.ac.za/ilifu • Consortium of 6 Western Cape institutions • Data-centric, high-performance computing facility for data-intensive research • Proto-typing distributed, federated cloud-based infrastructure as a platform for data-intensive research (African Research Cloud) • Data-processing pipelines and e-science research tools for big data analysis, visualisation and analytics • Development and implementation of research data management systems and tools • Development of platforms, portals and middleware to support access and collaborative research by distributed teams on data-intensive projects
  • 61.
    Regional SADC Cyberinfrastructure Framework[SADC memberstates & Working Group Approved 30th June 2016 at Joint Meeting of Ministers of Education& Training And Science & Technology]
  • 62.
  • 63.
  • 64.
    Components of aCI • National Research Networks - Specialized broadband infrastructure networks and service providers for education, research and innovation , • Computational Resources - Ranging from HPC to other computing capabilities , • Data - tools and facilities (including repositories) to enable sharing and efficient data driven discoveries, technologies and innovations, • Policies - To enable optimal establishment and utilization of cyber-infrastructure, generation, analysis, transport as well as stewardship of information, and • Human Capital - To make effective use of the Cyberinfrastructure.
  • 65.
    Data infrastructure Source: E‐InfrastructureRoadmap Engineering and Physical Sciences Research Council (EPSRC)
  • 66.
  • 67.
    H3ABioNet (H3Africa) Bioinformatics Network- 32 institutions, 15 African countries, 2 partners outside Africa
  • 68.
  • 69.
  • 70.
    SKA Science Goals •How do galaxies evolve? What is dark energy? • Does General Relativity theory break down in strong gravity? • What generates giant magnetic fields in space? • How were the first black holes and stars formed? • Are we alone? [https://www.skatelescope.org/science/]
  • 71.
    SKA Data Requirements Source:SKA Africa Figure 2 Showing SKA Data requirements [ Source HPCWire and Philip Diamond, Director General of SKA, and Rosie Bolton, SKA Regional Centre Project Scientist Supecomputing, SC17, Keynote talk 17th November, 2017, Denver, Colorado, USA]. https://www.hpcwire.com/2017/11/17/sc17-keynote-hpc-powers-ska-efforts-peer-deep-cosmos/ Astronomers estimate that the project will generate 35,000- DVDs-worth of data every second. This is equivalent to “the whole world wide web every day,” said Fanaroff.”
  • 72.
    Immediate Opportunities: Continental Collaborations– African Very Long Base Interferometry Network Source: SKA Africa
  • 73.
  • 74.
  • 75.
    Immediate Opportunities National –sharing instruments?( Botswana)
  • 76.
    Focus Areas –To be unpacked in implementation • Policy/Strategy development, institutionalisation, implementation and support; • Education, Research & Development and Innovation; • Cyber-Infrastructure support for education; • Support existing and new Centres of Excellence (CoE) and provide them with tools; • Cyber-Infrastructure support for research by promoting research collaborations and supporting regional flagship projects; and • Cyber-Infrastructure support for innovation • Human Capital Development; • Infrastructure development; • Resource mobilisation; • Communication, Awareness & Advocacy; and • Strategic Partnerships.
  • 77.
    Progress - Connectivity -High PerformanceComputing -Data Human capital Development
  • 78.
    World Bank Reporton Role of NRENs in Africa
  • 79.
    CMM of NRENsin SADC Region [World Bank Report - “The role and status of African National Research and Education Networks” (Foley, 2016).]
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87.
    Example - UniversityOf Botswana – CS Dept.
  • 88.
    HPC ECOSYSTEMS GLOBALMAP (2017-10+) Slide from HPC Ecosystems Project [Bryan Johnston, AceLab CHPC]
  • 89.
    Human Capital Development Acyber-infrastructure training map [Source Neil Chue Hong, Software Sustainability Institute]
  • 91.
    19 Southern Africanscholars attend workshop at TACC – Source Stem-Trek SADC-TACC Workshop@TACC 2015
  • 92.
  • 93.
    US/Pan-African Workshop: HPC OnCommon Ground @ SC16
  • 94.
    URISC@ SC’17 Denver,Colorado UnderstandingRisk in Shared Cyberecosystems Source: Stem-Trek
  • 95.
    [Datascience Training: @InternationalCenter For Theoretical Physics]
  • 96.
  • 97.
  • 99.
    [2016 School -ImageSource : CODATA]
  • 103.
  • 104.
    INTERNATIONAL DATA WEEK IDW2018 Gaborone, Botswana: 5-8 November 2018 Information: http://internationaldataweek.org/ Deadline for abstracts, 31 May: https://www.scidatacon.org/IDW2018/
  • 105.
    Digital Frontiers ofGlobal Science Frontier issues for research in a global and digital age. Applications, progress and challenges of data intensive research. Data infrastructure and enabling practices for international and collaborative research. Data, development and innovation: data as an interface between research, industry, government, society and development.
  • 106.
    Themes: research anddata; data science and data analysis; data stewardship; policy and practice of data in research; education and data; data, society, ethics and politics; open data, innovation and development; data and cybersecurity
  • 107.
    SciDataCon part of InternationalData Week  SciDataCon aims to help this community ensure that it has a concrete scientific record of its work: peer reviewed abstracts > presentations > Special Collection in the Data Science Journal.  Themes and Scope: see https://www.scidatacon.org/conference/IDW2018/conference_themes_and_scope/  Approved Sessions: https://www.scidatacon.org/conference/IDW2018/approved_sessions/  Incredibly rich range of topics. If you do not find a topic there you can submit an abstract to the general submissions.  Abstracts can be submitted to Approved Sessions or to General Submissions. Will be peer reviewed and distributed into the programme.  Abstracts for presentations and lightning talks/posters.  Deadline is 31 May: https://www.scidatacon.org/conference/IDW2018/call_for_papers/
  • 108.
    170+ Submitted ,65 Accepted, ~45% African-centric
  • 109.
    Next Steps Phase 2of the project will be focussing on developing an actual African Open Science Commons, similar to the European Open Science Cloud (EOSC), • integrating open science initiatives, • policy, • infrastructure, • skills development and • incentives.
  • 110.
    Summary 1. Ongoing phasedAfrican Open Science platform project on data, data policies and development of data infrastructure in the African continent 2. A landscape survey has been conducted - focus areas - policy, infrastructure, capacity building and incentives 3. The survey reveals a significant number of initiatives and developments in infrastructure and human capital development, primarily driven by project needs and preparedness 4. There is however limited advances in policy and incentive schemes 5. The main challenge in the project was to direct the focus on research data vs open government data 6. Policy development and slow and limited, 7. Phase 2 of the project will be focussing on developing an actual African Open Science Commons, similar to the European Open Science Cloud (EOSC), integrating open science initiatives, policy, infrastructure, skills development and incentives.
  • 111.
    Summary “You can havedata without information, but you cannot have information without data.” – Daniel Keys Moran, an American computer programmer and science fiction writer. “Data is a precious thing and will last longer than the systems themselves.” – Tim Berners-Lee, inventor of the World Wide Web.
  • 112.
    Thank you Ina Smith ProjectManager, African Open Science Platform Project, Academy of Science of South Africa (ASSAf) ina@assaf.org.za Tshiamo Motshegwa Computer Science Department, University Of Botswana Tshiamo.motshegwa@mopipi.ub.bw Acknowledgments Simon Hodson – executive director, CODATA simon@codata.org Visit http://africanopenscience.org.za