Successfully reported this slideshow.
Your SlideShare is downloading. ×

Perspectives from the African Open Science Platform (AOSP)/Ina Smith


Check these out next

1 of 52 Ad

More Related Content

Slideshows for you (20)

Similar to Perspectives from the African Open Science Platform (AOSP)/Ina Smith (18)


More from Academy of Science of South Africa (ASSAf) (20)

Recently uploaded (20)


Perspectives from the African Open Science Platform (AOSP)/Ina Smith

  1. 1. Perspectives from the African Open Science Platform (AOSP) Presented by Ina Smith @ismonet
  2. 2. About AOSP • Funded by the National Research Foundation (NRF) (SA Dept. of Science and Technology) • 3 years (1 Nov. 2016 – 31 Oct. 2019) • Open Data: Where are we? Where do we want to be? What do we need to get there? • Managed by Academy of Science of South Africa (ASSAf) • Through ASSAf hosting ICSU Regional Office for Africa (ICSU ROA) • Direction from CODATA
  3. 3. About ASSAf • Recognise scholarly achievement & excellence • Mobilise members in the service of society • Conduct systematic & evidence-based studies on issues of national importance (ASSAf OA Repository) • Promote the development of an indigenous system of South African research • Publish science-focused journals (SciELO SA) • Training in Open Journal Systems (OJS) • Criteria for high quality OA journals • Ambassador for Directory of Open Access Journals (DOAJ)
  4. 4. • Develop productive partnerships with national, regional and international organisations to build capacity within the National System of Innovation (NSI) • Create diversified sources of funding for sustainable functioning and growth of a national academy • Communicate with relevant stakeholders • Association of African Universities (AAU) DATAD-R harvester of OA repositories • Evaluation instrument – harvesting IRs adhering to criteria for best practice (ISO 16363, Data Seal of Approval, WDS etc.)
  5. 5. African Academies of Science
  6. 6. AOSP Governance • Advisory Council (Chair: Prof Khotso Mokhele) • Terms of Reference • Technical Advisory Board (Chair: Prof Joseph Wafula) • Terms of Reference • Working Group • Project Team • Project Administrator
  7. 7. Key Stakeholders • Global Network of Science Academies (IAP) • International Council for Science (ICSU) • Regional Office for Africa (ROA) • Committee on Data for Science and Technology (CODATA) • World Data System (WDS) • The World Academy of Sciences (TWAS) • Research Data Alliance (RDA) • Association of African Universities (AAU)
  8. 8. • Network of African Science Academies (NASAC) • African Research Councils (incl. DIRISA, funders) • African Universities • African Governments • NRENs (Internet Service Providers for Education) • Other
  9. 9. Progress re Openness in Africa • Open Institutional Research Repositories (Webometrics - 74) • Open Educational Resources (E.g. UCT OER) • Massive Open Online Courses (MOOCs) • Open Access Journals (DOAJ) (11 SA universities hosting their own)
  10. 10. • Open Monographs • Open Conference Proceedings (E.g. SUNConferences) • Open Patents • Open Source Software & Open Standards • Open Access & Open Science Policies (ROARMAP) • Open Science • Research Data Management Planning (RDM) • Data sets & accompanying data instruments
  11. 11. Open Data Repositories (re3data - 16)
  12. 12. Data Repositories vs Social Media • Social media sites: • Connect researchers sharing interests • Marketing data • Sites belong to third parties • Repository: • Supports export/harvesting of metadata • Offers long-term preservation • Non-profit – no advertisements • Uses open standards
  13. 13. • OA2020 (2017) • Dakar declaration on Open Science in Africa (2016) • Open Data in a Big Data World Accord (2015) • Open Science for the 21st century A declaration of ALL European Academies (2012) • Salvador Declaration on Open Access Cape Town Declaration (2010) • Cape Town Open Education Declaration (2008) • Kigali Declaration on the Development of an Equitable Information Society in Africa Africa Supporting Open Science
  14. 14. What is Open Science/ Open Data?
  15. 15. Open Science Defined “Open Science is the practice of science in such a way that others can collaborate and contribute, where research data, lab notes and other research processes are freely available, under terms that enable reuse, redistribution and reproduction of the research and its underlying data and methods.” - FOSTER Project, funded by the European Commission
  16. 16. “Open Science moves beyond open access research articles, towards encompassing other research objects such as data, software codes, protocols and workflows. The intention is for people to use, re-use and distribute content without legal, technological or social restrictions. In some cases, Open Science also entails the opening up of the entire research process from agenda-setting to the dissemination of findings.” - Open and Collaborative Science in Development Network project, funded by IDRC
  17. 17. Open Data, Open Science & Research Lifecycle (Foster)
  18. 18. Why African Open Science/ Open Data?
  19. 19. vaccines-caused-autism-children/#cZz6lzEjivPoTZ80.99
  20. 20. Intellectual Property “In many African countries, intellectual property protection is undeveloped, ineffective, expensive and unenforced and in some African countries there exists uncertainty on protection of IP and the threat of innovation being stolen away from inventors.” innovation-in-africa/
  21. 21. GIS Drone Workflow
  22. 22.
  23. 23. Accord on Open Data in a Big Data World • Values of open data in emerging scientific culture of big data • Need for an international framework • Proposes comprehensive set of principles • FAIR Principles • Provides framework & plan for African data science capacity mobilization initiative • Proposes African Platform Call to Endorse
  24. 24. Value of an African Platform • Collective view of Open Data activities • Create awareness • Showcase African research • Contribute to global knowledgebase • Increase return on investment (re-use) • Identify lack of data/opportunities/gaps
  25. 25. • Identify needs e.g. skills development, infrastructure, policy formulation, etc. • Act as conduit for links with international open data and open science programmes and standards • Use data across disciplines/studies • Establish relationships between data • Manage Intellectual Property (IP)
  26. 26. • Make data more discoverable/visible (metadata) • Encourage collaboration between scientific & private sectors, citizens • Participate in collective problem-solving • Allow verification of existing data, predict trends • Accelerate discovery – speed is everything (e.g. outbreaks) • Attract funders
  27. 27. 4 Focus Areas AOSP Policy Capacity Building Infrastr ucture Incentiv es
  28. 28. Policy Framework • JKUT (Kenya) Institutional Open Science Policy • Uganda Draft Open Data Policy • Madagascar Draft Open Data Policy • White Paper on Open Research Data Strategy in Botswana • White Paper on Open Research in South Africa • Funder Policy: National Research Foundation (NRF)(SA) • OECD Principles & Guidelines for Access to Research Data from Public Funding
  29. 29. 30 Nov – 1 Dec 2017
  30. 30. Infrastructure Framework • NRENs – Level 6 Elaborated Service Offering An NREN Capability Maturity Model – Duncan Greaves (2015, Tertiary Education Network) • Richly connected at high speed to many other networks/resources • Value added services e.g. grid & cloud computing resources, user controlled light paths, videoconferencing, federated identity services, and more • Deep culture of collaboration
  31. 31. • Roadmaps needed • “Some of the early stages in NREN formation cannot take place at all if the policy or regulatory environment is unduly hostile. In such cases, policy and regulation are themselves the spaces in which intervention is required.” – Duncan Greaves (2015) • Centres for High Performance Computing • Example: SKA telescope
  32. 32. “Construction of the SKA is due to begin in 2018 and finish sometime in the middle of the next decade. Data acquisition will begin in 2020, requiring a level of processing power and data management know-how that outstretches current capabilities. Astronomers estimate that the project will generate 35,000-DVDs- worth of data every second. This is equivalent to “the whole world wide web every day,” said Fanaroff. The project is investing in machine learning and artificial intelligence software tools to enable the data analysis. In advance of construction of the vast telescope - which will consist of some 250,000 radio antennas split between sites in Australia and South Africa - SKA already employs more than 400 engineers and technicians in infrastructure, fibre optics and data collection.” prepares-for-the-ultimate-big-data-challenge
  33. 33. African NRENs Source: WorldBank_Map2.jpg
  34. 34.
  35. 35. African SKA Partner Countries
  36. 36. Capacity Building Framework • Research Data Management Planning • Repositories • Command Line Interpretation • Software Development • Data Organisation • Data Cleaning • Data Management & Databases • Data Analysis & Visualisation (incl. programming)
  37. 37. Challenges re Capacity Building
  38. 38. Incentives Framework • Mechanisms that acknowledge publication of datasets and to promote data sharing • Funder requirements changing • Set of data can result in many papers, innovations, patents • Benefit other researchers where no funding • Acknowledge & reward impact of data: citations, downloads, views
  39. 39. AOSP Series of Webinars on Incentives October 2017
  40. 40. Actions & Deliverables Year 1 • AOSP Side Event to the SFSA 2016 - Launch • AOSP Workshop AAU 2017 (Ghana) - Policy • Madagascar Meeting & Workshop 2017 – Policy & Capacity Building • Botswana National Open Data Open Science Forum 2017 - Policy • UbuntunetConnect Ethiopian Workshops 2017 – Policy & Infrastructure • Database & Networks – Surveys etc
  41. 41. Actions & Deliverables Year 2 • Capacity Building – 3x AOSP Workshops • Batch/Grid Computing: Condor, HTCondor, SGE, PBS/Torque, LSF, SLURM • Computing Workflow: DAGMan, Pegasus, Makeflow • Data Repositories: Figshare, Zenodo,, Dataverse, DSpace etc. • RDM: DMPRoadmap • Command Line Interface Interpretation: UNIX Shell, bash (incl. editors e.g. Nano, Emacs, Vim, etc.)
  42. 42. • Software Development: Git, GitHub, Mercurial • Data organization: Spreadsheets • Data Cleaning: OpenRefine • Data Management & Databases: SQL, Hadoop(for really large datasets), MySQL, PostgreSQL • Data Analysis & Visualisation: R, R Markdown, ggplot2, Python, MATLAB, C, C++ • Machine Learning: R, Python • Artificial Neural Networks: R, Python, Tensorflow, deep convolutional networks
  43. 43. African Landscape: Outreach 2016-2017 – Top 10
  44. 44. Closing Remarks • Collaborate & learn from one another – strength in diversity • Take ownership & collect/curate data in ethical way • Downloaders vs Uploaders • Trusted & valid data managed in trusted way • Exploit data for the benefit of society (Min Naledi Pandor) • Tell the African story, in an African way
  45. 45. Thank you Ina Smith @AOSP_Official Copyright 2017 SA Dept. of Science and Technology License: CC-BY Please cite as follows: Smith, Ina. 2017. Perspectives from the African Open Science Platform. URL:

Editor's Notes

  • The open source software is there, the infrastructure is improving, but we need to start manage our research output, data sets and more. Demonstrate impact of research conducted with tax payers money, funders money.
  • William Edwards Deming (October 14, 1900 – December 20, 1993) was an American engineer, statistician, professor, author, lecturer, and management consultant. Educated initially as an electrical engineer and later specializing in mathematical physics, he helped develop the sampling techniques still used by the U.S. Department of the Census and the Bureau of Labor Statistics. In his book The New Economics for Industry, Government, and Education,[1] Deming championed the work of Walter Shewhart, including statistical process control, operational definitions, and what Deming called the "Shewhart Cycle"[2] which had evolved into PDSA (Plan-Do-Study-Act). This was in response to the growing popularity of PDSA, which Deming viewed as tampering with the meaning of Shewhart's original work.[3] Deming is best known for his work in Japan after WWII, particularly his work with the leaders of Japanese industry. That work began in August 1950 at the Hakone Convention Center in Tokyo when Deming delivered a speech on what he called "Statistical Product Quality Administration". Many in Japan credit Deming as one of the inspirations for what has become known as the Japanese post-war economic miracle of 1950 to 1960, when Japan rose from the ashes of war on the road to becoming the second largest economy in the world through processes partially influenced by the ideas Deming taught:[4]
    Better design of products to improve service
    Higher level of uniform product quality
    Improvement of product testing in the workplace and in research centers
    Greater sales through side [global] markets
    Deming is best known in the United States for his 14 Points (Out of the Crisis, by W. Edwards Deming, preface) and his system of thought he called the "System of Profound Knowledge". The system includes four components or "lenses" through which to view the world simultaneously:
    Appreciating a system
    Understanding variation
    Epistemology, the theory of knowledge[5]
    Deming made a significant contribution to Japan's reputation for innovative, high-quality products, and for its economic power. He is regarded as having had more impact on Japanese manufacturing and business than any other individual not of Japanese heritage. Despite being honored in Japan in 1951 with the establishment of the Deming Prize, he was only just beginning to win widespread recognition in the U.S. at the time of his death in 1993.[6] President Ronald Reagan awarded him the National Medal of Technology in 1987. The following year, the National Academy of Sciences gave Deming the Distinguished Career in Science award.

  • Last April, five months into the largest Ebola outbreak in history, an international group of researchers sequenced three viral genomes, sampled from patients in Guinea1. The data were made public that same month. Two months later, our group at the Broad Institute in Cambridge, Massachusetts, sequenced 99 more Ebola genomes, from patients at the Kenema Government Hospital in Sierra Leone.
    Uncertainties over whether the information belongs to local governments or data collectors present further barriers to sharing. So, too, does the absence of patient consent, common for data collected in emergencies — especially given the vulnerability of patients and their families to stigmatization and exploitation during outbreaks. Ebola survivors, for instance, risk being shunned because of fears that they will infect others.
    Related stories
    Tensions linger over discovery of coronavirus
    Dreams of flu data
    Nature special: Ebola outbreak
    More related stories
    We immediately uploaded the data to the public database GenBank (see Our priority was to help curb the outbreak. Colleagues who had worked with us for a decade were at the front lines and in immediate danger; some later died. We were amazed by the surge of collaboration that followed. Numerous experts from diverse disciplines, including drug and vaccine developers, contacted us. We also formed unexpected alliances — for instance, with a leading evolutionary virologist, who helped us to investigate when the strain of virus causing the current outbreak arose.
    The genomic data confirmed that the virus had spread from Guinea to Sierra Leone, and indicated that the outbreak was being sustained by human-to-human transmission, not contact with bats or some other carrier. They also suggested new probable routes of infection and, importantly, revealed where and how fast mutations were occurring2. This information is crucial to designing effective diagnostics, vaccines and antibody-based therapies.
    What followed was three months of stasis, during which no new virus sequence information was made public (see 'Gaps in the data'). Some genomes are known to have been generated during this time from patients treated in the United States3. The number is likely to have been much larger: thousands of samples were transferred to researchers' freezers across the world.
    Sources: Sequences, NCBI/; Ebola cases, WHO
    In an increasingly connected world, rapid sequencing, combined with new ways to collect clinical and epidemiological data, could transform our response to outbreaks. But the power of these potentially massive data sets to combat epidemics will be realized only if the data are shared as widely and as quickly as possible. Currently, no good guidelines exist to ensure that this happens.
    Speed is everything
    Researchers working on outbreaks — from Ebola to West Nile virus — must agree on standards and practices that promote and reward cooperation. If these protocols are endorsed internationally, the global research community will be able to share crucial information immediately wherever and whenever an outbreak occurs.
    The rapid dissemination of results during outbreaks is sporadic at best. In the case of influenza, an international consortium of researchers called GISAID established a framework for good practice in 2006. Largely thanks to this, during the 2009 H1N1 influenza outbreak, the US National Center for Biotechnology Information created a public repository that became a go-to place for the community to deposit and locate H1N1 sequence information4. By contrast, the publishing of sequence information in the early stages of the 2012 Middle East respiratory syndrome (MERS) outbreak in Saudi Arabia highlighted uncertainties about intellectual-property rights, and the resulting disputes hampered subsequent access to samples.
    Hasan Jamali/AP
    Pilgrims in Saudi Arabia try to protect themselves from Middle East respiratory syndrome (MERS) virus.
    Sharing data is especially important and especially difficult during an outbreak. Researchers are racing against the clock. Every outbreak can mobilize a different mixture of people — depending on the microbe and location involved — bringing together communities with different norms, in wildly different places. Uncertainties over whether the information belongs to local governments or data collectors present further barriers to sharing. So, too, does the absence of patient consent, common for data collected in emergencies — especially given the vulnerability of patients and their families to stigmatization and exploitation during outbreaks. Ebola survivors, for instance, risk being shunned because of fears that they will infect others.
    Nature special: Ebola outbreak
    Fortunately, useful models for responsible data sharing have been developed by the broader genomics community. In 1996, at a summit held in Bermuda, the heads of the major labs involved in the Human Genome Project agreed to submit DNA sequence assemblies of 1,000 bases or more to GenBank within 24 hours of producing them5, 6. In exchange, the sequencing centres retained the right to be the first to publish findings based on their own complete data sets, by laying out their plans for analyses in 'marker' papers.
    This rapid release of genomic data served the field well. New information on 30 disease genes, for instance, was published before the release of the complete human genome sequence. Since 1996, the Bermuda principles have been extended to other types of sequence data and to other fields that generate large data sets, such as metabolite research.
    Guidelines for sharing
    More-recent policies on data release similarly seek to align the interests of different parties, including funding agencies, data producers, data users and analysts, and scientific publishers. Since January, for example, the US National Institutes of Health has required grantees to make large-scale genomics data public by the time of publication at the latest, with earlier deadlines for some kinds of data7.
    We urge those at the forefront of outbreak research to forge similar agreements, taking into account the unique circumstances of an outbreak.
    First, incentives and safeguards should be created to encourage people to release their data quickly into the public domain. One possibility is to request that data users (and publishers) honour the publication intentions of data producers — the questions and analyses that they want to pursue themselves — for, say, six months. These intentions could be broadcast through several channels, including citable marker papers, disclaimer notices on data repositories such as GenBank, and online forums, such as and the EpiFlu database. Alternatively, data producers could publish an announcement about their data and their intentions on online forums as a resource that can be used by others as long as they cite the original source.
    “We urge researchers working on outbreaks to embrace a culture of openness.”
    Second, ethical, rigorous and standardized protocols for the collection of samples and data from patients should be established to facilitate the generation and sharing of that information. A global consortium involving the leading health and research agencies and the ministries of health of engaged nations should work together towards establishing these. Ethicists should be involved to safeguard subjects' privacy and dignity. Biosecurity experts will also be needed to address potential dual-use research and other safety concerns. A helpful analogue is the approach used by the Human Heredity and Health in Africa (H3Africa) Initiative, which aims to apply genomics to improving the health of African populations. Since August 2013, H3Africa has used standard consent-form guidelines8 for collecting DNA samples from subjects for genomic studies, regardless of their country of origin.
    Toshifumi Kitamura/AFP/Getty
    Quarantine officers rush to test passengers at Tokyo's Narita airport amid the 2009 swine-flu outbreak.
    Lastly, any preparation for future outbreaks should include provisions for rapidly building new bridges and establishing community norms. Successful collaborations in genomics and historical data-sharing agreements have tended to involve a fairly stable group of individuals and organizations, making norms of behaviour relatively easy to establish and sustain. By contrast, outbreaks can involve a new cast of characters each time, and cases in which the pathogen is new to science call for whole new fields of research.
    The Kenema way
    As a first step, we call on health agencies such as the World Health Organization, the US Centers for Disease Control and Prevention and Médecins Sans Frontières, as well as genome-sequencing centres and other research institutions, to convene a meeting this year — similar to that held in Bermuda in 1996. Attendees must include scientists, funders, ethicists, biosecurity experts, social scientists and journal editors.
    We urge researchers working on outbreaks to embrace a culture of openness. For our part, we have released all our sequence data as soon as it has been generated, including that from several hundred more Ebola samples we recently received from Kenema. We have listed the research questions that we are pursuing at and through GenBank, and we plan to present our results at as we generate them, for others to weigh in on. We invite people either to join our publication, or to prepare their own while openly laying out their intentions online. We have also made clinical data for 100 patients publicly available and have incorporated these into a user-friendly data-visualization tool, Mirador, to allow others to explore the data and uncover new insights.
    Kenema means 'translucent, clear like a river stream' or 'open to the public gaze'9. To honour the memory of our colleagues who died at the forefront of the Ebola outbreak, and to ensure that no future epidemic is as devastating, let's work openly in outbreaks.
    Nature 518, 477–479 (26 February 2015) doi:10.1038/518477a

  • Promote development & adoption of data policies, principles, practices, standards
    Determine infrastructure available
    Address issues of incentives, best practice, benefits
    Foster training & capacity building activities
    Create an awareness, stimulate dialogue (frontiers)
  • marks a fully mature NREN of the kind that characterises Europe, North America and comparable contexts. The NREN is richly connected at high speed to many other networks and resources. Numerous valueadded services are available, such as grid and cloud computing resources, usercontrolled lightpaths, videoconferencing, and federated identity services. The NREN’s value proposition lies primarily in these services, since bandwidth pricing in such contexts is transparently cost-related. Many institutions will purchase commodity bandwidth from a commercial provider in addition to NREN-specific bandwidth. A culture of collaboration is deeply established.
  • The NSRC cultivates collaboration among a community of peers to build and improve a global Internet that benefits all parties. We facilitate the growth of sustainable Internet infrastructure via technical training and engineering assistance to enrich the network of networks. Our goal is to connect people.
  • Includes SFSA 2016, AAU June 2017, Madagascar 2017, Database, Survey Phase 1, Webinar OA Week 2017 (incentives)
    Excludes: Mailing List members