Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Open Science and Open Data for Librarians


Published on

Presented during a librarian workshop - creating awareness and sensitising.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Open Science and Open Data for Librarians

  1. 1. Open Science & Open Data for Librarians 13 July 2018 University of the Free State, South Africa Presented by Ina Smith #ismonet #aosp_africa
  2. 2. Programme Introduction to Open Science/Open Data Library Research Data Service African Open Science Platform Project
  3. 3. Data Activity 1: Data Collection & Visualisation
  4. 4. Introduction to Open Science/Open Data
  5. 5. Social Media Data
  6. 6. Research Data "Research data is defined as recorded factual material commonly retained by and accepted in the scientific community as necessary to validate research findings; although the majority of such data is created in digital format, all research data is included irrespective of the format in which it is created."
  7. 7. Data Driven Research
  8. 8. Fake Data, Fake Research
  9. 9. Open Science (incl. Data) Defined “Open Science is the practice of science in such a way that others can collaborate and contribute, where research data, lab notes and other research processes are freely available, under terms that enable reuse, redistribution and reproduction of the research and its underlying data and methods.” - FOSTER Project, funded by the European Commission
  10. 10. Open Data, Open Science & Research Lifecycle (Foster)
  11. 11. Open Notebook Science “A laboratory notebook (lab notebook/lab book) is a primary record of research. Researchers use a lab notebook to document their hypotheses, experiments and initial analysis or interpretation of these experiments.”
  12. 12. infections-in-search-of-the-elephant-in-the-room/
  13. 13. Original Research Data Lifecycle image from University of California, Santa Cruz Repositories Repositories Tools Plan Research Output
  14. 14. Working with Data • Using R, Python, ggplot and more .. • Collection e.g. Survey • Normalisation & Cleaning e.g. OpenRefine • Analysis • Visualisation • Preservation • Mining
  15. 15. Data Cleaning
  16. 16. Data Visualisation • Static: Visualizations-MasterList-R-Code.html • Dynamic: data-visualization-tools-for-big-data/
  17. 17.;19.160;3&l=temperature&t=20160714/ 08
  18. 18. Data Mining • Set of methods to analyse data from various dimensions and perspectives, finding previously unknown hidden patterns, classifying and grouping the data and summarizing the identified relationships The tasks of data mining are twofold: • Create predictive power using features to predict unknown or future values of the same or other feature • Create a descriptive power, find interesting, human-interpretable patterns that describe the data
  19. 19.
  20. 20. unt/index.html
  21. 21. Data Pipelines pipelines-786f6746a59a “Set of data processing elements connected in series, where the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time-sliced fashion.” Create – Process – Clean – Mine – Analyse - Visualise
  22. 22. Research Methodology “It is a science of studying how research is to be carried out. Essentially, the procedures by which researchers go about their work of describing, explaining and predicting phenomena are called research methodology. It is also defined as the study of methods by which knowledge is gained.”
  23. 23. Benefits of Open Research Data (1) • Predict trends, help make informed decisions, informs policy • Collaboration advances science, discovery • Drives development, improves livelihoods of citizens of countries • Increases return on investment (funders), avoid duplication – research is expensive • More and more entrepreneurs are using data in innovative ways, creating more jobs which is much needed on our continent
  24. 24. Benefits of Open Research Data (2) • Helps improve service delivery e.g. mobile apps, robots, artificial intelligence (AI) • Provides evidence for research conducted • Data potentially has far more outcomes when open, with a higher impact • Use for tenure/promotion/measure contribution of researchers (data citation) • Open data reduces redundancy • And more …..
  25. 25. Fears Researchers Experience • Getting scooped • Time & effort by researcher • Someone else finding a path-breaking application of the data that researcher hasn’t considered • Fear of problems/errors in the measurement process being exposed • Confidentiality/privacy of respondents - ethics clearance • Intellectual Property Rights – signed away, little understanding, no IP in place
  26. 26. Protecting banana farmers’ livelihoods (Uganda) Using maps to increase access to education (Kenya) Monitoring child malnutrition (Uganda) Research Data in Support of SDGs
  27. 27. present-85482 The prevalence of malaria infection in sub- Saharan Africa today is at the lowest point since 1900.
  28. 28. outbreak-research-open-access-1.16966
  29. 29. H3ABioNet (H3Africa) 30 institutions, 15 African countries, 2 partners outside Africa
  30. 30. Square Kilometre Array (SKA) • Data collection on a massive scale • Telescope array to consist of 250,000 radio antennas between Australia & SA • Investment in machine learning and artificial intelligence software tools to enable data analysis • 400+ engineers and technicians in infrastructure, fibre optics, data collection • Supercomputers to process data (IBM) • To come: super computer 3x times power of world’s current fastest computer (Tianhe-2) to cope with SKA data
  31. 31. Testing Albert Einstein’s general theory of relativity; imaging neutral hydrogen—the building blocks for stars – in the distant universe; and examining galaxies that were formed billions of years ago. “Construction of the SKA is due to begin in 2018 and finish sometime in the middle of the next decade. Data acquisition will begin in 2020, requiring a level of processing power and data management know-how that outstretches current capabilities. Astronomers estimate that the project will generate 35,000- DVDs-worth of data every second. This is equivalent to “the whole world wide web every day,” said Fanaroff.”
  32. 32. Data Activity 2: Ornithology • Go to • Browse Tracks • Search studies that contain data sets for: Hooded Vulture Africa • Open in Studies Page. • What can you do with the study & related data? • Download the data. • Sort according to ground speed. • How many were spotted in Northern Kruger?
  33. 33. Library Research Data Service
  34. 34. Data Stakeholders • Governments (policy) • Institutions (policy & strategy) • Research Offices (reporting, impact) • Researchers (collecting data in an ethical and trusted way so that it can be re-used) • Research Ethics Committees (safeguard the dignity, rights, safety, and well-being of all trial participants) • Statisticians (processing, analysing and visualising data) • System engineers (to maintain a network and allow for data to be digitally transmitted) • Librarians (managing and organizing the data, and making sure it is digitally preserved for the unforeseeable future)
  35. 35. Why Librarians as Data Partners? • Information standards • Organizational skills • Setting up file structures (organizing information) • Knowledge of workflows • Knowledge of collection management • Describing data using established metadata schemes & controlled vocabulary • Collection curation/preservation
  36. 36. Data Skills for Librarians (1) • Data terminology • Unix-style command line interface, allowing librarians to efficiently work with directories and files, and find and manipulate data • Cleaning and enhancing data in OpenRefine and spreadsheets • Git version control system and the GitHub collaboration tool • Web scraping and extracting data from websites • Scientific writing in useful, powerful, and open mark-up languages such as LaTeX, XML, and Markdown • Formulating and managing citation data, publication lists, and bibliographies in open formats such as BiBTeX, JSON, XML and using open source reference management tools such as JabRef and Zotero
  37. 37. Data Skills for Librarians (2) • Transforming metadata documenting research outputs into open plain text formats for easy reuse in research information systems in support of funder compliance mandates and institutional reporting • Scholarly identity with ORCiD and managing reputation with ORCiD- enabled scholarly sharing platforms such as ScienceOpen • Authorship, contributorship, and copyright ownership in collaborative research projects • Demonstrating best practices in attribution, acknowledgement, and citation, particularly for non-traditional research outputs (software, datasets) • Identifying reputable Open Access publications and Open Institutional/Open Data repositories • Scholarly annotation and open peer review • Investigating and managing copyright status of a work, and evaluating conditions for Fair Use
  38. 38. Role of Librarians • Initiating conversation on Open Science Open Data Policy & Strategy - implement • Develop own data skills (data skills but also informed on copyright, licensing, citation) • Advocate for transparency, openness in research, access to data & provide support • Recommend trusted data repositories • Manage & register trusted data repositories • Increase visibility of research data • Promote & support proper research data management planning among researchers
  39. 39. Strategy & Policy
  40. 40. Open Science Open Data Statement
  41. 41. •
  42. 42. pdf Example: British Library Data Strategy High-level plan to achieve one or more goals under conditions of uncertainty Where are you? Where do you want to be? And how will you get there? Data Management Planning, Data Curation, Data Archiving & Preservation, Data Access, Discovery and Reuse
  43. 43. Example: UCT Research Data Management Policy /image_tool/images/346/TGO_Policy_Research_Data_M anagement_2018_V6.pdf Introduction – Purpose Statement – Definitions – Objectives of the Policy: Benefits of Data Availability & Reuse – Scope of the Policy – Criteria for Selection of Research Data – Stakeholder Roles & Responsibilities – Provision of Research Data Management Infrastructure – Data Management Planning – Discovery & Reuse – Recognition & Reward for Data Providers – Monitoring & Reporting Requirements – Related Policies
  44. 44. Open Science Open Data Policy content/uploads/red_LEARN_Elements_of_the_Content_of_a_RDM_Policy.pdf
  45. 45. Job Description/Work Agreement/KPAs “developing a flexible curriculum on data management; meeting with researchers in individual and group settings to consult on projects, planning, and best practices; exploring and piloting base-line services in curation practices and techniques; and creating documentation and guidelines related to scholars’ emerging data management needs. Other activities may include ongoing assessment and monitoring of researcher needs, proactive development of knowledge and expertise in data management issues across disciplines and domains, and advising researchers on how to meet the data management and open data requirements of publishers and federal funding agencies. This individual will be central to efforts to design appropriate data repository and storage infrastructure for researchers across the University.” 0FAxNkEAL
  46. 46. Business Plan • How will the service be aligned & implemented? • Describe service • How ill it be marketed? • Financial forecasting • Etc. • Pilot with champions • Budget
  47. 47. Upskilling & CPD
  48. 48. Self- & Lifelong Learning • Bachelor of Science in Data Science, Sol Plaatje University (South Africa) • Coursera Data Science • Coursera Research Data Management and Sharing* • Foster Open Science Courses* • MANTRA for Researchers • MANTRA for Librarians* • Author Carpentry • Data Carpentry • Library Carpentry • WDS Training Resources • UCT eResearch
  49. 49. data-standards/list
  50. 50. Advocacy & Marketing
  51. 51. Manage & Register Trusted Data Repositories
  52. 52. Data Repositories vs Social Media • Social media sites/3rd party software: • Connect researchers sharing interests • Marketing data • Sites belong to third parties – and data • Repository: • Supports export/harvesting of metadata • Offers long-term preservation • Non-profit – no advertisements • Uses open standards and protocols • Copyright
  53. 53. • IP (Copyright), CC Licensing, Citations, Persistent Identifiers (DOIs), Metadata Standards • DSpace • Dataverse • CKAN • DKAN • Nesstar Implement & Manage Trusted Data Repositories
  54. 54. “At Princeton we maintain several data collections in our DataSpace instance. With the help of our librarians we devised a custom submission form tailored towards collecting metadata for data sets. In addition we have best practice recommendations, like: add a README file, stick to formats commonly used in your discipline. The library developed a Research Data Management Guide with a section on file formats and data organization.”
  55. 55. Open Data Repositories (re3data - 16)
  56. 56. Register & Recommend Data Repositories • • Open Data Barometer • Global Open Data Index • African Open Science Platform • Dataverse …. And more …
  57. 57. Data Activity 3: Find Data Repositories Find data repositories in a specific discipline and list at:
  58. 58.
  59. 59. content/uploads/2017/01/Core_Trustworthy_Data_Repositories_Require ments_01_00.pdf
  60. 60. Research Data Management Plans
  61. 61. What is a Research Data Management Plan (DMP)? • Document that outlines what researcher will do with data during & after research project • Avoid duplication of effort, plan how to collect data, address ethical issues, preserve data as evidence & for re-use • Comply with funder requirements
  62. 62. Types of data - What is the source of your data? In what formats are your data? Will your data be fixed or will it change over time? How much data will your project produce? Contextual details (metadata) - How will you document and describe your data? Storage, backup and security - How and where will you store and secure your data? Provisions for protection/privacy - What privacy and confidentiality issues must you address? Policies for re-use - How may other researchers use your data? Access and sharing - How will you provide access to your data by other researchers? How will others discover your data? Archiving and providing access - What are your plans for preserving the data and providing long-term access?
  63. 63. Research Data Management Research Proposal Ethics Committee Funder Data Server & Repository Etc.
  64. 64. DIRISA DMPTool
  65. 65. Data Activity 4: Data Management Plan Work in groups and compile a brief Research Data Management Plan
  66. 66. African Open Science Platform Project Phase 1 & 2
  67. 67. African Open Science Platform (AOSP) • Platform = opportunity to engage in dialogue, create awareness, connect all, provide continental view • Funded by SA Dept. of Science & Technology through National Research Foundation • 3 years (1 Nov. 2016 – 31 Oct. 2019) • Managed by Academy of Science of South Africa (ASSAf) • Through ASSAf hosting ICSU Regional Office for Africa (ICSU ROA) • Direction from CODATA
  68. 68. Accord on Open Data in a Big Data World • Proposes comprehensive set of principles • FAIR Principles • Data as open possible, as closed necessary • Provides framework & plan for African data science capacity mobilization initiative – AOSP Call to Endorse
  69. 69. AOSP Focus Areas Policy Infrastructur Capacity Building Incentives
  70. 70. Please note: this is just a preview and data still to be cleaned and updated and corrected. African Open Science Platform (AOSP) Landscape Study
  71. 71. Phase 1 Deliverables • Frameworks & Roadmaps • Open Science & RDM Policy • Open Science & RDM Research & ICT Infrastructure • Open Science & RDM Incentives • Open Science & RDM Capacity Building • Library Framework
  72. 72. Rationale for a Library Framework • Research is becoming increasingly data-driven • There is a push towards science and research data being open and accessible, to advance science in support of the SDGs • Librarians increasingly play a role in managing research output through institutional research repositories – in a FAIR way (findable, accessible, interoperable, re-usable) • In addition, research data on the increase must be managed/curated in a trusted way, and librarians have the necessary skills to add value – also to remain relevant
  73. 73.
  74. 74. Conclusion Only if research and data are open and democratized so that all can have equal access, it would be possible to work towards achieving the 2030 Sustainable Development Goals Librarians to adapt service delivery to new way of doing research (systemic changes), providing data related support to researchers
  75. 75. Thank you Ina Smith Project Manager, African Open Science Platform Project, Academy of Science of South Africa (ASSAf) Visit