CODATA International Training Workshop in Big Data for Science for Researchers from Emerging and Developing Countries, Beijing, China, 5-20 June 2014: Overview of things learned
Presentation at NeDICC Meeting on 16 July 2014. Feedback from CODATA International Training Workshop in Big Data for Science for Researchers from Emerging and Developing Countries, Beijing, China, 5-20 June 2014
Trust threads: Provenance for Data Reuse in Long Tail ScienceBeth Plale
Invited Colloquium talk, Apr 23, 2015, Dept of Information and Library Science, School of Informatics and Computing, Indiana University. Abstract: The world contains a vast amount of digital information which grows vaster ever more rapidly. This makes it possible to do many things on an unprecedented scale: spot social trends, prevent diseases, increase fresh water supplies, accelerate innovation, and so on. As science and technology innovation is essential to improved public health and welfare, the growing sources of data can unlock more secrets. But the rapid growth of data makes accountability and transparency of research increasingly difficult. Data that are not adequately described are not useable except within the research lab that produced it. Data that are intentionally or unintentionally inaccessible or difficult to access and verify are not available to contribute to new forms of research. In this talk I show that data can carry with it thin threads of information that connect it to both its past and its future, forming its lineage particularly as it transitions into a shareable dataset residing in a public repository. In carrying this minimal provenance, the data becomes more trustworthy. This thread of trust is a critical element to the successful sharing, use, and reuse of big data in science and technology research in the future.
Trust threads : Active Curation and Publishing in SEADBeth Plale
Describes Trust Threads, minimalist approach to provenance capture to enhance trustworthiness of published data. Implemented as part of SEAD's Active Curation and Publishing Services. At National Data Integrity Conference, Ft. Collins, Colorado, May 2015.
HathiTrust Research Center Secure CommonsBeth Plale
Introduces HTRC secure commons, expanded secure infrastructure and services for text mining of HT digital data. Shows results comparing n-gram discovery using Solr full text index and a framework using mapReduce. Compute time over 1 million digital volumes is 1 day with 1024 cores. Weaknesses of Solr in n-gram identification are explored.
This is an updated version of an invited talk I presented at the European Research Council-Brussels (Scientific Seminar): "Love for Science or 'academic prostitution'".
It has been updated to be presented at the Document Freedom Day 2014, during the activities organized by the Oficina de Software Libre de la Universidad de Granada (26th March).
I have included some new slides and revised others.
I present a personal revision (sometimes my own vision) of some issues that I consider key for doing Science. It was focused on the expected audience, mainly Scientific Officers with background in different fields of science and scholarship, but also Agency staff.
Abstract: In a recent Special issue of Nature concerning Science Metrics it was claimed that " Research reverts to a kind of 'academic prostitution' in which work is done to please editors and referees rather than to further knowledge."If this is true, funding agencies should try to avoid falling into the trap of their own system. By perpetuating this 'prostitution' they risk not funding the best research but funding the best sold research.
Given the current epoch of economical crisis, where in a quest for funds researchers are forced into competitive game of pandering to panelists, its seems a good time for deep reflection about the entire scientific system.
With this talk I aim to provoke extra critical thinking among the committees who select evaluators, and among the evaluators, who in turn require critical thinking to the candidates when selecting excellent science.
I will present some initiatives (e.g. new tracers of impact for the Web era- 'altmetrics'), and on-going projects (e.g. how to move from publishing advertising to publishing knowledge), that might enable us to favor Science over marketing.
Introductory course on Open Science principles, initiatives, OA routes, OA publishing, Horizon 2020, OpenAIRE for PhD students delivered at the University of Milano Bicocca
Trust threads: Provenance for Data Reuse in Long Tail ScienceBeth Plale
Invited Colloquium talk, Apr 23, 2015, Dept of Information and Library Science, School of Informatics and Computing, Indiana University. Abstract: The world contains a vast amount of digital information which grows vaster ever more rapidly. This makes it possible to do many things on an unprecedented scale: spot social trends, prevent diseases, increase fresh water supplies, accelerate innovation, and so on. As science and technology innovation is essential to improved public health and welfare, the growing sources of data can unlock more secrets. But the rapid growth of data makes accountability and transparency of research increasingly difficult. Data that are not adequately described are not useable except within the research lab that produced it. Data that are intentionally or unintentionally inaccessible or difficult to access and verify are not available to contribute to new forms of research. In this talk I show that data can carry with it thin threads of information that connect it to both its past and its future, forming its lineage particularly as it transitions into a shareable dataset residing in a public repository. In carrying this minimal provenance, the data becomes more trustworthy. This thread of trust is a critical element to the successful sharing, use, and reuse of big data in science and technology research in the future.
Trust threads : Active Curation and Publishing in SEADBeth Plale
Describes Trust Threads, minimalist approach to provenance capture to enhance trustworthiness of published data. Implemented as part of SEAD's Active Curation and Publishing Services. At National Data Integrity Conference, Ft. Collins, Colorado, May 2015.
HathiTrust Research Center Secure CommonsBeth Plale
Introduces HTRC secure commons, expanded secure infrastructure and services for text mining of HT digital data. Shows results comparing n-gram discovery using Solr full text index and a framework using mapReduce. Compute time over 1 million digital volumes is 1 day with 1024 cores. Weaknesses of Solr in n-gram identification are explored.
This is an updated version of an invited talk I presented at the European Research Council-Brussels (Scientific Seminar): "Love for Science or 'academic prostitution'".
It has been updated to be presented at the Document Freedom Day 2014, during the activities organized by the Oficina de Software Libre de la Universidad de Granada (26th March).
I have included some new slides and revised others.
I present a personal revision (sometimes my own vision) of some issues that I consider key for doing Science. It was focused on the expected audience, mainly Scientific Officers with background in different fields of science and scholarship, but also Agency staff.
Abstract: In a recent Special issue of Nature concerning Science Metrics it was claimed that " Research reverts to a kind of 'academic prostitution' in which work is done to please editors and referees rather than to further knowledge."If this is true, funding agencies should try to avoid falling into the trap of their own system. By perpetuating this 'prostitution' they risk not funding the best research but funding the best sold research.
Given the current epoch of economical crisis, where in a quest for funds researchers are forced into competitive game of pandering to panelists, its seems a good time for deep reflection about the entire scientific system.
With this talk I aim to provoke extra critical thinking among the committees who select evaluators, and among the evaluators, who in turn require critical thinking to the candidates when selecting excellent science.
I will present some initiatives (e.g. new tracers of impact for the Web era- 'altmetrics'), and on-going projects (e.g. how to move from publishing advertising to publishing knowledge), that might enable us to favor Science over marketing.
Introductory course on Open Science principles, initiatives, OA routes, OA publishing, Horizon 2020, OpenAIRE for PhD students delivered at the University of Milano Bicocca
A talk at the Urban Science workshop at the Puget Sound Regional Council July 20 2014 organized by the Northwest Institute for Advanced Computing, a joint effort between Pacific Northwest National Labs and the University of Washington.
An invited talk in the Big Data session of the Industrial Research Institute meeting in Seattle Washington.
Some notes on how to train data science talent and exploit the fact that the membrane between academia and industry has become more permeable.
"Open Science, Open Data" training for participants of Software Writing Skills for Your Research - Workshop for Proficient, Helmholtz Centre Potsdam - GFZ German Research Centre for Geosciences, Telegrafenberg, December 16, 2015
Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona Elsevier
The Open Data report is a result of a year-long, co-conducted study between Elsevier and the Centre for Science and Technology Studies (CWTS), part of Leiden University, the Netherlands. The study is based on a complementary methods approach consisting of a quantitative analysis of bibliometric and publication data, a global survey of 1,200 researchers and three case studies including in-depth interviews with key individuals involved in data collection, analysis and deposition in the fields of soil science, human genetics and digital humanities.
The Evolution of e-Research: Machines, Methods and MusicDavid De Roure
David De Roure's Inaugural Lecture on 28th October at Oxford e-Research Centre, University of Oxford, UK
10 years ago we saw a few early adopters of e-Science technology; now we see acceleration of research through broader adoption and sharing of tools, techniques and artefacts, both for 'big science' and the 'long tail scientist'.
Will this incremental trend continue or are we seeing glimpses of a phase change ahead, where researchers harness these emerging digital capabilities to address research questions in ways that simply were not possible before?
This talk will describe three generations of e-Research, using the myExperiment social website as a lens to glimpse future research practice, and focusing on a web-scale computational musicology project as an illustration of 3rd generation thinking.
Also available from http://wiki.myexperiment.org/index.php/Presentations
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital TextsBeth Plale
Invited talk at TRUST Women’s Institute for Summer Enrichment (WISE), Cornell, NY Jun 16, 2014. Infrastructure support for text mining research of big data repository like HathiTrust raises challenges in access and security when the bulk of the repository is protected by copyright.
Thinking about Open Science practices, data sharing and lifetime, and communication from Climate Scientists. Slides based on a presentation given at the Lunchtime talk sessions from the MetOS Section, Department of Geosciences, University of Oslo, November 12th 2015.
Open science curriculum for students, June 2019Dag Endresen
Living Norway seminar on Open Science in Trondheim 12th June 2019.
https://livingnorway.no/2019/04/26/living-norway-seminar-2019/
https://www.gbif.no/events/2019/living-norway-seminar.html
What is e-research?
Enhancing research practice
e-Research Methods, Strategies, and Issues
Tips For Finding Useful Information
Some Search Tools for doing e-research
Research Design
Quantitative Research
Qualitative Research
Ethics & The e-Researcher
How The Net Complicates Ethics?
Privacy, Confidentiality, Autonomy, And The Respect For Persons
Tips For Ethical e-Research
Collaboration Tools
Why Consensus?
Net-based dissemination of E-research results
Dissemination through peer-reviewed articles
Advantages of a peer-reviewed article
Dissemination through email lists or Usenet groups
Dissemination through a virtual conference
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
Presentation at 3rd LEARN workshop on Research Data Management, “Make research data management policies work”
Helsinki, 28 June 2016, by Sarah Callaghan, STFC Rutherford Appleton Laboratory
A talk at the Urban Science workshop at the Puget Sound Regional Council July 20 2014 organized by the Northwest Institute for Advanced Computing, a joint effort between Pacific Northwest National Labs and the University of Washington.
An invited talk in the Big Data session of the Industrial Research Institute meeting in Seattle Washington.
Some notes on how to train data science talent and exploit the fact that the membrane between academia and industry has become more permeable.
"Open Science, Open Data" training for participants of Software Writing Skills for Your Research - Workshop for Proficient, Helmholtz Centre Potsdam - GFZ German Research Centre for Geosciences, Telegrafenberg, December 16, 2015
Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona Elsevier
The Open Data report is a result of a year-long, co-conducted study between Elsevier and the Centre for Science and Technology Studies (CWTS), part of Leiden University, the Netherlands. The study is based on a complementary methods approach consisting of a quantitative analysis of bibliometric and publication data, a global survey of 1,200 researchers and three case studies including in-depth interviews with key individuals involved in data collection, analysis and deposition in the fields of soil science, human genetics and digital humanities.
The Evolution of e-Research: Machines, Methods and MusicDavid De Roure
David De Roure's Inaugural Lecture on 28th October at Oxford e-Research Centre, University of Oxford, UK
10 years ago we saw a few early adopters of e-Science technology; now we see acceleration of research through broader adoption and sharing of tools, techniques and artefacts, both for 'big science' and the 'long tail scientist'.
Will this incremental trend continue or are we seeing glimpses of a phase change ahead, where researchers harness these emerging digital capabilities to address research questions in ways that simply were not possible before?
This talk will describe three generations of e-Research, using the myExperiment social website as a lens to glimpse future research practice, and focusing on a web-scale computational musicology project as an illustration of 3rd generation thinking.
Also available from http://wiki.myexperiment.org/index.php/Presentations
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital TextsBeth Plale
Invited talk at TRUST Women’s Institute for Summer Enrichment (WISE), Cornell, NY Jun 16, 2014. Infrastructure support for text mining research of big data repository like HathiTrust raises challenges in access and security when the bulk of the repository is protected by copyright.
Thinking about Open Science practices, data sharing and lifetime, and communication from Climate Scientists. Slides based on a presentation given at the Lunchtime talk sessions from the MetOS Section, Department of Geosciences, University of Oslo, November 12th 2015.
Open science curriculum for students, June 2019Dag Endresen
Living Norway seminar on Open Science in Trondheim 12th June 2019.
https://livingnorway.no/2019/04/26/living-norway-seminar-2019/
https://www.gbif.no/events/2019/living-norway-seminar.html
What is e-research?
Enhancing research practice
e-Research Methods, Strategies, and Issues
Tips For Finding Useful Information
Some Search Tools for doing e-research
Research Design
Quantitative Research
Qualitative Research
Ethics & The e-Researcher
How The Net Complicates Ethics?
Privacy, Confidentiality, Autonomy, And The Respect For Persons
Tips For Ethical e-Research
Collaboration Tools
Why Consensus?
Net-based dissemination of E-research results
Dissemination through peer-reviewed articles
Advantages of a peer-reviewed article
Dissemination through email lists or Usenet groups
Dissemination through a virtual conference
Data, Responsibly: The Next Decade of Data Science
Similar to CODATA International Training Workshop in Big Data for Science for Researchers from Emerging and Developing Countries, Beijing, China, 5-20 June 2014: Overview of things learned
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
Presentation at 3rd LEARN workshop on Research Data Management, “Make research data management policies work”
Helsinki, 28 June 2016, by Sarah Callaghan, STFC Rutherford Appleton Laboratory
Big data, new epistemologies and paradigm shiftsrobkitchin
This presentation examines how the availability of Big Data, coupled with new data analytics, challenges established epistemologies across the sciences, social sciences and humanities, and assesses the extent to which they are engendering paradigm shifts across multiple disciplines.
The State of Open Data Report by @figshare.
A selection of analyses and articles about open data, curated by Figshare
Foreword by Professor Sir Nigel Shadbolt
OCTOBER 2016
Lecture for a course at NTNU, 27th January 2021
CC-BY 4.0 Dag Endresen https://orcid.org/0000-0002-2352-5497
See also http://bit.ly/biodiversityinformatics
https://www.gbif.no/events/2021/lecture-ntnu-gbif.html
From Open Data to Open Science, by Geoffrey BoultonLEARN Project
1st LEARN Workshop. Embedding Research Data as part of the research cycle. 29 Jan 2016. Presentation by Geoffrey Boulton, University of Edinburgh & CODATA
What is Open Science and what role does it play in Development?Leslie Chan
What is Open Science and what role does it play in Development?
The talk begins with a review of current understanding of open science and its alleged role in providing new opportunities for addressing long-standing development challenges. I then introduce the newly launched Open and Collaborative Science in Development Network, funded by IDRC Canada, and in collaboration with iHub Nairobi, Kenya. The rationale, funding modalities, and the short and long term objectives of the network will be discussed.
Similar to CODATA International Training Workshop in Big Data for Science for Researchers from Emerging and Developing Countries, Beijing, China, 5-20 June 2014: Overview of things learned (20)
Going Full Circle: Research Data Management @ University of PretoriaJohann van Wyk
Presentation delivered at the eResearch Africa Conference, held 23-27 November 2014, at the University of Cape Town, Cape Town, South Africa. Various approaches to Research Data Management at Higher Education Institutions focus on an aspect or two of the research data cycle. At the University of Pretoria the approach has been to support researchers throughout the research process covering the whole research data cycle. The idea is to facilitate/capture the research data throughout the research cycle. This will give context to the data and will add provenance to the data. The University of Pretoria uses the UK Data Archive’s research data cycle model, to align its Research Data Management project-development. This model identifies the stages of a research data cycle as: creating data, processing data, analysing data, preserving data, giving access to data, and reusing data. This paper will give a short overview of the chronological development of research data management at the University of Pretoria. The overview will also highlight findings of two surveys done at the University, one in 2009 and one in 2013. This will be followed by a discussion of a number of pilot projects at the University, and how the needs of researchers involved in these projects are being addressed in a number of the stages of the research data cycle. The discussion will also give a short overview of how the University plans to support those stages not currently being addressed. The second part of the presentation will focus on the projects and technology (software and hardware) used. The University of Pretoria has adopted an Enterprise Content Management (ECM) approach to manage its Research Data. ECM is not a singular platform or system but rather a set of strategies, tools and methodologies that interoperate with each other to create a comprehensive management tool. These sets create an all-encompassing process addressing document, web, records and digital asset management. At the University of Pretoria we address all these processes with different software suites and tools to create a complete management system. Each process presented its own technical challenges. These had to be addressed, while keeping in mind the end objective of supporting researchers throughout the whole research process and data life cycle. Various platforms and standards have been adopted to meet the University of Pretoria’s criteria. To date three processes have been addressed namely, the capturing of data during the research process, the dissemination of data and the preservation of data.
This presentation was delivered at the Elsevier Library Connect Seminar on 6 October 2014 in Johannesburg, 7 October 2014 in Durban and 9 October 2014 in Cape Town and gives an overview of the potential role that librarians can play in research data management
Communities of Practice in an academic library: a run on the wild side?Johann van Wyk
Communities of Practice in an academic library: a run on the wild side?
Presentation by Johann van Wyk at the
5th ICAHIS Conference held on 4-7 July 2005 at
Onderstepoort, University of Pretoria,
South Africa
CoPs in Information Service Organisations: a wild goose chase?Johann van Wyk
Communities of Practice in Information Service Organisations: a wild goose chase? Paper Presentation by Johann van Wyk at the Health Information Community of South Africa (HICSA) Meeting held on 2 November 2005
Web 2 presentation LIASA ILLIG Workshop 21 June 2011Johann van Wyk
Presentation about Web 2.0 that was delivered at the LIASA Gauteng North Interlibrary Loans Workshop held on 21 June 2011 at the National Library of South Africa
Presentation delivered on 8 February 2011 to the Information Specialists at the University of Pretoria's Library Services on the topic of Web 2.0 and Information Professionals
2.0 Scout report: what is out there that we can use?Johann van Wyk
The presentation was delivered at the Special Libraries and Information Services (SLIS) Meeting, titled "Information Professionals in high gear: developing social media savvy" held on 14 October 2010 at the Knowledge Commons, CSIR, Pretoria, South Africa. The presentation takes the viewer on a tour of the different types of Web 2.0 tools that currently exist, and illustrates how some of these tools have been used by the Library Services of the University of Pretoria, South Africa. The presentation also highlights the value each tool can have in a library setting, and ends with possible future developments that are on the horizon.
Engaging Academia Through Library 2.0 tools: a case study: Education Library,...Johann van Wyk
Presentation by Johann van Wyk at the African Digital Scholarship and Curation Conference held from 12-14 May 2009 at CSIR Conference Centre, Pretoria, South Africa
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
CODATA International Training Workshop in Big Data for Science for Researchers from Emerging and Developing Countries, Beijing, China, 5-20 June 2014: Overview of things learned
1. CODATA International Training Workshop in
Big Data for Science for Researchers from
Emerging and Developing Countries,
Beijing , China, 5-20 June 2014
Johann van Wyk
CCOODDAATTAA
II
SS
UU
Presentation at NeDICC Meeting on 16 July 2014
Overview of things learned
3. Participants
21 People from the following countries:
South Africa 3
Brazil 1
Kenya 4
Tanzania/Zanzibar 1
Uganda 1
India 5
Vietnam 2
Indonesia 2
Mongolia 2
4. Programme
The programme was held at the Computer Network Information Center, of the Chinese Aacademy of Sciences, and consisted of lectures from various international scholars and scholars from the Academy of Sciences in China.
5. Workshop on Big Data for International Scientific Programmes: Challenges and Opportunities, 8-9 June 2014
We also attended this 2 day Workshop/Conference on 8-9 June 2014 in Beijing, where experts on Big Data Management from across the world came together
6. Organisations represented
Center for International Earth Science Information Network,
EARTH INSTITUTE, COLUMBIA UNIVERSITY
Computer Network Information Center, CAS
World Data Center for Microorganisms
Institute of Remote Sensing and Digital Earth, CAS
Dept of Earth Sciences
Institute for environment and Human Security
Thetherless World Constellation
International Society for Digital Earth
7. Topics Covered
A very good overview of international collaborations, developments and applications of big data and its management in various disciplines internationally and in China
Earth Observation
Microbial Big Data
Biodiversity Informatics
Geospatial Data
Global Change Research
Visualisation
Data Mining
Phylogenetics
Astronomical Data
Climatology
Data Sharing
Data Analytics
Disaster Management
Bioinformatics
Data Infrastructures
Data Clouds/Cloud Computing
Data Preservation
Data Publishing
Big Data Policy
Microbiology
Data Citation
High Energy Physics
Genomics
Remote Sensing
Knowledge Discovery
Crowdsourcing
Physical Sciences
Metadata
Integrating Social and Natural Science Data
Data Science as new discipline
Linked Data
Virtual Observatory Data
8. Things learned/of value/of interest
A clearer understanding of the concept of Big Data and all its facets
Prof GUO Huadong’s presentation on 8 June at Workshop on Big Data:
“Current characteristics of big data:
•Relative characteristics: denotes those datasets which cannot be acquired, managed or processed on common devices within an acceptable time
•Abolute chacteristics defines big data through Volume, Variety, Veracity and Velocity”
Data Intensive Research -
Inter- & Intra-disciplinary
Paradigm shift from model-driven to data- driven science
9. Thousand years ago – Experimental Science
• Description of natural phenomena
Last few hundred years – Theoretical Science
• Newton’s Laws, Maxwell’s Equations…
Last few decades – Computational Science
• Simulation of complex phenomena
Today – Data-Intensive Science
• Scientists overwhelmed with data sets
from many different sources
• Captured by instruments
• Generated by simulations
• Generated by sensor networks
Emergence of a Fourth Research Paradigm
2
2
2
.
3
4
a
G c
a
a
eScience is the set of tools and technologies
to support data federation and collaboration
• For analysis and data mining
• For data visualization and exploration
• For scholarly communication and dissemination
Courtesy of Prof. Tony Hey
From Presentation by Chenzhou Cui on 18 June 2014 at
CODATA Workshop, Beijing China
10. A Modern Scientific Discovery Process
Courtesy of Prof. S. G. Djorgovski
From Presentation by Chenzhou Cui on 18 June 2014 at CODATA Workshop, Beijing China
11. Things learned/of value/of interest
More deliberations on the concept of Big Data:
•What is your understanding on Big Data & Data Science?
•What are changing or will be changed in the Big data era when we do science?
•What should we do in order to embrace this new opportunity?
From Presentation by Jianhui LI, 11 June 2014, at CODATA Workshop, Beijing China
12. Some Comments/Views on Big Data
•High dimensionality may lead to wrong statistical inference and false scientific conclusions.
•Big Data creates issues of heterogeneity, experimental variations, and statistical biases, and requires us to develop more adaptive and robust procedures to handle the challenges of Big Data, we need new statistical thinking and computational methods.
•Old style science coped with nature’s complexities by seeking the underlying simplicities in the sparse data acquired by experiments. But Big Data forces scientists to confront the entire repertoire of nature’s nuances and all their complexities.
https://www.sciencenews.org/blog/context/why-big-data-bad-science
From Presentation by Jianhui LI, 11 June 2014, at CODATA Workshop, Beijing China
13. Some Comments/Views on Big Data
•Prediction and understanding have been intimately tied together in science, but the influence of big data in science is now breaking them apart. HOW CAN YOU PREDICT something without understanding it? Simple: Find some other phenomenon that tends to occur with the event you’re trying to predict.
•With big data, it turns out that almost everything in nature and society has a tale, one that can be discovered with sophisticated computer models that run on inexpensive hardware and crunch through terabytes of data. If you measure enough variables, it doesn’t matter whether you understand the relationship between cause and effect; all you need is a relationship between one variable and another.
http://www.psmag.com/navigation/nature-and-technology/big-data- changing-science-society-69650/
From Presentation by Jianhui LI, 11 June 2014, at CODATA Workshop, Beijing China
14. Some Comments/Views on Big Data
•Hypothesis-driven research is designed to answer specific questions about cause and effect
•A big data scientist considers hypothesis- driven science too limiting. For example cancer is a complex disease, involving many genes, and we’ll never understand it if we get bogged down in the time-consuming process of testing cause-and-effect relationships one at a time. Instead, we can tackle cancer much more effectively by measuring as many variables as possible in as many cancers as we can collect, without being biased by preconceived ideas
•It’s not a question of data correlation versus theory. The use of data for correlations allows one to test theories and refine them.
http://www.psmag.com/navigation/nature-and-technology/big-data- changing-science-society-69650/
The promise and peril of big data by aspen institute
From Presentation by Jianhui LI, 11 June 2014, at CODATA Workshop, Beijing China
15. •Data is not just a back-office, accounts-settling tool any more.
It is increasingly used as a real-time decision-making tool.
•New forms of computation combining statistical analysis, optimization and artificial intelligence are able to construct statistical models from large collections of data to infer how the system should respond to new data.
•Big data suddenly changes the whole game of how you look at the ethereal odd data sets. Instead of identifying outliers and “cleaning” datasets, theory formation using big data allows you to “craft an ontology and subject it to tests to see what its predictive value is.
•In other words, the data once perceived as “noise” can now be reconsidered with the rest of the data, leading to new ways to develop theories and ontologies. Look at how can you invent the “theory behind the noise” in order to de-convolve it, and to find the pattern that you weren’t supposed to find.
Some Comments/Views on Big Data
From Presentation by Jianhui LI at CODATA Workshop on 11 June 2014, Beijing, China
16. Some Comments/Views on Big Data
•Big data are big in four ways: Volume, Variety, Veracity and Velocity.
•Volume: The scale of data that systems must ingest, process and disseminate;
•Variety: the complexity of the types of information handled (many sources and types of data both structured and unstructured)
•Velocity: the pace at which data flows in and out from sources like business processes, machines, networks and human interaction with things like social media sites, mobile devices
•Veracity: refers to the biases, noise and abnormality in data
http://inside-bigdata.com/2013/09/12/beyond-volume-variety-velocity-issue-big-data-veracity/
17. 4 H’s of Scientific Big Data
•Heterogeneity and Non-reproducibility
“Everything changes and nothing remains still ... and ... you
cannot step twice into the same stream” – Heraclitus of Ephesus
Statistics may or may not work
•High uncertainty
Observation, sampling, record, uncertain models, . . .
•High dimension Multi-sourced
Mathematical models: Fourier Transform, Wavelet, Sparse
representation, Machine learning, . . .
•High computational complexity
From Lizhe Wang’s presentation at CODATA Workshop on 9 June 2014, Beijing, China
18. Data as Resource
•Ubiquitous: available anytime and anywhere Non-rivalrous: one person’s use of it does not impede another ’s
•Hyper-renewable: data do not diminish when it is used; it can be processed again and again and its consumption creates even more data
•Accumulative: Data’s value usually increases when it is used
•Big Data is like Oil or Soil?
•Data is Resource
Big Data is like Oil or Soil?
From Jianhui LI’s presentation at CODATA Workshop presentation, 11 June 2014
- He used Information from Yike GUO’ lecture
19. How to use this resource
•“Mining ”
to extract the actionable knowledge from the data by
understanding the correlations, causalities and to predict
future, just like digging for nuggets
•“Transduction”
to explore the optional use of the data to gain new value, just
like transforming energy
•“Interaction”
to enable the “data chemistry” approach to unleash the value
of combined data resource; just like organic reaction and
mineral composition
From presentation by Jianhui Li at CODATA 11 June 2014, Beijing, China
- He used information from Lecture in CNIC by Yike GUO
20. Discovery science – we need Abduction!
- a method of logical inference introduced by C. S. Peirce which comes prior to induction and deduction for which the colloquial name is to have a "hunch”
Human intuition is needed in interacting with large-scale data
From presentation by Peter Fox on 8 June at CODATA Workshop on Big Data for International Scientific Programmes, Beijing, China
21. Managing Big Data - CAS
From presentation by Danhuai Guo at CODATA Workshop on 13 June 2014, Beijing, China
22. Data Science as a new discipline
•Data Science is the extraction of actionable knowledge directly from data through a process of discovery, hypothesis, and analytical hypothesis analysis.
•Data Scientist is a practitioner who has sufficient knowledge of the overlapping regimes of expertise in business needs, domain knowledge, analytical skills and programming expertise to manage the end-to-end scientific method process through each stage in the Big Data lifecycle (through action) to deliver value.
From presentation by Wo Chang on 9 June at CODATA Workshop on Big Data for International Scientific Programmes, Beijing, China
23. Data Scientist
Data Scientist
t
From presentation by Jianhui Li at CODATA Training Workshop on Big Data for Science on 11 June 2014, Beijing, China
He used information from Lecture in CNIC by Yike GUO
24. How to be a data scientist
From presentation by Jianhui Li at CODATA Training Workshop on Big Data for Science on 11 June 2014, Beijing, China
He used information from Lecture in CNIC by Yike GUO
25. Cloud Computing for CAS data services
•Chinese Academy of Sciences Data Cloud (CAS Data Cloud) is focused on cloud technology to provide facilitated ways for scientists to make use of powerful information infrastructure, massive scientific data and rich scientific software
•It is a mixed evolution of grid computing, distributed computing, parallel computing, utility computing, network storage technologies, virtualization, and etc.
From presentation by Yuanchun Zhou at CODATA Workshop on 6 June 2014, Beijing, China
26. 3 Cloud Service Models
•Cloud Infrastructure as a Service (IaaS)
The capability provided to the consumer is to rent processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly select networking components (e.g., firewalls, load balancers).
•Cloud Platform as a Service (PaaS)
The capability provided to the consumer is to deploy onto the cloud infrastructure consumer- created applications using programming languages and tools supported by the provider (e.g., Java, Python, .Net). The consumer does not manage or control the underlying cloud infrastructure, network, servers, operating systems, or storage, but the consumer has control over the deployed applications and possibly application hosting environment configurations.
•Cloud Software as a Service (SaaS)
The capability provided to the consumer is to use the provider’s applications running on a cloud infrastructure and accessible from various client devices through a thin client interface such as a Web browser (e.g., web-based email). The consumer does not manage or control the underlying cloud infrastructure, network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.
From presentation by Yuanchun Zhou at CODATA Workshop on 6 June 2014, Beijing, China
27. 3 Cloud Service Models
From presentation by Yuanchun Zhou at CODATA Workshop 6 June 2014, Beijing, China
28. From presentation by Yuanchun Zhou at CODATA Workshop on 6 June 2014, Beijing, China
29. Data Analysis
Exploratory Data Analysis (EDA) is an approach/philosophy for data analysis that employs a variety of techniques (mostly graphical) to
•maximize insight into a data set;
•uncover underlying structure;
•extract important variables;
•detect outliers and anomalies;
•test underlying assumptions;
•develop parsimonious models;
•and determine optimal factor settings.
Exploratory Data Analysis (EDA)
From presentation by Danhuai Guo at CODATA Workshop on 13 June 2014, Beijing, China
30. EDA Techniques
•Most EDA techniques are graphical in nature with a few quantitative techniques.
The main role of EDA is to open-mindedly explore, and graphics gives
the analysts unparalleled power to do so, enticing the data to reveal
its structural secrets, and being always ready to gain some new, often
unsuspected, insight into the data.
•Graphics provide, unparalleled power to apply natural pattern- recognition capabilities. The particular graphical techniques employed in EDA are:
•Plotting the raw data (such as data traces, histograms, bihistograms, probability plots, lag plots, block plots, and Youden plots.
•Plotting simple statistics such as mean plots, standard deviation plots, box plots, and main effects plots of the raw data.
•Positioning such plots so as to maximize our natural pattern- recognition abilities, such as using multiple plots per page.
From presentation by Danhuai Guo at CODATA Workshop on 13 June 2014, Beijing, China
31. Visualisation of Data
Napoleon’s March to Moscow, Charles J. Minard, 1869
Not Something New
From Xiaoru Yuan’s presentation at CODATA Workshop on 12 June 2014, Beijing, China
32. Visualisation of Data: Examples
Cross-References in Bible
Kevin Hulsey Illustration, INC
From Xiaoru Yuan’s presentation at CODATA Workshop on 12 June 2014, Beijing, China
33. Visualisation of Data: Examples
From Xiaoru Yuan’s presentation at CODATA Workshop on 12 June 2014, Beijing, China
34. Visualisation of Data: Examples
From Xiaoru Yuan’s presentation at CODATA Workshop on 12 June 2014, Beijing, China
35. More examples
Scalable Multi-variate Analytics of Seismic
and Satellite-based Observational Data
Health Sciences Field
Wind flow
From Xiaoru Yuan’s presentation at CODATA Workshop on 12 June 2014, Beijing, China
36. Challenges!
•More and more unseen data
•Faster Creation and Collection
•Fast Dissemination
•5 exabytes of new information in 2002 [lyman 03] -
37000 Libraries of Congress
•161 exabytes in 2006 [Gantz07]
•988 exabytes in 2010
•Need better tools and algorithms to visually convey information
From Xiaoru Yuan’s presentation at CODATA Workshop on 12 June 2014, Beijing, China
37. Why Visualisations?
Reasons for Visualisations
•To expose ideas/relationships
•To answer questions
•To make an argument
•To make decisions
•To observe trends
•To see data in context
•To summarize/aggregate data
•To expand memory
•For archiving
•To support graphical calculation
•To create trust
•To find patterns
•To advertise ideas
•To present an argument
•For exploratory data analysis
•To tell a story
•To inspire
From Xiaoru Yuan’s presentation at CODATA Workshop on 12 June 2014, Beijing, China
38. Three Functions of Visualisations
•Record information
Photographs
Blueprints, etc
•Support reasoning about information (analyse)
Processing and calculations
Reasoning about data
Feedback and interaction
•Convey information to others (present)
Share and persuade
Collaborate and revise
Emphasize important aspects of data
From Xiaoru Yuan’s presentation at CODATA Workshop on 12 June 2014, Beijing, China
40. Data Publishing: Re-using World data sources
•Have a clear idea of what you want to do
creative idea
•Look for existing datasets (world data infrastructure
– GEOSS, WDS, GCMD …)
access the data
•Evaluate how this data is suitable for re-using data
processing for integrating
•Add your local knowledge and integrate it with the data
algorithm
•Get your data product analysed
discovery
•Publish your research products
publish paper and data
From LIU Chuang’s presentation at CODATA Workshop on 16 June 2014, Beijing, China
41. Data Publishing
•Big data not only means huge of volumes of data, but large of numbers of scientists using and creating data.
•Every scientist in their research uses data, also creates new data.
•Most of the datasets are at “sleep” in individual scientists’ offices
•Scientists’ IP is recognized and rewarded – data publishing is the best way to get data author(s)’ contribution to science recognized and rewarded.
Why should data be published?
From LIU Chuang’s presentation at CODATA Workshop on 16 June 2014, Beijing, China
42. Data Publishing- A New Research Data Sharing Model
•Make data persistently available on the Web:
•maybe with a procedure of quality control;
•So that they can be accessed, downloaded, analysed and reused by anyone for research or other purposes
•Including
•Academic Publication
•Commercial Publication
From presentation by Jianhui LI at CODATA Workshop on 11 June 2014, Beijing, China
43. Data Publishing: A Different Publication Model
•Resource Sharing Model
•Data collected by public funding and should be shared by public
•Data Centres and Libraries (Public Repository)
•Big Science, and especially observation data held by Big Science Facilities (Astronomy, High Energy Physics) or Organisations (USGS/NOAA etc.)
•Evidence/Supplement Publishing Model
•Data used by scientific paper should be opened as evidence reviewed by peer reviewers, accessed by readers (even reanalysis)
•Data as supplements to the scientific paper in Journal should be submitted to the public repository (Genebank, etc), before the paper is published
•Result Publishing Model (Data paper)
•Data paper + data repository
•Rewards to data author
From presentation by Jianhui LI at CODATA Workshop on 11 June 2014, Beijing, China
44. Data Publishing: Fundamental requirements for a Research Data Publication
•Persistent Unified Identifier
•Persistent Accessible
•Archived in one, or preferably, more than one independent document, or data repository, so that it will be available to be accessed persistently
•Understandable
•Necessary Metadata that are Machine Readable, and Understandable
•Rich Metadata/description that are human readable and understandable
•Reusable
•Quality, fit for use
•Easy for reuse (format, approach)
•Citable
From presentation by Jianhui LI at CODATA Workshop on 11 June 2014, Beijing, China
45. Data Journals
•Ecological Archives
•From 2000 to now, published 88 data papers - data paper’s average citation number is almost 11
•Earth System Science Data
•From 2009 to now, published 89 data papers – data paper’s average citation number is almost 12
•Biodiversity Data Journal
•From 2011 to now, published 11 data papers
•Nature Scientific Data
•Launched last month
From presentation by Jianhui LI at CODATA Workshop on 11 June 2014, Beijing, China
46. Registry of Data Repositories
Popular Data Registries:- Databib and re3data.org
•Databib connects to 978 data repositories and databases (agriculture,Geo-sciences,social Sciences,Biological sciences)
•re3data.org currently lists 634 research data repositories from different disciplines and 586 of these are described in detail using the re3data.org schema.
•In future, Databib and re3data.org will be merged into one service.
From ARD Prasad’s presentation at CODATA Workshop on 9 June 2014, Beijing, China
47. Public Research Data Repository
•Dryad
•PANGAEN
•Dataverse Network
•FigShare
From presentation by Jianhui LI at CODATA Workshop on 11 June 2014, Beijing, China
48. Dryad
PANGAEN
DataVerse
Domain
Any, now more in ecology and life science
Earth and Life Sciences
All disciplines worldwide
Founders
Non-profit membership organisation
AWI and MARUM in
Germany hosted, supported by several projects
Harvard University, Institute of Quantitative Social Science
Data type/Format
Any type
Any type
All file formats with maximum size of 2GB per file
Quality Control
Journal Peer-reviewed
Data Editorial ensures integrity and authenticity
By data owners
Data Providers
Research article authors
Authors for journal ESSD and various scientific journals related to earth system research
Any researcher worldwide (faculty, postdoc, student or staff)
Business Model
Member fees and DPCs
Free of charge, but appreciate financial contribution of average 300 €
Free to deposit data
Data Identifier
DOI
DOI
DOI
Data Package Download
Yes
Yes
Yes
API
A few
None
Data sharing API/Data deposit API
Citable
Yes, to data and paper
Yes, to data directly
Yes, to data directly
From presentation by Jianhui LI at CODATA Workshop on 11 June 2014, Beijing, China
49. Challenge: how to evaluate scientists’ contributions with regards to the data
•Who is the data author? Not very clear in some
cases in China
•Is the data reliable? Not very clear in some
datasets in China
•How to evaluate scientists’ contribution with
regards to their data? No standard and no
practices even;
•….;
From LIU Chuang’s presentation at CODATA Workshop on 16 June 2014, Beijing, China
50. DOI Technology and Standard
•DOI: ISO 26324 (May 2012)
•DOI: DOI/Handle system and technology worldwide
available
•This made data publishing possible and is the key
solution for data sharing in big data science
From LIU Chuang’s presentation at CODATA Workshop on 16 June 2014, Beijing, China
51. Data subm. and publ. sys. http://www.geodoi.ac.cn/
Peer reviewers
Peer review sys.
Digital Library
Data and Data paper publishing
Data Services
Impact factors
Alliance of Data Publ.
authors
editing
Subm.
Archive system
Data Visua. Sys.
Data and data paper publ. sys.
data services sys. for users
Data download records
Data citation retrieval sys.
China GEOSS
DOI:10.3974 Procedure and Flowchart of Global Change Research Data Publishing and Repository
Data Publishing
From LIU Chuang’s presentation at CODATA Workshop on 16 June 2014, Beijing, China
53. A clearer understanding of CODATA, all its activities, Workgroups, and Task groups
Things learned/of value/of interest
•International and national aspects of data policy
•Data policy committee: setting an international agenda for data policy, expert forum, advice and consultancy
•Coordinating with national committees
Data Policy
•Long-standing activities: fundamental constants.
•Strategic working groups; community-driven task groups
•Disciplinary and interdisciplinary data challenges, Big Data
Data Science
•Longstanding work on data preservation and access with developing countries
•Executive Committee Task Force on Capacity Building: setting an international agenda for capacity building; Early Career WG
Capacity Building
•Support for ICSU Mission
•Data issues and challenges in international, interdisciplinary science programmes
Data for International Science
From presentation by Simon Hodson at CODATA Workshop on 6 June 2014, in Beijing, China
54. Task groups in CODATA
•Advancing Informatics for Microbiology
•Anthropometric Data and Engineering
•Data at Risk
•Data Citation Standards and Practices
•Earth and Space Science Data Interoperability
•Exchangeable Materials Data Representation to Support Scientific Research and Education
•Fundamental Physical Constants
•Global Information Commons for Science Initiative
•Linked Open Data for Global Disaster Risk Research
•Octopus: Mining Space and Terrestrial Data for Improved Weather, Climate and Agriculture Predictions
•Global Roads Data Development
•Preservation of and Access to Scientific and Technical Data in/for/with Developing Countries (PASTD)
From presentation by Simon Hodson at CODATA Workshop on 6 June 2014, in Beijing, China
55. Invitation
Apply for corresponding membership to CODATA Task group on Preservation of and Access to Scientific and Technical Data in/for/with Developing Countries (PASTD)
http://www.codata.org/task-groups/preservation-of-and-access-to-scientific-and- technical-data-in-for-with-developing-countries-pastd CCOODDAATTAA
II
SS
UU
56. Invitation
Join CODATA Early Career Data Professionals Group (Initiative of CODATA EC Task Force on Capacity Building)
•This initiative is in line with two of the aims of NICIS (National Integrated Cyberinfrastructure System) of South Africa
“develop training for the human expertise necessary for data scientists,
including data management, sharing and analysis”; and
“actively participate in international forums related to Data Services to
promote South African activities and gain knowledge from other
international efforts”
•This initiative is in line with one of the aims of NeDICC (Network of Data and Information Curation Centres)
“to train and develop skills in research data management”
57. Events/Opportunities coming up
International Workshop on Open Data for Science and Sustainability in Developing Countries, Nairobi, Kenya, 4- 8 August 2014: http://www.codata-pastd.org
59. Events/Opportunities coming up
•Pivotal – The Spatial Edge Master Class for Resilient and Rapid Response, Brisbane, Australia: 29 June – July 2015
Hands-on Workshop on Visualisation
grounded on real-world solutions
http://pivotal2015.org
60. Actions
Network with participants in the CODATA Workshop
South Africa
India
Vietnam
Indonesia
Kenya
Uganda
Mongolia
Brasil
Tanzania (Zanzibar)
Colombia/ SA
61. Actions
Act as bridge between the various researchers in
different disciplines at University of Pretoria and
the presenters at the CODATA Training Workshop
• create awareness of initiatives
• link-up researchers with presenters
CCOODDAATTAA
II
SS
UU
62. Actions
Create awareness among networks in South Africa
about the activities, workgroups and taskgroups of
CODATA
NICIS
DIRISA CCOODDAATTAA
II
SS
UU
63. Actions
Work in closer collaboration with the National Research Foundation (the national member of CODATA) with regards to Data Management Initiatives CCOODDAATTAA
II
SS
UU
64. Actions
Include knowledge gained in the formulation of the
new UP Data Management Policy
CCOODDAATTAA
II
SS
UU
Workshop
65. Bibliography
•CHANG, WO. 2014. NIST Big Data Public Working Group and standardization activities. Presentation on 8 June 2014 at the CODATA Workshop on Big Data for International Scientific Programmes: challenges and opportunities, held in Beijing, China, 8-9 June 2014.
•CHENZHOU, CUI. 2014. Virtual Observatory, a global platform for astronomical data. Presentation on 18 June 2014 at the CODATA International Training Workshop in Big Data for Science for Researchers from Emerging and Developing Countries, held in Beijing, China, 5-20 June 2014.
•CHUANG, LIU. 2014. Data re-use and publishing for sciences and sustainability in developing countries. Presentation on 16 June 2014 at the CODATA International Training Workshop in Big Data for Science for Researchers from Emerging and Developing Countries, held in Beijing, China, 5-20 June 2014.
•DANHUAI, GUO. 2014. Exploratory analysis and visualization of spatio-temporal data. Presentation on 13 June 2014 at the CODATA International Training Workshop in Big Data for Science for Researchers from Emerging and Developing Countries, held in Beijing, China, 5-20 June 2014.
•The Fourth Paradigm: data-intensive scientific discovery. Edited by Tony Hey, Stewart Tansley, and Kristin Tolle. Redmond, WA : Microsoft Research, c2009.
66. Bibliography (2)
•FOX, PETER. 2014. Environmental Informatics: leveraging big data to solve big problems. Presentation on 8 June 2014 at the CODATA Workshop on Big Data for International Scientific Programmes: challenges and opportunities, held in Beijing, China, 8-9 June 2014.
•HODSON, SIMON. 2014. Global collaboration in data science: an introduction to CODATA. Presentation on 6 June 2014 at the CODATA International Training Workshop in Big Data for Science for Researchers from Emerging and Developing Countries, held in Beijing, China, 5-20 June 2014.
•HUADONG, GUO. 2014. Big data, big science, towards big discovery. Presentation on 8 June 2014 at the CODATA Workshop on Big Data for International Scientific Programmes: challenges and opportunities, held in Beijing, China, 8-9 June 2014.
•JIANHUI, LI. 2014. Understanding big data and data science: a dialogue with participants of the training workshop. Presentation on 11 June 2014 at the CODATA International Training Workshop in Big Data for Science for Researchers from Emerging and Developing Countries, held in Beijing, China, 5-20 June 2014.
•NORMANDEAU, KEVIN. 2013. Beyond volume, variety and velocity is the issue of big data veracity. Inside BIGDATA, 12 September 2013. [Online] available at http://inside- bigdata.com/2013/09/12/beyond-volume-variety-velocity-issue-big-data-veracity/ (Accessed 15 July 2014).
67. Bibliography (3)
•PRASAD, ARD. 2014. Metadata in big data. Presentation on 9 June 2014 at the CODATA Workshop on Big Data for International Scientific Programmes: challenges and opportunities, held in Beijing, China, 8-9 June 2014.
•SIEGFRIED, TOM. 2013. Why big data is bad for science. Science News, 26 November 2013. [Online] available at https://www.sciencenews.org/blog/context/why-big-data- bad-science (Accessed 15 July 2014).
•WANG, LIZHE. 2014. Data-intensive scientific discovery in Digital Earth. Presentation on 9 June 2014 at the CODATA Workshop on Big Data for International Scientific Programmes: challenges and opportunities, held in Beijing, China, 8-9 June 2014.
•WHITE, MICHAEL. 2013. How big data is changing science and [society]. Pacific Standard, 8 November 2013. [Online] available at http://www.psmag.com/navigation/nature-and-technology/big-data-changing- science-society-69650/ (Accessed 15 July 2014).
•YUANCHUN, ZHOU. 2014. Cloud concept, overview and CAS Data Cloud. Presentation on 6 June 2014 at the CODATA International Training Workshop in Big Data for Science for Researchers from Emerging and Developing Countries, held in Beijing, China, 5-20 June 2014.