SlideShare a Scribd company logo
1 of 18
Download to read offline
TWC 
Why Data Science Matters 
Xiaogang (Marshall) Ma 
Tetherless World Constellation 
Rensselaer Polytechnic Institute 
Email: max7@rpi.edu; Twitter: @MarshallXMa 
ICSU-WDS Data Stewardship Award Lecture 
SciDataCon 2014, New Delhi, India, Nov. 02-05
TAckWnowledgCements 
• Dr. Mustapha Mokrane and Dr. Simon Hodson 
• Colleagues at TWC/RPI, CODATA-ECDP, ESIP, CGI-IUGS, 
AGU/ESSI, ICSU-WDS, RDA, ITC, and more 
• My mentor Prof. Peter Fox 
• My family 
• All of you
TWOutlinCe 
• Technical trends 
– Data management, publication & citation 
• Methodology 
– Interoperability & Provenance 
• Data management is just a start 
– Data analysis 
– Semantic eScience 
3
TDatWa ManagCement 
4 
data work 
Image courtesy Randy Glasbergen
DTata MWanagemCent Plan 
• Data Management Plan 
– A formal document that outlines what you will do with your data 
during and after you complete your research 
• Resources/Tools help create DMPs: 
– NSF Data Management Plan Requirements: 
http://www.nsf.gov/eng/general/dmp.jsp 
– DCC Data Management Plans: 
http://www.dcc.ac.uk/resources/data-management-plans 
– DMPTool: https://dmptool.org 
– DCC DMPOnline: https://dmponline.dcc.ac.uk 
5
TDaWta PubliCcation 
• Data as first class products of research 
– e.g., NSF bio-sketches can include data publications 
See: http://www.nsf.gov/pubs/2013/nsf13004/nsf13004.jsp 
6 Image from j4h.net
TWC 
7 
“All data necessary to understand, assess, and extend the conclusions of 
the manuscript must be available to any reader of Science. ” 
“…authors are required to make materials, data and associated protocols 
promptly available to readers without undue qualifications.” 
“…authors must make materials, data, and associated protocols available 
to readers.” 
“…it is a condition of publication that authors make available the data and 
research materials supporting the results in the article.” 
“…require authors to make all data underlying the findings described in 
their manuscript fully available without restriction…” 
“Earth and space science data should be widely accessible in multiple 
formats and long‐term preservation of data is an integral responsibility of 
scientists and sponsoring institutions.” 
“…support the principle that research data should be made freely 
available to all researchers…” 
“…recommends depositing data that correspond to journal articles in 
reliable data repositories…”
TWC 
• Ways of data publication 
– Data as supplemental material of a paper 
– Standalone data 
– Data paper: data in a repository + descriptive ‘data paper’ 
8 
Examples: 
• Standalone data journals: Nature Scientific Data, Geoscience Data 
Journal, Ecological Archives, Data in Brief … 
• Journals that publish data papers: Earth and Space Science, 
GigaScience, F1000 Research, Internet Archaeology … 
Strasser, GeoData 2014 Workshop Presentation (2014)
TWC 
9 
An isolateddata island ?! 
Image from nature.com
TDWata CitaCtion 
• Data Citation Index 
– Indexes the world's leading data repositories 
– Connects datasets to related refereed literature indexed in 
the Web of Science™ 
– Efficient access to data across subjects and regions 
10 
Image courtesy http://wokinfo.com
TDataW interopCerability 
11 
Interoperability: 
“Data should be discoverable, accessible, decodable, 
understandable and usable, and data sharing should be 
legal and ethical for all participants.” 
Ma et al., Nature Geosciecne (2011) 
Original image from: http://ehna.org
PTroveWnance ofC research 
12 
Provenance documentation 
“Linking a range of observations and model outputs, research 
activities, people and organizations involved in the production of 
scientific findings with the supporting data sets and methods 
used to generate them” 
Image from nature.com 
Ma et al., Nature Climate Change (2014) 
http://data.globalchange.gov
TWC • IPython Notebook: 
A web-based interactive computational environment 
Codes, APIs, 
datasets, text… 
PDF document 
• We made extension to the IPython Notebook 
environment to enable automatic provenance 
capture during a scientific workflow 
Di Stefano et al., ESIP 2014 Summer Meeting Presentation (2014) 
13
TWC 
14
TSemWantic eSCcience 
• Artificial Intelligence accelerates scientific discovery 
– Data search, synthesis and hypothesis representation 
– Data analysis: reasoning with models of the data 
Gil et al., Science (2014) 
Image from science.com 
A state-of-the-art example: 
Hanalyzer (high-throughput analyzer) 
• Uses natural language processing to 
automatically extract a semantic network from 
all PubMed papers relevant to a scientist 
• Uses Semantic Web technology to integrate 
assertions from other biomedical sources 
• Reasons about the network to find new 
correlations that suggest new genes to 
investigate 
Leach et al., PLoS Comput Bio (2009) 
15
TWC Deep Carbon Virtual Observatory 
Fox, RDA Fourth Plenary Meeting Presentation (2014) 
A cyber-enabled 
platform for linked 
science 
http://deepcarbon.net
TWSummaCry 
• Data as first class products of research 
• eScience: the digital or electronic facilitation of science 
• Semantic eScience 
– A virtuous circle between science and semantic technologies 
– Data driven + Knowledge driven? 
Image courtesy @WileyExchanges 
17
TWC 
More information: 
Marshall X Ma 
max7@rpi.edu 
Thank you!

More Related Content

What's hot

Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at ElsevierPaul Groth
 
Research data management free online courses, publisher policies
Research data management   free online courses, publisher policiesResearch data management   free online courses, publisher policies
Research data management free online courses, publisher policiesNikesh Narayanan
 
SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...Natalie Stanford
 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016Susanna-Assunta Sansone
 
Data Management Services at the Morgan Library
Data Management Services at the Morgan LibraryData Management Services at the Morgan Library
Data Management Services at the Morgan LibraryC. Tobin Magle
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsPaul Groth
 
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13DataDryad
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data managementCunera Buys
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialPaul Groth
 
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar SeriesKatina Toufexis
 
The Disappearing Data Scientist
The Disappearing Data ScientistThe Disappearing Data Scientist
The Disappearing Data ScientistKurt Cagle
 
No more waiting! Tools that work Today to reveal dataset use
No more waiting!  Tools that work Today to reveal dataset useNo more waiting!  Tools that work Today to reveal dataset use
No more waiting! Tools that work Today to reveal dataset useHeather Piwowar
 
Machines are people too
Machines are people tooMachines are people too
Machines are people tooPaul Groth
 
Collaborative Data Management using OSF
Collaborative Data Management using OSFCollaborative Data Management using OSF
Collaborative Data Management using OSFC. Tobin Magle
 
Knowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityKnowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityJames Hendler
 

What's hot (20)

Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at Elsevier
 
Strasser "Effective data management and its role in open research"
Strasser "Effective data management and its role in open research"Strasser "Effective data management and its role in open research"
Strasser "Effective data management and its role in open research"
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Research data management free online courses, publisher policies
Research data management   free online courses, publisher policiesResearch data management   free online courses, publisher policies
Research data management free online courses, publisher policies
 
SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...
 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016
 
Data Management Services at the Morgan Library
Data Management Services at the Morgan LibraryData Management Services at the Morgan Library
Data Management Services at the Morgan Library
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
 
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data management
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational Material
 
EDI Training Module 2: EDI Project
EDI Training Module 2:  EDI ProjectEDI Training Module 2:  EDI Project
EDI Training Module 2: EDI Project
 
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
 
The Disappearing Data Scientist
The Disappearing Data ScientistThe Disappearing Data Scientist
The Disappearing Data Scientist
 
Open Research 2017
Open Research 2017Open Research 2017
Open Research 2017
 
No more waiting! Tools that work Today to reveal dataset use
No more waiting!  Tools that work Today to reveal dataset useNo more waiting!  Tools that work Today to reveal dataset use
No more waiting! Tools that work Today to reveal dataset use
 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
 
Collaborative Data Management using OSF
Collaborative Data Management using OSFCollaborative Data Management using OSF
Collaborative Data Management using OSF
 
Knowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityKnowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/Interoperability
 

Viewers also liked

Exploring the Web of Data for Earth and Environmental Sciences
Exploring the Web of Data for Earth and Environmental SciencesExploring the Web of Data for Earth and Environmental Sciences
Exploring the Web of Data for Earth and Environmental SciencesXiaogang (Marshall) Ma
 
Ontology spectrum for geological data interoperability (PhD defense nov 2011)
Ontology spectrum for geological data interoperability (PhD defense nov 2011)Ontology spectrum for geological data interoperability (PhD defense nov 2011)
Ontology spectrum for geological data interoperability (PhD defense nov 2011)Xiaogang (Marshall) Ma
 
Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Obser...
Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Obser...Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Obser...
Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Obser...Xiaogang (Marshall) Ma
 
Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semant...
Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semant...Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semant...
Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semant...Xiaogang (Marshall) Ma
 
A short story of geologic time ontologies and vocabularies
A short story of geologic time ontologies and vocabulariesA short story of geologic time ontologies and vocabularies
A short story of geologic time ontologies and vocabulariesXiaogang (Marshall) Ma
 
Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal
Adoption of RDA DTR and PID in Deep Carbon Observatory Data PortalAdoption of RDA DTR and PID in Deep Carbon Observatory Data Portal
Adoption of RDA DTR and PID in Deep Carbon Observatory Data PortalXiaogang (Marshall) Ma
 
Exploratory visualization of earth science data in a Semantic Web context
Exploratory visualization of earth science data in a Semantic Web contextExploratory visualization of earth science data in a Semantic Web context
Exploratory visualization of earth science data in a Semantic Web contextXiaogang (Marshall) Ma
 
Ontology Development for Provenance Tracing in National Climate Assessment o...
Ontology Development for Provenance Tracing in National Climate Assessment o...Ontology Development for Provenance Tracing in National Climate Assessment o...
Ontology Development for Provenance Tracing in National Climate Assessment o...Xiaogang (Marshall) Ma
 
A short review of Connected China: A visualization of elite social networks i...
A short review of Connected China: A visualization of elite social networks i...A short review of Connected China: A visualization of elite social networks i...
A short review of Connected China: A visualization of elite social networks i...Xiaogang (Marshall) Ma
 
A use case-driven iterative method for building a provenance-aware GCIS onto...
A use case-driven iterative method for building a provenance-aware GCIS onto...A use case-driven iterative method for building a provenance-aware GCIS onto...
A use case-driven iterative method for building a provenance-aware GCIS onto...Xiaogang (Marshall) Ma
 
Why data science matters and what we can do with it
Why data science matters and what we can do with itWhy data science matters and what we can do with it
Why data science matters and what we can do with itXiaogang (Marshall) Ma
 
From data portal to knowledge portal: Leveraging semantic technologies to sup...
From data portal to knowledge portal: Leveraging semantic technologies to sup...From data portal to knowledge portal: Leveraging semantic technologies to sup...
From data portal to knowledge portal: Leveraging semantic technologies to sup...Xiaogang (Marshall) Ma
 

Viewers also liked (13)

Exploring the Web of Data for Earth and Environmental Sciences
Exploring the Web of Data for Earth and Environmental SciencesExploring the Web of Data for Earth and Environmental Sciences
Exploring the Web of Data for Earth and Environmental Sciences
 
Ontology spectrum for geological data interoperability (PhD defense nov 2011)
Ontology spectrum for geological data interoperability (PhD defense nov 2011)Ontology spectrum for geological data interoperability (PhD defense nov 2011)
Ontology spectrum for geological data interoperability (PhD defense nov 2011)
 
Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Obser...
Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Obser...Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Obser...
Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Obser...
 
Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semant...
Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semant...Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semant...
Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semant...
 
A short story of geologic time ontologies and vocabularies
A short story of geologic time ontologies and vocabulariesA short story of geologic time ontologies and vocabularies
A short story of geologic time ontologies and vocabularies
 
Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal
Adoption of RDA DTR and PID in Deep Carbon Observatory Data PortalAdoption of RDA DTR and PID in Deep Carbon Observatory Data Portal
Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal
 
Exploratory visualization of earth science data in a Semantic Web context
Exploratory visualization of earth science data in a Semantic Web contextExploratory visualization of earth science data in a Semantic Web context
Exploratory visualization of earth science data in a Semantic Web context
 
Ontology Development for Provenance Tracing in National Climate Assessment o...
Ontology Development for Provenance Tracing in National Climate Assessment o...Ontology Development for Provenance Tracing in National Climate Assessment o...
Ontology Development for Provenance Tracing in National Climate Assessment o...
 
A short review of Connected China: A visualization of elite social networks i...
A short review of Connected China: A visualization of elite social networks i...A short review of Connected China: A visualization of elite social networks i...
A short review of Connected China: A visualization of elite social networks i...
 
A use case-driven iterative method for building a provenance-aware GCIS onto...
A use case-driven iterative method for building a provenance-aware GCIS onto...A use case-driven iterative method for building a provenance-aware GCIS onto...
A use case-driven iterative method for building a provenance-aware GCIS onto...
 
Why data science matters and what we can do with it
Why data science matters and what we can do with itWhy data science matters and what we can do with it
Why data science matters and what we can do with it
 
From data portal to knowledge portal: Leveraging semantic technologies to sup...
From data portal to knowledge portal: Leveraging semantic technologies to sup...From data portal to knowledge portal: Leveraging semantic technologies to sup...
From data portal to knowledge portal: Leveraging semantic technologies to sup...
 
A short introduction to GIS
A short introduction to GISA short introduction to GIS
A short introduction to GIS
 

Similar to Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture

Research data management for masters and ph d students
Research data management for masters and ph d studentsResearch data management for masters and ph d students
Research data management for masters and ph d studentsDebs Martindale
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...EDINA, University of Edinburgh
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ LibraryARDC
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Anita de Waard
 
Open science curriculum for students, June 2019
Open science curriculum for students, June 2019Open science curriculum for students, June 2019
Open science curriculum for students, June 2019Dag Endresen
 
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...hsuleslie
 
Tools für das Management von Forschungsdaten
Tools für das Management von ForschungsdatenTools für das Management von Forschungsdaten
Tools für das Management von ForschungsdatenHeinz Pampel
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Susanna-Assunta Sansone
 
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016Jisc
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Bertram Ludäscher
 
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds
 
Data Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn WoolfreyData Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn Woolfreypvhead123
 
Data sharing as part of the research ecosystem
Data sharing as part of the research ecosystemData sharing as part of the research ecosystem
Data sharing as part of the research ecosystemVarsha Khodiyar
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Robin Rice
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Anita de Waard
 
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017ARDC
 

Similar to Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture (20)

Research data management for masters and ph d students
Research data management for masters and ph d studentsResearch data management for masters and ph d students
Research data management for masters and ph d students
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ Library
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
 
Open science curriculum for students, June 2019
Open science curriculum for students, June 2019Open science curriculum for students, June 2019
Open science curriculum for students, June 2019
 
Researh data management
Researh data managementResearh data management
Researh data management
 
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
 
Tools für das Management von Forschungsdaten
Tools für das Management von ForschungsdatenTools für das Management von Forschungsdaten
Tools für das Management von Forschungsdaten
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
 
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
 
Preparing Your Research Material for the Future - 2015-02-23 - Humanities Div...
Preparing Your Research Material for the Future - 2015-02-23 - Humanities Div...Preparing Your Research Material for the Future - 2015-02-23 - Humanities Div...
Preparing Your Research Material for the Future - 2015-02-23 - Humanities Div...
 
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
 
Data Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn WoolfreyData Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn Woolfrey
 
Open Science and Open Data for Librarians
Open Science and Open Data for LibrariansOpen Science and Open Data for Librarians
Open Science and Open Data for Librarians
 
Data sharing as part of the research ecosystem
Data sharing as part of the research ecosystemData sharing as part of the research ecosystem
Data sharing as part of the research ecosystem
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...
 
Preparing Your Research Material for the Future - 2014-06-09 - Humanities Div...
Preparing Your Research Material for the Future - 2014-06-09 - Humanities Div...Preparing Your Research Material for the Future - 2014-06-09 - Humanities Div...
Preparing Your Research Material for the Future - 2014-06-09 - Humanities Div...
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
 

Recently uploaded

GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx201bo007
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clonechaudhary charan shingh university
 
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasBACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasChayanika Das
 
Production technology of Brinjal -Solanum melongena
Production technology of Brinjal -Solanum melongenaProduction technology of Brinjal -Solanum melongena
Production technology of Brinjal -Solanum melongenajana861314
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxPayal Shrivastava
 
Telephone Traffic Engineering Online Lec
Telephone Traffic Engineering Online LecTelephone Traffic Engineering Online Lec
Telephone Traffic Engineering Online Lecfllcampolet
 
Introduction of Organ-On-A-Chip - Creative Biolabs
Introduction of Organ-On-A-Chip - Creative BiolabsIntroduction of Organ-On-A-Chip - Creative Biolabs
Introduction of Organ-On-A-Chip - Creative BiolabsCreative-Biolabs
 
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary Microbiology
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary MicrobiologyLAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary Microbiology
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary MicrobiologyChayanika Das
 
cybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitationcybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitationSanghamitraMohapatra5
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsDanielBaumann11
 
Think Science: What Are Eclipses (101), by Craig Bobchin
Think Science: What Are Eclipses (101), by Craig BobchinThink Science: What Are Eclipses (101), by Craig Bobchin
Think Science: What Are Eclipses (101), by Craig BobchinNathan Cone
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGiovaniTrinidad
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxzeus70441
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsDobusch Leonhard
 

Recently uploaded (20)

GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx
 
Ultrastructure and functions of Chloroplast.pptx
Ultrastructure and functions of Chloroplast.pptxUltrastructure and functions of Chloroplast.pptx
Ultrastructure and functions of Chloroplast.pptx
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clone
 
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasBACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
 
Production technology of Brinjal -Solanum melongena
Production technology of Brinjal -Solanum melongenaProduction technology of Brinjal -Solanum melongena
Production technology of Brinjal -Solanum melongena
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptx
 
Telephone Traffic Engineering Online Lec
Telephone Traffic Engineering Online LecTelephone Traffic Engineering Online Lec
Telephone Traffic Engineering Online Lec
 
Introduction of Organ-On-A-Chip - Creative Biolabs
Introduction of Organ-On-A-Chip - Creative BiolabsIntroduction of Organ-On-A-Chip - Creative Biolabs
Introduction of Organ-On-A-Chip - Creative Biolabs
 
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary Microbiology
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary MicrobiologyLAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary Microbiology
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary Microbiology
 
cybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitationcybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitation
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
 
Think Science: What Are Eclipses (101), by Craig Bobchin
Think Science: What Are Eclipses (101), by Craig BobchinThink Science: What Are Eclipses (101), by Craig Bobchin
Think Science: What Are Eclipses (101), by Craig Bobchin
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptx
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptx
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and Pitfalls
 
Interferons.pptx.
Interferons.pptx.Interferons.pptx.
Interferons.pptx.
 

Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture

  • 1. TWC Why Data Science Matters Xiaogang (Marshall) Ma Tetherless World Constellation Rensselaer Polytechnic Institute Email: max7@rpi.edu; Twitter: @MarshallXMa ICSU-WDS Data Stewardship Award Lecture SciDataCon 2014, New Delhi, India, Nov. 02-05
  • 2. TAckWnowledgCements • Dr. Mustapha Mokrane and Dr. Simon Hodson • Colleagues at TWC/RPI, CODATA-ECDP, ESIP, CGI-IUGS, AGU/ESSI, ICSU-WDS, RDA, ITC, and more • My mentor Prof. Peter Fox • My family • All of you
  • 3. TWOutlinCe • Technical trends – Data management, publication & citation • Methodology – Interoperability & Provenance • Data management is just a start – Data analysis – Semantic eScience 3
  • 4. TDatWa ManagCement 4 data work Image courtesy Randy Glasbergen
  • 5. DTata MWanagemCent Plan • Data Management Plan – A formal document that outlines what you will do with your data during and after you complete your research • Resources/Tools help create DMPs: – NSF Data Management Plan Requirements: http://www.nsf.gov/eng/general/dmp.jsp – DCC Data Management Plans: http://www.dcc.ac.uk/resources/data-management-plans – DMPTool: https://dmptool.org – DCC DMPOnline: https://dmponline.dcc.ac.uk 5
  • 6. TDaWta PubliCcation • Data as first class products of research – e.g., NSF bio-sketches can include data publications See: http://www.nsf.gov/pubs/2013/nsf13004/nsf13004.jsp 6 Image from j4h.net
  • 7. TWC 7 “All data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science. ” “…authors are required to make materials, data and associated protocols promptly available to readers without undue qualifications.” “…authors must make materials, data, and associated protocols available to readers.” “…it is a condition of publication that authors make available the data and research materials supporting the results in the article.” “…require authors to make all data underlying the findings described in their manuscript fully available without restriction…” “Earth and space science data should be widely accessible in multiple formats and long‐term preservation of data is an integral responsibility of scientists and sponsoring institutions.” “…support the principle that research data should be made freely available to all researchers…” “…recommends depositing data that correspond to journal articles in reliable data repositories…”
  • 8. TWC • Ways of data publication – Data as supplemental material of a paper – Standalone data – Data paper: data in a repository + descriptive ‘data paper’ 8 Examples: • Standalone data journals: Nature Scientific Data, Geoscience Data Journal, Ecological Archives, Data in Brief … • Journals that publish data papers: Earth and Space Science, GigaScience, F1000 Research, Internet Archaeology … Strasser, GeoData 2014 Workshop Presentation (2014)
  • 9. TWC 9 An isolateddata island ?! Image from nature.com
  • 10. TDWata CitaCtion • Data Citation Index – Indexes the world's leading data repositories – Connects datasets to related refereed literature indexed in the Web of Science™ – Efficient access to data across subjects and regions 10 Image courtesy http://wokinfo.com
  • 11. TDataW interopCerability 11 Interoperability: “Data should be discoverable, accessible, decodable, understandable and usable, and data sharing should be legal and ethical for all participants.” Ma et al., Nature Geosciecne (2011) Original image from: http://ehna.org
  • 12. PTroveWnance ofC research 12 Provenance documentation “Linking a range of observations and model outputs, research activities, people and organizations involved in the production of scientific findings with the supporting data sets and methods used to generate them” Image from nature.com Ma et al., Nature Climate Change (2014) http://data.globalchange.gov
  • 13. TWC • IPython Notebook: A web-based interactive computational environment Codes, APIs, datasets, text… PDF document • We made extension to the IPython Notebook environment to enable automatic provenance capture during a scientific workflow Di Stefano et al., ESIP 2014 Summer Meeting Presentation (2014) 13
  • 15. TSemWantic eSCcience • Artificial Intelligence accelerates scientific discovery – Data search, synthesis and hypothesis representation – Data analysis: reasoning with models of the data Gil et al., Science (2014) Image from science.com A state-of-the-art example: Hanalyzer (high-throughput analyzer) • Uses natural language processing to automatically extract a semantic network from all PubMed papers relevant to a scientist • Uses Semantic Web technology to integrate assertions from other biomedical sources • Reasons about the network to find new correlations that suggest new genes to investigate Leach et al., PLoS Comput Bio (2009) 15
  • 16. TWC Deep Carbon Virtual Observatory Fox, RDA Fourth Plenary Meeting Presentation (2014) A cyber-enabled platform for linked science http://deepcarbon.net
  • 17. TWSummaCry • Data as first class products of research • eScience: the digital or electronic facilitation of science • Semantic eScience – A virtuous circle between science and semantic technologies – Data driven + Knowledge driven? Image courtesy @WileyExchanges 17
  • 18. TWC More information: Marshall X Ma max7@rpi.edu Thank you!