SlideShare a Scribd company logo
1 of 25
Download to read offline
Introducing the Whole Tale Project:
Merging Science and Cyberinfrastructure Pathways
Bertram Ludäscher Victoria Stodden Matt Turk
Kyle Chard (U Chicago), Niall Gaffney (TACC), Matt Jones (UCSB),
Jarek Nabrzyski (Notre Dame), Kandace Turner (NCSA)
CIRSS Seminar
September 2, 2016
Problems Facing Data Researchers
Workflow for data research is fragmented
● Data comes from many sources and is “integrated
the old fashioned way” e.g. via chains of email
● Use a collection of cloud services copying data
from Dropbox and Box to local storage with a
distributed directory structures to organize (and
provide discovery) to data
● Actions taken on data are not recorded (custom
scripts, some version of a community developed
and supported codebase)
● Publication of final data as prescribed by a Data
Management Plan (hopefully with a DOI) with link in
publications gives no reproducibility
Whole Tale ~ Whole Story (research à publication)
~ Long Tail of Science (Lil’Data/MPC) + Big Data/HPC
Whole Tale (WT) Big Picture
• WT will leverage & contribute
to existing CI and tools to
support the whole science
story (= run-to-pub-cycle), and
providing access to big CI &
HPC for long tail researchers.
➡ Integrated tools to simplify
usage and promote best
practices.
• NSF CC*DNI DIBBS:
– 5 Institutions, 5 Years ($5M total)
– Cooperative Agreement
WT Project Org & Working Groups
Whole	Tale:	Merging	Science	&	CI	Pathways	
…	through	Working	Groups!
Working	Groups	Driving	Use	
Cases	and		Adoption
Working	Groups	to	Provide	Key	
Components
GSLIS Research Showcase
K.	Bocinsky,	T.	Kohler,	A	2000-year	reconstruction	of	the	rain-fed	
maize	agricultural	niche	in	the	US	Southwest.	Nature
Communications.	doi:10.1038/ncomms6618
Map	showing	the	"selected"	trees	for	reconstructing	
precipitation	at	four	sites	in	theCAR regression	
approach	(Correlation-Adjusted	corRelation).
Reconstructions	
for	AD	1247	
Science Pathways: Archaeology
Joining data from different environments to enable new research:
• Streamline gathering, integrating, and analyzing environmental data
needed to build up a fuller picture of the paleoclimate.
• Enables access and interrogation of data from DataONE, iPlant, and
the Long Term Ecological Research Network (LTER), leveraging
Globus On-line, Brown Dog’s data tilling services, RStudio, and
XSEDE resources.
• Enables access to the Digital Archeological Record (tDAR), both
through a native API and via a tDAR member node in DataONE.
The CI Side of this Science
Pathways (Archaeology)
Reproducible Science Example:
Paleoclimate Reconstruction
Science paper (OA) uses:
• open source code:
– R, PaleoCAR, …
• multiple tree-ring databases
• HPC resources
• Example WG Goal:
– Reproducibility study using
• YesWorkflow toolkit:
Workflow & provenance from
code
• Jupyter notebook
Science Pathways: Astronomy
• Researcher A uses university credentials to access large cosmological
simulation outputs from Blue Waters published into WT, does analysis
using Whole Tale services in a Jupyter Notebook, and creates a new
result. With the publication, user creates a DOI linking data and source
code used to generate data tied to original input data and references
this in his reviewed and published research paper.
• Another researcher finds the DOI, and is able to access data and
analysis to then compare model output with new observations from the
Hobby Eberly Telescope Dark Energy experiment on TACC systems.
Results are shared with the original author and a new DOI is created
for these results.
• Enabling direct analysis and
collaborative research on simulation
outputs stored in Whole Tale
enabled repositories via user-
supplied Python scripts.
• YT (yt-project.org), will provide
advanced, customizable analysis
and visualization, leveraging Jupyter
for provide the scripting support.
• Federation will allow jobs to move to
data or visa versa where appropriate
Science Pathways: Astronomy
The Whole Tale’s Approach
WT will integrate well established CI components creating a simple and
unified environment to use, share, and publish data and workflows
1. Unified Authentication via Globus Auth
2. Abstracted Storage Layer with a unified namespace
3. Integrated Python and R APIs integrated with Jupyter Notebook Environments
4. Ingest and publication service linking data, computations, and scholarly articles
5. OwnCloud desktop integration for “Dropbox like interface”
6. Event System to react to changes (e.g. new data published)
7. Data Dashboard to ease data management and service interactions
• Capture full workflow via Notebooks, scripts, and applications to be
published along with Data and Research publications
WT Dashboard
• Web-based interface to enable ‘live
articles’ and research repeatability by
enabling the execution of research
methods on data using NDS labs
Docker containers and notebooks
(Jupyter).
• Research methods: provided support
for running python scripts
• Research Data: interfaced with NDS
Labs Python API - connecting the
desktop to the NDS Labs data
storage mechanisms (iRODS,
Dropbox, Google Drive, SciDrive and
local file integration)
• Provide a Docker “diff” tarball for
downloading research run results
Demonstrated	@	SC	2014
http://ndspilot.com/nds/ndspilot1080p.mp4
Base Share Integrate Reproduce Operation
• Ingest	data	from	
HTTP,	Globus,	and	
DataONE
• Store	data	in	a	
private	cloud	based	
home	directory
• Move	and	manage	
data	in	iRODS
• Interact	with	data	
using	Jupyter
• Manage	data	
across	ownClound
&	iRODS
• Authenticate	
using	ORCID
• Interact	with	data	
through	a	suite	of	
frontends
• Automatically	
extract	key	
metadata
• Search	and	manage	
distributed	data	
from	within	
frontends
• Operate	on	remote	
data	as	if	it	were	
local	(including	
using	OAI-ORE)
• Utilize	a	single	
identity	across	
services	
• Discover	and	
share	frontends	
through	global	
repository
• Integrate	data	and	
workflows	with	
publications
• Issue,	resolve,	and	
track	identifiers	
for	distributed	
data
• Discover	data	using	
federated	and	
distributed	queries
• Track	provenance	
across	services
• Organize	data	
collections	via	user-
defined		
namespaces
High-level Milestones, Phases
Working with the NDS and Others
WT is an NDS Pilot Project
– Will develop and integrate system as
components in NDS Labs
– Work with communities to create
powerful environments tailored for the
communities needs
WT will work with NDS to provide status and
train other users
– NDS Meeting Hackathons
– Cooperation and collaboration with
other projects in NDS
WT will collaborate with other projects (DIBBs,
BDHub, RDA…)
è Get involved! (e.g., Working Groups,
Hackathons, Summer Internship) wholetale.org
Part 2
Victoria Stodden
WT “Philosophy”
• Why a platform?
• Computation is near-ubiquitious in
research, yet we have few best practices
or dissemination standards
• And it’s complex! Small changes in a
computational implementation or in the
data can have a dramatic impact on the
result.
Some Bold Assertions
• Software is used as a tool of discovery in nearly all
research today.
• When software is a key part of the discovery
process, it should be subject to the same
philosophy of transparency as any method.
• Software is an integral and inseparable
component of the computational infrastructure in
which most research takes place.
• Computational research is embedded in a social
structure which includes many stakeholders.
Example: Facebook Study
Dissemination is Incomplete
• Publications are missing details that are
necessary to understand and verify
published findings…
• => credibility crisis in computational
science
• How to reproduce the result? What’s
needed?
Computational Reproducibility
Traditionally two branches to the scientific method:
• Branch 1 (deductive): mathematics, formal logic,
• Branch 2 (empirical): statistical analysis of
controlled experiments.
Now, new branches due to technological changes?
• Branch 3,4? (computational): large scale
simulations / data driven computational science.
The Ubiquity of Error
The central motivation for the scientific method is to
root out error:
• Deductive branch: the well-defined concept of the
proof,
• Empirical branch: the machinery of hypothesis
testing, appropriate statistical methods, structured
communication of methods and protocols.
Claim: Computation presents only a potential
third/fourth branch of the scientific method, until
the development of comparable standards.
Whole Tale?
Proposed Solution:
• Capture computational steps / provide
compute environment
• Provide unique identifiers to
data/code/workflows associated with results
• Provide links to embed in the publication for
discoverability
• Preserve digital scholarly objects
So it looks pretty simple..
• What about big data?
• Complex codes?
• Reuse and bug fixes?
• Meta-analysis?
• Working with external groups, such as publishers?
• Incentives? What if they don’t come?
• Allocating resources? Sustainability models?
• What does citation mean and how are
contributions to be rewarded?
Incompleteness
• “I ran all the stuff and it’s still the wrong
answer!”
• “I got a different result, using your code
and data!”
• “Your code doesn’t work!”
• “Where’s all the documentation? I can’t
figure this thing out.”
Part 3 (Demo)
Matt Turk

More Related Content

What's hot

The SFX Framework for Context-Sensitive Reference Linking
The SFX Framework for  Context-Sensitive Reference LinkingThe SFX Framework for  Context-Sensitive Reference Linking
The SFX Framework for Context-Sensitive Reference LinkingHerbert Van de Sompel
 
Sciunits: Reusable Research Objects
Sciunits: Reusable Research Objects Sciunits: Reusable Research Objects
Sciunits: Reusable Research Objects Globus
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the partsCarole Goble
 
DuraMat Data Management and Analytics
DuraMat Data Management and AnalyticsDuraMat Data Management and Analytics
DuraMat Data Management and AnalyticsAnubhav Jain
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the ContinuumIan Foster
 
Scott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data PublishingScott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data PublishingGigaScience, BGI Hong Kong
 
GlobusWorld 2015
GlobusWorld 2015GlobusWorld 2015
GlobusWorld 2015Tanu Malik
 
Using Neo4j for exploring the research graph connections made by RD-Switchboard
Using Neo4j for exploring the research graph connections made by RD-SwitchboardUsing Neo4j for exploring the research graph connections made by RD-Switchboard
Using Neo4j for exploring the research graph connections made by RD-Switchboardamiraryani
 
Open-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data setsOpen-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data setsAnubhav Jain
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationIan Foster
 
Research Automation for Data-Driven Discovery
Research Automationfor Data-Driven DiscoveryResearch Automationfor Data-Driven Discovery
Research Automation for Data-Driven DiscoveryGlobus
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?Paul Groth
 
Understanding WeboNaver
Understanding WeboNaverUnderstanding WeboNaver
Understanding WeboNaverHan Woo PARK
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...BrianDeCost
 
An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)Robert Grossman
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalWaqas Tariq
 

What's hot (20)

The SFX Framework for Context-Sensitive Reference Linking
The SFX Framework for  Context-Sensitive Reference LinkingThe SFX Framework for  Context-Sensitive Reference Linking
The SFX Framework for Context-Sensitive Reference Linking
 
FAIRy Stories
FAIRy StoriesFAIRy Stories
FAIRy Stories
 
Sciunits: Reusable Research Objects
Sciunits: Reusable Research Objects Sciunits: Reusable Research Objects
Sciunits: Reusable Research Objects
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
DuraMat Data Management and Analytics
DuraMat Data Management and AnalyticsDuraMat Data Management and Analytics
DuraMat Data Management and Analytics
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
Scott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data PublishingScott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data Publishing
 
UPennONS
UPennONSUPennONS
UPennONS
 
Braintalk cuso nm
Braintalk cuso nmBraintalk cuso nm
Braintalk cuso nm
 
GlobusWorld 2015
GlobusWorld 2015GlobusWorld 2015
GlobusWorld 2015
 
Using Neo4j for exploring the research graph connections made by RD-Switchboard
Using Neo4j for exploring the research graph connections made by RD-SwitchboardUsing Neo4j for exploring the research graph connections made by RD-Switchboard
Using Neo4j for exploring the research graph connections made by RD-Switchboard
 
Open-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data setsOpen-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data sets
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud Automation
 
Research Automation for Data-Driven Discovery
Research Automationfor Data-Driven DiscoveryResearch Automationfor Data-Driven Discovery
Research Automation for Data-Driven Discovery
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
 
Understanding WeboNaver
Understanding WeboNaverUnderstanding WeboNaver
Understanding WeboNaver
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...
 
MUDROD - Ranking
MUDROD - RankingMUDROD - Ranking
MUDROD - Ranking
 
An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information Retrieval
 

Similar to Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure Pathways

Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynoteCarole Goble
 
Open Data (and Software, and other Research Artefacts) - A proper management
Open Data (and Software, and other Research Artefacts) -A proper managementOpen Data (and Software, and other Research Artefacts) -A proper management
Open Data (and Software, and other Research Artefacts) - A proper management Oscar Corcho
 
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award LectureWhy Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award LectureXiaogang (Marshall) Ma
 
How to Execute A Research Paper
How to Execute A Research PaperHow to Execute A Research Paper
How to Execute A Research PaperAnita de Waard
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityOscar Corcho
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of PublishingAnita de Waard
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
 
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds
 
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13DataDryad
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseAnita de Waard
 
Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Enabling better science: Results and vision of the OpenAIRE infrastructure an...Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Enabling better science: Results and vision of the OpenAIRE infrastructure an...OpenAIRE
 
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Paolo Manghi
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Jeroen Rombouts
 
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...datacite
 

Similar to Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure Pathways (20)

A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
 
Open Data (and Software, and other Research Artefacts) - A proper management
Open Data (and Software, and other Research Artefacts) -A proper managementOpen Data (and Software, and other Research Artefacts) -A proper management
Open Data (and Software, and other Research Artefacts) - A proper management
 
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award LectureWhy Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
 
How to Execute A Research Paper
How to Execute A Research PaperHow to Execute A Research Paper
How to Execute A Research Paper
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Data management
Data management Data management
Data management
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of Publishing
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
 
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 
Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Enabling better science: Results and vision of the OpenAIRE infrastructure an...Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Enabling better science: Results and vision of the OpenAIRE infrastructure an...
 
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10
 
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
 

More from Bertram Ludäscher

Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionBertram Ludäscher
 
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!Bertram Ludäscher
 
[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database Rules[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database RulesBertram Ludäscher
 
[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database Rules[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database RulesBertram Ludäscher
 
Answering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsAnswering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsBertram Ludäscher
 
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Bertram Ludäscher
 
Which Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A DialogueWhich Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A DialogueBertram Ludäscher
 
From Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesFrom Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesBertram Ludäscher
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesBertram Ludäscher
 
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsPossible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsBertram Ludäscher
 
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseDeduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseBertram Ludäscher
 
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...Bertram Ludäscher
 
Dissecting Reproducibility: A case study with ecological niche models in th...
Dissecting Reproducibility:  A case study with ecological niche models  in th...Dissecting Reproducibility:  A case study with ecological niche models  in th...
Dissecting Reproducibility: A case study with ecological niche models in th...Bertram Ludäscher
 
Incremental Recomputation: Those who cannot remember the past are condemned ...
Incremental Recomputation:  Those who cannot remember the past are condemned ...Incremental Recomputation:  Those who cannot remember the past are condemned ...
Incremental Recomputation: Those who cannot remember the past are condemned ...Bertram Ludäscher
 
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsValidation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsBertram Ludäscher
 
An ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflowsAn ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflowsBertram Ludäscher
 
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses ApproachKnowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses ApproachBertram Ludäscher
 
Whole-Tale: The Experience of Research
Whole-Tale: The Experience of ResearchWhole-Tale: The Experience of Research
Whole-Tale: The Experience of ResearchBertram Ludäscher
 
ETC & Authors in the Driver's Seat
ETC & Authors in the Driver's SeatETC & Authors in the Driver's Seat
ETC & Authors in the Driver's SeatBertram Ludäscher
 
From Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable ProvenanceFrom Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable ProvenanceBertram Ludäscher
 

More from Bertram Ludäscher (20)

Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
 
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
 
[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database Rules[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database Rules
 
[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database Rules[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database Rules
 
Answering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsAnswering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query Patterns
 
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
 
Which Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A DialogueWhich Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A Dialogue
 
From Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesFrom Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science Tales
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
 
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsPossible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
 
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseDeduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
 
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
 
Dissecting Reproducibility: A case study with ecological niche models in th...
Dissecting Reproducibility:  A case study with ecological niche models  in th...Dissecting Reproducibility:  A case study with ecological niche models  in th...
Dissecting Reproducibility: A case study with ecological niche models in th...
 
Incremental Recomputation: Those who cannot remember the past are condemned ...
Incremental Recomputation:  Those who cannot remember the past are condemned ...Incremental Recomputation:  Those who cannot remember the past are condemned ...
Incremental Recomputation: Those who cannot remember the past are condemned ...
 
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsValidation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
 
An ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflowsAn ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflows
 
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses ApproachKnowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
 
Whole-Tale: The Experience of Research
Whole-Tale: The Experience of ResearchWhole-Tale: The Experience of Research
Whole-Tale: The Experience of Research
 
ETC & Authors in the Driver's Seat
ETC & Authors in the Driver's SeatETC & Authors in the Driver's Seat
ETC & Authors in the Driver's Seat
 
From Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable ProvenanceFrom Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable Provenance
 

Recently uploaded

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 

Recently uploaded (20)

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 

Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure Pathways

  • 1. Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure Pathways Bertram Ludäscher Victoria Stodden Matt Turk Kyle Chard (U Chicago), Niall Gaffney (TACC), Matt Jones (UCSB), Jarek Nabrzyski (Notre Dame), Kandace Turner (NCSA) CIRSS Seminar September 2, 2016
  • 2. Problems Facing Data Researchers Workflow for data research is fragmented ● Data comes from many sources and is “integrated the old fashioned way” e.g. via chains of email ● Use a collection of cloud services copying data from Dropbox and Box to local storage with a distributed directory structures to organize (and provide discovery) to data ● Actions taken on data are not recorded (custom scripts, some version of a community developed and supported codebase) ● Publication of final data as prescribed by a Data Management Plan (hopefully with a DOI) with link in publications gives no reproducibility
  • 3. Whole Tale ~ Whole Story (research à publication) ~ Long Tail of Science (Lil’Data/MPC) + Big Data/HPC
  • 4. Whole Tale (WT) Big Picture • WT will leverage & contribute to existing CI and tools to support the whole science story (= run-to-pub-cycle), and providing access to big CI & HPC for long tail researchers. ➡ Integrated tools to simplify usage and promote best practices. • NSF CC*DNI DIBBS: – 5 Institutions, 5 Years ($5M total) – Cooperative Agreement
  • 5. WT Project Org & Working Groups Whole Tale: Merging Science & CI Pathways … through Working Groups! Working Groups Driving Use Cases and Adoption Working Groups to Provide Key Components
  • 7. Joining data from different environments to enable new research: • Streamline gathering, integrating, and analyzing environmental data needed to build up a fuller picture of the paleoclimate. • Enables access and interrogation of data from DataONE, iPlant, and the Long Term Ecological Research Network (LTER), leveraging Globus On-line, Brown Dog’s data tilling services, RStudio, and XSEDE resources. • Enables access to the Digital Archeological Record (tDAR), both through a native API and via a tDAR member node in DataONE. The CI Side of this Science Pathways (Archaeology)
  • 8. Reproducible Science Example: Paleoclimate Reconstruction Science paper (OA) uses: • open source code: – R, PaleoCAR, … • multiple tree-ring databases • HPC resources • Example WG Goal: – Reproducibility study using • YesWorkflow toolkit: Workflow & provenance from code • Jupyter notebook
  • 9. Science Pathways: Astronomy • Researcher A uses university credentials to access large cosmological simulation outputs from Blue Waters published into WT, does analysis using Whole Tale services in a Jupyter Notebook, and creates a new result. With the publication, user creates a DOI linking data and source code used to generate data tied to original input data and references this in his reviewed and published research paper. • Another researcher finds the DOI, and is able to access data and analysis to then compare model output with new observations from the Hobby Eberly Telescope Dark Energy experiment on TACC systems. Results are shared with the original author and a new DOI is created for these results.
  • 10. • Enabling direct analysis and collaborative research on simulation outputs stored in Whole Tale enabled repositories via user- supplied Python scripts. • YT (yt-project.org), will provide advanced, customizable analysis and visualization, leveraging Jupyter for provide the scripting support. • Federation will allow jobs to move to data or visa versa where appropriate Science Pathways: Astronomy
  • 11. The Whole Tale’s Approach WT will integrate well established CI components creating a simple and unified environment to use, share, and publish data and workflows 1. Unified Authentication via Globus Auth 2. Abstracted Storage Layer with a unified namespace 3. Integrated Python and R APIs integrated with Jupyter Notebook Environments 4. Ingest and publication service linking data, computations, and scholarly articles 5. OwnCloud desktop integration for “Dropbox like interface” 6. Event System to react to changes (e.g. new data published) 7. Data Dashboard to ease data management and service interactions • Capture full workflow via Notebooks, scripts, and applications to be published along with Data and Research publications
  • 12. WT Dashboard • Web-based interface to enable ‘live articles’ and research repeatability by enabling the execution of research methods on data using NDS labs Docker containers and notebooks (Jupyter). • Research methods: provided support for running python scripts • Research Data: interfaced with NDS Labs Python API - connecting the desktop to the NDS Labs data storage mechanisms (iRODS, Dropbox, Google Drive, SciDrive and local file integration) • Provide a Docker “diff” tarball for downloading research run results Demonstrated @ SC 2014 http://ndspilot.com/nds/ndspilot1080p.mp4
  • 13. Base Share Integrate Reproduce Operation • Ingest data from HTTP, Globus, and DataONE • Store data in a private cloud based home directory • Move and manage data in iRODS • Interact with data using Jupyter • Manage data across ownClound & iRODS • Authenticate using ORCID • Interact with data through a suite of frontends • Automatically extract key metadata • Search and manage distributed data from within frontends • Operate on remote data as if it were local (including using OAI-ORE) • Utilize a single identity across services • Discover and share frontends through global repository • Integrate data and workflows with publications • Issue, resolve, and track identifiers for distributed data • Discover data using federated and distributed queries • Track provenance across services • Organize data collections via user- defined namespaces High-level Milestones, Phases
  • 14. Working with the NDS and Others WT is an NDS Pilot Project – Will develop and integrate system as components in NDS Labs – Work with communities to create powerful environments tailored for the communities needs WT will work with NDS to provide status and train other users – NDS Meeting Hackathons – Cooperation and collaboration with other projects in NDS WT will collaborate with other projects (DIBBs, BDHub, RDA…) è Get involved! (e.g., Working Groups, Hackathons, Summer Internship) wholetale.org
  • 16. WT “Philosophy” • Why a platform? • Computation is near-ubiquitious in research, yet we have few best practices or dissemination standards • And it’s complex! Small changes in a computational implementation or in the data can have a dramatic impact on the result.
  • 17. Some Bold Assertions • Software is used as a tool of discovery in nearly all research today. • When software is a key part of the discovery process, it should be subject to the same philosophy of transparency as any method. • Software is an integral and inseparable component of the computational infrastructure in which most research takes place. • Computational research is embedded in a social structure which includes many stakeholders.
  • 19. Dissemination is Incomplete • Publications are missing details that are necessary to understand and verify published findings… • => credibility crisis in computational science • How to reproduce the result? What’s needed?
  • 20. Computational Reproducibility Traditionally two branches to the scientific method: • Branch 1 (deductive): mathematics, formal logic, • Branch 2 (empirical): statistical analysis of controlled experiments. Now, new branches due to technological changes? • Branch 3,4? (computational): large scale simulations / data driven computational science.
  • 21. The Ubiquity of Error The central motivation for the scientific method is to root out error: • Deductive branch: the well-defined concept of the proof, • Empirical branch: the machinery of hypothesis testing, appropriate statistical methods, structured communication of methods and protocols. Claim: Computation presents only a potential third/fourth branch of the scientific method, until the development of comparable standards.
  • 22. Whole Tale? Proposed Solution: • Capture computational steps / provide compute environment • Provide unique identifiers to data/code/workflows associated with results • Provide links to embed in the publication for discoverability • Preserve digital scholarly objects
  • 23. So it looks pretty simple.. • What about big data? • Complex codes? • Reuse and bug fixes? • Meta-analysis? • Working with external groups, such as publishers? • Incentives? What if they don’t come? • Allocating resources? Sustainability models? • What does citation mean and how are contributions to be rewarded?
  • 24. Incompleteness • “I ran all the stuff and it’s still the wrong answer!” • “I got a different result, using your code and data!” • “Your code doesn’t work!” • “Where’s all the documentation? I can’t figure this thing out.”