SlideShare a Scribd company logo
1 of 7
Download to read offline
National Center for Supercomputing Applications
University of Illinois at Urbana–Champaign
Expressing and sharing workflows
Daniel S. Katz
Assistant Director for Scientific Software & Applications, NCSA
Research Associate Professor, CS
Research Associate Professor, ECE
Research Associate Professor, iSchool
dskatz@illinois.edu, d.katz@ieee.org
@danielskatz
What’s a workflow?
• A set of tasks and dependencies between them
• Perhaps expressed as data structure, e.g. graph (DAG or cyclic)
• How is this different than a computer program?
• The tasks as more well-defined (inputs, outputs)
• The tasks are longer (running time O(sec) – O(hr))
• Why express it differently?
• Program (script) is a natural way of expressing a workflow
• Examples: shell scripts, programs in Swift/Parsl
• YesWorkflow annotations to help in understanding scripts
• Swift/Parsl: functions used to identify components
• Expressing it as data corresponds to the compiled (assembly)
version of the workflow
• Useful for a lot of things, but not understanding
http://swift-lang.org, http://parsl-project.org
YesWorkflow (YW)
• Name: “Yes, scripts are (can be) workflows, too!”
• But, workflow (dataflow) usually hidden in the script
• Idea: let the script author reveal the structure by
declaring tasks (steps) and dataflow between tasks.
• This is a modeling step
• very coarse (workflow: one big black box w/ inputs & outputs)
• or rather fine (workflow has many steps, linked by dataflow)
• => language to explain (graphically) what the concepts
(relevant steps, relevant data) you want to share
• => this conceptual YW model can itself be queried; linked
with runtime observables, provenance
Credit: Bertram Ludäscher
Parsl
• A python-based parallel scripting library (http://parsl-project.org),
based on ideas in Swift (http://swift-lang.org)
• Tasks exposed as functions (python or bash)
@App('bash', data_flow_kernel)
def echo(message, outputs=[]):
return 'echo {0} &> {outputs[0]}’
@App('python', data_flow_kernel)
def cat(inputs=[]):
with open(inputs[0]) as f:
return f.readlines()
• Return values are futures
• Other tasks can be called that depend on these futures
• Will not run until futures are satisfied/filled
• Main code used to glue functions together
hello = echo("Hello World!", outputs=['hello1.txt'])
message = cat(inputs=[hello.outputs[0]])
• Fairly easy to understand
How to promote/share workflows
• How do we share general software?
• Libraries (units of execution with well-defined APIs)
• Source code (fork model)
• Source code repositories (GitHub), packaging
systems/repositories (PyPI, CRAN)
• How do we share data?
• Repositories (Dryad)
• For workflows
• Libraries -> sub-workflows, defined to provide well-specified
functionality
• Source code -> source code (scripts), may still be hard to
understand
• Data -> data repository for workflows (MyExperiment)
www.myexperiment.org
De	Roure,	D.,	Goble,	C.	Stevens,	R.	(2009)	The	Design	
and	Realisation of	the	myExperiment Virtual	Research	
Environment	for	Social	Sharing	of	Workflows.	Future	
Generation	Computer	Systems	25,	pp.	561-7.
• A	workflow	commons	for	workflow	sharing,	
designed	using	Web	2.0	principles
• Launched	open	beta	in	November	2007,	still	
actively	used
• Largest	public	collection	of	workflows,	for	
multiple	workflow	systems
• 2400+	entries	in	Google	Scholar	refer	to	
myExperiment
• Open	source,	REST	API,	part	of	Open	Linked	
Data	cloud	(66k	triples)	- lod-cloud.net
• Introduced	“packs”	which	led	to	Research	
Objects	– www.researchobject.org
• Workflow	collection	studied	in	scientific	
workflow	and	e-Science	communities
• Service	maintained	by	Manchester	and	Oxford	
universities.	Informs	design	of	other	workflow	
sharing	systems.	
• Content	stats:	10591	members,	393	groups,	
3876	workflows,	1233	files,	477	packs
Credit: Carole Goble
GitHub
• Widely used for sharing software, and socially working
on/with software (and many other types of documents)
• GitHub is used for sharing workflows today
• Both scripts and data
• Borrowing from “Software vs. data in the context of
citation”
• A workflow as a program or a script is code, a creative work
• Appropriate license: OSI-approved open source (e.g., BSD)
• A workflow as a DAG is data?
• Appropriate license: Creative Commons (e.g., CC-BY)?
• So, let’s keep workflows as programs/scripts
• Use YesWorkflow with scripts
• Use GitHub to share
Katz DS, Niemeyer KE, et al. (2016) Software vs. data in the context of citation.
PeerJ Preprints 4:e2630v1 doi: 10.7287/peerj.preprints.2630v1

More Related Content

What's hot

Rust & Apache Arrow @ RMS
Rust & Apache Arrow @ RMSRust & Apache Arrow @ RMS
Rust & Apache Arrow @ RMSAndy Grove
 
WebTech Tutorial Querying DBPedia
WebTech Tutorial Querying DBPediaWebTech Tutorial Querying DBPedia
WebTech Tutorial Querying DBPediaKatrien Verbert
 
From Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data ApplicationsFrom Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data ApplicationsDatabricks
 
An Introduction to NLP4L
An Introduction to NLP4LAn Introduction to NLP4L
An Introduction to NLP4LKoji Sekiguchi
 
Alexey Golub - Writing parsers in c# | 3Shape Meetup
Alexey Golub - Writing parsers in c# | 3Shape MeetupAlexey Golub - Writing parsers in c# | 3Shape Meetup
Alexey Golub - Writing parsers in c# | 3Shape MeetupOleksii Holub
 
Python first day
Python first dayPython first day
Python first dayfarkhand
 
An Introduction to NLP4L (Scala by the Bay / Big Data Scala 2015)
An Introduction to NLP4L (Scala by the Bay / Big Data Scala 2015)An Introduction to NLP4L (Scala by the Bay / Big Data Scala 2015)
An Introduction to NLP4L (Scala by the Bay / Big Data Scala 2015)Koji Sekiguchi
 
Understanding Hadoop through examples
Understanding Hadoop through examplesUnderstanding Hadoop through examples
Understanding Hadoop through examplesYoshitomo Matsubara
 
Anatomy of spark catalyst
Anatomy of spark catalystAnatomy of spark catalyst
Anatomy of spark catalystdatamantra
 

What's hot (14)

Rust & Apache Arrow @ RMS
Rust & Apache Arrow @ RMSRust & Apache Arrow @ RMS
Rust & Apache Arrow @ RMS
 
Azure search
Azure searchAzure search
Azure search
 
Introduction To Apache Lucene
Introduction To Apache LuceneIntroduction To Apache Lucene
Introduction To Apache Lucene
 
R training
R trainingR training
R training
 
WebTech Tutorial Querying DBPedia
WebTech Tutorial Querying DBPediaWebTech Tutorial Querying DBPedia
WebTech Tutorial Querying DBPedia
 
From Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data ApplicationsFrom Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data Applications
 
An Introduction to NLP4L
An Introduction to NLP4LAn Introduction to NLP4L
An Introduction to NLP4L
 
Alexey Golub - Writing parsers in c# | 3Shape Meetup
Alexey Golub - Writing parsers in c# | 3Shape MeetupAlexey Golub - Writing parsers in c# | 3Shape Meetup
Alexey Golub - Writing parsers in c# | 3Shape Meetup
 
Python first day
Python first dayPython first day
Python first day
 
Python first day
Python first dayPython first day
Python first day
 
An Introduction to NLP4L (Scala by the Bay / Big Data Scala 2015)
An Introduction to NLP4L (Scala by the Bay / Big Data Scala 2015)An Introduction to NLP4L (Scala by the Bay / Big Data Scala 2015)
An Introduction to NLP4L (Scala by the Bay / Big Data Scala 2015)
 
Perl tutorial final
Perl tutorial finalPerl tutorial final
Perl tutorial final
 
Understanding Hadoop through examples
Understanding Hadoop through examplesUnderstanding Hadoop through examples
Understanding Hadoop through examples
 
Anatomy of spark catalyst
Anatomy of spark catalystAnatomy of spark catalyst
Anatomy of spark catalyst
 

Similar to Expressing and sharing workflows

Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsDataStax Academy
 
Parsl: Pervasive Parallel Programming in Python
Parsl: Pervasive Parallel Programming in PythonParsl: Pervasive Parallel Programming in Python
Parsl: Pervasive Parallel Programming in PythonDaniel S. Katz
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopAmanda Casari
 
conceptsinobjectorientedprogramminglanguages-12659959597745-phpapp02.pdf
conceptsinobjectorientedprogramminglanguages-12659959597745-phpapp02.pdfconceptsinobjectorientedprogramminglanguages-12659959597745-phpapp02.pdf
conceptsinobjectorientedprogramminglanguages-12659959597745-phpapp02.pdfSahajShrimal1
 
Script of Scripts Polyglot Notebook and Workflow System
Script of ScriptsPolyglot Notebook and Workflow SystemScript of ScriptsPolyglot Notebook and Workflow System
Script of Scripts Polyglot Notebook and Workflow SystemBo Peng
 
An Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAn Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAdam Getchell
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Michael Rys
 
Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with PythonDonald Miner
 
AestasIT - Internal DSLs in Scala
AestasIT - Internal DSLs in ScalaAestasIT - Internal DSLs in Scala
AestasIT - Internal DSLs in ScalaDmitry Buzdin
 
Building Applications using Apache Hadoop
Building Applications using Apache HadoopBuilding Applications using Apache Hadoop
Building Applications using Apache HadoopC4Media
 
Alternatives to Apache Accumulo’s Java API
Alternatives to Apache Accumulo’s Java APIAlternatives to Apache Accumulo’s Java API
Alternatives to Apache Accumulo’s Java APIJosh Elser
 
Accumulo Summit 2015: Alternatives to Apache Accumulo's Java API [API]
Accumulo Summit 2015: Alternatives to Apache Accumulo's Java API [API]Accumulo Summit 2015: Alternatives to Apache Accumulo's Java API [API]
Accumulo Summit 2015: Alternatives to Apache Accumulo's Java API [API]Accumulo Summit
 
Another Intro To Hadoop
Another Intro To HadoopAnother Intro To Hadoop
Another Intro To HadoopAdeel Ahmad
 
Programming Under Linux In Python
Programming Under Linux In PythonProgramming Under Linux In Python
Programming Under Linux In PythonMarwan Osman
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudDatabricks
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentationJoseph Adler
 

Similar to Expressing and sharing workflows (20)

Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
 
Parsl: Pervasive Parallel Programming in Python
Parsl: Pervasive Parallel Programming in PythonParsl: Pervasive Parallel Programming in Python
Parsl: Pervasive Parallel Programming in Python
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
 
conceptsinobjectorientedprogramminglanguages-12659959597745-phpapp02.pdf
conceptsinobjectorientedprogramminglanguages-12659959597745-phpapp02.pdfconceptsinobjectorientedprogramminglanguages-12659959597745-phpapp02.pdf
conceptsinobjectorientedprogramminglanguages-12659959597745-phpapp02.pdf
 
Script of Scripts Polyglot Notebook and Workflow System
Script of ScriptsPolyglot Notebook and Workflow SystemScript of ScriptsPolyglot Notebook and Workflow System
Script of Scripts Polyglot Notebook and Workflow System
 
Php
PhpPhp
Php
 
Php
PhpPhp
Php
 
Php
PhpPhp
Php
 
An Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAn Overview Of Python With Functional Programming
An Overview Of Python With Functional Programming
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with Python
 
AestasIT - Internal DSLs in Scala
AestasIT - Internal DSLs in ScalaAestasIT - Internal DSLs in Scala
AestasIT - Internal DSLs in Scala
 
Building Applications using Apache Hadoop
Building Applications using Apache HadoopBuilding Applications using Apache Hadoop
Building Applications using Apache Hadoop
 
Alternatives to Apache Accumulo’s Java API
Alternatives to Apache Accumulo’s Java APIAlternatives to Apache Accumulo’s Java API
Alternatives to Apache Accumulo’s Java API
 
Accumulo Summit 2015: Alternatives to Apache Accumulo's Java API [API]
Accumulo Summit 2015: Alternatives to Apache Accumulo's Java API [API]Accumulo Summit 2015: Alternatives to Apache Accumulo's Java API [API]
Accumulo Summit 2015: Alternatives to Apache Accumulo's Java API [API]
 
Another Intro To Hadoop
Another Intro To HadoopAnother Intro To Hadoop
Another Intro To Hadoop
 
Programming Under Linux In Python
Programming Under Linux In PythonProgramming Under Linux In Python
Programming Under Linux In Python
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentation
 
Shivam PPT.pptx
Shivam PPT.pptxShivam PPT.pptx
Shivam PPT.pptx
 

More from Daniel S. Katz

Research software susainability
Research software susainabilityResearch software susainability
Research software susainabilityDaniel S. Katz
 
Software Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSASoftware Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSADaniel S. Katz
 
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...Daniel S. Katz
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?Daniel S. Katz
 
Citation and Research Objects: Toward Active Research Objects
Citation and Research Objects: Toward Active Research ObjectsCitation and Research Objects: Toward Active Research Objects
Citation and Research Objects: Toward Active Research ObjectsDaniel S. Katz
 
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...Daniel S. Katz
 
Fundamentals of software sustainability
Fundamentals of software sustainabilityFundamentals of software sustainability
Fundamentals of software sustainabilityDaniel S. Katz
 
Software Citation in Theory and Practice
Software Citation in Theory and PracticeSoftware Citation in Theory and Practice
Software Citation in Theory and PracticeDaniel S. Katz
 
Research Software Sustainability: WSSSPE & URSSI
Research Software Sustainability: WSSSPE & URSSIResearch Software Sustainability: WSSSPE & URSSI
Research Software Sustainability: WSSSPE & URSSIDaniel S. Katz
 
Citation and reproducibility in software
Citation and reproducibility in softwareCitation and reproducibility in software
Citation and reproducibility in softwareDaniel S. Katz
 
Software Citation: Principles, Implementation, and Impact
Software Citation:  Principles, Implementation, and ImpactSoftware Citation:  Principles, Implementation, and Impact
Software Citation: Principles, Implementation, and ImpactDaniel S. Katz
 
Summary of WSSSPE and its working groups
Summary of WSSSPE and its working groupsSummary of WSSSPE and its working groups
Summary of WSSSPE and its working groupsDaniel S. Katz
 
Working towards Sustainable Software for Science: Practice and Experience (WS...
Working towards Sustainable Software for Science: Practice and Experience (WS...Working towards Sustainable Software for Science: Practice and Experience (WS...
Working towards Sustainable Software for Science: Practice and Experience (WS...Daniel S. Katz
 
20160607 citation4software panel
20160607 citation4software panel20160607 citation4software panel
20160607 citation4software panelDaniel S. Katz
 
20160607 citation4software opening
20160607 citation4software opening20160607 citation4software opening
20160607 citation4software openingDaniel S. Katz
 
Scientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesScientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesDaniel S. Katz
 
What do we need beyond a DOI?
What do we need beyond a DOI?What do we need beyond a DOI?
What do we need beyond a DOI?Daniel S. Katz
 
Looking at Software Sustainability and Productivity Challenges from NSF
Looking at Software Sustainability and Productivity Challenges from NSFLooking at Software Sustainability and Productivity Challenges from NSF
Looking at Software Sustainability and Productivity Challenges from NSFDaniel S. Katz
 

More from Daniel S. Katz (20)

Research software susainability
Research software susainabilityResearch software susainability
Research software susainability
 
Software Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSASoftware Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSA
 
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?
 
Citation and Research Objects: Toward Active Research Objects
Citation and Research Objects: Toward Active Research ObjectsCitation and Research Objects: Toward Active Research Objects
Citation and Research Objects: Toward Active Research Objects
 
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
 
Fundamentals of software sustainability
Fundamentals of software sustainabilityFundamentals of software sustainability
Fundamentals of software sustainability
 
Software Citation in Theory and Practice
Software Citation in Theory and PracticeSoftware Citation in Theory and Practice
Software Citation in Theory and Practice
 
URSSI
URSSIURSSI
URSSI
 
Research Software Sustainability: WSSSPE & URSSI
Research Software Sustainability: WSSSPE & URSSIResearch Software Sustainability: WSSSPE & URSSI
Research Software Sustainability: WSSSPE & URSSI
 
Software citation
Software citationSoftware citation
Software citation
 
Citation and reproducibility in software
Citation and reproducibility in softwareCitation and reproducibility in software
Citation and reproducibility in software
 
Software Citation: Principles, Implementation, and Impact
Software Citation:  Principles, Implementation, and ImpactSoftware Citation:  Principles, Implementation, and Impact
Software Citation: Principles, Implementation, and Impact
 
Summary of WSSSPE and its working groups
Summary of WSSSPE and its working groupsSummary of WSSSPE and its working groups
Summary of WSSSPE and its working groups
 
Working towards Sustainable Software for Science: Practice and Experience (WS...
Working towards Sustainable Software for Science: Practice and Experience (WS...Working towards Sustainable Software for Science: Practice and Experience (WS...
Working towards Sustainable Software for Science: Practice and Experience (WS...
 
20160607 citation4software panel
20160607 citation4software panel20160607 citation4software panel
20160607 citation4software panel
 
20160607 citation4software opening
20160607 citation4software opening20160607 citation4software opening
20160607 citation4software opening
 
Scientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesScientific Software Challenges and Community Responses
Scientific Software Challenges and Community Responses
 
What do we need beyond a DOI?
What do we need beyond a DOI?What do we need beyond a DOI?
What do we need beyond a DOI?
 
Looking at Software Sustainability and Productivity Challenges from NSF
Looking at Software Sustainability and Productivity Challenges from NSFLooking at Software Sustainability and Productivity Challenges from NSF
Looking at Software Sustainability and Productivity Challenges from NSF
 

Recently uploaded

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Recently uploaded (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Expressing and sharing workflows

  • 1. National Center for Supercomputing Applications University of Illinois at Urbana–Champaign Expressing and sharing workflows Daniel S. Katz Assistant Director for Scientific Software & Applications, NCSA Research Associate Professor, CS Research Associate Professor, ECE Research Associate Professor, iSchool dskatz@illinois.edu, d.katz@ieee.org @danielskatz
  • 2. What’s a workflow? • A set of tasks and dependencies between them • Perhaps expressed as data structure, e.g. graph (DAG or cyclic) • How is this different than a computer program? • The tasks as more well-defined (inputs, outputs) • The tasks are longer (running time O(sec) – O(hr)) • Why express it differently? • Program (script) is a natural way of expressing a workflow • Examples: shell scripts, programs in Swift/Parsl • YesWorkflow annotations to help in understanding scripts • Swift/Parsl: functions used to identify components • Expressing it as data corresponds to the compiled (assembly) version of the workflow • Useful for a lot of things, but not understanding http://swift-lang.org, http://parsl-project.org
  • 3. YesWorkflow (YW) • Name: “Yes, scripts are (can be) workflows, too!” • But, workflow (dataflow) usually hidden in the script • Idea: let the script author reveal the structure by declaring tasks (steps) and dataflow between tasks. • This is a modeling step • very coarse (workflow: one big black box w/ inputs & outputs) • or rather fine (workflow has many steps, linked by dataflow) • => language to explain (graphically) what the concepts (relevant steps, relevant data) you want to share • => this conceptual YW model can itself be queried; linked with runtime observables, provenance Credit: Bertram Ludäscher
  • 4. Parsl • A python-based parallel scripting library (http://parsl-project.org), based on ideas in Swift (http://swift-lang.org) • Tasks exposed as functions (python or bash) @App('bash', data_flow_kernel) def echo(message, outputs=[]): return 'echo {0} &> {outputs[0]}’ @App('python', data_flow_kernel) def cat(inputs=[]): with open(inputs[0]) as f: return f.readlines() • Return values are futures • Other tasks can be called that depend on these futures • Will not run until futures are satisfied/filled • Main code used to glue functions together hello = echo("Hello World!", outputs=['hello1.txt']) message = cat(inputs=[hello.outputs[0]]) • Fairly easy to understand
  • 5. How to promote/share workflows • How do we share general software? • Libraries (units of execution with well-defined APIs) • Source code (fork model) • Source code repositories (GitHub), packaging systems/repositories (PyPI, CRAN) • How do we share data? • Repositories (Dryad) • For workflows • Libraries -> sub-workflows, defined to provide well-specified functionality • Source code -> source code (scripts), may still be hard to understand • Data -> data repository for workflows (MyExperiment)
  • 6. www.myexperiment.org De Roure, D., Goble, C. Stevens, R. (2009) The Design and Realisation of the myExperiment Virtual Research Environment for Social Sharing of Workflows. Future Generation Computer Systems 25, pp. 561-7. • A workflow commons for workflow sharing, designed using Web 2.0 principles • Launched open beta in November 2007, still actively used • Largest public collection of workflows, for multiple workflow systems • 2400+ entries in Google Scholar refer to myExperiment • Open source, REST API, part of Open Linked Data cloud (66k triples) - lod-cloud.net • Introduced “packs” which led to Research Objects – www.researchobject.org • Workflow collection studied in scientific workflow and e-Science communities • Service maintained by Manchester and Oxford universities. Informs design of other workflow sharing systems. • Content stats: 10591 members, 393 groups, 3876 workflows, 1233 files, 477 packs Credit: Carole Goble
  • 7. GitHub • Widely used for sharing software, and socially working on/with software (and many other types of documents) • GitHub is used for sharing workflows today • Both scripts and data • Borrowing from “Software vs. data in the context of citation” • A workflow as a program or a script is code, a creative work • Appropriate license: OSI-approved open source (e.g., BSD) • A workflow as a DAG is data? • Appropriate license: Creative Commons (e.g., CC-BY)? • So, let’s keep workflows as programs/scripts • Use YesWorkflow with scripts • Use GitHub to share Katz DS, Niemeyer KE, et al. (2016) Software vs. data in the context of citation. PeerJ Preprints 4:e2630v1 doi: 10.7287/peerj.preprints.2630v1