SlideShare a Scribd company logo
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Taverna, Biocatalogue, myExperiment, and the provenance of it all: forward-looking while looking back Scientific Workflow Management System
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier
What is the myGrid Project? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],E. Science laboris   ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier
Workflows:  E. Science laboris   ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier
[object Object],[object Object],[object Object],Workflows:  E. Science laboris   ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier
Separate the “workflow” from the application ,[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier Retrieves a protein sequence in Fasta format from Genbank and then performs a BLAST on that sequence
Workflow as data integrator ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier QTL genomic regions genes in QTL metabolic pathways (KEGG)
Data-driven computation in Taverna ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier [ path:mmu04010 MAPK signaling,  path:mmu04370 VEGF signaling ] [ [ path:mmu04210 Apoptosis, path:mmu04010 MAPK signaling, ...],  [ path:mmu04010 MAPK signaling , path:mmu04620 Toll-like receptor, ...] ] List-structured  KEGG gene ids: [ [ mmu:26416 ], [ mmu:328788 ] ] geneIDs pathways • • • • • • • • • • geneIDs pathways • • • • • • • • • •
Integration platform  Just in Time and Just in Case Interoperability ,[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier
Integration platform  Just in Time Interoperability ,[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier
What do Scientists use Taverna for? ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier Systems biology model building Proteomics Sequence analysis Protein structure prediction Gene/protein annotation  Microarray data analysis QTL studies QSAR studies Medical image analysis Public Health care epidemiology Heart model simulations High throughput screening Phenotypical studies Phylogeny Statistical analysis Text mining Astronomy, Music, Meteorology Netherlands Bioinformatics Centre Genome Canada Bioinformatics Platform BioMOBY US FLOSS social science program RENCI SysMO Consortium French SIGENAE farm animals project ThaiGrid CARMEN Neuroscience project SPINE consortium EU Enfin, EMBRACE, BioSapian, Casimir EU SysMO Consortium NERC Centre for Ecology and Hydrology Bergen Centre for Computational Biology Max-Planck institute for Plant Breeding Research Genoa Cancer Research Centre AstroGrid 30 USA academic and research institutions
ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier 200 Genotype Phenotype Metabolic pathways Literature [Paul Fisher]
Hypothesis Construction and  Explanation from the Literature my BioAID ,[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier
Genome-wide SNP data sets analysis ,[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier
WaaS: Workflows as a Service ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier [Pettifer, Kell, University of Manchester] inside
Workflows operating over  Grid Infrastructure ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier http://www.knowarc.eu KnowARC integrated with Taverna"  application prototype to use Taverna as direct interface to Grid resources running ARC.  Open source grid software infrastructure aimed at enabling multi-institutional data sharing and analysis. Underpins caBIG. Taverna links together caGrid resources. http://cagrid.org / http://www.eu-egee.org/ Europe’s leading grid computing project,  Piloted Taverna over EGEE gLite services
ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier Provisioning Workflows Appln Service Appln Service Users Workflows Composition Incorporation Invocation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[Foster 2005] Workflows
caBIG cancer cyberinfrastructure uses Taverna to link services ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier A sample caGrid workflow for microarray analysis, using caArray, GenePattern and geWorkbench [Ravi Madduri] Orchestrating caGrid Services in Taverna Wei Tan, Ravi Madduri, Kiran Keshav, Baris E. Suzek, Scott Oster, Ian Foster, Proc IEEE Intl Conf on Web Services (ICWS 2008)
Who else is in this space? ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier Kepler Triana BPEL Ptolemy II Taverna Trident BioExtract
An Ecosystem of Workflow Management Systems (WFMS) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Workflow-based experimentation lifecycle ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier Run Analyse Results Publish Develop  Collect and query provenance metadata
Taverna ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier Graphical Workbench For Professionals Plug-in architecture Nested Workflows Drag and Drop Wiring together Rapidly incorporate new service without coding.  Not restricted to predetermined services Access to local and remote resources and analysis tools 3500+ service operations available when start up
Services Mutability  implications for  sustainability, accountability and reproducability ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier
BioCatalogue ,[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier 28 April 2009, Boston MA Professor Carole Goble University of Manchester, UK Director myGrid Consortium
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],The short story
 
Content ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
A virtual circle ,[object Object],[object Object],curation usage trust annotators content
ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier Curation Model Versioning Quantitative  Content Tags Service Model Semantic  Content Ontologies Functional Capabilities Provenance Operational Capabilities Operational Metrics Usage Policy Community Standing Ratings Usage Statistics Attribution Free text Searching  Statistics Usable  and Useful Understandable Controlled vocabs Interfaces
Curation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Today: 14902 annotations (provider, user, registries) KEGG: 1433 annotations
Service monitoring
Who? ,[object Object],[object Object]
Scaling up along the social dimension ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier What ? Where ? Why ? Who ? How ? Crossing the boundaries of individual investigation Develop Run Analyze Publish Develop Run Analyze Publish
Scientific collaboration ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier Source:  Andrea Wiggins , talk given at the School of Computer Science, University of Manchester, UK, June 18th, 2009
ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier Source:  Andrea Wiggins , talk given at the School of Computer Science, University of Manchester, UK, June 18th, 2009
ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier Source:  Andrea Wiggins , talk given at the School of Computer Science, University of Manchester, UK, June 18th, 2009
The Selfish (or Self-interested) Scientist ,[object Object],Mike Ashburner and others Professor Genetics,  University of Cambridge,K   “ Data mining: my data’s mine and your data’s mine”
The potential for collaboration ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier
Asymmetric vs symmetric sharing ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Traditional sharing is asymmetric: Open science is symmetric: myGrid combines both paradigms:
Publishing for collaboration ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier Run Analyse Results Publish Develop  Collect and query provenance metadata Design-time reuse: Composition from existing workflows Runtime reuse: Workflows as services compare results across versions foster virtual scientific communities provenance exchange and interoperability the OPM experiment
Collaboration in the workflow space ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier What ? Where ? Why ? Who ? How ? Develop Run Analyze Publish Develop Run Analyze Publish
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Competitive advantage. Academic vanity. Adoption.  Reputation. Scrutiny. Being scooped. Misinterpretation. Reputation. Rewards Fears
[object Object],[object Object],[object Object],[object Object]
Incentive and reputation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Distinct myExperiment communities ,[object Object],[object Object],[object Object],[object Object],+
Reuse, Recycling, Repurposing Cross-fertilization ,[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],?
[object Object],Socially share, discover and reuse workflows and other methods. Cooperative bazaar. www.myexperiment.org
 
Just Enough Sharing Credit and Attribution
Authoring workflows for reuse ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Authoring workflows for reuse ,[object Object],[object Object]
Workflows and Services Curation by Experts Social Curation by the Crowd refine validate refine validate Self-Curation by Contributors seed seed refine validate seed refine validate seed Automated  Curation
Packs
 
Three chief user groups ,[object Object],[object Object],[object Object],Contributors Members Anon. Users
New Instances
Collaboration in the workflow space ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier What ? Where ? Why ? Who ? How ? Develop Run Analyze Publish Develop Run Analyze Publish
Technical implications of open science ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier
Provenance of data ,[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier Luc Moreau, Paul Groth, Simon Miles, Javier Vazquez-Salceda, John Ibbotson, Sheng Jiang, Steve Munroe, Omer Rana, Andreas Schreiber, Victor Tan, Laszlo Varga,  The provenance of electronic data,  Communications of the ACM, Vol. 51 No. 4, Pages 52-58
Analysis of process results ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier
Example: inverse data associations ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier List-structured  KEGG gene ids: [ [ mmu:26416 ], [ mmu:328788 ] ] [ path:mmu04010 MAPK signaling,  path:mmu04370 VEGF signaling ] [ [ path:mmu04210 Apoptosis, path:mmu04010 MAPK signaling, ...],  [ path:mmu04010 MAPK signaling , path:mmu04620 Toll-like receptor, ...] ] geneIDs pathways • • • • • • • • • • geneIDs pathways • • • • • • • • • •
Taverna + provenance  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier
Forms of provenance ... ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier Focus is on the data: the  observable outcomes of a process raw provenance metadata provenance metadata + interpretation framework design ,[object Object],[object Object],[object Object],[object Object],[object Object],execution ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
...and their uses and associated challenges ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier fully implemented in Taverna 2 raw provenance metadata provenance metadata + interpretation framework design ,[object Object],[object Object],[object Object],execution ,[object Object],[object Object],[object Object],[object Object],[object Object],to be released in Sept. 2009
Querying provenance traces ,[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier [ p1, ....] [ g1, ....]
Naive provenance trace queries ,[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier Z. Bao and S. Cohen-Boulakia and S. Davidson and A. Eyal and S. Khanna,  Differencing Provenance in Scientific Workflows , Procs. ICDE, 2009
Querying provenance graphs in Taverna ,[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier This results in a more efficient lineage query algorithm that scales to large provenance graphs Example: BACKTRACE (paths_per_gene[3,4],  paths_per_gene[1,2])  AT get_pathway_by_genes AND commonPathways[1]  AT TOP
Provenance management architecture ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier Taverna runtime process design  Provenance capture component ,[object Object],[object Object],[object Object],[object Object],relational data model Lineage query processor workflow results workflow inputs Provenance DB Results browser Results analysis Provenance browser
Provenance interoperability for open science OPM ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier Develop Run Analyze Publish Develop Run Analyze Publish OPM: the Open Provenance Model
The Science Lifecycle scientists Graduate Students Undergraduate Students experimentation Data, Metadata, Provenance, Scripts, Workflows, Services, Ontologies, Blogs, ... Digital Libraries Next Generation Researchers Adapted from David De Roure’s slides Local Web Repositories Virtual Learning Environment Technical Reports Reprints Peer-Reviewed Journal & Conference Papers Preprints & Metadata Certified Experimental Results & Analyses
Finding the Provenance  of research outputs across all the systems data transited through scientists Local Web Repositories Graduate Students Undergraduate Students Virtual Learning Environment Technical Reports Reprints Peer-Reviewed Journal & Conference Papers Preprints & Metadata Certified Experimental Results & Analyses experimentation Data, Metadata, Provenance, Scripts, Workflows, Services, Ontologies, Blogs, ... Digital Libraries Next Generation Researchers
Provenance Across Applications Local provenance stores Adapted from Luc Moreau’s slides: “The Open Provenance Model” (Univ. of Southampton,UK), 2009 Application Application Application Application Application Provenance Inter-Operability Layer import from OPM export to OPM
Illustration ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],P used(divisor) used(dividend) wasGeneratedBy(rest) wasGeneratedBy(quotient) type=division From Luc Moreau’s slides set: “The Open Provenance Model” (Univ. of Southampton,UK), 2009 A1 A2 A3 A4
Integrated OPM generation in Taverna ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier
More information on OPM ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier
From knowledge capture to exploitation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier
The Provenir ontology  ,[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier Satya S. Sahoo, Roger S. Barga, Jonathan Goldstein, Amit P. Sheth,  Where did you come from...Where did you go?” An Algebra and RDF Query Engine for Provenance , TR-2009-03, Kno.e.sis Center, CSE Dept., Wright State University, Dayton, OH, March, 2009
Upcoming events ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier SWPM 2009: The First International Workshop on the Role of Semantic Web in Provenance Management http://wiki.knoesis.org/index.php/SWPM-2009  Co-located with ISWC'09, October 25/26 2009, Washington D.C., USASubmission Deadline: Friday, July 31, 2009 Special issue of  Future Generation Computer Systems Journal  (FGCS) on the third provenance challenge (to be announced) expected deadline:  Dec., 2009
Summary:   Support for collaborative science ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Selected literature on provenance ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July  2009 - P. Missier

More Related Content

What's hot

FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use Case
Rothamsted Research, UK
 
High-performance web services for gene and variant annotations
High-performance web services for gene and variant annotationsHigh-performance web services for gene and variant annotations
High-performance web services for gene and variant annotations
Chunlei Wu
 
Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Paolo Missier
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspective
petermurrayrust
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and models
myGrid team
 
Biocatalogue Talk Slides
Biocatalogue Talk SlidesBiocatalogue Talk Slides
Biocatalogue Talk Slides
BioCatalogue
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIH
petermurrayrust
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
European Bioinformatics Institute
 
How many citations are there in the Data Citation Index
How many citations are there in the Data Citation IndexHow many citations are there in the Data Citation Index
How many citations are there in the Data Citation Index
EC3metrics Spin-Off
 
Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008Whitney Symposium Lecture June 2008
Annotation of SBML Models Through Rule-Based Semantic Integration
Annotation of SBML Models Through Rule-Based Semantic IntegrationAnnotation of SBML Models Through Rule-Based Semantic Integration
Annotation of SBML Models Through Rule-Based Semantic Integration
Allyson Lister
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
Duncan Hull
 
A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidata
petermurrayrust
 

What's hot (13)

FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use Case
 
High-performance web services for gene and variant annotations
High-performance web services for gene and variant annotationsHigh-performance web services for gene and variant annotations
High-performance web services for gene and variant annotations
 
Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspective
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and models
 
Biocatalogue Talk Slides
Biocatalogue Talk SlidesBiocatalogue Talk Slides
Biocatalogue Talk Slides
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIH
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
How many citations are there in the Data Citation Index
How many citations are there in the Data Citation IndexHow many citations are there in the Data Citation Index
How many citations are there in the Data Citation Index
 
Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008
 
Annotation of SBML Models Through Rule-Based Semantic Integration
Annotation of SBML Models Through Rule-Based Semantic IntegrationAnnotation of SBML Models Through Rule-Based Semantic Integration
Annotation of SBML Models Through Rule-Based Semantic Integration
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidata
 

Similar to Invited talk @ ESIP summer meeting, 2009

DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
Carole Goble
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009
Ian Foster
 
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeWorkflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Carole Goble
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astro
webuploader
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalface
LizLyon
 
CV-KS-Jun2015
CV-KS-Jun2015CV-KS-Jun2015
CV-KS-Jun2015
Kamran Sartipi
 
BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...
BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...
BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...
Syed Ahmad Chan Bukhari, PhD
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
Arabidopsis Information Portal: A Community-Extensible Platform for Open Data
Arabidopsis Information Portal: A Community-Extensible Platform for Open DataArabidopsis Information Portal: A Community-Extensible Platform for Open Data
Arabidopsis Information Portal: A Community-Extensible Platform for Open Data
Matthew Vaughn
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic Solutions
OVium Solutions
 
2014 Taverna Tutorial Introduction to eScience and workflows
2014 Taverna Tutorial Introduction to eScience and workflows2014 Taverna Tutorial Introduction to eScience and workflows
2014 Taverna Tutorial Introduction to eScience and workflows
myGrid team
 
OREChem Services and Workflows
OREChem Services and WorkflowsOREChem Services and Workflows
OREChem Services and Workflows
marpierc
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
Cv long
Cv longCv long
CV_10/17
CV_10/17CV_10/17
Lei_Resume-it.doc
Lei_Resume-it.docLei_Resume-it.doc
Lei_Resume-it.doc
butest
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
Carole Goble
 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decade
LizLyon
 
ISA - a short overview - Dec 2013
ISA - a short overview - Dec 2013ISA - a short overview - Dec 2013
ISA - a short overview - Dec 2013
Susanna-Assunta Sansone
 
Adithya Rajan_Jan_2016
Adithya Rajan_Jan_2016Adithya Rajan_Jan_2016
Adithya Rajan_Jan_2016
Adithya Rajan
 

Similar to Invited talk @ ESIP summer meeting, 2009 (20)

DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009
 
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeWorkflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astro
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalface
 
CV-KS-Jun2015
CV-KS-Jun2015CV-KS-Jun2015
CV-KS-Jun2015
 
BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...
BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...
BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
Arabidopsis Information Portal: A Community-Extensible Platform for Open Data
Arabidopsis Information Portal: A Community-Extensible Platform for Open DataArabidopsis Information Portal: A Community-Extensible Platform for Open Data
Arabidopsis Information Portal: A Community-Extensible Platform for Open Data
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic Solutions
 
2014 Taverna Tutorial Introduction to eScience and workflows
2014 Taverna Tutorial Introduction to eScience and workflows2014 Taverna Tutorial Introduction to eScience and workflows
2014 Taverna Tutorial Introduction to eScience and workflows
 
OREChem Services and Workflows
OREChem Services and WorkflowsOREChem Services and Workflows
OREChem Services and Workflows
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
Cv long
Cv longCv long
Cv long
 
CV_10/17
CV_10/17CV_10/17
CV_10/17
 
Lei_Resume-it.doc
Lei_Resume-it.docLei_Resume-it.doc
Lei_Resume-it.doc
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decade
 
ISA - a short overview - Dec 2013
ISA - a short overview - Dec 2013ISA - a short overview - Dec 2013
ISA - a short overview - Dec 2013
 
Adithya Rajan_Jan_2016
Adithya Rajan_Jan_2016Adithya Rajan_Jan_2016
Adithya Rajan_Jan_2016
 

More from Paolo Missier

(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
Paolo Missier
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
Paolo Missier
 
Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
Paolo Missier
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
Paolo Missier
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
Paolo Missier
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
Paolo Missier
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Paolo Missier
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overview
Paolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Paolo Missier
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
Paolo Missier
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Paolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
Paolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
Paolo Missier
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
Paolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Paolo Missier
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Paolo Missier
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
Paolo Missier
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Paolo Missier
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
Paolo Missier
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
Paolo Missier
 

More from Paolo Missier (20)

(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overview
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 

Recently uploaded

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 

Recently uploaded (20)

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 

Invited talk @ ESIP summer meeting, 2009

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8. Workflow as data integrator ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier QTL genomic regions genes in QTL metabolic pathways (KEGG)
  • 9. Data-driven computation in Taverna ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier [ path:mmu04010 MAPK signaling, path:mmu04370 VEGF signaling ] [ [ path:mmu04210 Apoptosis, path:mmu04010 MAPK signaling, ...], [ path:mmu04010 MAPK signaling , path:mmu04620 Toll-like receptor, ...] ] List-structured KEGG gene ids: [ [ mmu:26416 ], [ mmu:328788 ] ] geneIDs pathways • • • • • • • • • • geneIDs pathways • • • • • • • • • •
  • 10.
  • 11.
  • 12. What do Scientists use Taverna for? ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Systems biology model building Proteomics Sequence analysis Protein structure prediction Gene/protein annotation Microarray data analysis QTL studies QSAR studies Medical image analysis Public Health care epidemiology Heart model simulations High throughput screening Phenotypical studies Phylogeny Statistical analysis Text mining Astronomy, Music, Meteorology Netherlands Bioinformatics Centre Genome Canada Bioinformatics Platform BioMOBY US FLOSS social science program RENCI SysMO Consortium French SIGENAE farm animals project ThaiGrid CARMEN Neuroscience project SPINE consortium EU Enfin, EMBRACE, BioSapian, Casimir EU SysMO Consortium NERC Centre for Ecology and Hydrology Bergen Centre for Computational Biology Max-Planck institute for Plant Breeding Research Genoa Cancer Research Centre AstroGrid 30 USA academic and research institutions
  • 13. ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier 200 Genotype Phenotype Metabolic pathways Literature [Paul Fisher]
  • 14.
  • 15.
  • 16. WaaS: Workflows as a Service ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier [Pettifer, Kell, University of Manchester] inside
  • 17. Workflows operating over Grid Infrastructure ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier http://www.knowarc.eu KnowARC integrated with Taverna" application prototype to use Taverna as direct interface to Grid resources running ARC. Open source grid software infrastructure aimed at enabling multi-institutional data sharing and analysis. Underpins caBIG. Taverna links together caGrid resources. http://cagrid.org / http://www.eu-egee.org/ Europe’s leading grid computing project, Piloted Taverna over EGEE gLite services
  • 18.
  • 19. caBIG cancer cyberinfrastructure uses Taverna to link services ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier A sample caGrid workflow for microarray analysis, using caArray, GenePattern and geWorkbench [Ravi Madduri] Orchestrating caGrid Services in Taverna Wei Tan, Ravi Madduri, Kiran Keshav, Baris E. Suzek, Scott Oster, Ian Foster, Proc IEEE Intl Conf on Web Services (ICWS 2008)
  • 20. Who else is in this space? ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Kepler Triana BPEL Ptolemy II Taverna Trident BioExtract
  • 21.
  • 22. Workflow-based experimentation lifecycle ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Run Analyse Results Publish Develop Collect and query provenance metadata
  • 23. Taverna ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Graphical Workbench For Professionals Plug-in architecture Nested Workflows Drag and Drop Wiring together Rapidly incorporate new service without coding. Not restricted to predetermined services Access to local and remote resources and analysis tools 3500+ service operations available when start up
  • 24.
  • 25.
  • 26.
  • 27.  
  • 28.
  • 29.
  • 30. ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Curation Model Versioning Quantitative Content Tags Service Model Semantic Content Ontologies Functional Capabilities Provenance Operational Capabilities Operational Metrics Usage Policy Community Standing Ratings Usage Statistics Attribution Free text Searching Statistics Usable and Useful Understandable Controlled vocabs Interfaces
  • 31.
  • 33.
  • 34. Scaling up along the social dimension ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier What ? Where ? Why ? Who ? How ? Crossing the boundaries of individual investigation Develop Run Analyze Publish Develop Run Analyze Publish
  • 35. Scientific collaboration ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Source: Andrea Wiggins , talk given at the School of Computer Science, University of Manchester, UK, June 18th, 2009
  • 36. ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Source: Andrea Wiggins , talk given at the School of Computer Science, University of Manchester, UK, June 18th, 2009
  • 37. ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Source: Andrea Wiggins , talk given at the School of Computer Science, University of Manchester, UK, June 18th, 2009
  • 38.
  • 39.
  • 40.
  • 41. Publishing for collaboration ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Run Analyse Results Publish Develop Collect and query provenance metadata Design-time reuse: Composition from existing workflows Runtime reuse: Workflows as services compare results across versions foster virtual scientific communities provenance exchange and interoperability the OPM experiment
  • 42. Collaboration in the workflow space ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier What ? Where ? Why ? Who ? How ? Develop Run Analyze Publish Develop Run Analyze Publish
  • 43.
  • 44. Competitive advantage. Academic vanity. Adoption. Reputation. Scrutiny. Being scooped. Misinterpretation. Reputation. Rewards Fears
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.  
  • 52. Just Enough Sharing Credit and Attribution
  • 53.
  • 54.
  • 55. Workflows and Services Curation by Experts Social Curation by the Crowd refine validate refine validate Self-Curation by Contributors seed seed refine validate seed refine validate seed Automated Curation
  • 56. Packs
  • 57.  
  • 58.
  • 60. Collaboration in the workflow space ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier What ? Where ? Why ? Who ? How ? Develop Run Analyze Publish Develop Run Analyze Publish
  • 61.
  • 62.
  • 63.
  • 64. Example: inverse data associations ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier List-structured KEGG gene ids: [ [ mmu:26416 ], [ mmu:328788 ] ] [ path:mmu04010 MAPK signaling, path:mmu04370 VEGF signaling ] [ [ path:mmu04210 Apoptosis, path:mmu04010 MAPK signaling, ...], [ path:mmu04010 MAPK signaling , path:mmu04620 Toll-like receptor, ...] ] geneIDs pathways • • • • • • • • • • geneIDs pathways • • • • • • • • • •
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72. Provenance interoperability for open science OPM ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Develop Run Analyze Publish Develop Run Analyze Publish OPM: the Open Provenance Model
  • 73. The Science Lifecycle scientists Graduate Students Undergraduate Students experimentation Data, Metadata, Provenance, Scripts, Workflows, Services, Ontologies, Blogs, ... Digital Libraries Next Generation Researchers Adapted from David De Roure’s slides Local Web Repositories Virtual Learning Environment Technical Reports Reprints Peer-Reviewed Journal & Conference Papers Preprints & Metadata Certified Experimental Results & Analyses
  • 74. Finding the Provenance of research outputs across all the systems data transited through scientists Local Web Repositories Graduate Students Undergraduate Students Virtual Learning Environment Technical Reports Reprints Peer-Reviewed Journal & Conference Papers Preprints & Metadata Certified Experimental Results & Analyses experimentation Data, Metadata, Provenance, Scripts, Workflows, Services, Ontologies, Blogs, ... Digital Libraries Next Generation Researchers
  • 75. Provenance Across Applications Local provenance stores Adapted from Luc Moreau’s slides: “The Open Provenance Model” (Univ. of Southampton,UK), 2009 Application Application Application Application Application Provenance Inter-Operability Layer import from OPM export to OPM
  • 76.
  • 77. Integrated OPM generation in Taverna ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier
  • 78.
  • 79.
  • 80.
  • 81. Upcoming events ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier SWPM 2009: The First International Workshop on the Role of Semantic Web in Provenance Management http://wiki.knoesis.org/index.php/SWPM-2009 Co-located with ISWC'09, October 25/26 2009, Washington D.C., USASubmission Deadline: Friday, July 31, 2009 Special issue of Future Generation Computer Systems Journal (FGCS) on the third provenance challenge (to be announced) expected deadline: Dec., 2009
  • 82.
  • 83.

Editor's Notes

  1. Repetitive and mundane boring stuff made easier, reliable and adaptable. Big science and collaborative science
  2. Interoperability, Integration and Collaboration Automated processing Interactive Repetitive and accurate compound processes (protocols) Transparent processes Data flow Trackable results Agile software development
  3. This workflow searches for genes which reside in a QTL (Quantitative Trait Loci) region in the mouse, Mus musculus. The workflow requires an input of: a chromosome name or number; a QTL start base pair position; QTL end base pair position. Data is then extracted from BioMart to annotate each of the genes found in this region. The Entrez and UniProt identifiers are then sent to KEGG to obtain KEGG gene identifiers. The KEGG gene identifiers are then used to searcg for pathways in the KEGG pathway database. this is pathways_and_gene_annotations_for_qtl_phenotype_28303 exec with chromosome = 17 start_position = 28500000 end_position = 32500000
  4. mention scalability: we support large datasets as well, in addition to these small listsand that lists can grow very large
  5. The case study of Paul’s work Different datatypes, held in geographically different places of different subdisciplines. Phenotypic response investigated using microarray in form of expressed genes or evidence provided through QTL mapping Genes captured in microarray experiment and present in QTL (Quantitative Trait Loci ) region Microarray + QTL
  6. The workflow produces OMIM tagged diseases which can be used to enrich the proto-ontology automatically in RDF
  7. - caGrid (or the Grid) is the underlying network architecture and platform that provides the basis for connectivity of caBIG® tools. - ARC (australian Research Center) plugin for Taverna used for medical imaging Taverna is a workflow management system well known in bioinformatics. To show the ease of transitioning from local to grid, we execute MUSTANG first on the command line, then on the grid using Taverna. We then present two examples from the field of medical imaging, where the first one has to deal with huge temporary datasets. It thus greatly benefits from ARCs storage management and grid URL handling capabilities. The last example shows how one can achieve rapid testing iterations by separating the program binary from its use case description and the workflow. In the end, myExperiment is presented, which is a free web community for sharing Taverna workflows. By preparing a use case and an example workflow one can make his own program easily usable by everyone since the ARC middleware equalizes different grid configurations. video
  8. aimed at different layers of the software stack “ The Many Faces of IT as Service”, Foster, Tuecke, 2005 “ Provisioning” – reservation to configuration to … … make sure resource will do what I want it to do, with the right qualities of service Virtualization = separation of concerns between provider & consumer of “content” Client and service Service provider and resource provider Provisioning = assemble & configure resources to meet user needs Management = sustain desired qualities of service despite dynamic environment
  9. The GT4 plug-in support semantic based service query. Users can input multiple (up to three, in current scenario three is enough. We can add more in our program upon request) service query criteria and input the corresponding value. We can combine multiple criteria. The initial GUI only shows one query criteria, but more can be added by clicking “Add Service Query” button. For example, we can query the caGrid services whose “Research Center” name is “Ohio State University”, with Service Name “DICOMDataService”, and has operation “PullOp”.
  10. we are now taking a broader view... what we have shown so far is just the “run” part of a more comprehensive lifecycle
  11. Beanshell scripting and XML processing support inside the workflows Taverna 2: long running workflows, data reference handling, data streaming and staging, multiple extensibility points. Complete the Taverna 2 properties New data reference handling, security management, provenance management, asynchronous processor and data streaming, explicit monitoring and steering support, new dispatch layer better, supports dynamic service binding and service invocation through a resource broker, improved concurrency handling at the workflow level
  12. The clustalw program from Emboss is called ‘emma’ Services are not deposited and preserved in software libraries. Rapid metadata heart-beat, especially on operational metadata. BioNanny – using Grid tools Use myExperiment to notify scientists with potential problems Use myExperiment to be smart about which services should be monitored. Workflows are deposited but…. Not self-contained. Linking to external services in flux. Or depend on software Incorporating services unavailable to others. Workflow fragility and hence decay. Workflows become plans and provenance rather than working scientific objects unless tended and updated.
  13. for DEMO: search adaptivedisclosure key point: large-scale curation of rich service descriptions through community engagement, sustained over time, to ensure the quality of annotations what makes biocat stand apart from other approaches to service repositories? biocatalogue is a "super-registry" that is able to accommodate service descriptions from multiple different source registries thanks to a flexible annotation model context of application: distributed, P2P biocatalogue
  14. SeekDA is DERI. search engine for WS http://www.ebi.ac.uk/uniprot-das/
  15. curation results in trust and therefore usage encourages / justifies sustained further curation efforts really happens in all the associated tools that have references and use biocatalogue e.g. curation of Taverna services, of myExp workflows, etc accreditation of curators: idea borrowed from the wikipedia style of community contribution. not all contributors are equal, but all are entitled to providing contributions
  16. how are the types of annotations chosen? contributors are free to add pretty much any types of annotations at the moment, using a simple tagging mechanism, i.e., annotations are not necessarily locked into controlled vocabularies / ontologiesthink of them as name-value pairs for the time being, but already exposing metadata as RDF through the APIusers can rate services. main rating criteria expected to be ease of use, availability and quality of documentation. Separate subjective from objectiveadditionally, there is room for automated curation: using Quasar, for exampleservice monitoring and associated automated annotations. functional testing of services. let users/ providers upoload complex test scripts, including full-fledged workflows, for example. biocatalogue can run these periodically and annotate the services with the outcome (think "junit test reports" as automated annotations)Service Profile Wheel AvailabilityFreshness
  17. Automated monitoring & testing Test scripts, endpoint availability, meantime failure Partner feeds myExperiment.org Workflow profile Update feeds to users Develop incentives Expert for oversight How do we rank? How do we compare non-alike?
  18. Cite FLOSS
  19. myExperiment is as much an engineering project as it is a social experiment
  20. e-Science is me-Science Aligning community with individual. But we have to aware of the drivers for collaboration. Competitive advantage. Be the first with the Nature paper. Academic vanity Credit, credibility, fame, acclaim, recognition, peer respect, reputation. Adoption Get my stuff adopted / recognised More funding Being found out Open to rigorous inspection. Being scooped Beaten by lab X Protecting my turf. Releasing results too early. Getting left behind. Being out of fashion. Looking stupid Being misinterpreted or misrepresented. Looking stupid. Losing control. Taking a risk
  21. attribution -> provenance
  22. Transferring data, methods and know-how from one discipline to another (e.g. astronomy image analysis applied to cancer tissue microarrays) How do you find relevant material that uses a different jargon in a different discipline organised to only suit its experts? Validation? How do I know it does what it says it does? Reproducibility? When the services are volatile. Reusability? When it contains an in-house code or application. Longevity? Will this workflow still run in 6 months time? Palpability? Why does this workflow fail? why does it work? How does it work?
  23. Validation? How do I know it does what it says it does? Reproducibility? When the services are volatile. Reusability? When it contains an in-house code or application. Longevity? Will this workflow still run in 6 months time? Palpability? Why does this workflow fail? why does it work? How does it work?
  24. attirbution -> provenance curation -> provenance
  25. make ref to second talk -- on ROs
  26. myExperiment is as much an engineering project as it is a social experiment
  27. step back: what do we mean by provenance, in general?data and process provenancewhat do we do with it?we collect a great deal of metadata, at very fine level of granularity: what can we expect to get out of it?
  28. see dedicated in-depth talk for details on our approach
  29. Lineage query lets users identify variables that carry interesting values for which provenance is sought nodes in the graph where provenance information should be reported