SlideShare a Scribd company logo
Automated HypothesisTesting with
Large Scale Scientific Workflows
Yolanda Gil
Daniel Garijo
Rajiv Mayani
Varun Ratnakar
Information Sciences Institute
& Department of Computer Science
University of Southern California
http://www.isi.edu
Parag Mallick
Ravali Adusumilli
Hunter Boyce
Stanford School of Medicine
Canary Center for Early Cancer Detection
Stanford University
http://mallicklab.stanford.edu
http://www.disk-project.org
Talk Outline
๏ Motivation
๏ Research Challenges
1. Representing Hypotheses
2. Representing Lines of Inquiry
3. Meta-analysis to review workflow results
๏ DISK Scenario walkthrough
๏ Results in cancer multi-omics
๏ Related work
๏ Contributions and Future Work
Scientific Data AnalysisToday:
Inefficient, Incomplete, Irreproducible
๏ Data analysis is time consuming
๏ Not systematic
๏ Not updated when new data/methods
become available
๏ Hard/impractical to reproduce prior
work
๏ Overall process is manually done:
inefficient and error-prone
๏ Analytic knowledge is
compartmentalised
New
hypothesis
Formulate
line of inquiry
(data + method)
Retrieve
data
Run
workflows
(methods)
Meta-analysis
of results
Our Focus: Cancer Multi-Omics
๏ Data Availability and Complexity:
• The multi-omic domain is filled with multiple levels of heterogeneous data that is
regularly expanding in volume and complexity through projects likeThe Cancer
Genome AtlasTCGA and and the associated Clinical ProteomicTumor Analysis
Consortium (CPTAC)
Our Focus: Cancer Multi-Omics
๏ Analytic Complexity:
• Multi-omic analysis requires the
use of dozens of interconnected
tools each of which may require
substantial domain knowledge. MAQ	
BWA	
BWA-SW	(SE	
only)		
PERM	
SOAPv2	
MOSAIK	
NOVOALIGN	
SAMTOOLS	
PICARD	
GATK	
PICARD	
SAMTOOLS	
IGVtools	
Domain Knowledge is isolated
Our Focus: Cancer Multi-Omics
๏ Multiple types and complexities
of hypotheses:
• Hypotheses span the range from
single-gene/single dataset to
multi-gene/multi-ome/multi-
dataset
• Is this protein is found in this sample ?
• Is this gene is found in this sample ?
• Is this protein is associated with a
certain cancer ?
• Which proteins are associated with a
certain cancer ?
• ..
• ..
Talk Outline
๏ Motivation
๏ Our Approach & Research Challenges
1. Representing Hypotheses
2. Representing Lines of Inquiry
3. Meta-analysis to review workflow results
๏ DISK Scenario walkthrough
๏ Results in cancer multi-omics
๏ Related work
๏ Contributions and Future Work
Our Approach: Hypotheses-Driven Discovery
๏ Represent scientist
hypotheses
๏ Formulate lines of inquiry
that express how a type of
hypothesis can be pursued by
data analysis workflows
๏ Design a meta-analysis that
examines the results of lines of
inquiry and either validates or
revises the original hypotheses
๏ Develop an intelligent agent
that can report and explain
new findings to the scientist
Hypothesis
Lines of Inquiry
Specify relevant analytic methods (workflows),
type of data needed, and how to combine results
Query to
retrieve Data
Data Analysis
Workflows
Workflow
Bindings
Meta-Workflows
Confidence
Estimation
Benchmarking
Revised hypothesis &
interesting findings
Representing Hypotheses
Hypothesis
Lines of Inquiry
Specify relevant analytic methods (workflows),
type of data needed, and how to combine results
Query to
retrieve Data
Data Analysis
Workflows
Workflow
Bindings
Meta-Workflows
Confidence
Estimation
Benchmarking
Revised hypothesis &
interesting findings
Representing Hypotheses
Requirements from Omics
๏ Graph-based hypothesis
representation
• Entities are nodes
• Relationships are links
๏ Annotations on graphs
• Represent qualifications of hypotheses:
confidence and evidence
๏ Representing hypothesis evolution
• Graph versioning
Graph representation in RDF
๏ Standard semantic web language
๏ Scalable reasoners available
๏ Qualifications and provenance
through triple reification
๏ Versioning through multiple
named graphs
Representing Hypotheses
Representing Hypotheses
Biology
ontology
Hypothesis
ontology
hyp:expressedIn
user:TCGA-AA-3561-01A-22
User data
definitions
hyp:associatedWith
bio:ColonCancer
Graph Hy1
Graph Hy2
bio:PRKCDBP
bio:PRKCDBP
Lifecycle of a hypothesis
Biology
ontology
Hypothesis
ontology
hyp:expressedIn
user:TCGA-AA-3561-01A-22
User data
definitions
hyp:associatedWith
bio:ColonCancer
Graph Hy1
Graph Hy2
bio:PRKCDBP
bio:PRKCDBP
1. Initial Hypothesis, Data & Workflows
Data Available
Workflows Available
Proteomics
Proteogenomics
XX_3561Proteome_VU.zip
(MassSpecData)
producedData TCGA-AA-3561
(Patient)
collectedFromTCGA-AA-3561-01A-22
(Sample)
AA_3561_EX2
(Experiment)
experimentedOn
Hypothesis Statement Hy1
PRKCDBP
expressedIn
TCGA-AA-3561-01A-22
2. Running workflows on Data
Data Available
Workflows Available
Proteomics
Proteogenomics
XX_3561Proteome_VU.zip
(MassSpecData)
producedData TCGA-AA-3561
(Patient)
collectedFromTCGA-AA-3561-01A-22
(Sample)
AA_3561_EX2
(Experiment)
experimentedOn
Workflow Execution
W1
hasWorkflowTemplate
used
Hypothesis Statement Hy1
PRKCDBP
expressedIn
TCGA-AA-3561-01A-22
Qualifications of Hy1'Provenance of Hy1'
Hypothesis Statement Hy1
3. Meta reasoning about workflow results
PRKCDBP
expressedIn
TCGA-AA-3561-01A-22
Data Available
Workflows Available
Proteomics
Proteogenomics
XX_3561Proteome_VU.zip
(MassSpecData)
producedData TCGA-AA-3561
(Patient)
collectedFromTCGA-AA-3561-01A-22
(Sample)
AA_3561_EX2
(Experiment)
experimentedOn
Workflow Execution
W1
hasWorkflowTemplate
used
Meta-Workflow Execution
MW1
used
Revised Hypothesis Statement Hy1'
PRKCDBP
expressedIn
TCGA-AA-3561-01A-22
hasConfidenceValue
0
Statement Hy1'-S1
hasProvenance
producedused
produced
revisionOf
4. New Data becomes available
Workflows Available
Proteomics
Proteogenomics
Hypothesis Statement Ha1
PRKCDBP
expressedIn
TCGA-AA-3561-01A-22
Data Available
XX_3561Proteome_VU.zip
(MassSpecData)
producedData
producedData
experimentedOn
experimentedOn
TCGA-AA-3561
(Patient)
collectedFromTCGA-AA-3561-01A-22
(Sample)
AA_3561_EX1
(Experiment)
AA_3561_EX2
(Experiment)
XX_3561_DD.zip
(RNASeqData)
5. New Multi-Workflows are also run
Workflows Available
Proteomics
Proteogenomics
used
Data Available
XX_3561Proteome_VU.zip
(MassSpecData)
producedData
producedData
experimentedOn
experimentedOn
TCGA-AA-3561
(Patient)
collectedFromTCGA-AA-3561-01A-22
(Sample)
AA_3561_EX1
(Experiment)
AA_3561_EX2
(Experiment)
Workflow Execution
W2
XX_3561_DD.zip
(RNASeqData)
Workflow Execution
W1
used
Hypothesis Statement Ha1
PRKCDBP
expressedIn
TCGA-AA-3561-01A-22
Qualifications of Ha1'
hasProvenance
Provenance of Ha1'
6. Hypothesis Revision
Workflows Available
Proteomics
Proteogenomics
used
used
Revised Hypothesis Statement Ha1'
PRKCDBP
Mutated
expressedIn
TCGA-AA-3561-01A-22
hasConfidenceValue
0.98
Statement Ha1'-S1
producedused
Data Available
XX_3561Proteome_VU.zip
(MassSpecData)
producedData
producedData
experimentedOn
experimentedOn
TCGA-AA-3561
(Patient)
collectedFromTCGA-AA-3561-01A-22
(Sample)
AA_3561_EX1
(Experiment)
AA_3561_EX2
(Experiment)
Workflow Execution
W2
XX_3561_DD.zip
(RNASeqData)
Workflow Execution
W1
used used
produced
Meta-Workflow Execution
MW2
Hypothesis Statement Ha1
PRKCDBP
expressedIn
TCGA-AA-3561-01A-22
revisionOf
Representing Lines of Inquiry & Data analysis workflows
Hypothesis
Lines of Inquiry
Specify relevant analytic methods (workflows),
type of data needed, and how to combine results
Query to
retrieve Data
Data Analysis
Workflows
Workflow
Bindings
Meta-Workflows
Confidence
Estimation
Benchmarking
Revised hypothesis &
interesting findings
Data Query Pattern
DataFile ?d
Hypothesis Pattern
Lines of Inquiry
๏ Capture how to setup potential analyses that can be pursued to test a certain type of
hypothesis
bio:Protein ?p
hyp:expressedIn
bio:Sample ?s
producedData
Patient ?pcollectedFromSample ?sExperiment ?e
experimentedOn
Data Analytic Workflows
ProteomicsProteogenomics
DataFile ?d
Meta-workflowsComparisonConfidence estimation Benchmarking
Example Multi-omics Workflow (Zhang et. al replication)
Automated Workflow Generation in WINGS by Reasoning about
Semantic Constraints
Example: all input data must be from human species, i.e. must have HS in metadata
Workflow system uses this constraint to select datasets that have HS in their metadata so they are valid
Representing Hypotheses
Hypothesis
Lines of Inquiry
Specify relevant analytic methods (workflows),
type of data needed, and how to combine results
Query to
retrieve Data
Data Analysis
Workflows
Workflow
Bindings
Meta-Workflows
Confidence
Estimation
Benchmarking
Revised hypothesis &
interesting findings
Meta-workflows:
1) Comparison Meta-Workflows
Variant
Detection
Custom
Protein DB
Protein
Identification
Protein
Identification
Custom DB Reference DB
Protein IDs Protein IDs
Similarity
ScoreData Dependent:
•  Peptide Level
•  Protein Level
•  Scan Level
Comparison
Meta-Workflow
๏ Goals:
• Compare results amongst multiple workflows
• Measure the global similarity amongst multiple workflows
• Provide users with explanation of workflow-dependent
differences in results
Meta-workflows:
2) Benchmark Meta-Workflows
๏ Goals:
• Evaluation of workflow performance
• Training of confidence estimation models (probabilistic)
Probabilistic Models
Benchmark
Meta-Workflow
ROC, True/False
Positive Rate
Meta-workflows:
3) Confidence estimation Meta-Workflows
๏ Goals:
• Composite results from multiple workflows
• Estimate confidence of the workflow result
• Use estimated confidence to update hypothesis
Protein
Identification
Protein
Identification
Custom DB Reference DB
Protein IDs Protein IDs
Probabilistic
Model
Estimate Confidence
Update Hypothesis
Benchmark
Meta-Workflow
Talk Outline
๏ Motivation
๏ Our Approach & Research Challenges
1. Representing Hypotheses
2. Representing Lines of Inquiry
3. Meta-analysis to review workflow results
๏ DISK Scenario walkthrough
๏ Results in cancer multi-omics
๏ Related work
๏ Contributions and Future Work
DISK Walkthrough: Initial Hypothesis
๏ Initial hypothesis is provided by the user
• PRKCDBP protein is expressed in a patient sample
DISK Walkthrough: Lines of Inquiry
๏ Line of inquiry suggests to find data from different experiments done with the
patient’s sample, then run multi-omic workflows, and then combine evidence into
confidence score
General hypothesis pattern
Data query pattern: search for different experiments
that produced omics data (eg type RNASeq and
MassSpecData)
Data analysis workflows to run on genomics and
proteomics data (more omics in the future)
Meta-workflows to assess confidence on the
hypothesis based on workflow results
DISK Walkthrough: Data & Workflows
To test a hypothesis that a protein is present in a patient’s sample:
๏ Retrieve mass spec and RNASeq data
๏ Use workflows
• Wf1: Proteome only
• Wf2: ProteoGenomic
DISK Walkthrough: Meta-Workflows
๏ After running the workflows, meta-
workflow analyse the results and generate a
confidence value
DISK Walkthrough: Revised Hypothesis
๏ The hypothesis is revised and given a confidence value:
• A mutation of the protein PRKCDBP has been expressed in the patient’s sample
TCGA-AA-3561-01A-22 with a confidence 0.9887
DISK Walkthrough: Provenance Details
๏ Hypothesis provenance stores information about workflows run and the data used
• Workflow execution provenance is published by WINGS in the prov standard.
Talk Outline
๏ Motivation
๏ Our Approach & Research Challenges
1. Representing Hypotheses
2. Representing Lines of Inquiry
3. Meta-analysis to review workflow results
๏ DISK Scenario walkthrough
๏ Results in cancer multi-omics
๏ Related work
๏ Contributions and Future Work
DISK:Automated DIscovery of Scientific Knowledge
Workflow
Constraints
Workflow
Reasoning
Open
Publication of
Results as
Linked Data
Workflow
Provenance
WINGS Intelligent Workflow System
Lines of Inquiry
Interactive
Discovery
Agent
Hypothesis EvaluationHypotheses
Revised
hypotheses
& interesting
findings
Analytic Workflows
Data Retrieval
Workflow
Binding
Meta-Workflows
Confidence
Estimation
Benchmarking
Formulate
Lines of
Inquiry
Meta-Analysis
of Results
Data
Repository
Our Initial Focus: Reproduce Seminal Omics Analysis
[Zhang et al 2014]
๏ Replicated [Zhang et al 2014] Proteogenomic analysis of Colo-rectal cancer
๏ Successfully reproduced paper findings comparing results at multiple levels (final figure,
supplementary tables, etc.)
๏ Took months and direct conversations with authors to replicate paper figures and
supplemental figures
๏ Application of analysis approach to new cancer type now takes minutes
• Useful whenTCGA is integrated
๏ Expanded analysis to
• compare how sensitive findings were to workflow details
0
2
4
6
−1.0 −0.5 0.0 0.5 1.0
spearman correlation
density
Correlation between mRNA−protein abundance
(within samples)
0
1
2
−4 −3 −2 −1 0
spearman correlation
density
Correlation between mRNA−protein variation
(across samples)
Impact on Cancer Multi-Omics
Talk Outline
๏ Motivation
๏ Our Approach & Research Challenges
1. Representing Hypotheses
2. Representing Lines of Inquiry
3. Meta-analysis to review workflow results
๏ DISK Scenario walkthrough
๏ Results in cancer multi-omics
๏ Related work
๏ Contributions and Future Work
Related Work
1) Discovery Systems
๏ [Lenat 1976]
๏ [Lindsay et al 1980]
๏ [Langley 1981]
๏ [Falkenhainer 1985]
๏ [Kulkarni and Simon 1988]
๏ [Cheeseman et al 1989]
๏ [Zytkow et al 1990]
๏ [Simon 1996]
๏ [Valdes-Perez 1997]
๏ [Todorovski et al 2000]
๏ [Schmidt and Lipson 2009]
Related Work:
2) Hypothesis Representation as Graphs
๏ Existing vocabularies are related but need to be extended to represent hypotheses in
DISK
• SWAN [Gao et al 2006]
• EXPO [Soldatova and King 2006]
• Nanopublications [Groth et al 2010]
• Ovopublications [Callahan and Dumontier 2013]
• Micropublications [Clark et al 2014]
• LSC
• BEL
Talk Outline
๏ Motivation
๏ Our Approach & Research Challenges
1. Representing Hypotheses
2. Representing Lines of Inquiry
3. Meta-analysis to review workflow results
๏ DISK Scenario walkthrough
๏ Results in cancer multi-omics
๏ Related work
๏ Contributions and Future Work
Contributions
๏ Represent scientist hypotheses
• Hypothesis ontology includes revisions & provenance
๏ Formulate lines of inquiry that express how a type of hypothesis can be
pursued with a data analysis workflow
• Lines of inquiry outline what type of data and workflows to use, and customize
them to the hypotheses at hand
๏ Design a meta-analysis to assess the results of lines of inquiry and revise the
original hypotheses
• Meta-analysis workflows assess diverse evidence
Ongoing & Future Work
๏ Ongoing work:
• Interactive Discovery Agent that explains interesting findings
• Continuous analysis of data (TCGA/CPTAC) as it grows
• Extending and generalizing meta-workflows
• Using DISK in geosciences: Subsurface water resource modeling
๏ Future challenges:
• More complex hypotheses about several entities
• Incorporate evidence over time
• Designing domain-independent meta-workflows
• Resource-bound hypothesis exploration
Thank you

More Related Content

What's hot

Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
Eva Durall
 
Kenett On Information NYU-Poly 2013
Kenett On Information NYU-Poly 2013Kenett On Information NYU-Poly 2013
Kenett On Information NYU-Poly 2013
The Hebrew University of Jerusalem
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
Davis David
 
Breakdown of Regression Models for Dissertations
Breakdown of Regression Models for DissertationsBreakdown of Regression Models for Dissertations
Breakdown of Regression Models for Dissertations
Statistics Solutions
 
Data analysis
Data analysisData analysis
Data analysis
HarisRiaz25
 
Es credit scoring_2020
Es credit scoring_2020Es credit scoring_2020
Es credit scoring_2020
Eero Siljander
 
Applying ‘best fit’ frameworks to systematic review data extraction
Applying ‘best fit’ frameworks to systematic review data extractionApplying ‘best fit’ frameworks to systematic review data extraction
Applying ‘best fit’ frameworks to systematic review data extraction
Andrea Miller-Nesbitt
 
Chapter 15 Social Research
Chapter 15 Social ResearchChapter 15 Social Research
Chapter 15 Social Research
arpsychology
 
Exploratory Factor Analysis; Concepts and Theory
Exploratory Factor Analysis; Concepts and TheoryExploratory Factor Analysis; Concepts and Theory
Exploratory Factor Analysis; Concepts and Theory
Hamed Taherdoost
 
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKSTECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
Hamed Taherdoost
 
Data analysis aug-11
Data analysis aug-11Data analysis aug-11
Data analysis aug-11
DrVinodhiniYallagand
 
An Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationAn Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentation
Seth Grimes
 
Qualitative data analysis pdf
Qualitative data analysis pdfQualitative data analysis pdf
Qualitative data analysis pdf
Ayuni Abdullah
 
Introduction to regression
Introduction to regressionIntroduction to regression
Introduction to regression
Dr. C.V. Suresh Babu
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
Seth Grimes
 
Invited Lecture on Interactive Information Retrieval
Invited Lecture on Interactive Information RetrievalInvited Lecture on Interactive Information Retrieval
Invited Lecture on Interactive Information Retrieval
DavidMaxwell77
 
Dealing with incomplete data for mapping and spatial analysis
Dealing with incomplete data for mapping and spatial analysisDealing with incomplete data for mapping and spatial analysis
Dealing with incomplete data for mapping and spatial analysis
Aileen Buckley
 
Nuts and bolts
Nuts and boltsNuts and bolts
Nuts and bolts
NBER
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
Vishwas N
 

What's hot (19)

Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
 
Kenett On Information NYU-Poly 2013
Kenett On Information NYU-Poly 2013Kenett On Information NYU-Poly 2013
Kenett On Information NYU-Poly 2013
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 
Breakdown of Regression Models for Dissertations
Breakdown of Regression Models for DissertationsBreakdown of Regression Models for Dissertations
Breakdown of Regression Models for Dissertations
 
Data analysis
Data analysisData analysis
Data analysis
 
Es credit scoring_2020
Es credit scoring_2020Es credit scoring_2020
Es credit scoring_2020
 
Applying ‘best fit’ frameworks to systematic review data extraction
Applying ‘best fit’ frameworks to systematic review data extractionApplying ‘best fit’ frameworks to systematic review data extraction
Applying ‘best fit’ frameworks to systematic review data extraction
 
Chapter 15 Social Research
Chapter 15 Social ResearchChapter 15 Social Research
Chapter 15 Social Research
 
Exploratory Factor Analysis; Concepts and Theory
Exploratory Factor Analysis; Concepts and TheoryExploratory Factor Analysis; Concepts and Theory
Exploratory Factor Analysis; Concepts and Theory
 
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKSTECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
 
Data analysis aug-11
Data analysis aug-11Data analysis aug-11
Data analysis aug-11
 
An Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationAn Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentation
 
Qualitative data analysis pdf
Qualitative data analysis pdfQualitative data analysis pdf
Qualitative data analysis pdf
 
Introduction to regression
Introduction to regressionIntroduction to regression
Introduction to regression
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
 
Invited Lecture on Interactive Information Retrieval
Invited Lecture on Interactive Information RetrievalInvited Lecture on Interactive Information Retrieval
Invited Lecture on Interactive Information Retrieval
 
Dealing with incomplete data for mapping and spatial analysis
Dealing with incomplete data for mapping and spatial analysisDealing with incomplete data for mapping and spatial analysis
Dealing with incomplete data for mapping and spatial analysis
 
Nuts and bolts
Nuts and boltsNuts and bolts
Nuts and bolts
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 

Similar to Automated Hypothesis Testing with Large Scale Scientific Workflows

GBS MSCBDA - Dissertation Guidelines.pdf
GBS MSCBDA - Dissertation Guidelines.pdfGBS MSCBDA - Dissertation Guidelines.pdf
GBS MSCBDA - Dissertation Guidelines.pdf
StanleyChivandire1
 
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
alessio_ferrari
 
Statistics for Librarians: How to Use and Evaluate Statistical Evidence
Statistics for Librarians: How to Use and Evaluate Statistical EvidenceStatistics for Librarians: How to Use and Evaluate Statistical Evidence
Statistics for Librarians: How to Use and Evaluate Statistical Evidence
John McDonald
 
Case Study Research in Software Engineering
Case Study Research in Software EngineeringCase Study Research in Software Engineering
Case Study Research in Software Engineering
alessio_ferrari
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Sri Ambati
 
Data analytics in computer networking
Data analytics in computer networkingData analytics in computer networking
Data analytics in computer networking
Stenio Fernandes
 
Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"
James Neill
 
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...
Rasha
 
Research methodology (2)
Research methodology (2)Research methodology (2)
Research methodology (2)
Sandeep Soni Kanpur
 
Quantitative Research: Surveys and Experiments
Quantitative Research: Surveys and ExperimentsQuantitative Research: Surveys and Experiments
Quantitative Research: Surveys and Experiments
Martin Kretzer
 
ATPI Dissertation Proposal Rubric 2013
ATPI Dissertation Proposal Rubric 2013ATPI Dissertation Proposal Rubric 2013
ATPI Dissertation Proposal Rubric 2013
Laura Pasquini
 
The Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetThe Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer Dataset
CongChen35
 
grizzly - informal overview - pydata boston 2013
grizzly - informal overview - pydata boston 2013 grizzly - informal overview - pydata boston 2013
grizzly - informal overview - pydata boston 2013
adrianheilbut
 
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
Paolo Missier
 
A step-by-step guide for conducting statistical data analysis
A step-by-step guide for conducting statistical data analysisA step-by-step guide for conducting statistical data analysis
A step-by-step guide for conducting statistical data analysis
Phd Assistance
 
Panda Provenance
Panda ProvenancePanda Provenance
Panda Provenance
Vlad Vega
 
Rearch methodology
Rearch methodologyRearch methodology
Rearch methodology
Yedu Dharan
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
Izwan Nizal Mohd Shaharanee
 
Quantitative Method
Quantitative MethodQuantitative Method
Quantitative Method
zahraa Aamir
 
Quantitative and Qualitative research-100120032723-phpapp01.pptx
Quantitative and Qualitative research-100120032723-phpapp01.pptxQuantitative and Qualitative research-100120032723-phpapp01.pptx
Quantitative and Qualitative research-100120032723-phpapp01.pptx
KainatJameel
 

Similar to Automated Hypothesis Testing with Large Scale Scientific Workflows (20)

GBS MSCBDA - Dissertation Guidelines.pdf
GBS MSCBDA - Dissertation Guidelines.pdfGBS MSCBDA - Dissertation Guidelines.pdf
GBS MSCBDA - Dissertation Guidelines.pdf
 
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
 
Statistics for Librarians: How to Use and Evaluate Statistical Evidence
Statistics for Librarians: How to Use and Evaluate Statistical EvidenceStatistics for Librarians: How to Use and Evaluate Statistical Evidence
Statistics for Librarians: How to Use and Evaluate Statistical Evidence
 
Case Study Research in Software Engineering
Case Study Research in Software EngineeringCase Study Research in Software Engineering
Case Study Research in Software Engineering
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
 
Data analytics in computer networking
Data analytics in computer networkingData analytics in computer networking
Data analytics in computer networking
 
Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"
 
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...
 
Research methodology (2)
Research methodology (2)Research methodology (2)
Research methodology (2)
 
Quantitative Research: Surveys and Experiments
Quantitative Research: Surveys and ExperimentsQuantitative Research: Surveys and Experiments
Quantitative Research: Surveys and Experiments
 
ATPI Dissertation Proposal Rubric 2013
ATPI Dissertation Proposal Rubric 2013ATPI Dissertation Proposal Rubric 2013
ATPI Dissertation Proposal Rubric 2013
 
The Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetThe Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer Dataset
 
grizzly - informal overview - pydata boston 2013
grizzly - informal overview - pydata boston 2013 grizzly - informal overview - pydata boston 2013
grizzly - informal overview - pydata boston 2013
 
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
 
A step-by-step guide for conducting statistical data analysis
A step-by-step guide for conducting statistical data analysisA step-by-step guide for conducting statistical data analysis
A step-by-step guide for conducting statistical data analysis
 
Panda Provenance
Panda ProvenancePanda Provenance
Panda Provenance
 
Rearch methodology
Rearch methodologyRearch methodology
Rearch methodology
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
Quantitative Method
Quantitative MethodQuantitative Method
Quantitative Method
 
Quantitative and Qualitative research-100120032723-phpapp01.pptx
Quantitative and Qualitative research-100120032723-phpapp01.pptxQuantitative and Qualitative research-100120032723-phpapp01.pptx
Quantitative and Qualitative research-100120032723-phpapp01.pptx
 

More from dgarijo

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
dgarijo
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
dgarijo
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
dgarijo
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
dgarijo
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
dgarijo
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
dgarijo
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
dgarijo
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
dgarijo
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
dgarijo
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
dgarijo
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
dgarijo
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
dgarijo
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
dgarijo
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
dgarijo
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
dgarijo
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
dgarijo
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
dgarijo
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
dgarijo
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
dgarijo
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
dgarijo
 

More from dgarijo (20)

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
 

Recently uploaded

Odoo 17 Events - Attendees List Scanning
Odoo 17 Events - Attendees List ScanningOdoo 17 Events - Attendees List Scanning
Odoo 17 Events - Attendees List Scanning
Celine George
 
New Features in Odoo 17 Sign - Odoo 17 Slides
New Features in Odoo 17 Sign - Odoo 17 SlidesNew Features in Odoo 17 Sign - Odoo 17 Slides
New Features in Odoo 17 Sign - Odoo 17 Slides
Celine George
 
What is Rescue Session in Odoo 17 POS - Odoo 17 Slides
What is Rescue Session in Odoo 17 POS - Odoo 17 SlidesWhat is Rescue Session in Odoo 17 POS - Odoo 17 Slides
What is Rescue Session in Odoo 17 POS - Odoo 17 Slides
Celine George
 
1. Importance_of_reducing_postharvest_loss.pptx
1. Importance_of_reducing_postharvest_loss.pptx1. Importance_of_reducing_postharvest_loss.pptx
1. Importance_of_reducing_postharvest_loss.pptx
UmeshTimilsina1
 
Node JS Interview Question PDF By ScholarHat
Node JS Interview Question PDF By ScholarHatNode JS Interview Question PDF By ScholarHat
Node JS Interview Question PDF By ScholarHat
Scholarhat
 
A beginner’s guide to project reviews - everything you wanted to know but wer...
A beginner’s guide to project reviews - everything you wanted to know but wer...A beginner’s guide to project reviews - everything you wanted to know but wer...
A beginner’s guide to project reviews - everything you wanted to know but wer...
Association for Project Management
 
Imagination in Computer Science Research
Imagination in Computer Science ResearchImagination in Computer Science Research
Imagination in Computer Science Research
Abhik Roychoudhury
 
BỘ ĐỀ THI HỌC SINH GIỎI CÁC TỈNH MÔN TIẾNG ANH LỚP 9 NĂM HỌC 2023-2024 (CÓ FI...
BỘ ĐỀ THI HỌC SINH GIỎI CÁC TỈNH MÔN TIẾNG ANH LỚP 9 NĂM HỌC 2023-2024 (CÓ FI...BỘ ĐỀ THI HỌC SINH GIỎI CÁC TỈNH MÔN TIẾNG ANH LỚP 9 NĂM HỌC 2023-2024 (CÓ FI...
BỘ ĐỀ THI HỌC SINH GIỎI CÁC TỈNH MÔN TIẾNG ANH LỚP 9 NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
C Interview Questions PDF By Scholarhat.pdf
C Interview Questions PDF By Scholarhat.pdfC Interview Questions PDF By Scholarhat.pdf
C Interview Questions PDF By Scholarhat.pdf
Scholarhat
 
Codeavour 5.0 International Impact Report - The Biggest International AI, Cod...
Codeavour 5.0 International Impact Report - The Biggest International AI, Cod...Codeavour 5.0 International Impact Report - The Biggest International AI, Cod...
Codeavour 5.0 International Impact Report - The Biggest International AI, Cod...
Codeavour International
 
View Inheritance in Odoo 17 - Odoo 17 Slides
View Inheritance in Odoo 17 - Odoo 17  SlidesView Inheritance in Odoo 17 - Odoo 17  Slides
View Inheritance in Odoo 17 - Odoo 17 Slides
Celine George
 
How to Manage Early Receipt Printing in Odoo 17 POS
How to Manage Early Receipt Printing in Odoo 17 POSHow to Manage Early Receipt Printing in Odoo 17 POS
How to Manage Early Receipt Printing in Odoo 17 POS
Celine George
 
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - ...
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - ...BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - ...
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - ...
Nguyen Thanh Tu Collection
 
11EHS Term 3 Week 1 Unit 1 Review: Feedback and improvementpptx
11EHS Term 3 Week 1 Unit 1 Review: Feedback and improvementpptx11EHS Term 3 Week 1 Unit 1 Review: Feedback and improvementpptx
11EHS Term 3 Week 1 Unit 1 Review: Feedback and improvementpptx
mansk2
 
Introduction to Banking System in India.ppt
Introduction to Banking System in India.pptIntroduction to Banking System in India.ppt
Introduction to Banking System in India.ppt
Dr. S. Bulomine Regi
 
RDBMS Lecture Notes Unit4 chapter12 VIEW
RDBMS Lecture Notes Unit4 chapter12 VIEWRDBMS Lecture Notes Unit4 chapter12 VIEW
RDBMS Lecture Notes Unit4 chapter12 VIEW
Murugan Solaiyappan
 
Introduction to Google Productivity Tools for Office and Personal Use
Introduction to Google Productivity Tools for Office and Personal UseIntroduction to Google Productivity Tools for Office and Personal Use
Introduction to Google Productivity Tools for Office and Personal Use
Excellence Foundation for South Sudan
 
QCE – Unpacking the syllabus Implications for Senior School practices and ass...
QCE – Unpacking the syllabus Implications for Senior School practices and ass...QCE – Unpacking the syllabus Implications for Senior School practices and ass...
QCE – Unpacking the syllabus Implications for Senior School practices and ass...
mansk2
 
Cómo crear video-tutoriales con ScreenPal (2 de julio de 2024)
Cómo crear video-tutoriales con ScreenPal (2 de julio de 2024)Cómo crear video-tutoriales con ScreenPal (2 de julio de 2024)
Cómo crear video-tutoriales con ScreenPal (2 de julio de 2024)
Cátedra Banco Santander
 
Allopathic M1 Srudent Orientation Powerpoint
Allopathic M1 Srudent Orientation PowerpointAllopathic M1 Srudent Orientation Powerpoint
Allopathic M1 Srudent Orientation Powerpoint
Julie Sarpy
 

Recently uploaded (20)

Odoo 17 Events - Attendees List Scanning
Odoo 17 Events - Attendees List ScanningOdoo 17 Events - Attendees List Scanning
Odoo 17 Events - Attendees List Scanning
 
New Features in Odoo 17 Sign - Odoo 17 Slides
New Features in Odoo 17 Sign - Odoo 17 SlidesNew Features in Odoo 17 Sign - Odoo 17 Slides
New Features in Odoo 17 Sign - Odoo 17 Slides
 
What is Rescue Session in Odoo 17 POS - Odoo 17 Slides
What is Rescue Session in Odoo 17 POS - Odoo 17 SlidesWhat is Rescue Session in Odoo 17 POS - Odoo 17 Slides
What is Rescue Session in Odoo 17 POS - Odoo 17 Slides
 
1. Importance_of_reducing_postharvest_loss.pptx
1. Importance_of_reducing_postharvest_loss.pptx1. Importance_of_reducing_postharvest_loss.pptx
1. Importance_of_reducing_postharvest_loss.pptx
 
Node JS Interview Question PDF By ScholarHat
Node JS Interview Question PDF By ScholarHatNode JS Interview Question PDF By ScholarHat
Node JS Interview Question PDF By ScholarHat
 
A beginner’s guide to project reviews - everything you wanted to know but wer...
A beginner’s guide to project reviews - everything you wanted to know but wer...A beginner’s guide to project reviews - everything you wanted to know but wer...
A beginner’s guide to project reviews - everything you wanted to know but wer...
 
Imagination in Computer Science Research
Imagination in Computer Science ResearchImagination in Computer Science Research
Imagination in Computer Science Research
 
BỘ ĐỀ THI HỌC SINH GIỎI CÁC TỈNH MÔN TIẾNG ANH LỚP 9 NĂM HỌC 2023-2024 (CÓ FI...
BỘ ĐỀ THI HỌC SINH GIỎI CÁC TỈNH MÔN TIẾNG ANH LỚP 9 NĂM HỌC 2023-2024 (CÓ FI...BỘ ĐỀ THI HỌC SINH GIỎI CÁC TỈNH MÔN TIẾNG ANH LỚP 9 NĂM HỌC 2023-2024 (CÓ FI...
BỘ ĐỀ THI HỌC SINH GIỎI CÁC TỈNH MÔN TIẾNG ANH LỚP 9 NĂM HỌC 2023-2024 (CÓ FI...
 
C Interview Questions PDF By Scholarhat.pdf
C Interview Questions PDF By Scholarhat.pdfC Interview Questions PDF By Scholarhat.pdf
C Interview Questions PDF By Scholarhat.pdf
 
Codeavour 5.0 International Impact Report - The Biggest International AI, Cod...
Codeavour 5.0 International Impact Report - The Biggest International AI, Cod...Codeavour 5.0 International Impact Report - The Biggest International AI, Cod...
Codeavour 5.0 International Impact Report - The Biggest International AI, Cod...
 
View Inheritance in Odoo 17 - Odoo 17 Slides
View Inheritance in Odoo 17 - Odoo 17  SlidesView Inheritance in Odoo 17 - Odoo 17  Slides
View Inheritance in Odoo 17 - Odoo 17 Slides
 
How to Manage Early Receipt Printing in Odoo 17 POS
How to Manage Early Receipt Printing in Odoo 17 POSHow to Manage Early Receipt Printing in Odoo 17 POS
How to Manage Early Receipt Printing in Odoo 17 POS
 
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - ...
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - ...BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - ...
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - ...
 
11EHS Term 3 Week 1 Unit 1 Review: Feedback and improvementpptx
11EHS Term 3 Week 1 Unit 1 Review: Feedback and improvementpptx11EHS Term 3 Week 1 Unit 1 Review: Feedback and improvementpptx
11EHS Term 3 Week 1 Unit 1 Review: Feedback and improvementpptx
 
Introduction to Banking System in India.ppt
Introduction to Banking System in India.pptIntroduction to Banking System in India.ppt
Introduction to Banking System in India.ppt
 
RDBMS Lecture Notes Unit4 chapter12 VIEW
RDBMS Lecture Notes Unit4 chapter12 VIEWRDBMS Lecture Notes Unit4 chapter12 VIEW
RDBMS Lecture Notes Unit4 chapter12 VIEW
 
Introduction to Google Productivity Tools for Office and Personal Use
Introduction to Google Productivity Tools for Office and Personal UseIntroduction to Google Productivity Tools for Office and Personal Use
Introduction to Google Productivity Tools for Office and Personal Use
 
QCE – Unpacking the syllabus Implications for Senior School practices and ass...
QCE – Unpacking the syllabus Implications for Senior School practices and ass...QCE – Unpacking the syllabus Implications for Senior School practices and ass...
QCE – Unpacking the syllabus Implications for Senior School practices and ass...
 
Cómo crear video-tutoriales con ScreenPal (2 de julio de 2024)
Cómo crear video-tutoriales con ScreenPal (2 de julio de 2024)Cómo crear video-tutoriales con ScreenPal (2 de julio de 2024)
Cómo crear video-tutoriales con ScreenPal (2 de julio de 2024)
 
Allopathic M1 Srudent Orientation Powerpoint
Allopathic M1 Srudent Orientation PowerpointAllopathic M1 Srudent Orientation Powerpoint
Allopathic M1 Srudent Orientation Powerpoint
 

Automated Hypothesis Testing with Large Scale Scientific Workflows

  • 1. Automated HypothesisTesting with Large Scale Scientific Workflows Yolanda Gil Daniel Garijo Rajiv Mayani Varun Ratnakar Information Sciences Institute & Department of Computer Science University of Southern California http://www.isi.edu Parag Mallick Ravali Adusumilli Hunter Boyce Stanford School of Medicine Canary Center for Early Cancer Detection Stanford University http://mallicklab.stanford.edu http://www.disk-project.org
  • 2. Talk Outline ๏ Motivation ๏ Research Challenges 1. Representing Hypotheses 2. Representing Lines of Inquiry 3. Meta-analysis to review workflow results ๏ DISK Scenario walkthrough ๏ Results in cancer multi-omics ๏ Related work ๏ Contributions and Future Work
  • 3. Scientific Data AnalysisToday: Inefficient, Incomplete, Irreproducible ๏ Data analysis is time consuming ๏ Not systematic ๏ Not updated when new data/methods become available ๏ Hard/impractical to reproduce prior work ๏ Overall process is manually done: inefficient and error-prone ๏ Analytic knowledge is compartmentalised New hypothesis Formulate line of inquiry (data + method) Retrieve data Run workflows (methods) Meta-analysis of results
  • 4. Our Focus: Cancer Multi-Omics ๏ Data Availability and Complexity: • The multi-omic domain is filled with multiple levels of heterogeneous data that is regularly expanding in volume and complexity through projects likeThe Cancer Genome AtlasTCGA and and the associated Clinical ProteomicTumor Analysis Consortium (CPTAC)
  • 5. Our Focus: Cancer Multi-Omics ๏ Analytic Complexity: • Multi-omic analysis requires the use of dozens of interconnected tools each of which may require substantial domain knowledge. MAQ BWA BWA-SW (SE only) PERM SOAPv2 MOSAIK NOVOALIGN SAMTOOLS PICARD GATK PICARD SAMTOOLS IGVtools Domain Knowledge is isolated
  • 6. Our Focus: Cancer Multi-Omics ๏ Multiple types and complexities of hypotheses: • Hypotheses span the range from single-gene/single dataset to multi-gene/multi-ome/multi- dataset • Is this protein is found in this sample ? • Is this gene is found in this sample ? • Is this protein is associated with a certain cancer ? • Which proteins are associated with a certain cancer ? • .. • ..
  • 7. Talk Outline ๏ Motivation ๏ Our Approach & Research Challenges 1. Representing Hypotheses 2. Representing Lines of Inquiry 3. Meta-analysis to review workflow results ๏ DISK Scenario walkthrough ๏ Results in cancer multi-omics ๏ Related work ๏ Contributions and Future Work
  • 8. Our Approach: Hypotheses-Driven Discovery ๏ Represent scientist hypotheses ๏ Formulate lines of inquiry that express how a type of hypothesis can be pursued by data analysis workflows ๏ Design a meta-analysis that examines the results of lines of inquiry and either validates or revises the original hypotheses ๏ Develop an intelligent agent that can report and explain new findings to the scientist Hypothesis Lines of Inquiry Specify relevant analytic methods (workflows), type of data needed, and how to combine results Query to retrieve Data Data Analysis Workflows Workflow Bindings Meta-Workflows Confidence Estimation Benchmarking Revised hypothesis & interesting findings
  • 9. Representing Hypotheses Hypothesis Lines of Inquiry Specify relevant analytic methods (workflows), type of data needed, and how to combine results Query to retrieve Data Data Analysis Workflows Workflow Bindings Meta-Workflows Confidence Estimation Benchmarking Revised hypothesis & interesting findings Representing Hypotheses
  • 10. Requirements from Omics ๏ Graph-based hypothesis representation • Entities are nodes • Relationships are links ๏ Annotations on graphs • Represent qualifications of hypotheses: confidence and evidence ๏ Representing hypothesis evolution • Graph versioning Graph representation in RDF ๏ Standard semantic web language ๏ Scalable reasoners available ๏ Qualifications and provenance through triple reification ๏ Versioning through multiple named graphs Representing Hypotheses
  • 12. Lifecycle of a hypothesis Biology ontology Hypothesis ontology hyp:expressedIn user:TCGA-AA-3561-01A-22 User data definitions hyp:associatedWith bio:ColonCancer Graph Hy1 Graph Hy2 bio:PRKCDBP bio:PRKCDBP
  • 13. 1. Initial Hypothesis, Data & Workflows Data Available Workflows Available Proteomics Proteogenomics XX_3561Proteome_VU.zip (MassSpecData) producedData TCGA-AA-3561 (Patient) collectedFromTCGA-AA-3561-01A-22 (Sample) AA_3561_EX2 (Experiment) experimentedOn Hypothesis Statement Hy1 PRKCDBP expressedIn TCGA-AA-3561-01A-22
  • 14. 2. Running workflows on Data Data Available Workflows Available Proteomics Proteogenomics XX_3561Proteome_VU.zip (MassSpecData) producedData TCGA-AA-3561 (Patient) collectedFromTCGA-AA-3561-01A-22 (Sample) AA_3561_EX2 (Experiment) experimentedOn Workflow Execution W1 hasWorkflowTemplate used Hypothesis Statement Hy1 PRKCDBP expressedIn TCGA-AA-3561-01A-22
  • 15. Qualifications of Hy1'Provenance of Hy1' Hypothesis Statement Hy1 3. Meta reasoning about workflow results PRKCDBP expressedIn TCGA-AA-3561-01A-22 Data Available Workflows Available Proteomics Proteogenomics XX_3561Proteome_VU.zip (MassSpecData) producedData TCGA-AA-3561 (Patient) collectedFromTCGA-AA-3561-01A-22 (Sample) AA_3561_EX2 (Experiment) experimentedOn Workflow Execution W1 hasWorkflowTemplate used Meta-Workflow Execution MW1 used Revised Hypothesis Statement Hy1' PRKCDBP expressedIn TCGA-AA-3561-01A-22 hasConfidenceValue 0 Statement Hy1'-S1 hasProvenance producedused produced revisionOf
  • 16. 4. New Data becomes available Workflows Available Proteomics Proteogenomics Hypothesis Statement Ha1 PRKCDBP expressedIn TCGA-AA-3561-01A-22 Data Available XX_3561Proteome_VU.zip (MassSpecData) producedData producedData experimentedOn experimentedOn TCGA-AA-3561 (Patient) collectedFromTCGA-AA-3561-01A-22 (Sample) AA_3561_EX1 (Experiment) AA_3561_EX2 (Experiment) XX_3561_DD.zip (RNASeqData)
  • 17. 5. New Multi-Workflows are also run Workflows Available Proteomics Proteogenomics used Data Available XX_3561Proteome_VU.zip (MassSpecData) producedData producedData experimentedOn experimentedOn TCGA-AA-3561 (Patient) collectedFromTCGA-AA-3561-01A-22 (Sample) AA_3561_EX1 (Experiment) AA_3561_EX2 (Experiment) Workflow Execution W2 XX_3561_DD.zip (RNASeqData) Workflow Execution W1 used Hypothesis Statement Ha1 PRKCDBP expressedIn TCGA-AA-3561-01A-22
  • 18. Qualifications of Ha1' hasProvenance Provenance of Ha1' 6. Hypothesis Revision Workflows Available Proteomics Proteogenomics used used Revised Hypothesis Statement Ha1' PRKCDBP Mutated expressedIn TCGA-AA-3561-01A-22 hasConfidenceValue 0.98 Statement Ha1'-S1 producedused Data Available XX_3561Proteome_VU.zip (MassSpecData) producedData producedData experimentedOn experimentedOn TCGA-AA-3561 (Patient) collectedFromTCGA-AA-3561-01A-22 (Sample) AA_3561_EX1 (Experiment) AA_3561_EX2 (Experiment) Workflow Execution W2 XX_3561_DD.zip (RNASeqData) Workflow Execution W1 used used produced Meta-Workflow Execution MW2 Hypothesis Statement Ha1 PRKCDBP expressedIn TCGA-AA-3561-01A-22 revisionOf
  • 19. Representing Lines of Inquiry & Data analysis workflows Hypothesis Lines of Inquiry Specify relevant analytic methods (workflows), type of data needed, and how to combine results Query to retrieve Data Data Analysis Workflows Workflow Bindings Meta-Workflows Confidence Estimation Benchmarking Revised hypothesis & interesting findings
  • 20. Data Query Pattern DataFile ?d Hypothesis Pattern Lines of Inquiry ๏ Capture how to setup potential analyses that can be pursued to test a certain type of hypothesis bio:Protein ?p hyp:expressedIn bio:Sample ?s producedData Patient ?pcollectedFromSample ?sExperiment ?e experimentedOn Data Analytic Workflows ProteomicsProteogenomics DataFile ?d Meta-workflowsComparisonConfidence estimation Benchmarking
  • 21. Example Multi-omics Workflow (Zhang et. al replication)
  • 22. Automated Workflow Generation in WINGS by Reasoning about Semantic Constraints Example: all input data must be from human species, i.e. must have HS in metadata Workflow system uses this constraint to select datasets that have HS in their metadata so they are valid
  • 23. Representing Hypotheses Hypothesis Lines of Inquiry Specify relevant analytic methods (workflows), type of data needed, and how to combine results Query to retrieve Data Data Analysis Workflows Workflow Bindings Meta-Workflows Confidence Estimation Benchmarking Revised hypothesis & interesting findings
  • 24. Meta-workflows: 1) Comparison Meta-Workflows Variant Detection Custom Protein DB Protein Identification Protein Identification Custom DB Reference DB Protein IDs Protein IDs Similarity ScoreData Dependent: •  Peptide Level •  Protein Level •  Scan Level Comparison Meta-Workflow ๏ Goals: • Compare results amongst multiple workflows • Measure the global similarity amongst multiple workflows • Provide users with explanation of workflow-dependent differences in results
  • 25. Meta-workflows: 2) Benchmark Meta-Workflows ๏ Goals: • Evaluation of workflow performance • Training of confidence estimation models (probabilistic) Probabilistic Models Benchmark Meta-Workflow ROC, True/False Positive Rate
  • 26. Meta-workflows: 3) Confidence estimation Meta-Workflows ๏ Goals: • Composite results from multiple workflows • Estimate confidence of the workflow result • Use estimated confidence to update hypothesis Protein Identification Protein Identification Custom DB Reference DB Protein IDs Protein IDs Probabilistic Model Estimate Confidence Update Hypothesis Benchmark Meta-Workflow
  • 27. Talk Outline ๏ Motivation ๏ Our Approach & Research Challenges 1. Representing Hypotheses 2. Representing Lines of Inquiry 3. Meta-analysis to review workflow results ๏ DISK Scenario walkthrough ๏ Results in cancer multi-omics ๏ Related work ๏ Contributions and Future Work
  • 28. DISK Walkthrough: Initial Hypothesis ๏ Initial hypothesis is provided by the user • PRKCDBP protein is expressed in a patient sample
  • 29. DISK Walkthrough: Lines of Inquiry ๏ Line of inquiry suggests to find data from different experiments done with the patient’s sample, then run multi-omic workflows, and then combine evidence into confidence score General hypothesis pattern Data query pattern: search for different experiments that produced omics data (eg type RNASeq and MassSpecData) Data analysis workflows to run on genomics and proteomics data (more omics in the future) Meta-workflows to assess confidence on the hypothesis based on workflow results
  • 30. DISK Walkthrough: Data & Workflows To test a hypothesis that a protein is present in a patient’s sample: ๏ Retrieve mass spec and RNASeq data ๏ Use workflows • Wf1: Proteome only • Wf2: ProteoGenomic
  • 31. DISK Walkthrough: Meta-Workflows ๏ After running the workflows, meta- workflow analyse the results and generate a confidence value
  • 32. DISK Walkthrough: Revised Hypothesis ๏ The hypothesis is revised and given a confidence value: • A mutation of the protein PRKCDBP has been expressed in the patient’s sample TCGA-AA-3561-01A-22 with a confidence 0.9887
  • 33. DISK Walkthrough: Provenance Details ๏ Hypothesis provenance stores information about workflows run and the data used • Workflow execution provenance is published by WINGS in the prov standard.
  • 34. Talk Outline ๏ Motivation ๏ Our Approach & Research Challenges 1. Representing Hypotheses 2. Representing Lines of Inquiry 3. Meta-analysis to review workflow results ๏ DISK Scenario walkthrough ๏ Results in cancer multi-omics ๏ Related work ๏ Contributions and Future Work
  • 35. DISK:Automated DIscovery of Scientific Knowledge Workflow Constraints Workflow Reasoning Open Publication of Results as Linked Data Workflow Provenance WINGS Intelligent Workflow System Lines of Inquiry Interactive Discovery Agent Hypothesis EvaluationHypotheses Revised hypotheses & interesting findings Analytic Workflows Data Retrieval Workflow Binding Meta-Workflows Confidence Estimation Benchmarking Formulate Lines of Inquiry Meta-Analysis of Results Data Repository
  • 36. Our Initial Focus: Reproduce Seminal Omics Analysis [Zhang et al 2014]
  • 37. ๏ Replicated [Zhang et al 2014] Proteogenomic analysis of Colo-rectal cancer ๏ Successfully reproduced paper findings comparing results at multiple levels (final figure, supplementary tables, etc.) ๏ Took months and direct conversations with authors to replicate paper figures and supplemental figures ๏ Application of analysis approach to new cancer type now takes minutes • Useful whenTCGA is integrated ๏ Expanded analysis to • compare how sensitive findings were to workflow details 0 2 4 6 −1.0 −0.5 0.0 0.5 1.0 spearman correlation density Correlation between mRNA−protein abundance (within samples) 0 1 2 −4 −3 −2 −1 0 spearman correlation density Correlation between mRNA−protein variation (across samples) Impact on Cancer Multi-Omics
  • 38. Talk Outline ๏ Motivation ๏ Our Approach & Research Challenges 1. Representing Hypotheses 2. Representing Lines of Inquiry 3. Meta-analysis to review workflow results ๏ DISK Scenario walkthrough ๏ Results in cancer multi-omics ๏ Related work ๏ Contributions and Future Work
  • 39. Related Work 1) Discovery Systems ๏ [Lenat 1976] ๏ [Lindsay et al 1980] ๏ [Langley 1981] ๏ [Falkenhainer 1985] ๏ [Kulkarni and Simon 1988] ๏ [Cheeseman et al 1989] ๏ [Zytkow et al 1990] ๏ [Simon 1996] ๏ [Valdes-Perez 1997] ๏ [Todorovski et al 2000] ๏ [Schmidt and Lipson 2009]
  • 40. Related Work: 2) Hypothesis Representation as Graphs ๏ Existing vocabularies are related but need to be extended to represent hypotheses in DISK • SWAN [Gao et al 2006] • EXPO [Soldatova and King 2006] • Nanopublications [Groth et al 2010] • Ovopublications [Callahan and Dumontier 2013] • Micropublications [Clark et al 2014] • LSC • BEL
  • 41. Talk Outline ๏ Motivation ๏ Our Approach & Research Challenges 1. Representing Hypotheses 2. Representing Lines of Inquiry 3. Meta-analysis to review workflow results ๏ DISK Scenario walkthrough ๏ Results in cancer multi-omics ๏ Related work ๏ Contributions and Future Work
  • 42. Contributions ๏ Represent scientist hypotheses • Hypothesis ontology includes revisions & provenance ๏ Formulate lines of inquiry that express how a type of hypothesis can be pursued with a data analysis workflow • Lines of inquiry outline what type of data and workflows to use, and customize them to the hypotheses at hand ๏ Design a meta-analysis to assess the results of lines of inquiry and revise the original hypotheses • Meta-analysis workflows assess diverse evidence
  • 43. Ongoing & Future Work ๏ Ongoing work: • Interactive Discovery Agent that explains interesting findings • Continuous analysis of data (TCGA/CPTAC) as it grows • Extending and generalizing meta-workflows • Using DISK in geosciences: Subsurface water resource modeling ๏ Future challenges: • More complex hypotheses about several entities • Incorporate evidence over time • Designing domain-independent meta-workflows • Resource-bound hypothesis exploration