Towards Human-Guided Machine Learning - IUI 2019

dgarijo
Towards Human-Guided Machine Learning
Yolanda Gil1, James Honaker2, Shikhar Gupta1, Yibo Ma1, Vito D’Orazio3,
Daniel Garijo1, Shruti Gadewar1, Qifan Yang1 and Neda Jahanshad1
1University of Southern California
2University of Texas at Dallas
3Harvard University
https://w3id.org/people/dgarijo
@dgarijov
dgarijo@isi.edu
Intelligent User Interfaces (IUI19), March 18th, 2019
Information
Sciences
Institute
Rising Popularity of AutoML Systems
Intelligent User Interfaces, March 18th, 2019 2
auto-sklearn Auto-WEKA
AlphaZero
Anatomy of an AutoML System
Intelligent User Interfaces, March 18th, 2019 3
Auto ML
Predictions
Training data
Features: Train ML algorithm and one or more of the following:
• Extract features from data
• Data preparation (imputation, encoding, etc.)
• Feature selection
• Hyperparameter optimization
• Ensembling of solutions
Trained Model
Test data
Limitations of AutoML systems
Intelligent User Interfaces, March 18th, 2019 4
Training process is not transparent
Trained models are difficult to customize
Auto ML
Predictions
Training data
Trained Model
Test data
Human-Guided Machine Learning (HGML)
Intelligent User Interfaces, March 18th, 2019 5
Auto ML
Predictions
Training data
Trained Model
Test data
Domain expert
• Domain users don’t like black boxes
• They need to understand and modify the process to train a model with their
expertise
• Modify features (remove known biases)
• Guide hyper parameter search
• ….
Interface
Contributions of our work
Intelligent User Interfaces, March 18th, 2019 6
• AutoML system and user interface that supports basic HGML interactions
• A task analysis of HGML that enumerates discrete user tasks to guide
AutoML systems
• Characterizations of two significant studies in neuroscience and political sciences
• Requirements for HGML from AutoML system and user interface
• An assessment of how those requirements could be accommodated by
AutoML systems
AutoML System: P4ML
Intelligent User Interfaces, March 18th, 2019 7
• Extract features of interest from data (text, video, audio…)
• Builds a solution with the types of model and other steps to include (e.g.
imputation, encoding, etc.)
• Perform a hyperparameter search to improve the results
• Generate ensembles with the top-ranked models.
Phased Performance-Based Pipeline Planner
Predictions
Top Ranked
Solutions
Test data
Training data
Problem description
Evaluation
metric
HashingVectorizer -> LabelEncoder -> LogisticRegressionCV (0.9489)
CountVectorizer -> LabelEncoder -> BernoulliNB (0.9486)
TfidfVectorizer -> LabelEncoder -> AdaBoostClassifier (0.9460)
UI for AutoML System Interaction: TwoRavens
Intelligent User Interfaces, March 18th, 2019 8
• Statistical summaries of variables and variable exploration
• Integration with AutoML system (P4ML)
• Specify ML problem of interest
• Explore solution results returned by AutoML system
HGML Task Analysis
Intelligent User Interfaces, March 18th, 2019 9
• Top-down analysis
• Data Use
• Selection of variables (features) and instances
• Model Development
• Model selection and tuning
• Model Interpretation
• Result comparison
• Bottom up analysis
• Neuroscience: ENIGMA neurosciences consortium
• Political sciences: Seminal paper on civil war onset
Overview of task analysis (top down)
Intelligent User Interfaces, March 18th, 2019 10
Overview of task analysis (bottom up)
Intelligent User Interfaces, March 18th, 2019 11
Neuroscience
Political Sciences
Main task results:
• Feature selection and generation
• Model type selection
• Model configuration
• Quantities of interest and metrics
UI and AutoML Requirements
Intelligent User Interfaces, March 18th, 2019 12
Combined top-bottom and bottom up analyses to identify requirements for both
AutoML and user interface
Predictions
Accommodating HGML requirements – AutoML system
Intelligent User Interfaces, March 18th, 2019 13
Phased Performance-Based Pipeline Planner
Top Ranked
Solutions
Test data
Training data
Problem description
Evaluation
metric
Requirements
{
"include_model":["LinearSVC","LogisticRegression","DecisionTreeClassifier"],
"exclude_model":[],
"include_feature_generarion":["tfidfVectorizer"],
"use_imputation_method":"median",
"include_variables":[],
"exclude_variables":[],
"include_instances":[],
"exclude_instances":[],
"define_variable_weight":[{"variable":"","weight":},{}],
"select_training_and_test_data":{"training_data": [],"testing_data":
[],"cross_validation": "k-fold"},
…
}
Accommodating HGML requirements - UI
Intelligent User Interfaces, March 18th, 2019 14
• Extensions are needed for:
• Filtering variables and instances (subpopulations)
• Comparison and exploration of solutions
• Creation of variables from existing ones
Compare, filter, explore, transform
Conclusions and Future Work
Intelligent User Interfaces, March 18th, 2019 15
• Proliferation of AutoML systems
• AutoML solutions may not take into consideration domain expertise
• Interaction is needed: Human Guided Machine Learning
• Our contributions:
• Baseline HGML UI and AutoML system integration
• A task analysis of HGML
• Characterizations of two significant studies in neuroscience and political sciences
• Requirements for HGML based on task analysis
• An assessment of how those requirements could be accommodated by AutoML
systems
• Future work:
• Extend our baseline system with the requirements identified in this paper
Towards Human-Guided Machine Learning
Yolanda Gil1, James Honaker2, Shikhar Gupta1, Yibo Ma1, Vito D’Orazio3,
Daniel Garijo1, Shruti Gadewar1, Qifan Yang1 and Neda Jahanshad1
1University of Southern California
2University of Texas at Dallas
3Harvard University
https://w3id.org/people/dgarijo
@dgarijov
dgarijo@isi.edu
Intelligent User Interfaces (IUI19), March 18th, 2019
Information
Sciences
Institute
1 of 16

Recommended

Towards Knowledge Graphs of Reusable Research Software Metadata by
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadatadgarijo
624 views22 slides
A Template-Based Approach for Annotating Long-Tailed Datasets by
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasetsdgarijo
144 views12 slides
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles by
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
520 views8 slides
Scientific Software Registry Collaboration Workshop: From Software Metadata r... by
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...dgarijo
460 views12 slides
Towards Reusable Research Software by
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Softwaredgarijo
171 views9 slides
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs by
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphsdgarijo
424 views21 slides

More Related Content

What's hot

A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met... by
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...dgarijo
583 views14 slides
Coming to terms to FAIR semantics by
Coming to terms to FAIR semanticsComing to terms to FAIR semantics
Coming to terms to FAIR semanticsMaría Poveda Villalón
284 views17 slides
FAIRer Research by
FAIRer ResearchFAIRer Research
FAIRer ResearchCarole Goble
1K views40 slides
Let’s go on a FAIR safari! by
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Carole Goble
1.4K views58 slides
The Research Object Initiative: Frameworks and Use Cases by
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use CasesCarole Goble
1.7K views61 slides
Making data typing efforts or automatically detecting data types for automat... by
Making data typing efforts or automatically detecting data types  for automat...Making data typing efforts or automatically detecting data types  for automat...
Making data typing efforts or automatically detecting data types for automat...National Institute of Informatics
134 views1 slide

What's hot(20)

A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met... by dgarijo
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
dgarijo583 views
Let’s go on a FAIR safari! by Carole Goble
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
Carole Goble1.4K views
The Research Object Initiative: Frameworks and Use Cases by Carole Goble
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
Carole Goble1.7K views
Research Objects, SEEK and FAIRDOM by Carole Goble
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOM
Carole Goble1.7K views
Publishing your research: Research Data Management (Introduction) by Jamie Bisset
Publishing your research: Research Data Management (Introduction) Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction)
Jamie Bisset1.5K views
Reinventing Laboratory Data To Be Bigger, Smarter & Faster by OSTHUS
Reinventing Laboratory Data To Be Bigger, Smarter & FasterReinventing Laboratory Data To Be Bigger, Smarter & Faster
Reinventing Laboratory Data To Be Bigger, Smarter & Faster
OSTHUS230 views
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources by Pistoia Alliance
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesApplication of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Pistoia Alliance3.3K views
FAIR Computational Workflows by Carole Goble
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble493 views
The swings and roundabouts of a decade of fun and games with Research Objects by Carole Goble
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
Carole Goble168 views
Fairification experience clarifying the semantics of data matrices by Pistoia Alliance
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matrices
Pistoia Alliance465 views
Introduction to FAIRDOM by Carole Goble
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOM
Carole Goble1.3K views
How are we Faring with FAIR? (and what FAIR is not) by Carole Goble
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)
Carole Goble814 views
Being FAIR: FAIR data and model management SSBSS 2017 Summer School by Carole Goble
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Carole Goble978 views
FAIR Data and Model Management for Systems Biology (and SOPs too!) by Carole Goble
FAIR Data and Model Management for Systems Biology(and SOPs too!)FAIR Data and Model Management for Systems Biology(and SOPs too!)
FAIR Data and Model Management for Systems Biology (and SOPs too!)
Carole Goble1.1K views
Being FAIR: Enabling Reproducible Data Science by Carole Goble
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
Carole Goble1.2K views
FAIRy stories: tales from building the FAIR Research Commons by Carole Goble
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research Commons
Carole Goble1.4K views

Similar to Towards Human-Guided Machine Learning - IUI 2019

Intelligent Career Guidance System.pptx by
Intelligent Career Guidance System.pptxIntelligent Career Guidance System.pptx
Intelligent Career Guidance System.pptxAnonymous366406
226 views28 slides
Artificial Intelligence for Automating Data Analysis by
Artificial Intelligence for Automating Data AnalysisArtificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data AnalysisManuel Martín
2.5K views41 slides
Aws autopilot by
Aws autopilotAws autopilot
Aws autopilotVivek Raja P S
117 views26 slides
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION by
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTIONCATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTIONIJDKP
3 views19 slides
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION by
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTIONCATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTIONIJDKP
4 views19 slides
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION by
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTIONCATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTIONIJDKP
13 views19 slides

Similar to Towards Human-Guided Machine Learning - IUI 2019(20)

Intelligent Career Guidance System.pptx by Anonymous366406
Intelligent Career Guidance System.pptxIntelligent Career Guidance System.pptx
Intelligent Career Guidance System.pptx
Anonymous366406226 views
Artificial Intelligence for Automating Data Analysis by Manuel Martín
Artificial Intelligence for Automating Data AnalysisArtificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data Analysis
Manuel Martín2.5K views
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION by IJDKP
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTIONCATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
IJDKP3 views
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION by IJDKP
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTIONCATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
IJDKP4 views
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION by IJDKP
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTIONCATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION
IJDKP13 views
Machine learning for sensor Data Analytics by MATLABISRAEL
Machine learning for sensor Data AnalyticsMachine learning for sensor Data Analytics
Machine learning for sensor Data Analytics
MATLABISRAEL835 views
IRJET- Comparison of Classification Algorithms using Machine Learning by IRJET Journal
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine Learning
IRJET Journal107 views
Getting Started with Azure AutoML by Vivek Raja P S
Getting Started with Azure AutoMLGetting Started with Azure AutoML
Getting Started with Azure AutoML
Vivek Raja P S164 views
Optimized Feature Extraction and Actionable Knowledge Discovery for Customer ... by Eswar Publications
Optimized Feature Extraction and Actionable Knowledge Discovery for Customer ...Optimized Feature Extraction and Actionable Knowledge Discovery for Customer ...
Optimized Feature Extraction and Actionable Knowledge Discovery for Customer ...
MLSEV Virtual. ML Platformization and AutoML in the Enterprise by BigML, Inc
MLSEV Virtual. ML Platformization and AutoML in the EnterpriseMLSEV Virtual. ML Platformization and AutoML in the Enterprise
MLSEV Virtual. ML Platformization and AutoML in the Enterprise
BigML, Inc390 views
Decision Making Framework in e-Business Cloud Environment Using Software Metr... by ijitjournal
Decision Making Framework in e-Business Cloud Environment Using Software Metr...Decision Making Framework in e-Business Cloud Environment Using Software Metr...
Decision Making Framework in e-Business Cloud Environment Using Software Metr...
ijitjournal122 views
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis by IRJET Journal
IRJET-	 Fault Detection and Prediction of Failure using Vibration AnalysisIRJET-	 Fault Detection and Prediction of Failure using Vibration Analysis
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET Journal54 views
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp... by Ed Fernandez
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Ed Fernandez4.8K views
Innovation at the Edge_Final by Chris Waller
Innovation at the Edge_FinalInnovation at the Edge_Final
Innovation at the Edge_Final
Chris Waller502 views
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris Waller by Pistoia Alliance
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris WallerPistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris Waller
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris Waller
Pistoia Alliance712 views
IRJET- Instant Exam Paper Generator by IRJET Journal
IRJET- Instant Exam Paper GeneratorIRJET- Instant Exam Paper Generator
IRJET- Instant Exam Paper Generator
IRJET Journal17 views

More from dgarijo

SOMEF: a metadata extraction framework from software documentation by
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationdgarijo
121 views7 slides
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data by
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Datadgarijo
584 views13 slides
Capturing Context in Scientific Experiments: Towards Computer-Driven Science by
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Sciencedgarijo
551 views54 slides
WIDOCO: A Wizard for Documenting Ontologies by
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologiesdgarijo
1.2K views12 slides
Towards Automating Data Narratives by
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narrativesdgarijo
920 views23 slides
Automated Hypothesis Testing with Large Scale Scientific Workflows by
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflowsdgarijo
586 views44 slides

More from dgarijo(20)

SOMEF: a metadata extraction framework from software documentation by dgarijo
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
dgarijo121 views
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data by dgarijo
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
dgarijo584 views
Capturing Context in Scientific Experiments: Towards Computer-Driven Science by dgarijo
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
dgarijo551 views
WIDOCO: A Wizard for Documenting Ontologies by dgarijo
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
dgarijo1.2K views
Towards Automating Data Narratives by dgarijo
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
dgarijo920 views
Automated Hypothesis Testing with Large Scale Scientific Workflows by dgarijo
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
dgarijo586 views
OntoSoft: A Distributed Semantic Registry for Scientific Software by dgarijo
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
dgarijo919 views
OEG tools for supporting Ontology Engineering by dgarijo
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
dgarijo289 views
Software Metadata: Describing "dark software" in GeoSciences by dgarijo
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
dgarijo901 views
Reproducibility Using Semantics: An Overview by dgarijo
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
dgarijo890 views
PhD Thesis: Mining abstractions in scientific workflows by dgarijo
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
dgarijo1.8K views
Publicación de datos y métodos científicos en investigación by dgarijo
Publicación de datos y métodos científicos en investigaciónPublicación de datos y métodos científicos en investigación
Publicación de datos y métodos científicos en investigación
dgarijo820 views
EDBT 2015: Summer School Overview by dgarijo
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overview
dgarijo608 views
Similarity in Wikipedia Articles (EDBT Summer School) by dgarijo
Similarity in Wikipedia Articles (EDBT Summer School)Similarity in Wikipedia Articles (EDBT Summer School)
Similarity in Wikipedia Articles (EDBT Summer School)
dgarijo790 views
Semantic web 101: Benefits for geologists by dgarijo
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
dgarijo551 views
Is preserving data enough? Towards the preservation of scientific methods by dgarijo
Is preserving data enough? Towards the preservation of scientific methods Is preserving data enough? Towards the preservation of scientific methods
Is preserving data enough? Towards the preservation of scientific methods
dgarijo899 views
Creating abstractions from scientific workflows: PhD symposium 2015 by dgarijo
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
dgarijo526 views
Towards Workflow Ecosystems Through Semantic and Standard Representations by dgarijo
Towards Workflow Ecosystems Through Semantic and Standard RepresentationsTowards Workflow Ecosystems Through Semantic and Standard Representations
Towards Workflow Ecosystems Through Semantic and Standard Representations
dgarijo877 views
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users by dgarijo
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline UsersWorkflow Reuse in Practice: A Study of Neuroimaging Pipeline Users
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users
dgarijo934 views
Frag Flow: Automated Fragment Detection in Scientific Workflows by dgarijo
Frag Flow: Automated Fragment Detection in Scientific WorkflowsFrag Flow: Automated Fragment Detection in Scientific Workflows
Frag Flow: Automated Fragment Detection in Scientific Workflows
dgarijo928 views

Recently uploaded

OOPs - JAVA Quick Reference.pdf by
OOPs - JAVA Quick Reference.pdfOOPs - JAVA Quick Reference.pdf
OOPs - JAVA Quick Reference.pdfArthyR3
80 views66 slides
Creative Restart 2023: Christophe Wechsler - From the Inside Out: Cultivating... by
Creative Restart 2023: Christophe Wechsler - From the Inside Out: Cultivating...Creative Restart 2023: Christophe Wechsler - From the Inside Out: Cultivating...
Creative Restart 2023: Christophe Wechsler - From the Inside Out: Cultivating...Taste
39 views34 slides
Research Methodology (M. Pharm, IIIrd Sem.)_UNIT_IV_CPCSEA Guidelines for Lab... by
Research Methodology (M. Pharm, IIIrd Sem.)_UNIT_IV_CPCSEA Guidelines for Lab...Research Methodology (M. Pharm, IIIrd Sem.)_UNIT_IV_CPCSEA Guidelines for Lab...
Research Methodology (M. Pharm, IIIrd Sem.)_UNIT_IV_CPCSEA Guidelines for Lab...RAHUL PAL
53 views26 slides
Introduction to Physiotherapy and Electrotherapy by
Introduction to Physiotherapy and ElectrotherapyIntroduction to Physiotherapy and Electrotherapy
Introduction to Physiotherapy and ElectrotherapySreeraj S R
82 views10 slides
NodeJS and ExpressJS.pdf by
NodeJS and ExpressJS.pdfNodeJS and ExpressJS.pdf
NodeJS and ExpressJS.pdfArthyR3
60 views17 slides
DISTILLATION.pptx by
DISTILLATION.pptxDISTILLATION.pptx
DISTILLATION.pptxAnupkumar Sharma
87 views47 slides

Recently uploaded(20)

OOPs - JAVA Quick Reference.pdf by ArthyR3
OOPs - JAVA Quick Reference.pdfOOPs - JAVA Quick Reference.pdf
OOPs - JAVA Quick Reference.pdf
ArthyR380 views
Creative Restart 2023: Christophe Wechsler - From the Inside Out: Cultivating... by Taste
Creative Restart 2023: Christophe Wechsler - From the Inside Out: Cultivating...Creative Restart 2023: Christophe Wechsler - From the Inside Out: Cultivating...
Creative Restart 2023: Christophe Wechsler - From the Inside Out: Cultivating...
Taste39 views
Research Methodology (M. Pharm, IIIrd Sem.)_UNIT_IV_CPCSEA Guidelines for Lab... by RAHUL PAL
Research Methodology (M. Pharm, IIIrd Sem.)_UNIT_IV_CPCSEA Guidelines for Lab...Research Methodology (M. Pharm, IIIrd Sem.)_UNIT_IV_CPCSEA Guidelines for Lab...
Research Methodology (M. Pharm, IIIrd Sem.)_UNIT_IV_CPCSEA Guidelines for Lab...
RAHUL PAL53 views
Introduction to Physiotherapy and Electrotherapy by Sreeraj S R
Introduction to Physiotherapy and ElectrotherapyIntroduction to Physiotherapy and Electrotherapy
Introduction to Physiotherapy and Electrotherapy
Sreeraj S R82 views
NodeJS and ExpressJS.pdf by ArthyR3
NodeJS and ExpressJS.pdfNodeJS and ExpressJS.pdf
NodeJS and ExpressJS.pdf
ArthyR360 views
Interaction of microorganisms with vascular plants.pptx by MicrobiologyMicro
Interaction of microorganisms with vascular plants.pptxInteraction of microorganisms with vascular plants.pptx
Interaction of microorganisms with vascular plants.pptx
Artificial Intelligence and The Sustainable Development Goals (SDGs) Adoption... by BC Chew
Artificial Intelligence and The Sustainable Development Goals (SDGs) Adoption...Artificial Intelligence and The Sustainable Development Goals (SDGs) Adoption...
Artificial Intelligence and The Sustainable Development Goals (SDGs) Adoption...
BC Chew55 views
Peripheral artery diseases by Dr. Garvit.pptx by garvitnanecha
Peripheral artery diseases by Dr. Garvit.pptxPeripheral artery diseases by Dr. Garvit.pptx
Peripheral artery diseases by Dr. Garvit.pptx
garvitnanecha147 views
GSoC 2024 .pdf by ShabNaz2
GSoC 2024 .pdfGSoC 2024 .pdf
GSoC 2024 .pdf
ShabNaz250 views
ANGULARJS.pdf by ArthyR3
ANGULARJS.pdfANGULARJS.pdf
ANGULARJS.pdf
ArthyR354 views
Women From 1850 To 1950 Essay by Amy Williams
Women From 1850 To 1950 EssayWomen From 1850 To 1950 Essay
Women From 1850 To 1950 Essay
Amy Williams41 views
The Future of Micro-credentials: Is Small Really Beautiful? by Mark Brown
The Future of Micro-credentials:  Is Small Really Beautiful?The Future of Micro-credentials:  Is Small Really Beautiful?
The Future of Micro-credentials: Is Small Really Beautiful?
Mark Brown131 views
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx by Niranjan Chavan
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptxGuidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx
Niranjan Chavan48 views
Education of marginalized and socially disadvantages segments.pptx by GarimaBhati5
Education of marginalized and socially disadvantages segments.pptxEducation of marginalized and socially disadvantages segments.pptx
Education of marginalized and socially disadvantages segments.pptx
GarimaBhati559 views
JRN 362 - Lecture Twenty-Three (Epilogue) by Rich Hanley
JRN 362 - Lecture Twenty-Three (Epilogue)JRN 362 - Lecture Twenty-Three (Epilogue)
JRN 362 - Lecture Twenty-Three (Epilogue)
Rich Hanley46 views

Towards Human-Guided Machine Learning - IUI 2019

  • 1. Towards Human-Guided Machine Learning Yolanda Gil1, James Honaker2, Shikhar Gupta1, Yibo Ma1, Vito D’Orazio3, Daniel Garijo1, Shruti Gadewar1, Qifan Yang1 and Neda Jahanshad1 1University of Southern California 2University of Texas at Dallas 3Harvard University https://w3id.org/people/dgarijo @dgarijov dgarijo@isi.edu Intelligent User Interfaces (IUI19), March 18th, 2019 Information Sciences Institute
  • 2. Rising Popularity of AutoML Systems Intelligent User Interfaces, March 18th, 2019 2 auto-sklearn Auto-WEKA AlphaZero
  • 3. Anatomy of an AutoML System Intelligent User Interfaces, March 18th, 2019 3 Auto ML Predictions Training data Features: Train ML algorithm and one or more of the following: • Extract features from data • Data preparation (imputation, encoding, etc.) • Feature selection • Hyperparameter optimization • Ensembling of solutions Trained Model Test data
  • 4. Limitations of AutoML systems Intelligent User Interfaces, March 18th, 2019 4 Training process is not transparent Trained models are difficult to customize Auto ML Predictions Training data Trained Model Test data
  • 5. Human-Guided Machine Learning (HGML) Intelligent User Interfaces, March 18th, 2019 5 Auto ML Predictions Training data Trained Model Test data Domain expert • Domain users don’t like black boxes • They need to understand and modify the process to train a model with their expertise • Modify features (remove known biases) • Guide hyper parameter search • …. Interface
  • 6. Contributions of our work Intelligent User Interfaces, March 18th, 2019 6 • AutoML system and user interface that supports basic HGML interactions • A task analysis of HGML that enumerates discrete user tasks to guide AutoML systems • Characterizations of two significant studies in neuroscience and political sciences • Requirements for HGML from AutoML system and user interface • An assessment of how those requirements could be accommodated by AutoML systems
  • 7. AutoML System: P4ML Intelligent User Interfaces, March 18th, 2019 7 • Extract features of interest from data (text, video, audio…) • Builds a solution with the types of model and other steps to include (e.g. imputation, encoding, etc.) • Perform a hyperparameter search to improve the results • Generate ensembles with the top-ranked models. Phased Performance-Based Pipeline Planner Predictions Top Ranked Solutions Test data Training data Problem description Evaluation metric HashingVectorizer -> LabelEncoder -> LogisticRegressionCV (0.9489) CountVectorizer -> LabelEncoder -> BernoulliNB (0.9486) TfidfVectorizer -> LabelEncoder -> AdaBoostClassifier (0.9460)
  • 8. UI for AutoML System Interaction: TwoRavens Intelligent User Interfaces, March 18th, 2019 8 • Statistical summaries of variables and variable exploration • Integration with AutoML system (P4ML) • Specify ML problem of interest • Explore solution results returned by AutoML system
  • 9. HGML Task Analysis Intelligent User Interfaces, March 18th, 2019 9 • Top-down analysis • Data Use • Selection of variables (features) and instances • Model Development • Model selection and tuning • Model Interpretation • Result comparison • Bottom up analysis • Neuroscience: ENIGMA neurosciences consortium • Political sciences: Seminal paper on civil war onset
  • 10. Overview of task analysis (top down) Intelligent User Interfaces, March 18th, 2019 10
  • 11. Overview of task analysis (bottom up) Intelligent User Interfaces, March 18th, 2019 11 Neuroscience Political Sciences Main task results: • Feature selection and generation • Model type selection • Model configuration • Quantities of interest and metrics
  • 12. UI and AutoML Requirements Intelligent User Interfaces, March 18th, 2019 12 Combined top-bottom and bottom up analyses to identify requirements for both AutoML and user interface
  • 13. Predictions Accommodating HGML requirements – AutoML system Intelligent User Interfaces, March 18th, 2019 13 Phased Performance-Based Pipeline Planner Top Ranked Solutions Test data Training data Problem description Evaluation metric Requirements { "include_model":["LinearSVC","LogisticRegression","DecisionTreeClassifier"], "exclude_model":[], "include_feature_generarion":["tfidfVectorizer"], "use_imputation_method":"median", "include_variables":[], "exclude_variables":[], "include_instances":[], "exclude_instances":[], "define_variable_weight":[{"variable":"","weight":},{}], "select_training_and_test_data":{"training_data": [],"testing_data": [],"cross_validation": "k-fold"}, … }
  • 14. Accommodating HGML requirements - UI Intelligent User Interfaces, March 18th, 2019 14 • Extensions are needed for: • Filtering variables and instances (subpopulations) • Comparison and exploration of solutions • Creation of variables from existing ones Compare, filter, explore, transform
  • 15. Conclusions and Future Work Intelligent User Interfaces, March 18th, 2019 15 • Proliferation of AutoML systems • AutoML solutions may not take into consideration domain expertise • Interaction is needed: Human Guided Machine Learning • Our contributions: • Baseline HGML UI and AutoML system integration • A task analysis of HGML • Characterizations of two significant studies in neuroscience and political sciences • Requirements for HGML based on task analysis • An assessment of how those requirements could be accommodated by AutoML systems • Future work: • Extend our baseline system with the requirements identified in this paper
  • 16. Towards Human-Guided Machine Learning Yolanda Gil1, James Honaker2, Shikhar Gupta1, Yibo Ma1, Vito D’Orazio3, Daniel Garijo1, Shruti Gadewar1, Qifan Yang1 and Neda Jahanshad1 1University of Southern California 2University of Texas at Dallas 3Harvard University https://w3id.org/people/dgarijo @dgarijov dgarijo@isi.edu Intelligent User Interfaces (IUI19), March 18th, 2019 Information Sciences Institute

Editor's Notes

  1. We view human-guided machine learning (HGML) as a new area of research focused on how to assist users to use domain knowledge to guide an AutoML system to select machine learning algorithms and find multi-step solutions.