SlideShare a Scribd company logo
1 of 44
Machine Learning with
Incomplete Information
 A Ph.D. Dissertation Defense


Department of Computer Science
 University of Nebraska-Lincoln
            Nov 2011
Machine Learning: program the computer
             to be able to learn from examples.
  Many applications have incomplete data.




11/30/2011                Yaling Zheng            2
Outline
• Machine Learning
• Machine Learning with Incomplete
  Information
      –Active Learning
      –Budgeted Learning
• Our work of active learning
• Our work of budgeted learning
11/30/2011           Yaling Zheng    3
Machine Learning
  • Design of computer algorithms that derive
    general patterns, regularities, and rules from
    training data          New cancer patient description
  Labeled
  Training
    Data               Machine
                                             Patient cancer
                        learning
(cancer patient                             subtype classifier
descriptions with      algorithm
examination results)

                                            Predicted result
  11/30/2011                 Yaling Zheng                        4
Applications of Machine Learning
                   Algorithms
•   Medical Diagnosis
•   News Articles Topic Spotting
•   Email Spam Filtering
•   Face Detection
•   Credit Card Fraudulence Detection
•   Weather Prediction


11/30/2011             Yaling Zheng        5
Machine Learning with Incomplete
                Information
• Basic machine learning algorithms:
     Decision tree, Bayesian network,
     Support vector machine, Logistic regression, …

• What if we must pay for labels and/or features of
  training data?
      – Active Learning (learn a good classifier/hypothesis
        using as little cost as possible)
      – Budgeted Learning (learn as good
        classifier/hypothesis as possible using the given
        budget)

11/30/2011                   Yaling Zheng                     6
Active Learning Example




11/30/2011            Yaling Zheng     7
Budgeted Learning Example
($2 million was allocated to develop a diagnostic classifier for cancer subtypes)

     Diagnostic Diagnostic         Diagnostic          …          Cancer
       Test 1      Test 2            Test 3                      subtype
    (per $2000) (per $1000)        (per $500)
         ?           ?                  ?              …              1
           ?             ?                ?            …              2
           ?             ?                ?            …              1
           ?             ?                ?            …              4
           ?             ?                ?            …              1
           ?             ?                ?            …              …
           ?             ?                ?            …              …
    11/30/2011                       Yaling Zheng                           8
Our Work In Active Learning
• Many AL algorithms assume a single, perfect
  labeler; thus they focus on instance selection
      – Human labelers make mistakes
      – multiple labelers can exist (Amazon Mechanical
        Turk)
      – labelers can ask for different costs
• Our algorithms
      – Shift focus from instance selection to labeler
        selection and ground truth estimation

11/30/2011                Yaling Zheng               9
Our Work In Budgeted Learning
• Many BL results assume overly simplistic
  probabilistic data model, such as naïve Bayes
• In reality, it is possible that the features of the
  training data are correlated with each other
• What if we know the dependencies of the
  features?
    – Can we learn a more accurate classifier?
    – Can we exploit the Bayesian network and the known
      labels?

11/30/2011               Yaling Zheng               10
Active Learning from
         Multiple Noisy Labelers
           with Varied Costs
                 (ALML)

11/30/2011         Yaling Zheng    11
Active Learning from Multiple Noisy
      Labelers with Varied Costs
• A training set { (x1, y1), …, (xn, yn) }
• {x1, …, xn} = X is the set of instances
• {y1, …, yn} = Y is the set of unknown labels of
  instances X
• O = {o1, …, oN} is a set of labelers
• ci is the cost of paying oi to label an instance
• ai is the unknown accuracy of oi
• Goal: Learn a hypothesis that generalizes well
  while spending as little cost as possible on
  queries to labelers
 11/30/2011           Yaling Zheng             12
Contributions for ALML
• Proposed two algorithms IEAdjCost [Zheng
  et al. 2010] and wIEAdjCost
     – Rank labelers based on their adjusted costs
             = cost * multiplier (a value decided by labeler’s accuracy)
     – Give each chosen labeler a weight for voting
       according to their estimated accuracies
• These two algorithms significantly
  outperform existing algorithms IEThresh
  [Donmez et al. 2009] and Repeated
11/30/2011                          Yaling Zheng                       13
Budgeted Learning (BL)

     Of Naïve Bayes
Budgeted Learning
      – Labels of the training data are given
      – Features of the training data are available at a cost
• Contributions:
      – Exp3CR, Exp3C, FEL (adapted from algorithms for
        multi-arm bandit problem)
      – ABR2, RBR2, WBR2 (variations of Biased Robin
        based on second-order statistics)
      – Row (Instance) selectors:
             • Entropy (EN)
             • Error-Correction (EC)

11/30/2011                         Yaling Zheng             15
Conclusions for Our BL algorithms
• Compared to BR [Lizotte et al.
  2003], Random, and RSFL [Kapoor and Greiner
  2005]
     ABR2 with all row selectors, WBR2 and Exp3C with EC
     row selector, and FEL with UR (uniform random) row
     selector perform well
• EC row selector stands out for
  Random, BR, ABR2, and WBR2
• EN row selector stands out for RBR2 and Exp3C
11/30/2011                Yaling Zheng                16
Budgeted Learning of
              Bayesian Networks
                    (BLBN)


11/30/2011            Yaling Zheng   17
Bayesian Networks
            Visit                                                P(S=1)
             to                              Smoking                 0.7
            Asia

                                                                            S   P(B=1|S)
                                     Lung               Bronchitis
                                                                            0     0.3
       Tuberculosis                 Cancer
                                                                            1     0.7


                                                    T   L      P(ToB=1| T, L)
                    Tuberculosis                    0   0             0.1
                    or Bronchitis                   0   1             0.3
                                                    1   0             0.5
                                                    1   1             0.8
           X-Ray
           Result          Dyspnea                                                      18
11/30/2011                           Yaling Zheng
Naïve Bayesian
   A NB classifier can be viewed as a Bayesian network
   with a simple structure that has the label node as the
   parent node of all other feature nodes
                     Tuberculosis
                      or Cancer


                                          …
      Visit
       to
      Asia
                 Tuberculosis
                                          …
                                          …   Dyspnea


11/30/2011                 Yaling Zheng   …             19
Our Work
• Budgeted Learning of Bayesian networks under
  the setting that
   – the labels of the training data are known while the
     features of the training data have to be
     purchased, subject to an overall budget
   – The structure of the Bayesian network is given
     however the parameters of the Bayesian networks are
     unknown
• The goal is to learn as good BN as possible under
  the given budget
 11/30/2011              Yaling Zheng                20
Related Work
• Active Learning of Bayesian networks [Tong and
  Koller 2001]
      – the features of the training data are given while the
        labels of the training data are available at a cost
• Budgeted distribution learning of Bayesian networks
  [Li et al. 2010]
      – both the labels and features of the training data have
        to be purchased
• The choice of (instance, feature) pairs is based on
  learned parameters of the BN
11/30/2011                   Yaling Zheng                   21
Our Work
• Budgeted Learning on Bayesian networks
      – the labels of the training data are known while
        the features of the training data have to be
        purchased, subject to an overall budget


• The choice of (instance, feature) pairs is
  based on learned BN, the known labels, and
  the structure of the BN

11/30/2011                Yaling Zheng               22
Adapting Existing Algorithms
• Use BNs to provide improved probabilistic
  models to existing BL algorithms that typically
  use naïve Bayes
      – The adapted algorithms might learn a more
        accurate classifier
      – Improved probabilistic model might improve
        accuracies of objective functions to help the choice
        of (instance, feature) pairs


11/30/2011                  Yaling Zheng                  23
New Algorithms
• Take advantage of the BN, the known class label,
  and the structure of the BN

• Basis for new algorithms
      – (instance, feature) pair that leads to higher expected
        relative probability gain (ERPG) leads to a better BN
      – (instance, feature) pair that leads to more d-separations
        from the label node has a bigger influence on other
        nodes
      – Instantiation of nodes in the Markov blanket of the label
        node make the label node independent from the
        remaining nodes
11/30/2011                   Yaling Zheng                    24
BLBN: Adapting Biased Robin




11/30/2011          Yaling Zheng      25
New BLBN Algorithm: MERPG
     A B C D E F          Label P(+ | xi) E(P(+ | xi ⋃ ? ))
X1   0 3 0 1 T ?          +     0.7       0.75

X2 ? 6 1 0 ? true +               0.9       0.95

X3 ? ? 3 1 ? ?            -

Choose the (instance, feature) pair that maximizes expected
relative probability gain
Assume F can be true or false
 E(P(+ | x1 ⋃ ? )) = P( F=true | x1 ) * P(+ | x1 , F=true)
11/30/2011
                    + P(F=false| x1) * P( + | x1, F = false) 26
                           Yaling Zheng
MERPGDSEP
• MERPGDSEP breaks ties of MERPG by choosing
  the purchase that leads to maximum increase of
  d-separations    from     the    label   node
  (NumIncreaseDseps)
• Why?
• Because instantiation of those d-separated nodes
  in the future will not affect the label
  node, therefore the bigger NumIncreaseDseps
  is, the bigger influence of this purchase to other
  nodes
11/30/2011             Yaling Zheng                27
Example of d-Separation (NumIncreaseDseps)
                   Visit
                    to                             Smoking
                   Asia


                                               Lung      Bronchitis
               Tuberculosis                   Cancer


                       Tuberculosis
                       or Bronchitis


              X-Ray
                                      Dyspnea
              Result                                                  28
 11/30/2011                    Yaling Zheng
Making NumIncreaseDseps as a
   weighting factor combined with
 MERPG (MERPGDSEPW1 and MERPGDSEPW2)
• Define weighting factor LOGFACT as follows:
     If number of increased d-separations >=0
           LOGFACT= ln(e + NumIncreaseDseps)
     Else
           LOGFACT= 1/ln(e + NumIncreaseDseps)
• Choose the (instance, feature) pair that
  maximizes
11/30/2011
           LOGFACT * ERPG-value
                    Yaling Zheng            29
Markov Blanket Filter

• Choosing (instance, feature) pair to purchase
  from a subset of (instance, feature) pairs whose
  features fall in the Markov blanket of the label
  node
• Why?
• Because the instantiation of the entire Markov
  blanket nodes make the other nodes d-separated
  from the label node

 11/30/2011           Yaling Zheng             30
Example of Markov Blanket




11/30/2011             Yaling Zheng      31
Experimental Setup
• Chose 5 learning problems whose structures are
  provided by Norsys Net Library
• Generated Instances for these 5 learning problems
  based on distribution of the Bayesian networks
• Set the Initial distributions (CPTs) of these learning
  problems to be uniform




 11/30/2011              Yaling Zheng               32
Experimental Setup
• Test on Random, Round Robin, Biased Robin, Single
  Feature Look-ahead, MERPG, MERPGDSEP,
  MERPGDSEPW1, MERPGDSEPW2 on naïve Bayes,
  Bayesian network, on Bayesian network with
  Markov blanket filter
• Did 10-fold cross validation
• Set the budget for each learning problem as the
  number of purchases that the best algorithm
  approximately reaches its Baseline
 11/30/2011           Yaling Zheng             33
Research Question 1:
How much improvement can we get by
 changing the base learner from Naïve
     Bayes to Bayesian network?

             Answer:
             In many cases there is a significant
             improvement.

11/30/2011                 Yaling Zheng             34
After 100 purchases, MERPG (BN)
 significantly outperforms MERPG (NB)




11/30/2011       Yaling Zheng       35
Research Question 2:
          Does Markov blanket filter help?
Answer: Yes, in many cases there is a significant improvement.




  11/30/2011                 Yaling Zheng                    36
Research Question 3:
     Which method performs the best?

Answer:
MERPGDSEP with a Markov blanket filter on
Bayesian network (a.k.a. MBdsep) performs the
best
(by counting the number of significant wins and
losses of the algorithm to any other algorithm on
each learning problem via Wilcoxon signed rank
test)

11/30/2011            Yaling Zheng              37
Overall, MBdsep performs the best




11/30/2011     Yaling Zheng      38
Main Conclusions of Our BLBN
                     Algorithms
• Done experiments on 5 learning problems
  from Norsys Net Library
1) Learning a BN outperforms learning a NB
2) Markov blanket filter does help
3) MERPGDSEP on Bayesian network with
   Markov blanket filter performs the best



11/30/2011               Yaling Zheng
                                             39
Publications
• “Active Learning from Multiple Noisy Labelers
  with Varied Costs” in Proceedings of the Tenth
  IEEE International Conference on Data Mining,
  pages 639-648 in Dec 2010
• “Bandit-Based Algorithms for Budgeted
  Learning” in Proceedings of the Seventh IEEE
  International Conference on Data Mining in
  Oct 2007

11/30/2011           Yaling Zheng              40
Accepted and Submitted Papers
• “New Algorithms for Budgeted Learning” was
  accepted with revisions by Machine Learning
• “Active Learning from Multiple Noisy Labelers
  with Varied Costs” was submitted to Machine
  Learning in July 2011 and now it is under
  review



11/30/2011           Yaling Zheng             41
11/30/2011   Yaling Zheng   42
Examples of d-separations in BNs

                                                                 Label
                      Label
                                                                 node
   A                  node


              V                                                      V
V is instantiated
A is d-separated from label node
  A is d-separated from the label =>
                                                                       A
  purchasing A in the future would not add            V is instantiated
  information about the label
11/30/2011                             Yaling Zheng   A is d-separated of label node   43
Examples of d-separations in BNs
Label
node             A         Neither V is
                           instantiated, nor any of V’s
                           descendants
             V             C, D, E, F, H is instantiated

                           A is d-separated from label
     C       D       E     node


11/30/2011
                 F              H
                         Yaling Zheng                44

More Related Content

Similar to Zheng defense1129

Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction CS, NcState
 
SBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resourcesSBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resourcesMike Hucka
 
The application of artificial intelligence
The application of artificial intelligenceThe application of artificial intelligence
The application of artificial intelligencePallavi Vashistha
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsElena Sügis
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
 
Machine Learning: Learning with data
Machine Learning: Learning with dataMachine Learning: Learning with data
Machine Learning: Learning with dataONE Talks
 
One talk Machine Learning
One talk Machine LearningOne talk Machine Learning
One talk Machine LearningONE Talks
 
20141003.journal club
20141003.journal club20141003.journal club
20141003.journal clubHayaru SHOUNO
 
Molecular autoencoder
Molecular autoencoderMolecular autoencoder
Molecular autoencoderDan Elton
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfAnkita Tiwari
 
Random Features Strengthen Graph Neural Networks
Random Features Strengthen Graph Neural NetworksRandom Features Strengthen Graph Neural Networks
Random Features Strengthen Graph Neural Networksjoisino
 
University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012Brock University
 
SBML: What Is It About?
SBML: What Is It About?SBML: What Is It About?
SBML: What Is It About?Mike Hucka
 
Recent developments in the world of SBML (the Systems Biology Markup Language)
Recent developments in the world of SBML (the Systems Biology Markup Language) Recent developments in the world of SBML (the Systems Biology Markup Language)
Recent developments in the world of SBML (the Systems Biology Markup Language) Mike Hucka
 
A Profile of Today's SBML-Compatible Software
A Profile of Today's SBML-Compatible SoftwareA Profile of Today's SBML-Compatible Software
A Profile of Today's SBML-Compatible SoftwareMike Hucka
 
Deep behavioral phenotyping in functional MRI for cognitive mapping of the hu...
Deep behavioral phenotyping in functional MRI for cognitive mapping of the hu...Deep behavioral phenotyping in functional MRI for cognitive mapping of the hu...
Deep behavioral phenotyping in functional MRI for cognitive mapping of the hu...Ana Luísa Pinho
 
Natural-Inspired_Amany_Final.pptx
Natural-Inspired_Amany_Final.pptxNatural-Inspired_Amany_Final.pptx
Natural-Inspired_Amany_Final.pptxamanyarafa1
 

Similar to Zheng defense1129 (20)

Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction
 
SBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resourcesSBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resources
 
The application of artificial intelligence
The application of artificial intelligenceThe application of artificial intelligence
The application of artificial intelligence
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in Bioinformatics
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
Machine Learning: Learning with data
Machine Learning: Learning with dataMachine Learning: Learning with data
Machine Learning: Learning with data
 
One talk Machine Learning
One talk Machine LearningOne talk Machine Learning
One talk Machine Learning
 
20141003.journal club
20141003.journal club20141003.journal club
20141003.journal club
 
Biological Network Inference via Gaussian Graphical Models
Biological Network Inference via Gaussian Graphical ModelsBiological Network Inference via Gaussian Graphical Models
Biological Network Inference via Gaussian Graphical Models
 
Molecular autoencoder
Molecular autoencoderMolecular autoencoder
Molecular autoencoder
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdf
 
Seminar nov2017
Seminar nov2017Seminar nov2017
Seminar nov2017
 
Random Features Strengthen Graph Neural Networks
Random Features Strengthen Graph Neural NetworksRandom Features Strengthen Graph Neural Networks
Random Features Strengthen Graph Neural Networks
 
University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012
 
SBML: What Is It About?
SBML: What Is It About?SBML: What Is It About?
SBML: What Is It About?
 
Recent developments in the world of SBML (the Systems Biology Markup Language)
Recent developments in the world of SBML (the Systems Biology Markup Language) Recent developments in the world of SBML (the Systems Biology Markup Language)
Recent developments in the world of SBML (the Systems Biology Markup Language)
 
A Profile of Today's SBML-Compatible Software
A Profile of Today's SBML-Compatible SoftwareA Profile of Today's SBML-Compatible Software
A Profile of Today's SBML-Compatible Software
 
Deep behavioral phenotyping in functional MRI for cognitive mapping of the hu...
Deep behavioral phenotyping in functional MRI for cognitive mapping of the hu...Deep behavioral phenotyping in functional MRI for cognitive mapping of the hu...
Deep behavioral phenotyping in functional MRI for cognitive mapping of the hu...
 
Natural-Inspired_Amany_Final.pptx
Natural-Inspired_Amany_Final.pptxNatural-Inspired_Amany_Final.pptx
Natural-Inspired_Amany_Final.pptx
 

Recently uploaded

Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 

Recently uploaded (20)

Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 

Zheng defense1129

  • 1. Machine Learning with Incomplete Information A Ph.D. Dissertation Defense Department of Computer Science University of Nebraska-Lincoln Nov 2011
  • 2. Machine Learning: program the computer to be able to learn from examples. Many applications have incomplete data. 11/30/2011 Yaling Zheng 2
  • 3. Outline • Machine Learning • Machine Learning with Incomplete Information –Active Learning –Budgeted Learning • Our work of active learning • Our work of budgeted learning 11/30/2011 Yaling Zheng 3
  • 4. Machine Learning • Design of computer algorithms that derive general patterns, regularities, and rules from training data New cancer patient description Labeled Training Data Machine Patient cancer learning (cancer patient subtype classifier descriptions with algorithm examination results) Predicted result 11/30/2011 Yaling Zheng 4
  • 5. Applications of Machine Learning Algorithms • Medical Diagnosis • News Articles Topic Spotting • Email Spam Filtering • Face Detection • Credit Card Fraudulence Detection • Weather Prediction 11/30/2011 Yaling Zheng 5
  • 6. Machine Learning with Incomplete Information • Basic machine learning algorithms: Decision tree, Bayesian network, Support vector machine, Logistic regression, … • What if we must pay for labels and/or features of training data? – Active Learning (learn a good classifier/hypothesis using as little cost as possible) – Budgeted Learning (learn as good classifier/hypothesis as possible using the given budget) 11/30/2011 Yaling Zheng 6
  • 8. Budgeted Learning Example ($2 million was allocated to develop a diagnostic classifier for cancer subtypes) Diagnostic Diagnostic Diagnostic … Cancer Test 1 Test 2 Test 3 subtype (per $2000) (per $1000) (per $500) ? ? ? … 1 ? ? ? … 2 ? ? ? … 1 ? ? ? … 4 ? ? ? … 1 ? ? ? … … ? ? ? … … 11/30/2011 Yaling Zheng 8
  • 9. Our Work In Active Learning • Many AL algorithms assume a single, perfect labeler; thus they focus on instance selection – Human labelers make mistakes – multiple labelers can exist (Amazon Mechanical Turk) – labelers can ask for different costs • Our algorithms – Shift focus from instance selection to labeler selection and ground truth estimation 11/30/2011 Yaling Zheng 9
  • 10. Our Work In Budgeted Learning • Many BL results assume overly simplistic probabilistic data model, such as naïve Bayes • In reality, it is possible that the features of the training data are correlated with each other • What if we know the dependencies of the features? – Can we learn a more accurate classifier? – Can we exploit the Bayesian network and the known labels? 11/30/2011 Yaling Zheng 10
  • 11. Active Learning from Multiple Noisy Labelers with Varied Costs (ALML) 11/30/2011 Yaling Zheng 11
  • 12. Active Learning from Multiple Noisy Labelers with Varied Costs • A training set { (x1, y1), …, (xn, yn) } • {x1, …, xn} = X is the set of instances • {y1, …, yn} = Y is the set of unknown labels of instances X • O = {o1, …, oN} is a set of labelers • ci is the cost of paying oi to label an instance • ai is the unknown accuracy of oi • Goal: Learn a hypothesis that generalizes well while spending as little cost as possible on queries to labelers 11/30/2011 Yaling Zheng 12
  • 13. Contributions for ALML • Proposed two algorithms IEAdjCost [Zheng et al. 2010] and wIEAdjCost – Rank labelers based on their adjusted costs = cost * multiplier (a value decided by labeler’s accuracy) – Give each chosen labeler a weight for voting according to their estimated accuracies • These two algorithms significantly outperform existing algorithms IEThresh [Donmez et al. 2009] and Repeated 11/30/2011 Yaling Zheng 13
  • 14. Budgeted Learning (BL) Of Naïve Bayes
  • 15. Budgeted Learning – Labels of the training data are given – Features of the training data are available at a cost • Contributions: – Exp3CR, Exp3C, FEL (adapted from algorithms for multi-arm bandit problem) – ABR2, RBR2, WBR2 (variations of Biased Robin based on second-order statistics) – Row (Instance) selectors: • Entropy (EN) • Error-Correction (EC) 11/30/2011 Yaling Zheng 15
  • 16. Conclusions for Our BL algorithms • Compared to BR [Lizotte et al. 2003], Random, and RSFL [Kapoor and Greiner 2005] ABR2 with all row selectors, WBR2 and Exp3C with EC row selector, and FEL with UR (uniform random) row selector perform well • EC row selector stands out for Random, BR, ABR2, and WBR2 • EN row selector stands out for RBR2 and Exp3C 11/30/2011 Yaling Zheng 16
  • 17. Budgeted Learning of Bayesian Networks (BLBN) 11/30/2011 Yaling Zheng 17
  • 18. Bayesian Networks Visit P(S=1) to Smoking 0.7 Asia S P(B=1|S) Lung Bronchitis 0 0.3 Tuberculosis Cancer 1 0.7 T L P(ToB=1| T, L) Tuberculosis 0 0 0.1 or Bronchitis 0 1 0.3 1 0 0.5 1 1 0.8 X-Ray Result Dyspnea 18 11/30/2011 Yaling Zheng
  • 19. Naïve Bayesian A NB classifier can be viewed as a Bayesian network with a simple structure that has the label node as the parent node of all other feature nodes Tuberculosis or Cancer … Visit to Asia Tuberculosis … … Dyspnea 11/30/2011 Yaling Zheng … 19
  • 20. Our Work • Budgeted Learning of Bayesian networks under the setting that – the labels of the training data are known while the features of the training data have to be purchased, subject to an overall budget – The structure of the Bayesian network is given however the parameters of the Bayesian networks are unknown • The goal is to learn as good BN as possible under the given budget 11/30/2011 Yaling Zheng 20
  • 21. Related Work • Active Learning of Bayesian networks [Tong and Koller 2001] – the features of the training data are given while the labels of the training data are available at a cost • Budgeted distribution learning of Bayesian networks [Li et al. 2010] – both the labels and features of the training data have to be purchased • The choice of (instance, feature) pairs is based on learned parameters of the BN 11/30/2011 Yaling Zheng 21
  • 22. Our Work • Budgeted Learning on Bayesian networks – the labels of the training data are known while the features of the training data have to be purchased, subject to an overall budget • The choice of (instance, feature) pairs is based on learned BN, the known labels, and the structure of the BN 11/30/2011 Yaling Zheng 22
  • 23. Adapting Existing Algorithms • Use BNs to provide improved probabilistic models to existing BL algorithms that typically use naïve Bayes – The adapted algorithms might learn a more accurate classifier – Improved probabilistic model might improve accuracies of objective functions to help the choice of (instance, feature) pairs 11/30/2011 Yaling Zheng 23
  • 24. New Algorithms • Take advantage of the BN, the known class label, and the structure of the BN • Basis for new algorithms – (instance, feature) pair that leads to higher expected relative probability gain (ERPG) leads to a better BN – (instance, feature) pair that leads to more d-separations from the label node has a bigger influence on other nodes – Instantiation of nodes in the Markov blanket of the label node make the label node independent from the remaining nodes 11/30/2011 Yaling Zheng 24
  • 25. BLBN: Adapting Biased Robin 11/30/2011 Yaling Zheng 25
  • 26. New BLBN Algorithm: MERPG A B C D E F Label P(+ | xi) E(P(+ | xi ⋃ ? )) X1 0 3 0 1 T ? + 0.7 0.75 X2 ? 6 1 0 ? true + 0.9 0.95 X3 ? ? 3 1 ? ? - Choose the (instance, feature) pair that maximizes expected relative probability gain Assume F can be true or false E(P(+ | x1 ⋃ ? )) = P( F=true | x1 ) * P(+ | x1 , F=true) 11/30/2011 + P(F=false| x1) * P( + | x1, F = false) 26 Yaling Zheng
  • 27. MERPGDSEP • MERPGDSEP breaks ties of MERPG by choosing the purchase that leads to maximum increase of d-separations from the label node (NumIncreaseDseps) • Why? • Because instantiation of those d-separated nodes in the future will not affect the label node, therefore the bigger NumIncreaseDseps is, the bigger influence of this purchase to other nodes 11/30/2011 Yaling Zheng 27
  • 28. Example of d-Separation (NumIncreaseDseps) Visit to Smoking Asia Lung Bronchitis Tuberculosis Cancer Tuberculosis or Bronchitis X-Ray Dyspnea Result 28 11/30/2011 Yaling Zheng
  • 29. Making NumIncreaseDseps as a weighting factor combined with MERPG (MERPGDSEPW1 and MERPGDSEPW2) • Define weighting factor LOGFACT as follows: If number of increased d-separations >=0 LOGFACT= ln(e + NumIncreaseDseps) Else LOGFACT= 1/ln(e + NumIncreaseDseps) • Choose the (instance, feature) pair that maximizes 11/30/2011 LOGFACT * ERPG-value Yaling Zheng 29
  • 30. Markov Blanket Filter • Choosing (instance, feature) pair to purchase from a subset of (instance, feature) pairs whose features fall in the Markov blanket of the label node • Why? • Because the instantiation of the entire Markov blanket nodes make the other nodes d-separated from the label node 11/30/2011 Yaling Zheng 30
  • 31. Example of Markov Blanket 11/30/2011 Yaling Zheng 31
  • 32. Experimental Setup • Chose 5 learning problems whose structures are provided by Norsys Net Library • Generated Instances for these 5 learning problems based on distribution of the Bayesian networks • Set the Initial distributions (CPTs) of these learning problems to be uniform 11/30/2011 Yaling Zheng 32
  • 33. Experimental Setup • Test on Random, Round Robin, Biased Robin, Single Feature Look-ahead, MERPG, MERPGDSEP, MERPGDSEPW1, MERPGDSEPW2 on naïve Bayes, Bayesian network, on Bayesian network with Markov blanket filter • Did 10-fold cross validation • Set the budget for each learning problem as the number of purchases that the best algorithm approximately reaches its Baseline 11/30/2011 Yaling Zheng 33
  • 34. Research Question 1: How much improvement can we get by changing the base learner from Naïve Bayes to Bayesian network? Answer: In many cases there is a significant improvement. 11/30/2011 Yaling Zheng 34
  • 35. After 100 purchases, MERPG (BN) significantly outperforms MERPG (NB) 11/30/2011 Yaling Zheng 35
  • 36. Research Question 2: Does Markov blanket filter help? Answer: Yes, in many cases there is a significant improvement. 11/30/2011 Yaling Zheng 36
  • 37. Research Question 3: Which method performs the best? Answer: MERPGDSEP with a Markov blanket filter on Bayesian network (a.k.a. MBdsep) performs the best (by counting the number of significant wins and losses of the algorithm to any other algorithm on each learning problem via Wilcoxon signed rank test) 11/30/2011 Yaling Zheng 37
  • 38. Overall, MBdsep performs the best 11/30/2011 Yaling Zheng 38
  • 39. Main Conclusions of Our BLBN Algorithms • Done experiments on 5 learning problems from Norsys Net Library 1) Learning a BN outperforms learning a NB 2) Markov blanket filter does help 3) MERPGDSEP on Bayesian network with Markov blanket filter performs the best 11/30/2011 Yaling Zheng 39
  • 40. Publications • “Active Learning from Multiple Noisy Labelers with Varied Costs” in Proceedings of the Tenth IEEE International Conference on Data Mining, pages 639-648 in Dec 2010 • “Bandit-Based Algorithms for Budgeted Learning” in Proceedings of the Seventh IEEE International Conference on Data Mining in Oct 2007 11/30/2011 Yaling Zheng 40
  • 41. Accepted and Submitted Papers • “New Algorithms for Budgeted Learning” was accepted with revisions by Machine Learning • “Active Learning from Multiple Noisy Labelers with Varied Costs” was submitted to Machine Learning in July 2011 and now it is under review 11/30/2011 Yaling Zheng 41
  • 42. 11/30/2011 Yaling Zheng 42
  • 43. Examples of d-separations in BNs Label Label node A node V V V is instantiated A is d-separated from label node A is d-separated from the label => A purchasing A in the future would not add V is instantiated information about the label 11/30/2011 Yaling Zheng A is d-separated of label node 43
  • 44. Examples of d-separations in BNs Label node A Neither V is instantiated, nor any of V’s descendants V C, D, E, F, H is instantiated A is d-separated from label C D E node 11/30/2011 F H Yaling Zheng 44

Editor's Notes

  1. Thank you for being here. My name is YalingZheng. Today I am going to present my dissertation “Machine Learning with Incomplete Information.
  2. Machine learning is to program the computer to be able to learn from examples.Machine learning programs detect patterns in data and adjust program actions accordingly.  For example, Facebook's News Feed changes according to the user's personal interactions with other users. If a user frequently tags a friend in photos, writes on his wall [or "likes" his links,] the News Feed will show more of that friend's activity due to presumed closeness. Here are some machine learning applications: face detection, email spam filtering, credit card fraudulence detection, recommendation of books, advertising, and medical diagnosis.Many applications have incomplete data. For example, in email spam filtering, the labels of these emails have to be purchased at a cost.In medical diagnosis, patient’s diagnostic results have to be purchased at a cost. We study machine learning with this version, that is, machine learning with incomplete information
  3. In this presentation, First, I am going to talk about machine learning and its applicationsAnd then, I am going to talk about machine learning with incomplete information, including active learning and budgeted learning. I will give examples of them, and also talk about the motivations of our work. After that, I will present our work of active learning and budgeted learning
  4. Machine learning studies the design of computer algorithms that derive general patterns, regularities, and rules from training data. Given labeled training data, for example, cancer patient descriptions with examination results, the machine learning algorithm generates a patient cancer subtype classifier. The advantage of this built classifier is: for any new cancer patient description, the classifier can predict a cancer subtype for the patient.
  5. Machine learning algorithms have a wide range of applications. Besides medical diagnosis, they are also applicable to news article topic spotting, which is to categorize the news articles into different subjects.Also email spam filtering, which is to identify whether a given email is a spam or not.And also face detection, which is to identify the presence of face in an image analysis. And lots more …
  6. We can use various of machine learning algorithms to learn a classifier, for example, decision tree, Bayesian network, Support vector machine, logistic regression, and so on. These machine learning algorithms work well when the training data is complete. If the labels and/or features of the training data are not given, these machine learning algorithms will learn nothing.What if we must pay for labels and/or features of the training data?We call it active learning if the goal is to learn a good classifier/hypothesis using as little cost as possible.We call it budgeted learning if the goal is to learn as good classifier/hypothesis as possible using the given budget.
  7. Here is an active learning example from Amazon Mechanic Turk website. For this scene labeling problem, the labels of the scenes are purchased from paid workers. The scenes can be categorized as man-made scene, or natural scene. An active learning algorithm needs to choose which scenes to label so that it can learn a classifier using as little cost as possible. For this example, an active learning algorithm may choose those scenes that are hard to categorize as natural scene or man-made scene. For example, the 5 fingers scene.
  8. The next I give an example of budgeted learning.A project was allocated $2 million to develop a diagnostic classifier for patient cancer subtypes. In this study, a pool of patients with known cancer subtype were available, as were various diagnostic tests could be performed, each with a diagnostic cost.For this example, we can see that each test is expensive, and there is an overall budget of $2 million, so a budgeted learning algorithm needs to carefully choose which patients do what kind of diagnostic test so that it can learn as good cancer subtype classifier as it can under the given $2 million budget.A budgeted learning algorithm may choose those diagnostic tests that are more related to the cancer subtype more frequently than other tests.
  9. So far I have given examples of active learning and budgeted learning.Now I am going to talk about the motivation of our work in active learningMany Active learning algorithms assume a single labeler, a perfect labeler to label an instance, therefore, these active learning algorithms concentrate only on instance selection. However, it is possible that human labelers or experts make mistakes. Multiple labelers existFor example, the scene labeling in Amazon Mechanic Turk website, it allows multiple paid workers to label the given pictures. Because any one can be a paid worker, their accuracy cannot be guaranteed. In case we have multiple human labelers, It is possible that some human labelers ask for higher pay and they provide higher accuracy answers,while other human labelers ask for lower pay and they provide lower accuracy answers?Which labelers should we choose? Also, after we choose labelers to label one instance, we will have multiple answers, how do we determine the ground truth for the instance?To solve these problems, we proposed our algorithms, which shift focus from instance selection in classical active learning to labeler selection and ground truth estimation.******************* ignore the following **********************************Labelers accuracies range from 0.50 to 0.95 for RTE (40 labelers who labeled all the 800 instances)Labelers accuracies range from 0.44 to 0.98 for TEMP (31 labelers who labeled all the 462 instances)Each instance in RTE is a sentence-sentence pair and the annotators are asked to decide whether the second sentence can be inferred from the first, answering “true” or “false.” The original RTE data set has 800 instances and 165 annotators. Each instance in TEMP is a short articleincluding two events and the annotators need to judge which of the two events happens first. The
  10. So far I have explained the motivation of our work in active learning.In the following, I am going to present the motivation of our work in budgeted learning under the setting the feature values have to be purchased at a costMany BL results assume overly simplistic probabilistic data model, such as naïve Bayes net, which assumes that the features of the training data are independent from each other.However, in reality, it is highly possible that the features of the training data are correlated with each other.What if we know the dependencies of the features? Can we learn a more accurate classifier?Can we have algorithms that exploits the known dependencies of the features?To answer these questions, we adapted existing algorithms to Bayesian networks and alsoproposed new algorithms for budgeted learning on Bayesian networks.**********************************************Note thatrecently Li et al. studied learning a Bayesian network under the setting that both the labels and the features are unknown. “Budgeted Distribution Learning of Belief Net Parameters” in ICML 2010Dan Lizotte. "Budgeted Learning of Naive Bayes Classifiers - MSc". MSc Thesis, University of Alberta, September 2003.@MScThesis{Lizotte:03, author = {Dan Lizotte}, title = {Budgeted Learning of Naive Bayes Classifiers - MSc}, School = {University of Alberta}, year = 2003, }L. Li, B. Poczos, C. Szepesvari, R. Greiner. "Budgeted Distribution Learning of Belief Net Parameters". International Conference on Machine Learning (ICML), June 2010.
  11. In the following, I will briefly talk about our work of active learning from multiple noisy labelers with varied costs.
  12. In this setting, same as in classical active learning setting, we are given a training set, a set of instances whose labels are available with a cost.Different from classical active learning, we have a set of labelers, each has a known cost and an unknown accuracy.And the goal is the same as active learning, that is, to learn a hypothesis that generalizes well while spending as little cost as possible on queries to labelers
  13. For ALML,We proposed two algorithms IEAdjCost and wIEAdjCost whichRank labelers based on their adjusted cost And Give each chosen labeler a weight for voting according to their estimated accuracies.[Answer for which percentage for saved time. Please see my paper. For data set kr-vs-kp, up to 92% percent of time.]
  14. In the following, I will briefly talk about our work of budgeted learning algorithms on naïve Bayes.
  15. For budgeted learning under the setting that the labels of the training data are given howeverThe features of the training data are available at a costOur contributions are as follows.First, we proposed …Second, we proposed …Third, we proposed …
  16. In the following, I will present our work onBudgeted learning of Bayesian networks BLBN.
  17. First, I introduce Bayesian networkWhat is a Bayesian network?It is directed acyclic graph with a joint probability distributionthat satisfies the Markov condition.That is, every variable in this graph is conditionally independent of the set of all its non-descendants given the set of all its parents. For example, node ``Tuberculosis or Bronchitis’’, given its parent Tuberculosis and Lung Cancer, it is independent from ``Visit to Aisa’’, ``Smoking’’, and ``Bronchitis’’In a BN, an arrow from node A to node B indicates A causes B, A partially causes B, B is an imperfect observation of A, A and B are functionally related, or A and B are statistically correlated.Each node has an associated conditional probability table in which every line indicates the probability of this node to be a value given its parent nodes’ values. [ignore …. Each node is either a feature node or a label node. ]
  18. A naïve Bayesian classifier can be viewed as a Bayesian network with a simple structure that has the label node as the parent node of all the feature nodes. When the label node is instantiated, all the features nodes are independent from each other.
  19. We study budgeted learning of Bayesian network under the setting thatThe labels of the training data are known while the features of the training data have to be purchased, subject to an overall budgetThe structure of the Bayesian network is given however the parameters of the Bayesian networks are unknownAnd the goal of BLBN is to learn a good Bayesian network as it can using given budget.
  20. There are two related works that have been done on learning the parameters of the Bayesian networkOne is Active Learning of Bayesian network under the setting that the features of the training data are given while the labels of the training data are available at a cost by Tong and Koller 2011The other is budgeted learning of Bayesian networks under the setting both the labels and features of the training data have to be purchased by Li et al. 2010. In both of these works, the choice of the (instance, feature) pairs is based on learned parameters of the BN. ***********************************************************************************So far we only have 1 paper by Li et al. published in International Conference of machine learning last year. The authors studied budgeted learning of Bayesian network in the setting that both the labels and the features has unknown values and need to be purchased.
  21. Compared to these two related works,Our algorithms of budgeted learning on Bayesian network studies budgeted learning of Bayesian network under the setting The labels of the training data are known while the features of the training data have to be purchased The choice of (instance, feature) pairs of our algorithms is based on the learned BN, the known labels, and the structure of the BNs.
  22. Our first result is to use BNs to provide improved probabilistic models to existing BL algorithms that typicallyuse naïve BayesSo that the adapted algorithms might learn a more accurate classifierAnd the improved probabilistic model might improve accuracies of objective functions to help the choice of (instance, feature) pairs.
  23. Besides adapting existing algorithms to Bayesian networks,We proposed new algorithms which take advantage of BN,the known class label, and the structure of the BN to choose (instance, feature) pairs. The basis for our new algorithms are as follows.First, xxxSecond, xxxThird, xxx
  24. Our first result is to adapt existing BL algorithms from naïve Bayes to Bayesian network. In the following, I am going to show how do we adapt Biased Robin to Bayesian network.Here the log loss is the minus of the summation over the log probability of predicting the correct label for each instance. Log loss is based on probabilistic estimates, would a Bayesian network improves performance?I will answer this question later.We also adapted Single Feature Look-Ahead and other similar algorithms. Their objective functions also depend on probabilistic estimates. Log loss are based on probabilistic estimates, would Bayesian network improves performanceBR and GRSFL are general budgete\\\\d learning algorithms, Related work kind of thingFor log loss function, f(xk, w) is a parametric functionKnown as cross-entropy between f(xk, w) and yk𝐿(𝑤)= −∑16_𝑘▒〖𝑙𝑜𝑔𝑃[𝑦𝑘│𝑓(𝑥𝑘;𝑤) ]= 〗−∑16_𝑘▒〖𝑙𝑜𝑔𝑓[│𝑦𝑘(𝑥𝑘;𝑤) ] 〗
  25. So far I have talked about adapting existing BL algorithms to Bayesian networks, The following I am going to talk about our BLBN algorithm MERPG. MERPG chooses the (instance, feature) pair that maximizes expected relative probability gain. For example, comparing purchasing (x1, F) and (x2, E), the expected probability of predicting correct label ,Each of them improved .05, but look at the original probability, the first has A bigger relative improvement, and that is the instance feature pair what we are going to chooseHere we show how do we compute the expected probability gain of purchasing (X1, F). Assume the node F can be true or false. The expected probability gain of purchasing (X1, F) is computed as follows.Consider when (X1, F) is true, what’s the probability of the predicting x1 as the correct label?And consider when (x1, F) is false, what’s the probability of predicting x1 as the correct label?And we sum the results up.
  26. we also proposed MERPGDSEP which breaks ties of MERPG by choosing the purchase that leads to theMaximum increase of number of d-separations from the label nodeThe following I am going to explain d-separation.
  27. Here I show an example of computing of number of increased d-separations.Generally speaking, a node is d-separated with another node if they are independent from each other. In this example, Tuberculosis or Bronchitis is the label node. Before instantiation of Lung Cancer, how many nodes are d-separated from the label node? None.After instantiation of Lung Cancer, how many nodes are d-separated from the label node?
  28. Besides making NumIncreaseDeseps as a tie breaker for MERPG,We also proposed makingNumIncreaseDseps as a weighting factor combined with MERPG in A linear and logarithmetic way, and we call them MERPGDSEPW1 and MERPGDSEPW2For example, for MERPGDSEPW2, the weighting factor is defined as follows, And it chooses the (instance, feature) pair that maximizes the weighting factor times its ERPG value.
  29. We also run our algorithms on a Markov blanket filter by choosing
  30. Here is an example of Markov blanketFor label node A, its Markov blanket includes its parents, its children, and it children’s other parentsThe instantiation of A’s Markov blanket nodes can make the remaining nodes d-separated from node A.
  31. [So far I introduced adapting existing algorithms to Bayesian networks, such as Biased Robin and Single Feature Look-ahead.And also, I explained our algorithms MERPG, MERPGDSEP, MERPGDSEPW1 and MERPGDSEPW2.In the following, I will show our experimental setup and results.]To compare our algorithms with existing algorithms, we set up the experiments as follows.
  32. So far I introduced our experimental setup. In the following, I am going to answer our several research questions.
  33. For example, for learning problem Car Diagnosis 2, We plotted the mean classification error of MERPG (BN) and MERPG (NB) after a number of purchases. after one hundred purchases, MERPG (BN) (which is the green one) significantly outperforms MERPG (NB) (which is the blue one)
  34. Research Question 2: Does Markov blanket filter help?The answer is Yes. In many cases there is a significant improvement. For example, for learning problem ALARM, We plotted the mean classification error of MBmerpg (BN) and MERPG (BN) after a number of purchases. after one hundred purchases, Mbmerpg (BN) (the red one) significantly outperforms MERPG (BN) (the green one)
  35. Research Question 3, which method performs the best?Answer is : MERPGDSEP with a Markov blanket filter on Bayesian network (also called Mbdsep) performs the bestBy counting the number of significant wins and losses of the algorithm to any other algorithm on each learning problemVia Wilcoxon signed rank test.
  36. For example, comparing MBdsep with other algorithms on Markov blanket filter on learning problem ALARM,We plotted the mean classification error for these algorithms after a number of purchases. after one hundred purchases, MBdsep(BN) (the pink one) performs the best.
  37. In summary, We have done experiments on 5 learning problems from Norsys Net Library.The main conclusions of our BLBN algorithms are as follows:
  38. So far I finished talking about our work of BLBNIn the following, I am going to show our publications.We published a paper on active learning from multiple noisy labelers in ICDM2010And we also published a paper on budgeted learning in ICDM 2007We added new algorithms to these two topics, and submitted them to Machine learning.
  39. Our paper about budgeted learning was accepted with minor revisions, and theOther paper about active learning is under review.
  40. Thank you for your time and attention. I welcome your questions
  41. Generally speaking, two nodes are d-separated if they are independent from each other. Here are some examples that the instantiation of a node changes the relationship of other nodes.In the first and second example, if V is instantiated, then A will be d-separated from the label node.And if A is independent from the label, that means purchasing A in the future would not affect the label
  42. The third example, if V and all its descendants C,D, E, F, H are NOT instantiated, A is d-separated from the label node. Otherwise, A will be connected to the label node.Therefore, instantiation of a node can also decrease the number of d-separations from the label node.