Your SlideShare is downloading. ×
Zheng defense1129
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Zheng defense1129

1,023
views

Published on

Machine Learning with Incomplete Information

Machine Learning with Incomplete Information

Published in: Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,023
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Thank you for being here. My name is YalingZheng. Today I am going to present my dissertation “Machine Learning with Incomplete Information.
  • Machine learning is to program the computer to be able to learn from examples.Machine learning programs detect patterns in data and adjust program actions accordingly.  For example, Facebook's News Feed changes according to the user's personal interactions with other users. If a user frequently tags a friend in photos, writes on his wall [or "likes" his links,] the News Feed will show more of that friend's activity due to presumed closeness. Here are some machine learning applications: face detection, email spam filtering, credit card fraudulence detection, recommendation of books, advertising, and medical diagnosis.Many applications have incomplete data. For example, in email spam filtering, the labels of these emails have to be purchased at a cost.In medical diagnosis, patient’s diagnostic results have to be purchased at a cost. We study machine learning with this version, that is, machine learning with incomplete information
  • In this presentation, First, I am going to talk about machine learning and its applicationsAnd then, I am going to talk about machine learning with incomplete information, including active learning and budgeted learning. I will give examples of them, and also talk about the motivations of our work. After that, I will present our work of active learning and budgeted learning
  • Machine learning studies the design of computer algorithms that derive general patterns, regularities, and rules from training data. Given labeled training data, for example, cancer patient descriptions with examination results, the machine learning algorithm generates a patient cancer subtype classifier. The advantage of this built classifier is: for any new cancer patient description, the classifier can predict a cancer subtype for the patient.
  • Machine learning algorithms have a wide range of applications. Besides medical diagnosis, they are also applicable to news article topic spotting, which is to categorize the news articles into different subjects.Also email spam filtering, which is to identify whether a given email is a spam or not.And also face detection, which is to identify the presence of face in an image analysis. And lots more …
  • We can use various of machine learning algorithms to learn a classifier, for example, decision tree, Bayesian network, Support vector machine, logistic regression, and so on. These machine learning algorithms work well when the training data is complete. If the labels and/or features of the training data are not given, these machine learning algorithms will learn nothing.What if we must pay for labels and/or features of the training data?We call it active learning if the goal is to learn a good classifier/hypothesis using as little cost as possible.We call it budgeted learning if the goal is to learn as good classifier/hypothesis as possible using the given budget.
  • Here is an active learning example from Amazon Mechanic Turk website. For this scene labeling problem, the labels of the scenes are purchased from paid workers. The scenes can be categorized as man-made scene, or natural scene. An active learning algorithm needs to choose which scenes to label so that it can learn a classifier using as little cost as possible. For this example, an active learning algorithm may choose those scenes that are hard to categorize as natural scene or man-made scene. For example, the 5 fingers scene.
  • The next I give an example of budgeted learning.A project was allocated $2 million to develop a diagnostic classifier for patient cancer subtypes. In this study, a pool of patients with known cancer subtype were available, as were various diagnostic tests could be performed, each with a diagnostic cost.For this example, we can see that each test is expensive, and there is an overall budget of $2 million, so a budgeted learning algorithm needs to carefully choose which patients do what kind of diagnostic test so that it can learn as good cancer subtype classifier as it can under the given $2 million budget.A budgeted learning algorithm may choose those diagnostic tests that are more related to the cancer subtype more frequently than other tests.
  • So far I have given examples of active learning and budgeted learning.Now I am going to talk about the motivation of our work in active learningMany Active learning algorithms assume a single labeler, a perfect labeler to label an instance, therefore, these active learning algorithms concentrate only on instance selection. However, it is possible that human labelers or experts make mistakes. Multiple labelers existFor example, the scene labeling in Amazon Mechanic Turk website, it allows multiple paid workers to label the given pictures. Because any one can be a paid worker, their accuracy cannot be guaranteed. In case we have multiple human labelers, It is possible that some human labelers ask for higher pay and they provide higher accuracy answers,while other human labelers ask for lower pay and they provide lower accuracy answers?Which labelers should we choose? Also, after we choose labelers to label one instance, we will have multiple answers, how do we determine the ground truth for the instance?To solve these problems, we proposed our algorithms, which shift focus from instance selection in classical active learning to labeler selection and ground truth estimation.******************* ignore the following **********************************Labelers accuracies range from 0.50 to 0.95 for RTE (40 labelers who labeled all the 800 instances)Labelers accuracies range from 0.44 to 0.98 for TEMP (31 labelers who labeled all the 462 instances)Each instance in RTE is a sentence-sentence pair and the annotators are asked to decide whether the second sentence can be inferred from the first, answering “true” or “false.” The original RTE data set has 800 instances and 165 annotators. Each instance in TEMP is a short articleincluding two events and the annotators need to judge which of the two events happens first. The
  • So far I have explained the motivation of our work in active learning.In the following, I am going to present the motivation of our work in budgeted learning under the setting the feature values have to be purchased at a costMany BL results assume overly simplistic probabilistic data model, such as naïve Bayes net, which assumes that the features of the training data are independent from each other.However, in reality, it is highly possible that the features of the training data are correlated with each other.What if we know the dependencies of the features? Can we learn a more accurate classifier?Can we have algorithms that exploits the known dependencies of the features?To answer these questions, we adapted existing algorithms to Bayesian networks and alsoproposed new algorithms for budgeted learning on Bayesian networks.**********************************************Note thatrecently Li et al. studied learning a Bayesian network under the setting that both the labels and the features are unknown. “Budgeted Distribution Learning of Belief Net Parameters” in ICML 2010Dan Lizotte. "Budgeted Learning of Naive Bayes Classifiers - MSc". MSc Thesis, University of Alberta, September 2003.@MScThesis{Lizotte:03, author = {Dan Lizotte}, title = {Budgeted Learning of Naive Bayes Classifiers - MSc}, School = {University of Alberta}, year = 2003, }L. Li, B. Poczos, C. Szepesvari, R. Greiner. "Budgeted Distribution Learning of Belief Net Parameters". International Conference on Machine Learning (ICML), June 2010.
  • In the following, I will briefly talk about our work of active learning from multiple noisy labelers with varied costs.
  • In this setting, same as in classical active learning setting, we are given a training set, a set of instances whose labels are available with a cost.Different from classical active learning, we have a set of labelers, each has a known cost and an unknown accuracy.And the goal is the same as active learning, that is, to learn a hypothesis that generalizes well while spending as little cost as possible on queries to labelers
  • For ALML,We proposed two algorithms IEAdjCost and wIEAdjCost whichRank labelers based on their adjusted cost And Give each chosen labeler a weight for voting according to their estimated accuracies.[Answer for which percentage for saved time. Please see my paper. For data set kr-vs-kp, up to 92% percent of time.]
  • In the following, I will briefly talk about our work of budgeted learning algorithms on naïve Bayes.
  • For budgeted learning under the setting that the labels of the training data are given howeverThe features of the training data are available at a costOur contributions are as follows.First, we proposed …Second, we proposed …Third, we proposed …
  • In the following, I will present our work onBudgeted learning of Bayesian networks BLBN.
  • First, I introduce Bayesian networkWhat is a Bayesian network?It is directed acyclic graph with a joint probability distributionthat satisfies the Markov condition.That is, every variable in this graph is conditionally independent of the set of all its non-descendants given the set of all its parents. For example, node ``Tuberculosis or Bronchitis’’, given its parent Tuberculosis and Lung Cancer, it is independent from ``Visit to Aisa’’, ``Smoking’’, and ``Bronchitis’’In a BN, an arrow from node A to node B indicates A causes B, A partially causes B, B is an imperfect observation of A, A and B are functionally related, or A and B are statistically correlated.Each node has an associated conditional probability table in which every line indicates the probability of this node to be a value given its parent nodes’ values. [ignore …. Each node is either a feature node or a label node. ]
  • A naïve Bayesian classifier can be viewed as a Bayesian network with a simple structure that has the label node as the parent node of all the feature nodes. When the label node is instantiated, all the features nodes are independent from each other.
  • We study budgeted learning of Bayesian network under the setting thatThe labels of the training data are known while the features of the training data have to be purchased, subject to an overall budgetThe structure of the Bayesian network is given however the parameters of the Bayesian networks are unknownAnd the goal of BLBN is to learn a good Bayesian network as it can using given budget.
  • There are two related works that have been done on learning the parameters of the Bayesian networkOne is Active Learning of Bayesian network under the setting that the features of the training data are given while the labels of the training data are available at a cost by Tong and Koller 2011The other is budgeted learning of Bayesian networks under the setting both the labels and features of the training data have to be purchased by Li et al. 2010. In both of these works, the choice of the (instance, feature) pairs is based on learned parameters of the BN. ***********************************************************************************So far we only have 1 paper by Li et al. published in International Conference of machine learning last year. The authors studied budgeted learning of Bayesian network in the setting that both the labels and the features has unknown values and need to be purchased.
  • Compared to these two related works,Our algorithms of budgeted learning on Bayesian network studies budgeted learning of Bayesian network under the setting The labels of the training data are known while the features of the training data have to be purchased The choice of (instance, feature) pairs of our algorithms is based on the learned BN, the known labels, and the structure of the BNs.
  • Our first result is to use BNs to provide improved probabilistic models to existing BL algorithms that typicallyuse naïve BayesSo that the adapted algorithms might learn a more accurate classifierAnd the improved probabilistic model might improve accuracies of objective functions to help the choice of (instance, feature) pairs.
  • Besides adapting existing algorithms to Bayesian networks,We proposed new algorithms which take advantage of BN,the known class label, and the structure of the BN to choose (instance, feature) pairs. The basis for our new algorithms are as follows.First, xxxSecond, xxxThird, xxx
  • Our first result is to adapt existing BL algorithms from naïve Bayes to Bayesian network. In the following, I am going to show how do we adapt Biased Robin to Bayesian network.Here the log loss is the minus of the summation over the log probability of predicting the correct label for each instance. Log loss is based on probabilistic estimates, would a Bayesian network improves performance?I will answer this question later.We also adapted Single Feature Look-Ahead and other similar algorithms. Their objective functions also depend on probabilistic estimates. Log loss are based on probabilistic estimates, would Bayesian network improves performanceBR and GRSFL are general budgete\\\\d learning algorithms, Related work kind of thingFor log loss function, f(xk, w) is a parametric functionKnown as cross-entropy between f(xk, w) and yk𝐿(𝑤)= −∑16_𝑘▒〖𝑙𝑜𝑔𝑃[𝑦𝑘│𝑓(𝑥𝑘;𝑤) ]= 〗−∑16_𝑘▒〖𝑙𝑜𝑔𝑓[│𝑦𝑘(𝑥𝑘;𝑤) ] 〗
  • So far I have talked about adapting existing BL algorithms to Bayesian networks, The following I am going to talk about our BLBN algorithm MERPG. MERPG chooses the (instance, feature) pair that maximizes expected relative probability gain. For example, comparing purchasing (x1, F) and (x2, E), the expected probability of predicting correct label ,Each of them improved .05, but look at the original probability, the first has A bigger relative improvement, and that is the instance feature pair what we are going to chooseHere we show how do we compute the expected probability gain of purchasing (X1, F). Assume the node F can be true or false. The expected probability gain of purchasing (X1, F) is computed as follows.Consider when (X1, F) is true, what’s the probability of the predicting x1 as the correct label?And consider when (x1, F) is false, what’s the probability of predicting x1 as the correct label?And we sum the results up.
  • we also proposed MERPGDSEP which breaks ties of MERPG by choosing the purchase that leads to theMaximum increase of number of d-separations from the label nodeThe following I am going to explain d-separation.
  • Here I show an example of computing of number of increased d-separations.Generally speaking, a node is d-separated with another node if they are independent from each other. In this example, Tuberculosis or Bronchitis is the label node. Before instantiation of Lung Cancer, how many nodes are d-separated from the label node? None.After instantiation of Lung Cancer, how many nodes are d-separated from the label node?
  • Besides making NumIncreaseDeseps as a tie breaker for MERPG,We also proposed makingNumIncreaseDseps as a weighting factor combined with MERPG in A linear and logarithmetic way, and we call them MERPGDSEPW1 and MERPGDSEPW2For example, for MERPGDSEPW2, the weighting factor is defined as follows, And it chooses the (instance, feature) pair that maximizes the weighting factor times its ERPG value.
  • We also run our algorithms on a Markov blanket filter by choosing
  • Here is an example of Markov blanketFor label node A, its Markov blanket includes its parents, its children, and it children’s other parentsThe instantiation of A’s Markov blanket nodes can make the remaining nodes d-separated from node A.
  • [So far I introduced adapting existing algorithms to Bayesian networks, such as Biased Robin and Single Feature Look-ahead.And also, I explained our algorithms MERPG, MERPGDSEP, MERPGDSEPW1 and MERPGDSEPW2.In the following, I will show our experimental setup and results.]To compare our algorithms with existing algorithms, we set up the experiments as follows.
  • So far I introduced our experimental setup. In the following, I am going to answer our several research questions.
  • For example, for learning problem Car Diagnosis 2, We plotted the mean classification error of MERPG (BN) and MERPG (NB) after a number of purchases. after one hundred purchases, MERPG (BN) (which is the green one) significantly outperforms MERPG (NB) (which is the blue one)
  • Research Question 2: Does Markov blanket filter help?The answer is Yes. In many cases there is a significant improvement. For example, for learning problem ALARM, We plotted the mean classification error of MBmerpg (BN) and MERPG (BN) after a number of purchases. after one hundred purchases, Mbmerpg (BN) (the red one) significantly outperforms MERPG (BN) (the green one)
  • Research Question 3, which method performs the best?Answer is : MERPGDSEP with a Markov blanket filter on Bayesian network (also called Mbdsep) performs the bestBy counting the number of significant wins and losses of the algorithm to any other algorithm on each learning problemVia Wilcoxon signed rank test.
  • For example, comparing MBdsep with other algorithms on Markov blanket filter on learning problem ALARM,We plotted the mean classification error for these algorithms after a number of purchases. after one hundred purchases, MBdsep(BN) (the pink one) performs the best.
  • In summary, We have done experiments on 5 learning problems from Norsys Net Library.The main conclusions of our BLBN algorithms are as follows:
  • So far I finished talking about our work of BLBNIn the following, I am going to show our publications.We published a paper on active learning from multiple noisy labelers in ICDM2010And we also published a paper on budgeted learning in ICDM 2007We added new algorithms to these two topics, and submitted them to Machine learning.
  • Our paper about budgeted learning was accepted with minor revisions, and theOther paper about active learning is under review.
  • Thank you for your time and attention. I welcome your questions
  • Generally speaking, two nodes are d-separated if they are independent from each other. Here are some examples that the instantiation of a node changes the relationship of other nodes.In the first and second example, if V is instantiated, then A will be d-separated from the label node.And if A is independent from the label, that means purchasing A in the future would not affect the label
  • The third example, if V and all its descendants C,D, E, F, H are NOT instantiated, A is d-separated from the label node. Otherwise, A will be connected to the label node.Therefore, instantiation of a node can also decrease the number of d-separations from the label node.
  • Transcript

    • 1. Machine Learning withIncomplete Information A Ph.D. Dissertation DefenseDepartment of Computer Science University of Nebraska-Lincoln Nov 2011
    • 2. Machine Learning: program the computer to be able to learn from examples. Many applications have incomplete data.11/30/2011 Yaling Zheng 2
    • 3. Outline• Machine Learning• Machine Learning with Incomplete Information –Active Learning –Budgeted Learning• Our work of active learning• Our work of budgeted learning11/30/2011 Yaling Zheng 3
    • 4. Machine Learning • Design of computer algorithms that derive general patterns, regularities, and rules from training data New cancer patient description Labeled Training Data Machine Patient cancer learning(cancer patient subtype classifierdescriptions with algorithmexamination results) Predicted result 11/30/2011 Yaling Zheng 4
    • 5. Applications of Machine Learning Algorithms• Medical Diagnosis• News Articles Topic Spotting• Email Spam Filtering• Face Detection• Credit Card Fraudulence Detection• Weather Prediction11/30/2011 Yaling Zheng 5
    • 6. Machine Learning with Incomplete Information• Basic machine learning algorithms: Decision tree, Bayesian network, Support vector machine, Logistic regression, …• What if we must pay for labels and/or features of training data? – Active Learning (learn a good classifier/hypothesis using as little cost as possible) – Budgeted Learning (learn as good classifier/hypothesis as possible using the given budget)11/30/2011 Yaling Zheng 6
    • 7. Active Learning Example11/30/2011 Yaling Zheng 7
    • 8. Budgeted Learning Example($2 million was allocated to develop a diagnostic classifier for cancer subtypes) Diagnostic Diagnostic Diagnostic … Cancer Test 1 Test 2 Test 3 subtype (per $2000) (per $1000) (per $500) ? ? ? … 1 ? ? ? … 2 ? ? ? … 1 ? ? ? … 4 ? ? ? … 1 ? ? ? … … ? ? ? … … 11/30/2011 Yaling Zheng 8
    • 9. Our Work In Active Learning• Many AL algorithms assume a single, perfect labeler; thus they focus on instance selection – Human labelers make mistakes – multiple labelers can exist (Amazon Mechanical Turk) – labelers can ask for different costs• Our algorithms – Shift focus from instance selection to labeler selection and ground truth estimation11/30/2011 Yaling Zheng 9
    • 10. Our Work In Budgeted Learning• Many BL results assume overly simplistic probabilistic data model, such as naïve Bayes• In reality, it is possible that the features of the training data are correlated with each other• What if we know the dependencies of the features? – Can we learn a more accurate classifier? – Can we exploit the Bayesian network and the known labels?11/30/2011 Yaling Zheng 10
    • 11. Active Learning from Multiple Noisy Labelers with Varied Costs (ALML)11/30/2011 Yaling Zheng 11
    • 12. Active Learning from Multiple Noisy Labelers with Varied Costs• A training set { (x1, y1), …, (xn, yn) }• {x1, …, xn} = X is the set of instances• {y1, …, yn} = Y is the set of unknown labels of instances X• O = {o1, …, oN} is a set of labelers• ci is the cost of paying oi to label an instance• ai is the unknown accuracy of oi• Goal: Learn a hypothesis that generalizes well while spending as little cost as possible on queries to labelers 11/30/2011 Yaling Zheng 12
    • 13. Contributions for ALML• Proposed two algorithms IEAdjCost [Zheng et al. 2010] and wIEAdjCost – Rank labelers based on their adjusted costs = cost * multiplier (a value decided by labeler’s accuracy) – Give each chosen labeler a weight for voting according to their estimated accuracies• These two algorithms significantly outperform existing algorithms IEThresh [Donmez et al. 2009] and Repeated11/30/2011 Yaling Zheng 13
    • 14. Budgeted Learning (BL) Of Naïve Bayes
    • 15. Budgeted Learning – Labels of the training data are given – Features of the training data are available at a cost• Contributions: – Exp3CR, Exp3C, FEL (adapted from algorithms for multi-arm bandit problem) – ABR2, RBR2, WBR2 (variations of Biased Robin based on second-order statistics) – Row (Instance) selectors: • Entropy (EN) • Error-Correction (EC)11/30/2011 Yaling Zheng 15
    • 16. Conclusions for Our BL algorithms• Compared to BR [Lizotte et al. 2003], Random, and RSFL [Kapoor and Greiner 2005] ABR2 with all row selectors, WBR2 and Exp3C with EC row selector, and FEL with UR (uniform random) row selector perform well• EC row selector stands out for Random, BR, ABR2, and WBR2• EN row selector stands out for RBR2 and Exp3C11/30/2011 Yaling Zheng 16
    • 17. Budgeted Learning of Bayesian Networks (BLBN)11/30/2011 Yaling Zheng 17
    • 18. Bayesian Networks Visit P(S=1) to Smoking 0.7 Asia S P(B=1|S) Lung Bronchitis 0 0.3 Tuberculosis Cancer 1 0.7 T L P(ToB=1| T, L) Tuberculosis 0 0 0.1 or Bronchitis 0 1 0.3 1 0 0.5 1 1 0.8 X-Ray Result Dyspnea 1811/30/2011 Yaling Zheng
    • 19. Naïve Bayesian A NB classifier can be viewed as a Bayesian network with a simple structure that has the label node as the parent node of all other feature nodes Tuberculosis or Cancer … Visit to Asia Tuberculosis … … Dyspnea11/30/2011 Yaling Zheng … 19
    • 20. Our Work• Budgeted Learning of Bayesian networks under the setting that – the labels of the training data are known while the features of the training data have to be purchased, subject to an overall budget – The structure of the Bayesian network is given however the parameters of the Bayesian networks are unknown• The goal is to learn as good BN as possible under the given budget 11/30/2011 Yaling Zheng 20
    • 21. Related Work• Active Learning of Bayesian networks [Tong and Koller 2001] – the features of the training data are given while the labels of the training data are available at a cost• Budgeted distribution learning of Bayesian networks [Li et al. 2010] – both the labels and features of the training data have to be purchased• The choice of (instance, feature) pairs is based on learned parameters of the BN11/30/2011 Yaling Zheng 21
    • 22. Our Work• Budgeted Learning on Bayesian networks – the labels of the training data are known while the features of the training data have to be purchased, subject to an overall budget• The choice of (instance, feature) pairs is based on learned BN, the known labels, and the structure of the BN11/30/2011 Yaling Zheng 22
    • 23. Adapting Existing Algorithms• Use BNs to provide improved probabilistic models to existing BL algorithms that typically use naïve Bayes – The adapted algorithms might learn a more accurate classifier – Improved probabilistic model might improve accuracies of objective functions to help the choice of (instance, feature) pairs11/30/2011 Yaling Zheng 23
    • 24. New Algorithms• Take advantage of the BN, the known class label, and the structure of the BN• Basis for new algorithms – (instance, feature) pair that leads to higher expected relative probability gain (ERPG) leads to a better BN – (instance, feature) pair that leads to more d-separations from the label node has a bigger influence on other nodes – Instantiation of nodes in the Markov blanket of the label node make the label node independent from the remaining nodes11/30/2011 Yaling Zheng 24
    • 25. BLBN: Adapting Biased Robin11/30/2011 Yaling Zheng 25
    • 26. New BLBN Algorithm: MERPG A B C D E F Label P(+ | xi) E(P(+ | xi ⋃ ? ))X1 0 3 0 1 T ? + 0.7 0.75X2 ? 6 1 0 ? true + 0.9 0.95X3 ? ? 3 1 ? ? -Choose the (instance, feature) pair that maximizes expectedrelative probability gainAssume F can be true or false E(P(+ | x1 ⋃ ? )) = P( F=true | x1 ) * P(+ | x1 , F=true)11/30/2011 + P(F=false| x1) * P( + | x1, F = false) 26 Yaling Zheng
    • 27. MERPGDSEP• MERPGDSEP breaks ties of MERPG by choosing the purchase that leads to maximum increase of d-separations from the label node (NumIncreaseDseps)• Why?• Because instantiation of those d-separated nodes in the future will not affect the label node, therefore the bigger NumIncreaseDseps is, the bigger influence of this purchase to other nodes11/30/2011 Yaling Zheng 27
    • 28. Example of d-Separation (NumIncreaseDseps) Visit to Smoking Asia Lung Bronchitis Tuberculosis Cancer Tuberculosis or Bronchitis X-Ray Dyspnea Result 28 11/30/2011 Yaling Zheng
    • 29. Making NumIncreaseDseps as a weighting factor combined with MERPG (MERPGDSEPW1 and MERPGDSEPW2)• Define weighting factor LOGFACT as follows: If number of increased d-separations >=0 LOGFACT= ln(e + NumIncreaseDseps) Else LOGFACT= 1/ln(e + NumIncreaseDseps)• Choose the (instance, feature) pair that maximizes11/30/2011 LOGFACT * ERPG-value Yaling Zheng 29
    • 30. Markov Blanket Filter• Choosing (instance, feature) pair to purchase from a subset of (instance, feature) pairs whose features fall in the Markov blanket of the label node• Why?• Because the instantiation of the entire Markov blanket nodes make the other nodes d-separated from the label node 11/30/2011 Yaling Zheng 30
    • 31. Example of Markov Blanket11/30/2011 Yaling Zheng 31
    • 32. Experimental Setup• Chose 5 learning problems whose structures are provided by Norsys Net Library• Generated Instances for these 5 learning problems based on distribution of the Bayesian networks• Set the Initial distributions (CPTs) of these learning problems to be uniform 11/30/2011 Yaling Zheng 32
    • 33. Experimental Setup• Test on Random, Round Robin, Biased Robin, Single Feature Look-ahead, MERPG, MERPGDSEP, MERPGDSEPW1, MERPGDSEPW2 on naïve Bayes, Bayesian network, on Bayesian network with Markov blanket filter• Did 10-fold cross validation• Set the budget for each learning problem as the number of purchases that the best algorithm approximately reaches its Baseline 11/30/2011 Yaling Zheng 33
    • 34. Research Question 1:How much improvement can we get by changing the base learner from Naïve Bayes to Bayesian network? Answer: In many cases there is a significant improvement.11/30/2011 Yaling Zheng 34
    • 35. After 100 purchases, MERPG (BN) significantly outperforms MERPG (NB)11/30/2011 Yaling Zheng 35
    • 36. Research Question 2: Does Markov blanket filter help?Answer: Yes, in many cases there is a significant improvement. 11/30/2011 Yaling Zheng 36
    • 37. Research Question 3: Which method performs the best?Answer:MERPGDSEP with a Markov blanket filter onBayesian network (a.k.a. MBdsep) performs thebest(by counting the number of significant wins andlosses of the algorithm to any other algorithm oneach learning problem via Wilcoxon signed ranktest)11/30/2011 Yaling Zheng 37
    • 38. Overall, MBdsep performs the best11/30/2011 Yaling Zheng 38
    • 39. Main Conclusions of Our BLBN Algorithms• Done experiments on 5 learning problems from Norsys Net Library1) Learning a BN outperforms learning a NB2) Markov blanket filter does help3) MERPGDSEP on Bayesian network with Markov blanket filter performs the best11/30/2011 Yaling Zheng 39
    • 40. Publications• “Active Learning from Multiple Noisy Labelers with Varied Costs” in Proceedings of the Tenth IEEE International Conference on Data Mining, pages 639-648 in Dec 2010• “Bandit-Based Algorithms for Budgeted Learning” in Proceedings of the Seventh IEEE International Conference on Data Mining in Oct 200711/30/2011 Yaling Zheng 40
    • 41. Accepted and Submitted Papers• “New Algorithms for Budgeted Learning” was accepted with revisions by Machine Learning• “Active Learning from Multiple Noisy Labelers with Varied Costs” was submitted to Machine Learning in July 2011 and now it is under review11/30/2011 Yaling Zheng 41
    • 42. 11/30/2011 Yaling Zheng 42
    • 43. Examples of d-separations in BNs Label Label node A node V VV is instantiatedA is d-separated from label node A is d-separated from the label => A purchasing A in the future would not add V is instantiated information about the label11/30/2011 Yaling Zheng A is d-separated of label node 43
    • 44. Examples of d-separations in BNsLabelnode A Neither V is instantiated, nor any of V’s descendants V C, D, E, F, H is instantiated A is d-separated from label C D E node11/30/2011 F H Yaling Zheng 44