The document provides an overview of Bayesian decision theory and naive Bayesian classification. It discusses how Bayesian decision theory predates other machine learning techniques and forms the basis of classifiers like naive Bayes. It then explains the Bayes theorem and how it is used for probability inference and decision making. An example of predicting whether a customer will buy a computer is used to illustrate Bayesian reasoning. Finally, the document describes how the naive Bayes classifier works by making a strong independence assumption between features to simplify computations. It gives the mathematical formulas and works through an example to classify a customer profile.
Data Science - Part IX - Support Vector MachineDerek Kane
This lecture provides an overview of Support Vector Machines in a more relatable and accessible manner. We will go through some methods of calibration and diagnostics of SVM and then apply the technique to accurately detect breast cancer within a dataset.
Naive Bayes is a kind of classifier which uses the Bayes Theorem. It predicts membership probabilities for each class such as the probability that given record or data point belongs to a particular class.
Talk on Optimization for Deep Learning, which gives an overview of gradient descent optimization algorithms and highlights some current research directions.
Welcome to the Supervised Machine Learning and Data Sciences.
Algorithms for building models. Support Vector Machines.
Classification algorithm explanation and code in Python ( SVM ) .
Data Science - Part IX - Support Vector MachineDerek Kane
This lecture provides an overview of Support Vector Machines in a more relatable and accessible manner. We will go through some methods of calibration and diagnostics of SVM and then apply the technique to accurately detect breast cancer within a dataset.
Naive Bayes is a kind of classifier which uses the Bayes Theorem. It predicts membership probabilities for each class such as the probability that given record or data point belongs to a particular class.
Talk on Optimization for Deep Learning, which gives an overview of gradient descent optimization algorithms and highlights some current research directions.
Welcome to the Supervised Machine Learning and Data Sciences.
Algorithms for building models. Support Vector Machines.
Classification algorithm explanation and code in Python ( SVM ) .
Uncertainty & Probability
Baye's rule
Choosing Hypotheses- Maximum a posteriori
Maximum Likelihood - Baye's concept learning
Maximum Likelihood of real valued function
Bayes optimal Classifier
Joint distributions
Naive Bayes Classifier
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...NTNU
The introduction of expert knowledge when learning Bayesian Networks from data is known to be an excellent approach to boost the performance of automatic learning methods, specially when the data is scarce. Previous approaches for this problem based on Bayesian statistics introduce the expert knowledge modifying the prior probability distributions. In this study, we propose a new methodology based on Monte Carlo simulation which starts with non-informative priors and requires knowledge from the expert a posteriori, when the simulation ends. We also explore a new Importance Sampling method for Monte Carlo simulation and the definition of new non-informative priors for the structure of the network. All these approaches are experimentally validated with five standard Bayesian networks.
Read more:
http://link.springer.com/chapter/10.1007%2F978-3-642-14049-5_70
MS CS - Selecting Machine Learning AlgorithmKaniska Mandal
ML Algorithms usually solve an optimization problem such that we need to find parameters for a given model that minimizes
— Loss function (prediction error)
— Model simplicity (regularization)
Formalizing (Web) Standards: An Application of Test and ProofAchim D. Brucker
Most popular technologies are based on informal or semiformal standards that lack a rigid formal semantics. Typical examples include web technologies such as the DOM or HTML, which are defined by the Web Hypertext Application Technology Working Group (WHATWG) and the World Wide Web Consortium (W3C). While there might be API specifications and test cases meant to assert the compliance of a certain implementation, the actual standard is rarely accompanied by a formal model that would lend itself for, e.g., verifying the security or safety properties of real systems.
Even when such a formalization of a standard exists, two important questions arise: first, to what extend does the formal model comply to the standard and, second, to what extend does the implementation comply to the formal model and the assumptions made during the verification? In this paper, we present an approach that brings all three involved artifacts - the (semi-)formal standard, the formalization of the standard, and the implementations - closer together by combining verification, symbolic execution, and specification based testing.
Applying Linear Optimization Using GLPKJeremy Chen
A brief introduction to linear optimization with a focus on applying it with the high-quality open-source solver GLPK.
Originally prepared for an intra-department sharing session.
2. Bayesian Decision Theory came long before Version
Spaces, Decision Tree Learning and Neural
Networks. It was studied in the field of Statistical
Theory and more specifically, in the field of Pattern
Recognition.
Bayesian Decision Theory is at the basis of
important learning schemes such as the Naïve
Bayes Classifier, Learning Bayesian Belief Networks
and the EM Algorithm.
Bayesian Decision Theory is also useful as it
provides a framework within which many non-
Bayesian classifiers can be studied
3. Bayesian reasoning is applied to decision
making and inferential statistics that deals
with probability inference. It is used the
knowledge of prior events to predict future
events.
Example: Predicting the color of marbles in a
baske.t
4.
5.
6. The Bayes Theorem:
P(h) : Prior probability of hypothesis h
P(D) : Prior probability of training data D
P(h/D) : Probability of h given D
P(D/h) : Probability of D given h
7. D : 35 year old customer with an income of $50,000 PA.
h : Hypothesis that our customer will buy our computer.
P(h/D) : Probability that customer D will buy our computer given
that we know his age and income.
P(h) : Probability that any customer will buy our computer
regardless of age (Prior Probability).
P(D/h) : Probability that the customer is 35 yrs old and earns
$50,000, given that he has bought our computer (Posterior
Probability).
P(D) : Probability that a person from our set of customers is 35
yrs old and earns $50,000.
8. Example:
h1: Customer buys a computer = Yes
h2 : Customer buys a computer = No
where h1 and h2 are subsets of our
Hypothesis Space „H‟
P(h/D) (Final Outcome) = arg max{ P( D/h1) P(h1)
, P(D/h2) P(h2)}
P(D) can be ignored as it is the same for both the
terms
9. Theory:
Generally we want the most probable
hypothesis given the training data
hMAP = arg max P(h/D) (where h belongs to
H and H is the hypothesis space)
10.
11. If we assume P(hi) = P(hj) where the
calculated probabilities amount to the same
Further simplification leads to:
hML = arg max P(D/hi)
(where hi belongs to H)
12. P (buys computer = yes) = 5/10 = 0.5
P (buys computer = no) = 5/10 = 0.5
P (customer is 35 yrs & earns $50,000) =
4/10 = 0.4
P (customer is 35 yrs & earns $50,000 / buys
computer = yes) = 3/5 =0.6
P (customer is 35 yrs & earns $50,000 / buys
computer = no) = 1/5 = 0.2
13. Customer buys a computer P(h1/D) = P(h1) *
P (D/ h1) / P(D) = 0.5 * 0.6 / 0.4
Customer does not buy a computer P(h2/D) =
P(h2) * P (D/ h2) / P(D) = 0.5 * 0.2 / 0.4
Final Outcome = arg max {P(h1/D) , P(h2/D)}
= max(0.6, 0.2)
=> Customer buys a computer
15. It is based on the Bayesian theorem It is
particularly suited when the dimensionality of
the inputs is high. Parameter estimation for
naive Bayes models uses the method of
maximum likelihood. In spite over-simplified
assumptions, it often performs better in
many complex realworl situations.
Advantage: Requires a small amount of
training data to estimate the parameters
16.
17. X = ( age= youth, income = medium, student
= yes, credit_rating = fair)
A person belonging to tuple X will buy a
computer?
18. Derivation:
D : Set of tuples
** Each Tuple is an „n‟ dimensional
attribute vector
** X : (x1,x2,x3,…. xn)
Let there be „m‟ Classes : C1,C2,C3…Cm
Naïve Bayes classifier predicts X belongs to Class
Ci iff
**P (Ci/X) > P(Cj/X) for 1<= j <= m , j <> i
Maximum Posteriori Hypothesis
**P(Ci/X) = P(X/Ci) P(Ci) / P(X)
**Maximize P(X/Ci) P(Ci) as P(X) is constant
19. With many attributes, it is computationally
expensive to evaluate P(X/Ci).
Naïve Assumption of “class conditional
independence”
22. Find class Ci that Maximizes P(X/Ci) * P(Ci)
=>P(X/Buys a computer = yes) *
P(buys_computer = yes) = 0.028
=>P(X/Buys a computer = No) *
P(buys_computer = no) = 0.007
Prediction : Buys a computer for Tuple X
23. http://en.wikipedia.org/wiki/Bayesian_probability
http://en.wikipedia.org/wiki/Naive_Bayes_classifier
http://www.let.rug.nl/~tiedeman/ml05/03_bayesian_hando ut. pdf
http://www.statsoft.com/textbook/stnaiveb.html
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-2/www/mlbook/ch6.pdf
Chai, K.; H. T. Hn, H. L. Chieu; “Bayesian Online Classifiers for Text Classification
and Filtering”,
Proceedings of the 25th annual international ACM SIGIR conference on Research and
Development in Information Retrieval, August 2002, pp 97-104.
DATA MINING Concepts and Techniques,Jiawei Han, Micheline Kamber Morgan
Kaufman Publishers, 2003.