This document discusses probabilistic models for classification tasks. It describes generative models that model the joint probability of classes and features to calculate class posterior probabilities using Bayes' rule. Generative models make assumptions like conditional independence of features. Discriminative models directly model the posterior probability of classes given features and make fewer assumptions. Maximum likelihood estimation techniques are discussed for estimating the parameters of generative models like naive Bayes and logistic regression. The document also covers decision theory and different loss functions used for classification.
Pre-requisites
Probability
Linear algebra, optimization, high school calculus, …
Recommended Books
“Machine Learning: A Probabilistic Perspective”, Kevin Murphy
“Pattern Recognition and Machine Learning”, Christopher Bishop
“An Introduction to Probabilistic Graphical Models”, Michael Jordan
Evaluation
Assignments
Exam
With R, Python, Apache Spark and a plethora of other open source tools, anyone with a computer can run machine learning algorithms in a jiffy! However, without an understanding of which algorithms to choose and when to apply a particular technique, most machine learning efforts turn into trial and error experiments with conclusions like "The algorithms don't work" or "Perhaps we should get more data".
In this lecture, we will focus on the key tenets of machine learning algorithms and how to choose an algorithm for a particular purpose. Rather than just showing how to run experiments in R ,Python or Apache Spark, we will provide an intuitive introduction to machine learning with just enough mathematics and basic statistics.
We will address:
• How do you differentiate Clustering, Classification and Prediction algorithms?
• What are the key steps in running a machine learning algorithm?
• How do you choose an algorithm for a specific goal?
• Where does exploratory data analysis and feature engineering fit into the picture?
• Once you run an algorithm, how do you evaluate the performance of an algorithm?
Learning a nonlinear embedding by preserving class neibourhood structure 최종WooSung Choi
Salakhutdinov, Ruslan, and Geoffrey E. Hinton. "Learning a nonlinear embedding by preserving class neighbourhood structure." International Conference on Artificial Intelligence and Statistics. 2007.
Pre-requisites
Probability
Linear algebra, optimization, high school calculus, …
Recommended Books
“Machine Learning: A Probabilistic Perspective”, Kevin Murphy
“Pattern Recognition and Machine Learning”, Christopher Bishop
“An Introduction to Probabilistic Graphical Models”, Michael Jordan
Evaluation
Assignments
Exam
With R, Python, Apache Spark and a plethora of other open source tools, anyone with a computer can run machine learning algorithms in a jiffy! However, without an understanding of which algorithms to choose and when to apply a particular technique, most machine learning efforts turn into trial and error experiments with conclusions like "The algorithms don't work" or "Perhaps we should get more data".
In this lecture, we will focus on the key tenets of machine learning algorithms and how to choose an algorithm for a particular purpose. Rather than just showing how to run experiments in R ,Python or Apache Spark, we will provide an intuitive introduction to machine learning with just enough mathematics and basic statistics.
We will address:
• How do you differentiate Clustering, Classification and Prediction algorithms?
• What are the key steps in running a machine learning algorithm?
• How do you choose an algorithm for a specific goal?
• Where does exploratory data analysis and feature engineering fit into the picture?
• Once you run an algorithm, how do you evaluate the performance of an algorithm?
Learning a nonlinear embedding by preserving class neibourhood structure 최종WooSung Choi
Salakhutdinov, Ruslan, and Geoffrey E. Hinton. "Learning a nonlinear embedding by preserving class neighbourhood structure." International Conference on Artificial Intelligence and Statistics. 2007.
SANN: Programming Code Representation Using Attention Neural Network with Opt...Peter Brusilovsky
Slides of CIKM 2023 paper by Muntasir Hoq, Sushanth Reddy Chilla, Melika Ahmadi Ranjbar, Peter Brusilovsky and Bita Akram
https://dl.acm.org/doi/10.1145/3583780.3615047
An Introduction to Reinforcement Learning - The Doors to AGIAnirban Santara
Reinforcement Learning (RL) is a genre of Machine Learning in which an agent learns to choose optimal actions in different states in order to reach its specified goal, solely by interacting with the environment through trial and error. Unlike supervised learning, the agent does not get examples of "correct" actions in given states as ground truth. Instead, it has to use feedback from the environment (which can be sparse and delayed) to improve its policy over time. The formulation of the RL problem closely resembles the way in which human beings learn to act in different situations. Hence it is often considered the gateway to achieving the goal of Artificial General Intelligence.
The motivation of this talk is to introduce the audience to key theoretical concepts like formulation of the RL problem using Markov Decision Process (MDP) and solution of MDP using dynamic programming and policy gradient based algorithms. State-of-the-art deep reinforcement learning algorithms will also be covered. A case study of the application of reinforcement learning in robotics will also be presented.
Paper Study: Melding the data decision pipelineChenYiHuang5
Melding the data decision pipeline: Decision-Focused Learning for Combinatorial Optimization from AAAI2019.
Derive the math equation from myself and match the same result as two mentioned CMU papers [Donti et. al. 2017, Amos et. al. 2017] while applying the same derivation procedure.
These slides are for the tutorial on how to use R language for data analysis and Machine Learning tasks.
The workshop was given at OSCON (Austin, TX), 2017
Explainable algorithm evaluation from lessons in educationCSIRO
How can we evaluate a portfolio of algorithms to extract meaningful interpretations about them? Suppose we have a set of algorithms. These can be classification, regression, clustering or any other type of algorithm. And suppose we have a set of problems that these algorithms can work on. We can evaluate these algorithms on the problems and get the results. From these results, can we explain the algorithms in a meaningful way? The easy option is to find which algorithm performs best for each problem and find the algorithm that performs best on the greatest number of problems. But, there is a limitation with this approach. We are only looking at the overall best! Suppose a certain algorithm gives the best performance on hard problems, but not on easy problems. We would miss this algorithm by using the “overall best” approach. How do we obtain a salient set of algorithm features?
We present Graph Convolutional Networks that, unlike classic DL models, allow supervised learning by exploiting both the single node features and its relationships with the others within the network.
Preliminaries of Analytic Geometry and Linear Algebra 3D modellingPirouz Nourian
from my lecture notes for the course Geo1004 (2015), 3D modelling of the built environment, at TU Delft, faculty of Architecture and the Built Environment
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
How do we develop machine learning models and systems taking fairness, accuracy, explainability, and transparency into account? How do we protect the privacy of users when building large-scale AI based systems? Model fairness and explainability and protection of user privacy are considered prerequisites for building trust and adoption of AI systems in high stakes domains such as hiring, lending, and healthcare. We will first motivate the need for adopting a “fairness, explainability, and privacy by design” approach when developing AI/ML models and systems for different consumer and enterprise applications from the societal, regulatory, customer, end-user, and model developer perspectives. We will then focus on the application of responsible AI techniques in practice through industry case studies. We will discuss the sociotechnical dimensions and practical challenges, and conclude with the key takeaways and open challenges.
Influence of Design Parameters on the Singularities and Workspace of a 3-RPS ...Dr. Ranjan Jha
CCToMM 2017 Symposium on Mechanisms, Machines, and Mechatronics (2015 CCToMM M3 Symposium)
Venue: Montreal, Canada
Abstract: This paper presents the variations in the workspace, singularities and joint space with respect to the design parameter $k$ of the 3-R\underline{P}S parallel manipulator. Also, the influence on the parasitic motions due to the design parameters is studied, which plays an important role in the selection of the manipulator for the desired task. The cylindrical algebraic decomposition method and Gr\"{o}bner based computations are used to model the workspace and joint space with the parallel singularities in 2R1T and 3T projection spaces, where the orientation of the mobile platform is represented by using quaternions. These computations are useful to select the optimum value for the design parameter such that the parasitic motions can be limited to specific values. Depending on the design parameter $k$, three different configurations of the 3-R\underline{P}S parallel robot are analyzed.
SANN: Programming Code Representation Using Attention Neural Network with Opt...Peter Brusilovsky
Slides of CIKM 2023 paper by Muntasir Hoq, Sushanth Reddy Chilla, Melika Ahmadi Ranjbar, Peter Brusilovsky and Bita Akram
https://dl.acm.org/doi/10.1145/3583780.3615047
An Introduction to Reinforcement Learning - The Doors to AGIAnirban Santara
Reinforcement Learning (RL) is a genre of Machine Learning in which an agent learns to choose optimal actions in different states in order to reach its specified goal, solely by interacting with the environment through trial and error. Unlike supervised learning, the agent does not get examples of "correct" actions in given states as ground truth. Instead, it has to use feedback from the environment (which can be sparse and delayed) to improve its policy over time. The formulation of the RL problem closely resembles the way in which human beings learn to act in different situations. Hence it is often considered the gateway to achieving the goal of Artificial General Intelligence.
The motivation of this talk is to introduce the audience to key theoretical concepts like formulation of the RL problem using Markov Decision Process (MDP) and solution of MDP using dynamic programming and policy gradient based algorithms. State-of-the-art deep reinforcement learning algorithms will also be covered. A case study of the application of reinforcement learning in robotics will also be presented.
Paper Study: Melding the data decision pipelineChenYiHuang5
Melding the data decision pipeline: Decision-Focused Learning for Combinatorial Optimization from AAAI2019.
Derive the math equation from myself and match the same result as two mentioned CMU papers [Donti et. al. 2017, Amos et. al. 2017] while applying the same derivation procedure.
These slides are for the tutorial on how to use R language for data analysis and Machine Learning tasks.
The workshop was given at OSCON (Austin, TX), 2017
Explainable algorithm evaluation from lessons in educationCSIRO
How can we evaluate a portfolio of algorithms to extract meaningful interpretations about them? Suppose we have a set of algorithms. These can be classification, regression, clustering or any other type of algorithm. And suppose we have a set of problems that these algorithms can work on. We can evaluate these algorithms on the problems and get the results. From these results, can we explain the algorithms in a meaningful way? The easy option is to find which algorithm performs best for each problem and find the algorithm that performs best on the greatest number of problems. But, there is a limitation with this approach. We are only looking at the overall best! Suppose a certain algorithm gives the best performance on hard problems, but not on easy problems. We would miss this algorithm by using the “overall best” approach. How do we obtain a salient set of algorithm features?
We present Graph Convolutional Networks that, unlike classic DL models, allow supervised learning by exploiting both the single node features and its relationships with the others within the network.
Preliminaries of Analytic Geometry and Linear Algebra 3D modellingPirouz Nourian
from my lecture notes for the course Geo1004 (2015), 3D modelling of the built environment, at TU Delft, faculty of Architecture and the Built Environment
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
How do we develop machine learning models and systems taking fairness, accuracy, explainability, and transparency into account? How do we protect the privacy of users when building large-scale AI based systems? Model fairness and explainability and protection of user privacy are considered prerequisites for building trust and adoption of AI systems in high stakes domains such as hiring, lending, and healthcare. We will first motivate the need for adopting a “fairness, explainability, and privacy by design” approach when developing AI/ML models and systems for different consumer and enterprise applications from the societal, regulatory, customer, end-user, and model developer perspectives. We will then focus on the application of responsible AI techniques in practice through industry case studies. We will discuss the sociotechnical dimensions and practical challenges, and conclude with the key takeaways and open challenges.
Influence of Design Parameters on the Singularities and Workspace of a 3-RPS ...Dr. Ranjan Jha
CCToMM 2017 Symposium on Mechanisms, Machines, and Mechatronics (2015 CCToMM M3 Symposium)
Venue: Montreal, Canada
Abstract: This paper presents the variations in the workspace, singularities and joint space with respect to the design parameter $k$ of the 3-R\underline{P}S parallel manipulator. Also, the influence on the parasitic motions due to the design parameters is studied, which plays an important role in the selection of the manipulator for the desired task. The cylindrical algebraic decomposition method and Gr\"{o}bner based computations are used to model the workspace and joint space with the parallel singularities in 2R1T and 3T projection spaces, where the orientation of the mobile platform is represented by using quaternions. These computations are useful to select the optimum value for the design parameter such that the parasitic motions can be limited to specific values. Depending on the design parameter $k$, three different configurations of the 3-R\underline{P}S parallel robot are analyzed.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
2. Binary Classification Problem
• N iid training samples: {𝑥𝑛, 𝑐𝑛}
• Class label: 𝑐𝑛 ∈ {0,1}
• Feature vector: 𝑋 ∈ 𝑅𝑑
• Focus on modeling conditional probabilities 𝑃(𝐶|𝑋)
• Needs to be followed by a decision step
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 2
3. Generative models for classification
• Model joint probability
• 𝑃 𝐶, 𝑋 = 𝑃 𝐶 𝑃(𝑋|𝐶)
• Class posterior probabilities via Bayes rule
• 𝑃 𝐶 𝑋 ∝ 𝑃(𝐶, 𝑋)
• Prior probability of a class: 𝑃(𝐶 = 𝑘)
• Class conditional probabilities: 𝑃(𝑋 = 𝑥|𝐶 = 𝑘)
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 3
4. Generative Process for Data
• Enables generation of new data points
• Repeat N times
• Sample class 𝑐𝑖 ∼ 𝑝 𝑐
• Sample feature value 𝑥𝑖 ∼ 𝑝(𝑥|𝑐𝑖)
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 4
5. Conditional Probability in a Generative Model
𝑃 𝐶 = 1 𝑥
=
𝑃 𝐶 = 1 𝑃(𝑥|𝐶 = 1)
𝑃 𝐶 = 1 𝑃(𝑥|𝐶 = 1) + 𝑃 𝐶 = 0 𝑃(𝑥|𝐶 = 0)
=
1
1 + exp{−𝑎}
≝ 𝜎(𝑎)
where 𝑎 = ln(
𝑃 𝐶=1 𝑃(𝑥|𝐶=1)
𝑃 𝐶=0 𝑃(𝑥|𝐶=0)
)
• Logistic function 𝜎()
• Independent of specific form of class
conditional probabilities
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 5
7. Case: Binary classification with Gaussians
𝑃 𝐶 = 1 𝑥 = 𝜎 𝑤𝑇
𝑥 + 𝑤0
Where
𝑤 = Σ−1 𝜇1 − 𝜇0
𝑤0 = −
1
2
𝜇11
𝑇Σ−1𝜇1 +
1
2
𝜇0
𝑇
Σ−1𝜇0 + log
𝜋
1 − 𝜋
• Quadratic term cancels out
• Linear classification model
• Class boundary 𝑤𝑇𝑥 + 𝑤0 = 0
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 7
8. Special Cases
• Σ = 𝐼; 𝜋 = 1 − 𝜋 = 0.5
• Class boundary: 𝑥 =
1
2
(𝜇0 + 𝜇1)
• Σ = 𝐼; 𝜋 ≠ 1 − 𝜋
• Class boundary shifts by log
𝜋
1−𝜋
• Arbitrary Σ
• Decision boundary still linear but
not orthogonal to the hyper-plane
joining the two means
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 8
Image from Michael Jordan’s book
10. Case: Gaussian Multi-class Classification
• 𝐶 ∈ 1,2, … , 𝐾
• Prior 𝑃 𝐶 = 𝑘 = 𝜋𝑘; 𝜋𝑘 ≥ 0, 𝑘 𝜋𝑘 = 0
• Class conditional densities 𝑃 𝑥 𝐶 = 𝑘 = 𝑁(𝜇𝑘, Σ)
𝑃 𝐶 = 𝑘 𝑥 =
exp{𝑎𝑘}
𝑙 exp{𝑎𝑙}
where 𝑎𝑘 = log 𝑝 𝐶 = 𝑘 𝑝(𝑥|𝐶 = 𝑘)
• Soft-max / normalized exponential function
• For Gaussian class conditionals
• 𝑎𝑘 = 𝜇𝑘
𝑇
Σ−1
𝑥 −
1
2
𝜇𝑘
𝑇
Σ−1
𝜇𝑘
𝑇
+ log 𝜋𝑘
• The decision boundaries are still lines in the feature space
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 10
11. MLE for Gaussian Multi-class
• Similar to the Binary case
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 11
12. Case: Naïve Bayes
• Similar to Gaussian setting, only features are
discrete (binary, for simplicity)
• “Naïve” Assumption: Feature dimensions 𝑋𝑗
conditionally independent given class label
• Very different from independence assumption
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 12
14. MLE for Naïve Bayes
• Formulate loglikelihood in terms of parameters
Θ = 𝜋, 𝜂
𝑙 Θ =
𝑛 𝑗 𝑘
𝑐𝑛𝑘[𝑥𝑛𝑗 log 𝜂𝑘𝑗 + (1 − 𝑥𝑛𝑗 log(1 − 𝜂𝑘𝑗))] +
𝑛 𝑘
𝑐𝑛𝑘 log 𝜋𝑘
• Maximize likelihood wrt parameters
Θ𝑀𝐿 = arg max l Θ 𝑠. 𝑡.
𝑘
𝜋𝑘 = 1
𝜂𝑘𝑗𝑀𝐿
=
𝑛 𝑥𝑛𝑗𝑐𝑛𝑘
𝑛 𝑐𝑛𝑘
𝜋𝑘𝑀𝐿 =
𝑛 𝑐𝑛𝑘
𝑁
• MLE overfits
• Susceptible to 0 frequencies in training data
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 14
15. Bayesian Estimation for Naïve Bayes
• Model the parameters as random variables and
analyze posterior distributions
• Take point estimates if necessary
𝜋 ∼ 𝐵𝑒𝑡𝑎 𝛼, 𝛽
𝜂𝑘𝑗 ∼ 𝑖𝑖𝑑 𝐵𝑒𝑡𝑎 𝛼𝑘, 𝛽𝑘
𝜋𝑘𝑀𝐿 =
𝑛 𝑐𝑛𝑘 + 𝛼 − 1
𝑁 + 𝛼 + 𝛽 − 2
𝜂𝑘𝑗𝑀𝐴𝑃
=
𝑛 𝑥𝑛𝑗𝑐𝑛𝑘 + 𝛼𝑘 − 1
𝑛 𝑐𝑛𝑘 + 𝛼𝑘 + 𝛽𝑘 − 2
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 15
16. Discriminative Models for Classification
• Familiar form for posterior class
distribution
𝑃 𝐶 = 𝑘 𝑥 =
exp{𝑤𝑘
𝑇
𝑥 + 𝑤0}
𝑙 exp 𝑤𝑙
𝑇
𝑥 + 𝑤0
• Model posterior distribution directly
• Advantages as classification model
• Fewer assumptions, fewer parameters
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 16
Image from Michael Jordan’s book
17. Logistic Regression for Binary Classification
• Apply model for binary setting
𝜇 𝑥 ≡ 𝑃 𝐶 = 1 𝑥 =
1
1 + exp −𝑤𝑇𝑥
• Formulate likelihood with weights as parameters
𝐿 𝑤 =
𝑛
𝜇 𝑥𝑛
𝑐𝑛 1 − 𝜇 𝑥𝑛
1−𝑐𝑛
𝑙 𝑤 = 𝑛 𝑐𝑛 log 𝜇 + 1 − 𝑐𝑛 log(1 − 𝜇)
where 𝜇 =
1
1+exp{−𝑤𝑇𝑥𝑛}
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 17
18. MLE for Binary Logistic Regression
• Maximize likelihood wrt weights
𝜕𝑙(𝑤)
𝜕𝑤
= 𝑋𝑇
(𝑐 − 𝜇)
• No closed form solution
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 18
19. MLE for Binary Logistic Regression
• Not quadratic but still convex
• Iterative optimization using gradient descent (LMS
algorithm)
• Batch gradient update
• 𝑤(𝑡+1) = 𝑤𝑡 + 𝜌 𝑛 𝑥𝑛(𝑐𝑛 − 𝜇(𝑥𝑛))
• Stochastic gradient descent update
• 𝑤(𝑡+1) = 𝑤𝑡 + 𝜌𝑥𝑛(𝑐𝑛 − 𝜇(𝑥𝑛))
• Faster algorithm – Newton’s Method
• Iterative Re-weighted least squares (IRLS)
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 19
20. Bayesian Binary Logistic Regression
• Bayesian model exists, but intractable
• Conjugacy breaks down because of the sigmoid function
• Laplace approximation for the posterior
• Major challenge for Bayesian framework
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 20
21. Soft-max regression for Multi-class Classification
• Left as exercise
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 21
22. Choices for the activation function
• Probit function: CDF of the Gaussian
• Complementary log-log model: CDF of exponential
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 22
23. Generative vs Discriminative: Summary
• Generative models
• Easy parameter estimation
• Require more parameters OR simplifying assumptions
• Models and “understands” each class
• Easy to accommodate unlabeled data
• Poorly calibrated probabilities
• Discriminative models
• Complicated estimation problem
• Fewer parameters and fewer assumptions
• No understanding of individual classes
• Difficult to accommodate unlabeled data
• Better calibrated probabilities
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 23
24. Decision Theory
• From posterior distributions to actions
• Loss functions measure extent of error
• Optimal action depends on loss function
• Reject option for classification problems
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 24
25. Loss functions
• 0-1 loss
• 𝐿 𝑦, 𝑎 = 𝐼 𝑦 ≠ 𝑎 =
0 𝑖𝑓 𝑎 = 𝑦
1 𝑖𝑓 𝑎 ≠ 𝑦
• Minimized by MAP estimate (posterior mode)
• 𝑙2 loss
• 𝐿 𝑦, 𝑎 = 𝑦 − 𝑎 2
• Expected loss: 𝐸[ 𝑦 − 𝑎 2|𝑥] (Min mean squared error)
• Minimized by Bayes estimate (posterior mean)
• 𝑙1loss
• 𝐿 𝑦, 𝑎 = 𝑦 − 𝑎
Minimized by posterior median
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 25
28. ROC curves
• Plot TPR and FPR for
different values of
decision threshold
• Quality of classifier
measured by area under
the curve (AUC)
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 28
Image from wikipedia
29. Precision-recall curves
• In settings such as
information retrieval, 𝑁− ≫
𝑁+
• Precision =
𝑇𝑃
𝑁+
• Recall =
𝑇𝑃
𝑁+
• Plot precision vs recall for
varying values of threshold
• Quality of classifier
measured by area under the
curve (AUC) or by specific
values e.g. P@k
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 29
Image from scikit-learn
30. F1-scores
• To evaluate at a single threshold, need to combine
precision and recall
• 𝐹1 =
2𝑃𝑅
𝑃+𝑅
• 𝐹𝛽 =
1+𝛽2 𝑃.𝑅
𝛽2𝑃+𝑅
when P and R and not equally important
• Harmonic mean
• Why?
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 30
31. Estimating generalization error
• Training set performance is not a good indicator of
generalization error
• A more complex model overfits, a less complex one
underfits
• Which model do I select?
• Validation set
• Typically 80%, 20%
• Wastes valuable labeled data
• Cross validation
• Split training data into K folds
• For ith iteration, train on K/i folds, test on ith fold
• Average generalization error over all folds
• Leave one out cross validation: K=N
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 31
32. Summary
• Generative models
• Gaussian Discriminant Analysis
• Naïve Bayes
• Discriminative models
• Logistics regression
• Iterative algorithms for training
• Binary vs Multiclass
• Evaluation of classification models
• Generalization performance
• Cross validation
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 32