The document provides an overview of the course "Probabilistic Machine Learning". Key points:
- The course will meet twice a week and cover foundations of probabilistic machine learning, including setting up probabilistic models, quantifying uncertainty, and estimation algorithms.
- Assessment includes four homework assignments, two quizzes, a midterm exam, and an optional research project.
- Recommended textbooks will be made available but no single textbook is required.
- The goals of the course are to introduce principles of probabilistic ML, how to model uncertainty, and contemporary advances in the field.
- Probabilistic ML is important as it allows modeling of uncertainty in parameters, predictions, and other unknown
ReduCE is a machine learning method called a reduced Coulomb energy network for approximate classification of individuals using ontologies. It learns a model called an RCE network from a training set of pre-classified individuals and background knowledge base. The model can then be used to approximately classify other individuals by outputting a predicted classification and likelihood. Experiments showed the method had good performance with high accuracy and low error rates on several ontologies compared to a standard reasoner.
1) Machine learning is a field of artificial intelligence that allows computers to learn without being explicitly programmed by finding patterns in data.
2) There are three main types of machine learning problems: supervised learning which uses labeled training data, unsupervised learning which finds hidden patterns in unlabeled data, and reinforcement learning where a system learns from feedback of rewards and punishments.
3) Key machine learning concepts include linear regression, which finds a linear relationship between variables, and gradient descent, an algorithm for minimizing cost functions to optimize model parameters like slope and intercept of a linear regression line.
Chap 8. Optimization for training deep modelsYoung-Geun Choi
연구실 내부 세미나 자료. Goodfellow et al. (2016), Deep Learning, MIT Press의 Chapter 8을 요약/발췌하였습니다. 깊은 신경망(deep neural network) 모형 훈련시 목적함수 최적화 방법으로 흔히 사용되는 방법들을 소개합니다.
This document provides an introduction to statistical model selection. It discusses various approaches to model selection including predictive risk, Bayesian methods, information theoretic measures like AIC and MDL, and adaptive methods. The key goals of model selection are to understand the bias-variance tradeoff and select models that offer the best guaranteed predictive performance on new data. Model selection aims to find the right level of complexity to explain patterns in available data while avoiding overfitting.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
- Kernel methods allow linear learning algorithms to be applied to non-linear problems by mapping data into high-dimensional feature spaces. This overcomes limitations of insufficient capacity in linear models.
- Both frequentist and Bayesian approaches to kernel methods have been developed. Frequentist approaches analyze generalization error, while Bayesian approaches use probabilistic modeling and inference.
- Support vector machines optimize the margin between classes in feature space, improving generalization. The dual formulation allows computation using kernel functions without explicitly working in feature space.
https://github.com/telecombcn-dl/dlmm-2017-dcu
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
My name is Rose Tom. I am associated with statisticsassignmenthelp.com for the past 8 years and have been helping statistics students with their MyStataLab assignments.
I have a master's in Professional Statistics from Princeton University, USA.
ReduCE is a machine learning method called a reduced Coulomb energy network for approximate classification of individuals using ontologies. It learns a model called an RCE network from a training set of pre-classified individuals and background knowledge base. The model can then be used to approximately classify other individuals by outputting a predicted classification and likelihood. Experiments showed the method had good performance with high accuracy and low error rates on several ontologies compared to a standard reasoner.
1) Machine learning is a field of artificial intelligence that allows computers to learn without being explicitly programmed by finding patterns in data.
2) There are three main types of machine learning problems: supervised learning which uses labeled training data, unsupervised learning which finds hidden patterns in unlabeled data, and reinforcement learning where a system learns from feedback of rewards and punishments.
3) Key machine learning concepts include linear regression, which finds a linear relationship between variables, and gradient descent, an algorithm for minimizing cost functions to optimize model parameters like slope and intercept of a linear regression line.
Chap 8. Optimization for training deep modelsYoung-Geun Choi
연구실 내부 세미나 자료. Goodfellow et al. (2016), Deep Learning, MIT Press의 Chapter 8을 요약/발췌하였습니다. 깊은 신경망(deep neural network) 모형 훈련시 목적함수 최적화 방법으로 흔히 사용되는 방법들을 소개합니다.
This document provides an introduction to statistical model selection. It discusses various approaches to model selection including predictive risk, Bayesian methods, information theoretic measures like AIC and MDL, and adaptive methods. The key goals of model selection are to understand the bias-variance tradeoff and select models that offer the best guaranteed predictive performance on new data. Model selection aims to find the right level of complexity to explain patterns in available data while avoiding overfitting.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
- Kernel methods allow linear learning algorithms to be applied to non-linear problems by mapping data into high-dimensional feature spaces. This overcomes limitations of insufficient capacity in linear models.
- Both frequentist and Bayesian approaches to kernel methods have been developed. Frequentist approaches analyze generalization error, while Bayesian approaches use probabilistic modeling and inference.
- Support vector machines optimize the margin between classes in feature space, improving generalization. The dual formulation allows computation using kernel functions without explicitly working in feature space.
https://github.com/telecombcn-dl/dlmm-2017-dcu
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
My name is Rose Tom. I am associated with statisticsassignmenthelp.com for the past 8 years and have been helping statistics students with their MyStataLab assignments.
I have a master's in Professional Statistics from Princeton University, USA.
This document outlines the lecture notes for CS 502 Deep Learning course. It covers machine learning basics including supervised and unsupervised learning techniques. It then introduces deep learning and the key elements of neural networks such as activation functions, training using gradient descent, and different network architectures. The document discusses linear and non-linear classification methods and concludes with an overview of neural network components like layers, neurons, weights, and biases.
Presented at: All Things Open 2019
Presented by: Samuel Taylor, Indeed
Find the transcript: https://www.samueltaylor.org/articles/open-source-machine-learning.html
This document provides an overview of machine learning concepts including supervised learning, unsupervised learning, and reinforcement learning. It discusses common machine learning applications and challenges. Key topics covered include linear regression, classification, clustering, neural networks, bias-variance tradeoff, and model selection. Evaluation techniques like training error, validation error, and test error are also summarized.
Machine learning algorithms can adapt and learn from experience. The three main machine learning methods are supervised learning (using labeled training data), unsupervised learning (using unlabeled data), and semi-supervised learning (using some labeled and some unlabeled data). Supervised learning includes classification and regression tasks, while unsupervised learning includes cluster analysis.
This document provides an overview of machine learning concepts including:
- Machine learning uses data and past experiences to improve future performance on tasks. Learning is guided by minimizing loss or maximizing gain.
- The machine learning process involves data collection, representation, modeling, estimation, and model selection. Representation of input data is important for solving problems.
- Types of learning problems include supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), and reinforcement learning.
- Generalization to new data is important but challenging due to the bias-variance tradeoff. Models can underfit or overfit training data. Appropriate model complexity and regularization are important.
This document provides an overview of machine learning concepts including:
- Machine learning uses data and past experiences to improve future performance on tasks. Learning is guided by minimizing loss or maximizing gain.
- The machine learning process involves data collection, representation, modeling, estimation, and model selection. Representation of input data is important for solving problems.
- Types of learning problems include supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), and reinforcement learning.
- Generalization to new data is important but challenging due to the bias-variance tradeoff. Models can underfit or overfit training data. Regularization and more data help address overfitting.
This document provides an overview of machine learning concepts including:
- Machine learning uses data and past experiences to improve future performance on tasks. Learning is guided by minimizing loss or maximizing gain.
- The machine learning process involves data collection, representation, modeling, estimation, and model selection. Representation of input data is important for solving problems.
- Types of learning problems include supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), and reinforcement learning.
- Generalization to new data is important but challenging due to the bias-variance tradeoff. Models can underfit or overfit training data. Regularization and more data help address overfitting.
This document provides information about the Machine Learning course EC-452 offered in Fall 2023. It is a 3 credit elective course that can be taken by DE-42 (Electrical) students in their 7th semester. Assessment will include a midterm exam, final exam, quizzes and assignments. Topics that will be covered include introduction to machine learning, applications of machine learning, common understanding of machine learning concepts and jargon, supervised learning workflow and notation, data representation, hypothesis space, classes of machine learning algorithms, and algorithm categorization schemes.
This document provides an introduction to artificial neural networks. It discusses how neural networks can mimic the brain's ability to learn from large amounts of data. The document outlines the basic components of a neural network including neurons, layers, and weights. It also reviews the history of neural networks and some common modern applications. Examples are provided to demonstrate how neural networks can learn basic logic functions through adjusting weights. The concepts of forward and backward propagation are introduced for training neural networks on classification problems. Optimization techniques like gradient descent are discussed for updating weights to minimize error. Exercises are included to help understand implementing neural networks for regression and classification tasks.
Linear regression models a quantity as a linear combination of features to minimize residuals. Issues include unstable results from collinear features and overfitting. Regularization like ridge regression addresses this by minimizing residuals and coefficient size, improving variance. Cross-validation chooses the best model by evaluating prediction error on validation data rather than training error. Lasso induces sparsity by minimizing residuals and the L1 norm of coefficients. Scikit Learn implements these methods with options like normalization, intercept fitting, and solver choices.
This document provides an introduction to machine learning and neural networks. It discusses key concepts like supervised vs unsupervised learning, classification vs regression problems, and performance evaluation metrics. It also covers foundational machine learning techniques like k-nearest neighbors for classification and regression. Descriptive statistics concepts like mean, variance, correlation and covariance are introduced. Finally, it discusses visualizing data through scatter plots and histograms.
This document discusses machine learning techniques for classification and clustering. It introduces case-based reasoning and k-nearest neighbors classification. It discusses how kernels can allow non-linear classification by mapping data to higher dimensions. It describes k-means clustering, which groups data by minimizing distances to cluster means. It also introduces agglomerative clustering, which successively merges the closest clusters.
The document discusses model overfitting in machine learning. It explains that overfitting occurs when a model learns the details and noise in the training data too well, reducing its ability to fit new data. Decision trees can overfit by becoming too complex, and increasing the size of the training data or using regularization techniques like pruning can help address overfitting. The document also covers techniques for evaluating models like cross-validation to estimate generalization performance on new examples.
The document provides biographical and professional details about Engr. Dr. Sohaib Manzoor. It lists his educational qualifications including a BS in electrical engineering, an MS in electrical and electronics engineering, and a PhD in information and communication engineering. It also outlines his work experience as a lecturer at Mirpur University of Science and Technology, Pakistan. Additionally, it lists his skills, contact information, hobbies and some academic and non-academic achievements.
The document discusses the differences and similarities between classification and prediction, providing examples of how classification predicts categorical class labels by constructing a model based on training data, while prediction models continuous values to predict unknown values, though the process is similar between the two. It also covers clustering analysis, explaining that it is an unsupervised technique that groups similar data objects into clusters to discover hidden patterns in datasets.
Machine learning in science and industry — day 1arogozhnikov
A course of machine learning in science and industry.
- notions and applications
- nearest neighbours: search and machine learning algorithms
- roc curve
- optimal classification and regression
- density estimation
- Gaussian mixtures and EM algorithm
- clustering, an example of clustering in the opera
Supervised machine learning algorithms are categorized as either supervised or unsupervised. Supervised algorithms learn from labeled examples to predict future labels, while unsupervised algorithms find hidden patterns in unlabeled data. Specifically, supervised algorithms are presented with labeled training data and learn a model to predict the class labels of new test data. Common supervised algorithms include neural networks, decision trees, k-nearest neighbors, and Naive Bayes classifiers. Naive Bayes is an easy to implement algorithm that assumes independence between features. It has been successfully applied to problems like spam filtering.
This document outlines the agenda for a machine learning programming course lecture on logistic regression and multiclass classification. The lecture will cover model validation using training and test sets, recognizing single digits from the MNIST dataset, preprocessing and encoding image data, and extending linear regression to logistic regression for multiclass classification problems. It provides examples of assessing model performance on training versus test data to avoid overfitting. The goal of the lab session is to build a program classifying MNIST digit images into 10 classes using logistic regression.
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
This document outlines the lecture notes for CS 502 Deep Learning course. It covers machine learning basics including supervised and unsupervised learning techniques. It then introduces deep learning and the key elements of neural networks such as activation functions, training using gradient descent, and different network architectures. The document discusses linear and non-linear classification methods and concludes with an overview of neural network components like layers, neurons, weights, and biases.
Presented at: All Things Open 2019
Presented by: Samuel Taylor, Indeed
Find the transcript: https://www.samueltaylor.org/articles/open-source-machine-learning.html
This document provides an overview of machine learning concepts including supervised learning, unsupervised learning, and reinforcement learning. It discusses common machine learning applications and challenges. Key topics covered include linear regression, classification, clustering, neural networks, bias-variance tradeoff, and model selection. Evaluation techniques like training error, validation error, and test error are also summarized.
Machine learning algorithms can adapt and learn from experience. The three main machine learning methods are supervised learning (using labeled training data), unsupervised learning (using unlabeled data), and semi-supervised learning (using some labeled and some unlabeled data). Supervised learning includes classification and regression tasks, while unsupervised learning includes cluster analysis.
This document provides an overview of machine learning concepts including:
- Machine learning uses data and past experiences to improve future performance on tasks. Learning is guided by minimizing loss or maximizing gain.
- The machine learning process involves data collection, representation, modeling, estimation, and model selection. Representation of input data is important for solving problems.
- Types of learning problems include supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), and reinforcement learning.
- Generalization to new data is important but challenging due to the bias-variance tradeoff. Models can underfit or overfit training data. Appropriate model complexity and regularization are important.
This document provides an overview of machine learning concepts including:
- Machine learning uses data and past experiences to improve future performance on tasks. Learning is guided by minimizing loss or maximizing gain.
- The machine learning process involves data collection, representation, modeling, estimation, and model selection. Representation of input data is important for solving problems.
- Types of learning problems include supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), and reinforcement learning.
- Generalization to new data is important but challenging due to the bias-variance tradeoff. Models can underfit or overfit training data. Regularization and more data help address overfitting.
This document provides an overview of machine learning concepts including:
- Machine learning uses data and past experiences to improve future performance on tasks. Learning is guided by minimizing loss or maximizing gain.
- The machine learning process involves data collection, representation, modeling, estimation, and model selection. Representation of input data is important for solving problems.
- Types of learning problems include supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), and reinforcement learning.
- Generalization to new data is important but challenging due to the bias-variance tradeoff. Models can underfit or overfit training data. Regularization and more data help address overfitting.
This document provides information about the Machine Learning course EC-452 offered in Fall 2023. It is a 3 credit elective course that can be taken by DE-42 (Electrical) students in their 7th semester. Assessment will include a midterm exam, final exam, quizzes and assignments. Topics that will be covered include introduction to machine learning, applications of machine learning, common understanding of machine learning concepts and jargon, supervised learning workflow and notation, data representation, hypothesis space, classes of machine learning algorithms, and algorithm categorization schemes.
This document provides an introduction to artificial neural networks. It discusses how neural networks can mimic the brain's ability to learn from large amounts of data. The document outlines the basic components of a neural network including neurons, layers, and weights. It also reviews the history of neural networks and some common modern applications. Examples are provided to demonstrate how neural networks can learn basic logic functions through adjusting weights. The concepts of forward and backward propagation are introduced for training neural networks on classification problems. Optimization techniques like gradient descent are discussed for updating weights to minimize error. Exercises are included to help understand implementing neural networks for regression and classification tasks.
Linear regression models a quantity as a linear combination of features to minimize residuals. Issues include unstable results from collinear features and overfitting. Regularization like ridge regression addresses this by minimizing residuals and coefficient size, improving variance. Cross-validation chooses the best model by evaluating prediction error on validation data rather than training error. Lasso induces sparsity by minimizing residuals and the L1 norm of coefficients. Scikit Learn implements these methods with options like normalization, intercept fitting, and solver choices.
This document provides an introduction to machine learning and neural networks. It discusses key concepts like supervised vs unsupervised learning, classification vs regression problems, and performance evaluation metrics. It also covers foundational machine learning techniques like k-nearest neighbors for classification and regression. Descriptive statistics concepts like mean, variance, correlation and covariance are introduced. Finally, it discusses visualizing data through scatter plots and histograms.
This document discusses machine learning techniques for classification and clustering. It introduces case-based reasoning and k-nearest neighbors classification. It discusses how kernels can allow non-linear classification by mapping data to higher dimensions. It describes k-means clustering, which groups data by minimizing distances to cluster means. It also introduces agglomerative clustering, which successively merges the closest clusters.
The document discusses model overfitting in machine learning. It explains that overfitting occurs when a model learns the details and noise in the training data too well, reducing its ability to fit new data. Decision trees can overfit by becoming too complex, and increasing the size of the training data or using regularization techniques like pruning can help address overfitting. The document also covers techniques for evaluating models like cross-validation to estimate generalization performance on new examples.
The document provides biographical and professional details about Engr. Dr. Sohaib Manzoor. It lists his educational qualifications including a BS in electrical engineering, an MS in electrical and electronics engineering, and a PhD in information and communication engineering. It also outlines his work experience as a lecturer at Mirpur University of Science and Technology, Pakistan. Additionally, it lists his skills, contact information, hobbies and some academic and non-academic achievements.
The document discusses the differences and similarities between classification and prediction, providing examples of how classification predicts categorical class labels by constructing a model based on training data, while prediction models continuous values to predict unknown values, though the process is similar between the two. It also covers clustering analysis, explaining that it is an unsupervised technique that groups similar data objects into clusters to discover hidden patterns in datasets.
Machine learning in science and industry — day 1arogozhnikov
A course of machine learning in science and industry.
- notions and applications
- nearest neighbours: search and machine learning algorithms
- roc curve
- optimal classification and regression
- density estimation
- Gaussian mixtures and EM algorithm
- clustering, an example of clustering in the opera
Supervised machine learning algorithms are categorized as either supervised or unsupervised. Supervised algorithms learn from labeled examples to predict future labels, while unsupervised algorithms find hidden patterns in unlabeled data. Specifically, supervised algorithms are presented with labeled training data and learn a model to predict the class labels of new test data. Common supervised algorithms include neural networks, decision trees, k-nearest neighbors, and Naive Bayes classifiers. Naive Bayes is an easy to implement algorithm that assumes independence between features. It has been successfully applied to problems like spam filtering.
This document outlines the agenda for a machine learning programming course lecture on logistic regression and multiclass classification. The lecture will cover model validation using training and test sets, recognizing single digits from the MNIST dataset, preprocessing and encoding image data, and extending linear regression to logistic regression for multiclass classification problems. It provides examples of assessing model performance on training versus test data to avoid overfitting. The goal of the lab session is to build a program classifying MNIST digit images into 10 classes using logistic regression.
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
06-20-2024-AI Camp Meetup-Unstructured Data and Vector DatabasesTimothy Spann
Tech Talk: Unstructured Data and Vector Databases
Speaker: Tim Spann (Zilliz)
Abstract: In this session, I will discuss the unstructured data and the world of vector databases, we will see how they different from traditional databases. In which cases you need one and in which you probably don’t. I will also go over Similarity Search, where do you get vectors from and an example of a Vector Database Architecture. Wrapping up with an overview of Milvus.
Introduction
Unstructured data, vector databases, traditional databases, similarity search
Vectors
Where, What, How, Why Vectors? We’ll cover a Vector Database Architecture
Introducing Milvus
What drives Milvus' Emergence as the most widely adopted vector database
Hi Unstructured Data Friends!
I hope this video had all the unstructured data processing, AI and Vector Database demo you needed for now. If not, there’s a ton more linked below.
My source code is available here
https://github.com/tspannhw/
Let me know in the comments if you liked what you saw, how I can improve and what should I show next? Thanks, hope to see you soon at a Meetup in Princeton, Philadelphia, New York City or here in the Youtube Matrix.
Get Milvused!
https://milvus.io/
Read my Newsletter every week!
https://github.com/tspannhw/FLiPStackWeekly/blob/main/141-10June2024.md
For more cool Unstructured Data, AI and Vector Database videos check out the Milvus vector database videos here
https://www.youtube.com/@MilvusVectorDatabase/videos
Unstructured Data Meetups -
https://www.meetup.com/unstructured-data-meetup-new-york/
https://lu.ma/calendar/manage/cal-VNT79trvj0jS8S7
https://www.meetup.com/pro/unstructureddata/
https://zilliz.com/community/unstructured-data-meetup
https://zilliz.com/event
Twitter/X: https://x.com/milvusio https://x.com/paasdev
LinkedIn: https://www.linkedin.com/company/zilliz/ https://www.linkedin.com/in/timothyspann/
GitHub: https://github.com/milvus-io/milvus https://github.com/tspannhw
Invitation to join Discord: https://discord.com/invite/FjCMmaJng6
Blogs: https://milvusio.medium.com/ https://www.opensourcevectordb.cloud/ https://medium.com/@tspann
https://www.meetup.com/unstructured-data-meetup-new-york/events/301383476/?slug=unstructured-data-meetup-new-york&eventId=301383476
https://www.aicamp.ai/event/eventdetails/W2024062014
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Marlon Dumas
This webinar discusses the limitations of traditional approaches for business process simulation based on had-crafted model with restrictive assumptions. It shows how process mining techniques can be assembled together to discover high-fidelity digital twins of end-to-end processes from event data.
Discover the cutting-edge telemetry solution implemented for Alan Wake 2 by Remedy Entertainment in collaboration with AWS. This comprehensive presentation dives into our objectives, detailing how we utilized advanced analytics to drive gameplay improvements and player engagement.
Key highlights include:
Primary Goals: Implementing gameplay and technical telemetry to capture detailed player behavior and game performance data, fostering data-driven decision-making.
Tech Stack: Leveraging AWS services such as EKS for hosting, WAF for security, Karpenter for instance optimization, S3 for data storage, and OpenTelemetry Collector for data collection. EventBridge and Lambda were used for data compression, while Glue ETL and Athena facilitated data transformation and preparation.
Data Utilization: Transforming raw data into actionable insights with technologies like Glue ETL (PySpark scripts), Glue Crawler, and Athena, culminating in detailed visualizations with Tableau.
Achievements: Successfully managing 700 million to 1 billion events per month at a cost-effective rate, with significant savings compared to commercial solutions. This approach has enabled simplified scaling and substantial improvements in game design, reducing player churn through targeted adjustments.
Community Engagement: Enhanced ability to engage with player communities by leveraging precise data insights, despite having a small community management team.
This presentation is an invaluable resource for professionals in game development, data analytics, and cloud computing, offering insights into how telemetry and analytics can revolutionize player experience and game performance optimization.
Do People Really Know Their Fertility Intentions? Correspondence between Sel...Xiao Xu
Fertility intention data from surveys often serve as a crucial component in modeling fertility behaviors. Yet, the persistent gap between stated intentions and actual fertility decisions, coupled with the prevalence of uncertain responses, has cast doubt on the overall utility of intentions and sparked controversies about their nature. In this study, we use survey data from a representative sample of Dutch women. With the help of open-ended questions (OEQs) on fertility and Natural Language Processing (NLP) methods, we are able to conduct an in-depth analysis of fertility narratives. Specifically, we annotate the (expert) perceived fertility intentions of respondents and compare them to their self-reported intentions from the survey. Through this analysis, we aim to reveal the disparities between self-reported intentions and the narratives. Furthermore, by applying neural topic modeling methods, we could uncover which topics and characteristics are more prevalent among respondents who exhibit a significant discrepancy between their stated intentions and their probable future behavior, as reflected in their narratives.
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
CS772-Lec1.pptx
1. Course Logistics and Introduction to
Probabilistic Machine Learning
CS772A: Probabilistic Machine Learning
Piyush Rai
2. CS772A: PML
Course Logistics
Course Name: Probabilistic Machine Learning – CS772A
2 classes each week
Mon/Thur 18:00-19:30
Venue: KD-101
All material (readings etc) will be posted on course webpage (internal
access)
URL: https://web.cse.iitk.ac.in/users/piyush/courses/pml_autumn22/pml.html
Q/A and announcements on Piazza. Please sign up
URL: https://piazza.com/iitk.ac.in/firstsemester2022/cs772a
If need to contact me by email (piyush@cse.iitk.ac.in), prefix subject line with
“CS772”
Auditors are welcome
2
3. CS772A: PML
Workload and Grading Policy
4 homeworks (theory + programming): 40%
Must be typeset in LaTeX. To be uploaded on Gradescope (login details will be
shared)
2 quizzes: 10% (tentative dates: Aug 27, Oct 22)
Mid-sem exam: 20% (date as per DOAA schedule)
End-sem exam: 30% (date as per DOAA schedule)
(Optional) Research project (to be done in groups of 2): 20%
If opted, will be exempted from appearing in the mid-sem exam
We may suggest some topics but the project has to be driven by you
We will not be able to provide compute resources
Advisable only if you have strong prior experience of working on ML research
3
Tentative dates for
release of homeworks:
Aug 11, Aug 29, Sept 26,
Oct 20
Each homework due
roughly in 2 weeks
(excluding holidays, exam
weeks)
4. CS772A: PML
Textbooks and Readings
Textbook: No official textbook required
Some books that you may use as reference (freely available PDF)
Kevin P. Murphy, Probabilistic Machine Learning: An Introduction (PML-1), The MIT
Press, 2022.
Kevin P. Murphy, Probabilistic Machine Learning: Advanced Topics(PML-2), The MIT
Press, 2022.
Christopher M. Bishop, Pattern Recognition and Machine Learning (PRML), Springer,
2007.
4
758
pages
855 pages 1347 pages
Python code Python code Python code
All these books
have
accompanying
Python code. You
are encouraged to
explore (we will
also go through
some)
5. CS772A: PML
Course Policies
Policy on homeworks
Homework solutions must be in your own words
Must cite sources you have referred to or used in preparing your HW solutions
No requests for deadline extension will entertained. Plan ahead of time
Late submissions allowed up to 72 hours with 10% penalty per 24 hour delay
Every student entitled for ONE late homework submission without penalty (use
it wisely)
Policy on collaboration/cheating
Punishable as per institute's/department’s rules
Plagiarism from other sources will also lead to strict punishment
Both copying as well as helping someone copy will be equally punishable
5
6. CS772A: PML
Course Team
6
Soumya Banerjee
soumyab@cse.iitk.ac.in
Avideep Mukherjee
avideep@cse.iitk.ac.in
Abhishek Jaiswal
abhijais@cse.iitk.ac.in
Abhinav Joshi
ajoshi@cse.iitk.ac.in
Piyush Rai
piyush@cse.iitk.ac.in
TA
TA
TA
TA
Instructor
7. CS772A: PML
Course Goals
Introduce you to the foundations of probabilistic machine learning
(PML)
Focus will be on learning core, general principles of PML
How to set up a probabilistic model for a given ML problem
How to quantify uncertainty in parameter estimates and predictions
Estimation/inference algorithms to learn various unknowns in a model
How to think about trade-offs (computational efficiency vs accuracy)
Will also look at the above using specific examples of ML models
Throughout the course, focus will also be on contemporary/latest
advances
7
Image source: Probabilistic Machine Learning – Advanced Topics (Murphy, 2022)
9. CS772A: PML
9
Probabilistic ML because… Uncertainty Matters!
ML models ingest data, give us predictions, trends in the data, etc
The standard approach: Minimize a loss func. to find optimal parameters. Use
them to predict
In many applications, we also care about the parameter/predictive uncertainty
9
𝜃 = arg min𝜃 ℓ(𝒟, 𝜃)
Single answer! No
estimate of uncertainty
about true 𝜃
May be unreliable
and can overfit
especially with
limited training data
Desirable: An approach
that reports large
predictive uncertainty
(variance) in regions where
there is little/no training
data
Regression Binary classification
Desirable: An
approach that
reports close to
uniform class
probability (0.5 in
this case) for inputs
faraway from
training data or for
out-of-distribution
inputs
Training
data
Healthcare applications,
autonomous driving,
etc.
Image source: Probabilistic Machine Learning – Advanced Topics (Murphy, 2022)
10. CS772A: PML
Types of Uncertainty: Model Uncertainty
Model uncertainty is due to not having enough training data
Also called epistemic uncertainty. Usually reducible
Vanishes with “sufficient” training data
10
Same model class (linear models)
but uncertainty about the weights
Uncertainty not just about the
weights but also the model class
𝑝(𝜃|𝒟)
Distribution of model
parameters
conditioned on the
training data
Also known as the
“posterior
distribution”
Hard to compute in
general. Will study
several methods to
do it
A probabilistic way to
express model
uncertainty
Image credit: Balaji L, Dustin T, Jasper N. (NeurIPS 2020 tutorial)
Means
uncertainty in
model weights
An example posterior of
a two-dim parameter
vector 𝜃
Each model class itself will
have uncertainty(like left fig)
since there isn’t enough
training data
3 different model
classes considered here
(with linear, polynomial,
circular decision
boundaries)
11. CS772A: PML
Model Uncertainty via Posterior: An Illustration
Consider a linear classification model for 2-dim
inputs
Classifier weight will be a 2-dim vector 𝜃 = [𝜃1, 𝜃2]
Its posterior will be some 2-dim distribution 𝑝(𝜃|𝒟)
Sampling from this distribution will generate 2-dim
vectors
Each vector will correspond to a linear separator
(right fig)
Thus posterior in this case is equivalent to a
11
Not every separator is
equally important
Importance of
separator 𝜃(𝑖)
is
𝑝(𝜃(𝑖)
|𝒟)
12. CS772A: PML
Types of Uncertainty: Data Uncertainty
Data uncertainty can be due to various reasons, e.g.,
Intrinsic hardness of labeling, class overlap
Labeling errors/disagreements (for difficult training inputs)
Noisy or missing features
Also called aleatoric uncertainty. Usually irreducible
Won’t vanish even with infinite training data
Note: Can sometimes vanish by adding more features
(figure on the right) or switching to a more complex model
12
𝑝(𝑦|𝜃, 𝒙)
Usually specified by the
distribution of data being
modeled conditioned on model
parameters and other inputs
A probabilistic way to
express data
uncertainty
Image source: “Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods” (H&W 2021)
Image source: “Improving machine classification using human uncertainty measurements” (Battleday et al, 2021)
Image credit: Eric Nalisnick
13. CS772A: PML
A Key Principle of PML: Marginalization
Consider training data 𝒟 = (𝐗, 𝐲)
Consider a model 𝑚(e.g., a deep neural net) of the form 𝑝(𝑦|𝒙, 𝜃, 𝑚)
Let 𝜃 denote the optimal weights of model 𝑚 by minimizing a loss fn.
on 𝒟
Standard prediction for a new test input 𝒙∗ : 𝑝 𝑦∗ 𝒙∗, 𝜃, 𝑚
A more robust prediction can be obtained via marginalization
13
𝑝(𝑦∗|𝒙∗, 𝒟, 𝑚) = ∫ 𝑝 𝑦∗ 𝒙∗, 𝜃, 𝑚 𝑝 𝜃 𝒟, 𝑚 𝑑𝜃
For multi-class
classification, this quantity
will be a vector of predicted
class probabilities
For multi-class
classification, this too will
be a vector of predicted
class probabilities but its
computation incorporates
the uncertainty in 𝜃
Denotes data /
aleatoric
uncertainty
Denotes model weight /
epistemic uncertainty
Prediction via marginalizing
the predictions by each
value of 𝜃 over its posterior
distribution
Predictions by different
𝜃’s not weighted equally
but based on how likely
each 𝜃 is given data 𝒟,,
i.e., 𝑝 𝜃 𝒟, 𝑚
e.g., a deep net
with softmax
outputs
Will derive the
expression shortly – just
a simple application of
product and chain rules
of probability
Note: 𝑚 is just a
model identifier;
can ignore when
writing
Posterior
distribution of the
weights
Uncertainty about 𝜃
ignored
14. CS772A: PML
More on Marginalization
Marginalization (integral) is usually intractable and needs to be
approximated
Marginalization can be done even over several choices of models
14
𝑝(𝑦∗|𝒙∗, 𝒟, 𝑚) = ∫ 𝑝 𝑦∗ 𝒙∗, 𝜃, 𝑚 𝑝 𝜃 𝒟, 𝑚 𝑑𝜃
𝑝(𝑦∗|𝒙∗, 𝒟) = 𝑚=1
𝑀
𝑝 𝑦∗ 𝒙∗, 𝒟, 𝑚 𝑝(𝑚|𝒟)
𝑝(𝑦∗|𝒙∗, 𝒟, 𝑚) = ∫ 𝑝 𝑦∗ 𝒙∗, 𝜃, 𝑚 𝑝 𝜃 𝒟, 𝑚 𝑑𝜃
≈
1
𝑆 𝑖=1
𝑆
𝑝 𝑦∗ 𝒙∗, 𝜃(𝑖), 𝑚
Marginalization over all
weights of a single
model 𝑚
Marginalization over all
finite choices 𝑚 =
1,2, … , 𝑀 of the model
Like a double averaging
(over all model choices,
and over all weights of
each model choice)
Each 𝜃(𝑖)
is drawn
i.i.d. from the
distribution
𝑝 𝜃 𝒟, 𝑚
Above integral replaced
by a “Monte-Carlo
Averaging”
Will look at these
ideas in more depth
later
For example, deep nets
with different architectures
Haven’t yet told
you how to compute
this quantity but will
see shortly
15. CS772A: PML
Marginalization is like Ensemble-based Averaging
Recall the approximation of the marginalization
Akin to computing averaged prediction using ensemble of weights
𝜃(𝑖)
𝑖=1
𝑆
Ensembling is a well-known classical technique for strong predictive
15
𝑝 𝑦∗ 𝒙∗, 𝒟, 𝑚 ≈
1
𝑆 𝑖=1
𝑆
𝑝 𝑦∗ 𝒙∗, 𝜃(𝑖), 𝑚
Tip: If you can’t
design/use a full-fledged,
rigorous probabilistic
model, consider using an
ensemble* instead
We will also study
ensembles later during
this course
One way to get such an
ensemble is to train the
same model with 𝑆
different initializations
*Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles (Lakshminarayanan et al, 2017)
May not be as good as
doing proper
marginalization over the
posterior but usually
better than using a single
estimate of the weights
𝜃(1) 𝜃(2) 𝜃(𝑆)
17. CS772A: PML
Modeling Data Probabilistically: A Simplistic View
Assume data 𝐗 = {𝒙1, 𝒙2, … , 𝒙𝑁} generated from a prob. model with
params 𝜃
Note: Shaded nodes = observed; unshaded nodes =
unknown/unobserved
Goal: To estimate the unknowns (𝜃 in this case), given the observed
data 𝐗
Many ways to do this (point estimate or the posterior distribution of 𝜃)
17
𝒙1, 𝒙2, … , 𝒙𝑁 ∼ 𝑝(𝒙|𝜃)
A plate diagram
of this simplistic
model
No outputs/labels in this
problem; just modeling the
inputs (an unsupervised
learning task)
18. CS772A: PML
Basic Ingredients in Probabilistic ML
Specification of prob. models requires two key ingredients: Likelihood
and prior
Likelihood 𝑝(𝒙|𝜃) or the “observation model” specifies how data is
generated
Measures data fit (or “loss”) w.r.t. the given 𝜃 (will see the reason formally
later)
Prior distribution 𝑝(𝜃) specifies how likely different parameter values
are a priori
18
19. CS772A: PML
Parameter Estimation in Probabilistic Models
A simple way: Find 𝜃 for which the observed data is most likely or
most probable
More desirable: Estimate the full posterior distribution over 𝜃 to get the
uncertainty
To make predictions, can use full posterior rather than a point
19
This “point estimate”, called
maximum likelihood estimate
(MLE), however, does not provide
us any information about
uncertainty in 𝜃
Fully Bayesian inference. In
general, an intractable problem
(mainly because computing the
marginal 𝑝(𝐗) is hard), except for
some simple cases (will study
how to solve such problems)
Just like loss function
minimization
approach
Log
likelihood
Can also find a single best estimate
of 𝜃 by finding the mode of the
posterior, i.e., 𝜃 =
argmax
𝜃
log 𝑝(𝜃|𝐗), called the
maximum-a-posteriori (MAP)
estimate, but we won’t get
uncertainty estimate
In this course, the use of
the word “inference”
would mean Bayesian
inference, i.e.,
computing the (usually
an approximate)
posterior distribution of
the model parameters.
At some other places, especially
in deep learning community, the
word “inference” usually refers
to making prediction on the test
inputs
20. CS772A: PML
Posterior Averaging
Can now use the posterior over params to compute “averaged
prediction”, e.g.,
20
Posterior predictive
distribution (PPD)
obtained by doing an
importance-weighted
averaging over the
posterior
Plug-in predictive
distribution
Tells us how
important this value
of 𝜃 is
𝑝 𝒙∗ 𝐗 = ∫ 𝑝 𝒙∗ 𝜃 𝑝 𝜃 𝐗 𝑑𝜃
𝑝 𝒙∗ 𝐗 = ∫ 𝑝(𝒙∗, 𝜃|𝐗) 𝑑𝜃
= ∫ 𝑝 𝒙∗ 𝜃, 𝐗 𝑝(𝜃|𝐗) 𝑑𝜃
= ∫ 𝑝 𝒙∗ 𝜃 𝑝(𝜃|𝐗) 𝑑𝜃
Assuming
observations are
i.i.d. given 𝜃
An
approximation
of the PPD
≈
1
𝑆 𝑖=1
𝑆
𝑝(𝒙∗|𝜃 𝑖
)
“Plug-in”
because we
plugged-in a
single value of 𝜃
to compute this
Note: The PPD will take
different forms depending
on the problem/model,
e.g., for a discriminative
supervised learning
model, PPD will be of the
form 𝑝(𝑦|𝒙, 𝒟) where 𝒟 is
the labeled training data
Proof of the PPD expression
Past (training)
data
Future (test)
data
21. CS772A: PML
Bayesian Inference
Bayesian inference can be seen in a sequential fashion
Our belief about 𝜃 keeps getting updated as we see more and more
data
Posterior keeps getting updates as more and more data is observed
Note: Updates may not be straightforward and approximations may be
21
23. CS772A: PML
Modeling Data Probabilistically: With Latent Vars
Can endow generative models of data with latent variables. For
example:
Such models are used in many problems, especially unsupervised learning:
Gaussian mixture model, probabilistic PCA, topic models, deep generative
models, etc.
We will look at several of these in this course and way to learn such models
23
Each data point 𝒙𝑛
is associated with
a latent variable
𝒛𝑛
The latent variable 𝒛𝑛 can be used
to encode some property of 𝒙𝑛
(e.g., its cluster membership, or its
low-dim representation, or missing
parts of 𝒙𝑛)
24. CS772A: PML
Modeling Data Probabilistically: Other Examples
Can construct various other types of models of data for different
problems
Any node (even if observed) we are uncertain about is modeled by a prob.
distribution
These nodes become the random variables of the model
The full model is specified via a joint prob. distribution over all random
variables
24
A simple supervised
learning model
A latent variable model
for unsupervised learning
A latent variable model
for supervised learning
A latent variable model
for sequential data
26. CS772A: PML
Helps in Sequential Decision-Making Problems
Sequential decision-making: Information about uncertainty can
“guide” us, e.g.,
Applications in active learning, reinforcement learning, Bayesian
26
Given our current estimate of
the regression function, which
training input(s) should we add
next to improve its estimate
the most?
Uncertainty can help here: Acquire
training inputs from regions where the
function is most uncertain about its
current predictions
Blue curve is the mean of
the function, shaded
region denotes the
predictive uncertainty
27. CS772A: PML
Can Learn Data Distribution and Generate Data
Often wish to learn the underlying probability distribution 𝑝(𝒙) of the
data
Useful for many tasks, e.g.,
Outlier/novelty detection: Outliers will have low probability under 𝑝(𝒙)
Can sample from this distribution to generate new “artificial” but realistic-
looking data
27
Several models, such as
generative adversarial
networks (GAN), variational
auto-encoders (VAE),
denoising diffusion models,
etc can do this
Pic credit: https://medium.com/analytics-vidhya/an-introduction-to-generative-deep-learning-792e93d1c6d4
28. CS772A: PML
Can Better Handle OOD Data
Many modern deep neural networks (DNN) tend to be overconfident
Especially true if test data is “out-of-distribution (OOD)”
PML models are robust and give better uncertainty estimates to flag
OOD data
28
Low
confidence
High
confidence
Class 1
Class 2
Some OOD
test data
Desirable confidence map Confidence map of a DNN
Model has high confidence
for predictions on even
inputs that are far away
from training data
Example of an
overconfident
model
Image source: “Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness” (Liu et al, 2020)
29. CS772A: PML
Can Construct Complex Models in Modular Ways
Can combine multiple simple probabilistic models to learn complex
patterns
Can design models that can jointly learn from multiple datasets and share
information across multiple datasets using shared parameters with a prior
distribution
29
A combination of a mixture model
for clustering and a probabilistic
linear regression model: Result is a
probabilistic nonlinear regression
model
Can design a
latent variable
model to do
this
Essentially a
“mixture of
experts” model
An example of transfer
learning or multitask
learning using a
probabilistic approach
Example: Estimating the
means of 𝑚 datasets,
assuming the means are
somewhat related. Can do
this jointly rather than
estimating independently
Easy to do it using a probabilistic
approach with shared parameters
(will see details later)
30. CS772A: PML
Hyperparameter Estimation
ML models invariably have hyperparams, e.g., regularization/kernel
h.p. in a linear/kernel regression, hyperparameters of a deep neural
network, etc.
Can specify the hyperparams as additional unknown of the probabilistic
model
Can now estimate them, e.g., using a point estimate or a posterior
distribution
To find point estimate of hyperparameters, we can write the probability of data as a
30
A way to find the point estimate of the
hyperparameters by maximizing the
marginal likelihood of data (more on
this later)
𝛼 = argmax
𝛼
log 𝑝(𝐗|𝛼)
= argmax
𝛼
log ∫ 𝑝 𝐗 𝜃 𝑝 𝜃 𝛼 𝜃
The approach of using marginal
likelihood for doing such thing
has some issues; other
quantities can be used such as
“conditional” marginal
likelihood* (more on this later)
*Bayesian Model Selection, the Marginal Likelihood, and Generalization (Lotfi et al, 2022)
31. CS772A: PML
Model Comparison
Suppose we have a number of models 𝑚 = 1,2, … , 𝑀 to choose from
The standard way to choose the best model is cross-validation
Can also compute the posterior probability of each candidate model, using
Bayes rule
If all models are equally likely a priori (𝑝(𝑚) is uniform) then the best model
can be selected as the one with largest marginal likelihood
31
𝑝 𝑚 𝐗 =
𝑝 𝑚 𝑝(𝐗|𝑚)
𝑝(𝐗)
Marginal likelihood of model
𝑚
𝑝(𝐗|𝑚)
This doesn’t require
a separate validation
set unlike cross-
validation
Therefore also useful for
doing model
selection/comparison for
unsupervised learning
problems
May not be easy to do
exactly but can
compute it
approximately
32. CS772A: PML
Non-probabilistic ML Methods?
Some non-probabilistic ML methods can give probabilistic answers via
heuristics
Doesn’t mean these methods are not useful/used but they don’t follow
the PML paradigm, so we won’t study them in this course
Some examples which you may have seen
Converting distances from hyperplane (in hyperplane classifiers) to compute class
probabilities
Using class-frequencies in nearest neighbors to compute class probabilities
Using class-frequencies at leaves of a Decision Tree to compute class probabilities
Soft k-means clustering to compute probabilistic cluster memberships
32
Or methods like Platt
Scaling used to get
class probabilities for
SVMs
33. CS772A: PML
Tentative Outline
Basics of probabilistic modeling and inference
Common probability distributions
Basic point estimation (MLE and MAP)
Bayesian inference (simple and not-so-simple cases)
Probabilistic models for regression, classification, clustering, dimensionality
reduction
Gaussian Processes (probabilistic modeling meets kernels)
Latent Variable Models (for i.i.d., sequential, and relational data)
Approximate Bayesian inference (EM, variational inference, sampling, etc)
Bayesian Deep Learning
Misc topics, e.g., deep generative models, black-box inference, sequential
decision-making, etc
33