1) Machine learning draws on areas of mathematics including probability, statistical inference, linear algebra, and optimization theory.
2) While there are easy-to-use machine learning packages, understanding the underlying mathematics is important for choosing the right algorithms, making good parameter and validation choices, and interpreting results.
3) Key concepts in probability and statistics that are important for machine learning include random variables, probability distributions, expected value, variance, covariance, and conditional probability. These concepts allow quantification of relationships and uncertainties in data.
A presentation about the basics of imprecise probability, focusing on its behavioural interpretation and the relation of this latter with the robust interpretation, that is the handling of sets of probabilities.
My data are incomplete and noisy: Information-reduction statistical methods f...Umberto Picchini
We review parameter inference for stochastic modelling in complex scenario, such as bad parameters initialization and near-chaotic dynamics. We show how state-of-art methods for state-space models can fail while, in some situations, reducing data to summary statistics (information reduction) enables robust estimation. Wood's synthetic likelihoods method is reviewed and the lecture closes with an example of approximate Bayesian computation methodology.
Accompanying code is available at https://github.com/umbertopicchini/pomp-ricker and https://github.com/umbertopicchini/abc_g-and-k
Readership lecture given at Lund University on 7 June 2016. The lecture is of popular science nature hence mathematical detail is kept to a minimum. However numerous links and references are offered for further reading.
Inference for stochastic differential equations via approximate Bayesian comp...Umberto Picchini
Despite the title the methods are appropriate for more general dynamical models (including state-space models). Presentation given at Nordstat 2012, Umeå. Relevant research paper at http://arxiv.org/abs/1204.5459 and software code at https://sourceforge.net/projects/abc-sde/
A 3hrs intro lecture to Approximate Bayesian Computation (ABC), given as part of a PhD course at Lund University, February 2016. For sample codes see http://www.maths.lu.se/kurshemsida/phd-course-fms020f-nams002-statistical-inference-for-partially-observed-stochastic-processes/
RuleML2015: Input-Output STIT Logic for Normative SystemsRuleML
In this paper we study input/output STIT logic. We introduce the semantics, proof theory and prove the completeness theorem. Input/output STIT logic has more expressive power than Makinson and van der Torre’s input/output logic. We show that input/output STIT logic is decidable and free from Ross’ paradox.
Statement of stochastic programming problemsSSA KPI
AACIMP 2010 Summer School lecture by Leonidas Sakalauskas. "Applied Mathematics" stream. "Stochastic Programming and Applications" course. Part 1.
More info at http://summerschool.ssa.org.ua
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...Umberto Picchini
I show how to obtain approximate maximum likelihood inference for "complex" models having some latent (unobservable) component. With "complex" I mean models having a so-called intractable likelihood, where the latter is unavailable in closed for or is too difficult to approximate. I construct a version of SAEM (and EM-type algorithm) that makes it possible to conduct inference for complex models. Traditionally SAEM is implementable only for models that are fairly tractable analytically. By introducing the concept of synthetic likelihood, where information is captured by a series of user-defined summary statistics (as in approximate Bayesian computation), it is possible to automatize SAEM to run on any model having some latent-component.
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Alex Pinto
We could all have predicted this with our magical Big Data analytics platforms, but it seems that Machine Learning is the new hotness in Information Security. A great number of startups with ‘cy’ and ‘threat’ in their names that claim that their product will defend or detect more effectively than their neighbour's product "because math". And it should be easy to fool people without a PhD or two that math just works.
Indeed, math is powerful and large scale machine learning is an important cornerstone of much of the systems that we use today. However, not all algorithms and techniques are born equal. Machine Learning is a most powerful tool box, but not every tool can be applied to every problem and that’s where the pitfalls lie.
This presentation will describe the different techniques available for data analysis and machine learning for information security, and discuss their strengths and caveats. The Ghost of Marketing Past will also show how similar the unfulfilled promises of deterministic and exploratory analysis were, and how to avoid making the same mistakes again.
Finally, the presentation will describe the techniques and feature sets that were developed by the presenter on the past year as a part of his ongoing research project on the subject, in particular present some interesting results obtained since the last presentation on DefCon 21, and some ideas that could improve the application of machine learning for use in information security, especially in its use as a helper for security analysts in incident detection and response.
A presentation about the basics of imprecise probability, focusing on its behavioural interpretation and the relation of this latter with the robust interpretation, that is the handling of sets of probabilities.
My data are incomplete and noisy: Information-reduction statistical methods f...Umberto Picchini
We review parameter inference for stochastic modelling in complex scenario, such as bad parameters initialization and near-chaotic dynamics. We show how state-of-art methods for state-space models can fail while, in some situations, reducing data to summary statistics (information reduction) enables robust estimation. Wood's synthetic likelihoods method is reviewed and the lecture closes with an example of approximate Bayesian computation methodology.
Accompanying code is available at https://github.com/umbertopicchini/pomp-ricker and https://github.com/umbertopicchini/abc_g-and-k
Readership lecture given at Lund University on 7 June 2016. The lecture is of popular science nature hence mathematical detail is kept to a minimum. However numerous links and references are offered for further reading.
Inference for stochastic differential equations via approximate Bayesian comp...Umberto Picchini
Despite the title the methods are appropriate for more general dynamical models (including state-space models). Presentation given at Nordstat 2012, Umeå. Relevant research paper at http://arxiv.org/abs/1204.5459 and software code at https://sourceforge.net/projects/abc-sde/
A 3hrs intro lecture to Approximate Bayesian Computation (ABC), given as part of a PhD course at Lund University, February 2016. For sample codes see http://www.maths.lu.se/kurshemsida/phd-course-fms020f-nams002-statistical-inference-for-partially-observed-stochastic-processes/
RuleML2015: Input-Output STIT Logic for Normative SystemsRuleML
In this paper we study input/output STIT logic. We introduce the semantics, proof theory and prove the completeness theorem. Input/output STIT logic has more expressive power than Makinson and van der Torre’s input/output logic. We show that input/output STIT logic is decidable and free from Ross’ paradox.
Statement of stochastic programming problemsSSA KPI
AACIMP 2010 Summer School lecture by Leonidas Sakalauskas. "Applied Mathematics" stream. "Stochastic Programming and Applications" course. Part 1.
More info at http://summerschool.ssa.org.ua
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...Umberto Picchini
I show how to obtain approximate maximum likelihood inference for "complex" models having some latent (unobservable) component. With "complex" I mean models having a so-called intractable likelihood, where the latter is unavailable in closed for or is too difficult to approximate. I construct a version of SAEM (and EM-type algorithm) that makes it possible to conduct inference for complex models. Traditionally SAEM is implementable only for models that are fairly tractable analytically. By introducing the concept of synthetic likelihood, where information is captured by a series of user-defined summary statistics (as in approximate Bayesian computation), it is possible to automatize SAEM to run on any model having some latent-component.
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Alex Pinto
We could all have predicted this with our magical Big Data analytics platforms, but it seems that Machine Learning is the new hotness in Information Security. A great number of startups with ‘cy’ and ‘threat’ in their names that claim that their product will defend or detect more effectively than their neighbour's product "because math". And it should be easy to fool people without a PhD or two that math just works.
Indeed, math is powerful and large scale machine learning is an important cornerstone of much of the systems that we use today. However, not all algorithms and techniques are born equal. Machine Learning is a most powerful tool box, but not every tool can be applied to every problem and that’s where the pitfalls lie.
This presentation will describe the different techniques available for data analysis and machine learning for information security, and discuss their strengths and caveats. The Ghost of Marketing Past will also show how similar the unfulfilled promises of deterministic and exploratory analysis were, and how to avoid making the same mistakes again.
Finally, the presentation will describe the techniques and feature sets that were developed by the presenter on the past year as a part of his ongoing research project on the subject, in particular present some interesting results obtained since the last presentation on DefCon 21, and some ideas that could improve the application of machine learning for use in information security, especially in its use as a helper for security analysts in incident detection and response.
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...Sri Ambati
Top 10 Performance Gotchas in scaling in-memory Algorithms
Abstract:
Math Algorithms have primarily been the domain of desktop data science. With the success of scalable algorithms at Google, Amazon, and Netflix, there is an ever growing demand for sophisticated algorithms over big data. In this talk, we get a ringside view in the making of the world's most scalable and fastest machine learning framework, H2O, and the performance lessons learnt scaling it over EC2 for Netflix and over commodity hardware for other power users.
Top 10 Performance Gotchas is about the white hot stories of i/o wars, S3 resets, and muxers, as well as the power of primitive byte arrays, non-blocking structures, and fork/join queues. Of good data distribution & fine-grain decomposition of Algorithms to fine-grain blocks of parallel computation. It's a 10-point story of the rage of a network of machines against the tyranny of Amdahl while keeping the statistical properties of the data and accuracy of the algorithm.
Track: Scalability, Availability, and Performance: Putting It All Together
Time: Wednesday, 11:45am - 12:35pm
Feature Talk: Real-time Aggregations, Approximations, Similarities, and Recommendations at Scale using Spark Streaming, ML, GraphX, Kafka, Cassandra, Docker, CoreNLP, Word2Vec, LDA, and Twitter Algebird
Talk Abstract: Starting with a live, interactive demo generating audience-specific recommendations, we'll dive deep into each of the key components including NiFi, Kafka, Stanford CoreNLP, Docker, Word2Vec, LDA, Twitter Algebird, Spark Streaming, SQL, ML, GraphX. As a bonus, we'll discuss the latest Netflix Recommendations Pipeline and related open source projects.
Talk Agenda:
• Intro
• Live, Interactive Recommendations Demo
• Spark Streaming, ML, GraphX, Kafka, Cassandra, Docker, CoreNLP, Word2Vec, LDA, and Twitter Algebird (advancedspark.com)
• Types of Similarity
• Euclidean vs. Non-Euclidean Similarity
• Jaccard Similarity
• Cosine Similarity
• LogLikelihood Similarity
• Edit Distance
• Text-based Similarities and Analytics
• Word2Vec
• LDA Topic Extraction
• TextRank
• Similarity-based Recommendations
• User-to-User
• Content-based, Item-to-Item (Amazon)
• Collaborative-based, User-to-Item (Netflix)
• Graph-based, Item-to-Item "Pathways" (Spotify)
• Aggregations, Approximations, and Similarities at Scale
• Twitter Algebird
• MinHash and Bucketing
• Locality Sensitive Hashing (LSH)
• BloomFilters
• CountMin Sketch
• HyperLogLog
• Q & A
Speaker Bio: Chris Fregly is a Research Engineer @ Flux Capacitor AI in SF, an Apache Spark Contributor, and a Netflix Open Source Committer.
Chris is also the founder of the global Advanced Apache Spark Meetup and author of the upcoming book, Advanced Spark @ advancedspark.com.
Previously, Chris was a Data Solutions Engineer at Databricks and a Streaming Data Engineer at Netflix.
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...Chris Fregly
In this talk, I describe some recent advancements in Streaming ML and AI Pipelines to enable data scientists to rapidly train and test on streaming data - and ultimately deploy models directly into production on their own with low friction and high impact.
With proper tooling and monitoring, data scientist have the freedom and responsibility to experiment rapidly on live, streaming data - and deploy directly into production as often as necessary. I’ll describe this tooling - and demonstrate a real production pipeline using Jupyter Notebook, Docker, Kubernetes, Spark ML, Kafka, TensorFlow, Jenkins, and Netflix Open Source.
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...Chris Fregly
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool, I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models in GPU-based production environment.
This talk is contains many Spark ML and TensorFlow AI demos using PipelineIO's 100% Open Source Community Edition. All code and Docker images are available to reproduce on your own CPU or GPU-based cluster.
Chris Fregly is Founder and Research Engineer at PipelineIO, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
https://www.meetup.com/TensorFlow-Chicago/events/240267321/
https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/240587698/
http://pipeline.io
https://github.com/fluxcapacitor/pipeline
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Chris Fregly
Advanced Spark and TensorFlow Meetup 08-04-2016
Fundamental Algorithms of Neural Networks including Gradient Descent, Back Propagation, Auto Differentiation, Partial Derivatives, Chain Rule
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...Chris Fregly
Empowering the Data Scientist with "1-Click" Production Deployment and Canary Testing of High-Performance and Highly-Scalable Spark ML and TensorFlow Models directly from Jupyter/iPython Notebooks using Docker, Kubernetes, Netflix OSS, Microservices, and Spinnaker.
With proper tooling and metrics, Data Scientists can directly deploy, analyze, A/B test, rollback, and scale out their Spark ML and TensorFlow model into live production serving with zero friction.
We will show you the open source tools that we've built based on Docker, Kubernetes, Netflix Open Source, Microservices, Spinnaker - and even Chaos Monkey!
Speaker: Chris Fregly @ PipelineIO, formerly Databricks and Netflix
Machine Learning without the Math: An overview of Machine LearningArshad Ahmed
A brief overview of Machine Learning and its associated tasks from a high level. This presentation discusses key concepts without the maths.The more mathematically inclined are referred to Bishops book on Pattern Recognition and Machine Learning.
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...Chris Fregly
In this completely 100% Open Source demo-based talk, Chris Fregly from PipelineIO will be addressing an area of machine learning and artificial intelligence that is often overlooked: the real-time, end-user-facing "serving” layer in a hybrid-cloud and on-premise deployment environment using Jupyter, NetflixOSS, Docker, and Kubernetes.
Serving models to end-users in real-time in a highly-scalable, fault-tolerant manner requires not only an understanding of machine learning fundamentals, but also an understanding of distributed systems and scalable microservices.
Chris will combine his work experience from both Databricks and Netflix to present a 100% open source, real-world, hybrid-cloud, on-premise, and NetflixOSS-based production-ready environment to serve your notebook-based Spark ML and TensorFlow AI models with highly-scalable and highly-available robustness.
Speaker Bio
Chris Fregly is a Research Scientist at PipelineIO - a Streaming Analytics and Machine Learning Startup in San Francisco.
Chris is an Apache Spark Contributor, Netflix Open Source Committer, Founder of the Global Advanced Spark and TensorFlow Meetup, and Author of the upcoming book, Advanced Spark, and Creator of the upcoming O'Reilly video series, Scaling TensorFlow Distributed in Production.
Previously, Chris was an engineer at Databricks and Netflix - as well as a Founding Member of the IBM Spark Technology Center in San Francisco.
Presentation by Tommy Lofstedt, Associated Professor at Umeå University (Sweden), at the FogGuru Workshop on linking with other disciplines in October 2019.
Machine Learning, Financial Engineering and Quantitative InvestingShengyuan Wang Steven
Explain ML from a computational learning theory perspective, illustrate FE using an end-to-end framework, and tackle QI via a concrete example with implementation.
Accounting for uncertainty is a crucial component in decision making (e.g., classification) because of ambiguity in our measurements.
Probability theory is the proper mechanism for accounting for uncertainty.
Typically quantifying uncertainty requires many evaluations of a computational model or simulator. If a simulator is computationally expensive and/or high-dimensional, working directly with a simulator often proves intractable. Surrogates of expensive simulators are popular and powerful tools for overcoming these challenges. I will give an overview of surrogate approaches from an applied math perspective and from a statistics perspective with the goal of setting the stage for the "other" community.
Detail Description about Probability Distribution for Dummies. The contents are about random variables, its types(Discrete and Continuous) , it's distribution (Discrete probability distribution and probability density function), Expected value, Binomial, Poisson and Normal Distribution usage and solved example for each topic.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
2. Areas of math essential to machine learning
Machine learning is part of both statistics and computer
science
– Probability
– Statistical inference
– Validation
– Estimates of error, confidence intervals
Linear algebra
Li
l b
– Hugely useful for compact representation of linear
transformations on data
– Dimensionality reduction techniques
Optimization theory
p
y
Jeff Howbert
Introduction to Machine Learning
Winter 2012
2
3. Why worry about the math?
There are lots of easy-to-use machine learning
packages out there.
After this course, you will know how to apply
several of the most general-purpose algorithms.
g
p p
g
HOWEVER
To get really useful results, you need good
mathematical intuitions about ce ta ge e a
at e at ca tu t o s
certain general
machine learning principles, as well as the inner
workings of the individual algorithms.
Jeff Howbert
Introduction to Machine Learning
Winter 2012
3
4. Why worry about the math?
These intuitions will allow you to:
– Choose the right algorithm(s) for the problem
– Make good choices on parameter settings,
validation strategies
g
– Recognize over- or underfitting
– Troubleshoot poor / ambiguous results
– Put appropriate bounds of confidence /
uncertainty on results
– Do a better job of coding algorithms or
incorporating them into more complex
p
g
p
analysis pipelines
Jeff Howbert
Introduction to Machine Learning
Winter 2012
4
5. Notation
a∈A
|B|
|| v ||
∑
∫
ℜ
ℜn
set membership: a is member of set A
cardinality: number of items in set B
norm: length of vector v
summation
integral
the t f
th set of real numbers
l
b
real number space of dimension n
n = 2 : plane or 2-space
n = 3 : 3- (dimensional) space
p
yp p
n > 3 : n-space or hyperspace
Jeff Howbert
Introduction to Machine Learning
Winter 2012
5
6. Notation
x, y, z, vector (bold, lower case)
u, v
A, B, X matrix (bold, upper case)
y = f( x ) function (map): assigns unique value in
range of y to each value in domain of x
dy / dx derivative of y with respect to single
y
p
g
variable x
y = f( x ) function on multiple variables, i.e. a
(
p
vector of variables; function in n-space
∂y / ∂xi partial derivative of y with respect to
element i of vector x
Jeff Howbert
Introduction to Machine Learning
Winter 2012
6
7. The concept of probability
Intuition:
In some process, several outcomes are possible.
When the process is repeated a large number of
times, each outcome occurs with a characteristic
relative frequency or probability If a particular
frequency, probability.
outcome happens more often than another
outcome,
outcome we say it is more probable
probable.
Jeff Howbert
Introduction to Machine Learning
Winter 2012
7
8. The concept of probability
Arises in two contexts:
In actual repeated experiments.
– Example: You record the color of 1000 cars driving
by. 57 of them are green. You estimate the
probability of a car being green as 57 / 1000 = 0 0057
0.0057.
In idealized conceptions of a repeated process.
– Example: You consider the behavior of an unbiased
six-sided die. The expected probability of rolling a 5 is
1 / 6 = 0.1667.
– Example: You need a model for how people’s heights
are distributed. You choose a normal distribution
(
(bell-shaped curve) to represent the expected relative
p
)
p
p
probabilities.
Jeff Howbert
Introduction to Machine Learning
Winter 2012
8
9. Probability spaces
A probability space is a random process or experiment with
three components:
– Ω, the set of possible outcomes O
number of possible outcomes = | Ω | = N
– F th set of possible events E
F, the t f
ibl
t
an event comprises 0 to N outcomes
number of possible events = | F | = 2N
– P, the probability distribution
function mapping each outcome and event to real number
between 0 and 1 (th probability of O or E)
b t
d (the
b bilit f
probability of an event is sum of probabilities of possible
outcomes in event
Jeff Howbert
Introduction to Machine Learning
Winter 2012
9
10. Axioms of probability
1.
Non-negativity:
for any event E ∈ F p( E ) ≥ 0
F,
2.
2
All possible outcomes:
p( Ω ) = 1
3.
Additivity of disjoint events:
for all events E, E’ ∈ F where E ∩ E’ = ∅,
p( E U E’ ) = p( E ) + p( E’ )
Jeff Howbert
Introduction to Machine Learning
Winter 2012
10
11. Types of probability spaces
Define | Ω | = number of possible outcomes
Discrete space
| Ω | is finite
– Analysis involves summations ( ∑ )
Continuous space | Ω | is infinite
C ti
i i fi it
– Analysis involves integrals ( ∫ )
Jeff Howbert
Introduction to Machine Learning
Winter 2012
11
12. Example of discrete probability space
Single roll of a six-sided die
– 6 possible outcomes: O = 1, 2, 3, 4, 5, or 6
p
, , , , ,
– 26 = 64 possible events
example: E = ( O ∈ { 1, 3, 5 } ), i.e. outcome is odd
– If die is fair, then probabilities of outcomes are equal
p( 1 ) = p( 2 ) = p( 3 ) =
p( 4 ) = p( 5 ) = p( 6 ) = 1 / 6
example: probability of event E = ( outcome is odd ) is
p( 1 ) + p( 3 ) + p( 5 ) = 1 / 2
Jeff Howbert
Introduction to Machine Learning
Winter 2012
12
13. Example of discrete probability space
Three consecutive flips of a coin
– 8 possible outcomes: O = HHH, HHT, HTH, HTT,
p
,
,
,
,
THH, THT, TTH, TTT
– 28 = 256 possible events
example: E = ( O ∈ { HHT, HTH, THH } ), i.e. exactly two flips
are heads
example: E = ( O ∈ { THT, TTT } ), i.e. the first and third flips
are tails
– If coin is fair, then probabilities of outcomes are equal
p( HHH ) = p( HHT ) = p( HTH ) = p( HTT ) =
p( THH ) = p( THT ) = p( TTH ) = p( TTT ) = 1 / 8
example: probability of event E = ( exactly two heads ) is
p( HHT ) + p( HTH ) + p( THH ) = 3 / 8
(
(
(
Jeff Howbert
Introduction to Machine Learning
Winter 2012
13
14. Example of continuous probability space
Height of a randomly chosen American male
– Infinite number of possible outcomes: O has some
single value in range 2 feet to 8 feet
– Infinite number of possible events
example: E = ( O | O < 5.5 feet ), i.e. individual chosen is less
than 5.5 feet tall
– Probabilities of outcomes are not equal and are
equal,
described by a continuous function, p( O )
Jeff Howbert
Introduction to Machine Learning
Winter 2012
14
15. Example of continuous probability space
Height of a randomly chosen American male
– Probabilities of outcomes O are not equal and are
equal,
described by a continuous function, p( O )
– p( O ) is a relative, not an absolute probability
p( O ) for any particular O is zero
∫ p( O ) from O = -∞ to ∞ (i.e. area under curve) is 1
example: p( O = 5 8 ) > p( O = 6 2 )
5’8”
6’2”
example: p( O < 5’6” ) = ( ∫ p( O ) from O = -∞ to 5’6” ) ≈ 0.25
Jeff Howbert
Introduction to Machine Learning
Winter 2012
15
16. Probability distributions
Discrete:
probability mass function (pmf)
example:
sum of two
fair dice
probability density function (pdf)
example:
waiting time between
eruptions of Old Faithful
(minutes)
Jeff Howbert
probability
Continuous:
Introduction to Machine Learning
Winter 2012
16
17. Random variables
A random variable X is a function that associates a number x with
each outcome O of a process
– C
Common notation: X( O ) = x, or j t X = x
t ti
just
Basically a way to redefine (usually simplify) a probability space to a
new probability space
– X must obey axioms of probability (over the possible values of x)
– X can be discrete or continuous
Example: X = number of heads in three flips of a coin
– Possible values of X are 0, 1, 2, 3
– p( X = 0 ) = p( X = 3 ) = 1 / 8
p( X = 1 ) = p( X = 2 ) = 3 / 8
– Size of space (number of “outcomes”) reduced from 8 to 4
Example: X = average height of five randomly chosen American men
– Size of space unchanged (X can range from 2 feet to 8 feet) but
feet),
pdf of X different than for single man
Jeff Howbert
Introduction to Machine Learning
Winter 2012
17
18. Multivariate probability distributions
Scenario
– Several random processes occur (
p
(doesn’t matter
whether in parallel or in sequence)
– Want to know probabilities for each possible
combination of outcomes
bi ti
f t
Can describe as joint probability of several random
variables
– Example: two processes whose outcomes are
represented by random variables X and Y. Probability
that process X has outcome x and process Y has
outcome y is denoted as:
p( X = x, Y = y )
(
Jeff Howbert
Introduction to Machine Learning
Winter 2012
18
19. Example of multivariate distribution
joint probability: p( X = minivan, Y = European ) = 0.1481
Jeff Howbert
Introduction to Machine Learning
Winter 2012
19
20. Multivariate probability distributions
Marginal probability
– Probability distribution of a single variable in a
joint distribution
– Example: two random variables X and Y:
p( X = x ) = ∑b=all values of Y p( X = x, Y = b )
Conditional probability
– Probability distribution of one variable given
that another variable takes a certain value
– Example: two random variables X and Y:
p( X = x | Y = y ) = p( X = x Y = y ) / p( Y = y )
x,
Jeff Howbert
Introduction to Machine Learning
Winter 2012
20
21. Example of marginal probability
marginal probability: p( X = minivan ) = 0.0741 + 0.1111 + 0.1481 = 0.3333
Jeff Howbert
Introduction to Machine Learning
Winter 2012
21
22. Example of conditional probability
conditional probability: p( Y = European | X = minivan ) =
0.1481 / ( 0.0741 + 0.1111 + 0.1481 ) = 0.4433
p
probability
0.2
0.15
0 15
0.1
0.05
0
American
sport
Asian
Y = manufacturer
Jeff Howbert
SUV
minivan
European
sedan
Introduction to Machine Learning
X = model type
Winter 2012
22
23. Continuous multivariate distribution
Same concepts of joint, marginal, and conditional
probabilities apply (except use integrals)
Example: three-component Gaussian mixture in two
dimensions
Jeff Howbert
Introduction to Machine Learning
Winter 2012
23
24. Expected value
Given:
A discrete random variable X with possible
X,
values x = x1, x2, … xn
Probabilities p( X = xi ) that X takes on the
various values of xi
A function yi = f( xi ) defined on X
The expected value of f is the probability-weighted
“average” value of f( xi ):
E( f ) = ∑i p( xi ) ⋅ f( xi )
Jeff Howbert
Introduction to Machine Learning
Winter 2012
24
25. Example of expected value
Process: game where one card is drawn from the deck
– If face card, dealer pays y $10
,
p y you $
– If not a face card, you pay dealer $4
Random variable X = { face card, not face card }
– p( face card ) = 3/13
– p( not face card ) = 10/13
Function f( X ) is payout to you
– f( face card ) = 10
– f( not face card ) = -4
tf
d
4
Expected value of payout is:
E( f ) = ∑i p( xi ) ⋅ f( xi ) = 3/13 ⋅ 10 + 10/13 ⋅ -4 = -0.77
4 0 77
Jeff Howbert
Introduction to Machine Learning
Winter 2012
25
26. Expected value in continuous spaces
E( f ) = ∫x = a → b p( x ) ⋅ f( x )
Jeff Howbert
Introduction to Machine Learning
Winter 2012
26
27. Common forms of expected value (1)
Mean (μ)
f( xi ) = xi ⇒
μ = E( f ) = ∑i p( xi ) ⋅ xi
– Average value of X = xi, taking into account probability
of the various xi
– M t common measure of “center” of a distribution
Most
f“
t ” f di t ib ti
Compare to formula for mean of an actual sample
1 n
μ = ∑ xi
N i =1
Jeff Howbert
Introduction to Machine Learning
Winter 2012
27
28. Common forms of expected value (2)
Variance (σ2)
f( xi ) = ( xi - μ ) ⇒
σ2 = ∑i p( xi ) ⋅ ( xi - μ )2
– Average value of squared deviation of X = xi from
mean μ, taking into account probability of the various xi
– M t common measure of “spread” of a distribution
Most
f“
d” f di t ib ti
– σ is the standard deviation
Compare to formula for variance of an actual sample
1 n
2
σ =
( xi − μ ) 2
∑
N − 1 i =1
Jeff Howbert
Introduction to Machine Learning
Winter 2012
28
29. Common forms of expected value (3)
Covariance
f( xi ) = ( xi - μx ), g( yi ) = ( yi - μy ) ⇒
cov( x y ) = ∑i p( xi , yi ) ⋅ ( xi - μx ) ⋅ ( yi - μy )
x,
high (pos
sitive)
covaria
ance
no co
ovariance
– Measures tendency for x and y to deviate from their means in
same (or opposite) directions at same time
Compare to formula for covariance of actual samples
1 n
cov( x, y ) =
∑ ( xi − μ x )( yi − μ y )
N − 1 i =1
Jeff Howbert
Introduction to Machine Learning
Winter 2012
29
30. Correlation
Pearson’s correlation coefficient is covariance normalized
by the standard deviations of the two variables
cov( x, y )
corr( x, y ) =
σ xσ y
– Always lies in range -1 to 1
– Only reflects linear dependence between variables
Linear dependence
with noise
Linear dependence
without noise
Various nonlinear
dependencies
Jeff Howbert
Introduction to Machine Learning
Winter 2012
30
31. Complement rule
Given: event A, which can occur or not
p( not A ) = 1 - p( A )
Ω
A
not A
areas represent relative probabilities
Jeff Howbert
Introduction to Machine Learning
Winter 2012
31
32. Product rule
Given: events A and B, which can co-occur (or not)
p( A B ) = p( A | B ) ⋅ p( B )
A,
(same expression given previously to define conditional probability)
(not A, not B)
A
(A B)
A,
B
Ω
(A, not B)
(not A, B)
areas represent relative probabilities
Jeff Howbert
Introduction to Machine Learning
Winter 2012
32
33. Example of product rule
Probability that a man has white hair (event A)
and is over 65 (event B)
– p( B ) = 0.18
– p( A | B ) = 0.78
– p( A, B ) = p( A | B ) ⋅ p( B ) =
0.78 ⋅ 0.18 =
0.14
Jeff Howbert
Introduction to Machine Learning
Winter 2012
33
34. Rule of total probability
Given: events A and B, which can co-occur (or not)
p( A ) = p( A B ) + p( A not B )
A,
A,
(same expression given previously to define marginal probability)
(not A, not B)
A
(A B)
A,
B
Ω
(A, not B)
(not A, B)
areas represent relative probabilities
Jeff Howbert
Introduction to Machine Learning
Winter 2012
34
35. Independence
Given: events A and B, which can co-occur (or not)
p( A | B ) = p( A )
or
p( A B ) = p( A ) ⋅ p( B )
A,
Ω
(not A, not B)
(not A, B)
B
(A, not B)
A
( A, B )
areas represent relative probabilities
Jeff Howbert
Introduction to Machine Learning
Winter 2012
35
36. Examples of independence / dependence
Independence:
– Outcomes on multiple rolls of a die
p
– Outcomes on multiple flips of a coin
– Height of two unrelated individuals
– Probability of getting a king on successive draws from
a deck, if card from each draw is replaced
Dependence:
D
d
– Height of two related individuals
– Duration of successive eruptions of Old Faithful
– Probability of getting a king on successive draws from
a deck, if card from each draw is not replaced
,
p
Jeff Howbert
Introduction to Machine Learning
Winter 2012
36
37. Example of independence vs. dependence
Independence: All manufacturers have identical product
mix. p( X = x | Y = y ) = p( X = x ).
Dependence: American manufacturers love SUVs,
Europeans manufacturers don’t.
Jeff Howbert
Introduction to Machine Learning
Winter 2012
37
38. Bayes rule
A way to find conditional probabilities for one variable when
conditional probabilities for another variable are known.
p( B | A ) = p( A | B ) ⋅ p( B ) / p( A )
where p( A ) = p( A, B ) + p( A, not B )
(not A, not B)
A
( A, B )
B
Ω
(A, not B)
Jeff Howbert
(not A, B)
Introduction to Machine Learning
Winter 2012
38
39. Bayes rule
posterior probability ∝ likelihood × prior probability
p( B | A ) = p( A | B ) ⋅ p( B ) / p( A )
(not A, not B)
A
( A, B )
B
Ω
(A, not B)
Jeff Howbert
(not A, B)
Introduction to Machine Learning
Winter 2012
39
40. Example of Bayes rule
Marie is getting married tomorrow at an outdoor ceremony in the
desert. In recent years, it has rained only 5 days each year.
Unfortunately,
Unfortunately the weatherman is forecasting rain for tomorrow When
tomorrow.
it actually rains, the weatherman has forecast rain 90% of the time.
When it doesn't rain, he has forecast rain 10% of the time. What is the
probability it will rain on the day of Marie's wedding?
Marie s
Event A: The weatherman has forecast rain.
Event B: It rains.
We know:
– p( B ) = 5 / 365 = 0.0137 [ It rains 5 days out of the year. ]
– p( not B ) = 360 / 365 = 0.9863
– p( A | B ) = 0.9 [ When it rains, the weatherman has forecast
rain 90% of the time. ]
– p( A | not B ) = 0.1 [When it does not rain the weatherman has
01
rain,
forecast rain 10% of the time.]
Jeff Howbert
Introduction to Machine Learning
Winter 2012
40
41. Example of Bayes rule, cont’d.
1.
2.
3.
We want to know p( B | A ), the probability it will rain on
the day of Marie's wedding, given a forecast for rain by
the
th weatherman. Th answer can b d t
th
The
be determined f
i d from
Bayes rule:
p( B | A ) = p( A | B ) ⋅ p( B ) / p( A )
p( A ) = p( A | B ) ⋅ p( B ) + p( A | not B ) ⋅ p( not B ) =
(0.9)(0.014) + (0.1)(0.986) = 0.111
p( B | A ) = (0.9)(0.0137) / 0.111 = 0.111
The result seems unintuitive but is correct. Even when the
weatherman predicts rain, it only rains only about 11% of
p
gloomy p
y prediction, it
,
the time. Despite the weatherman's g
is unlikely Marie will get rained on at her wedding.
Jeff Howbert
Introduction to Machine Learning
Winter 2012
41
42. Probabilities: when to add, when to multiply
ADD: When you want to allow for occurrence of
any of several possible outcomes of a single
process. Comparable to logical OR.
MULTIPLY: When you want to allow for
simultaneous occurrence of particular outcomes
p
from more than one process. Comparable to
logical AND.
– But only if the processes are independent.
Jeff Howbert
Introduction to Machine Learning
Winter 2012
42
43. Linear algebra applications
1)
2)
3)
4)
5)
6)
Operations on or between vectors and matrices
Coordinate transformations
Dimensionality reduction
Linear regression
Solution of linear systems of equations
Many others
M
th
Applications 1) – 4) are directly relevant to this
course. Today we’ll start with 1).
Jeff Howbert
Introduction to Machine Learning
Winter 2012
43
44. Why vectors and matrices?
Most common form of data
organization for machine
learning is a 2D array, where
– rows represent samples
p
p
(records, items, datapoints)
– columns represent attributes
p
(features, variables)
Natural to think of each sample
as a vector of attributes, and
whole array as a matrix
Jeff Howbert
Introduction to Machine Learning
vector
Refund Marital
Status
Taxable
Income Cheat
Yes
Single
125K
No
No
Married
100K
No
No
Single
70K
No
Yes
Married
120K
No
No
Divorced 95K
Yes
No
Married
No
Yes
Divorced 220K
No
No
Single
85K
Yes
No
Married
75K
No
No
Single
90K
Yes
60K
10
matrix
Winter 2012
44
45. Vectors
Definition: an n-tuple of values (usually real
numbers).
– n referred to as the dimension of the vector
– n can be any positive integer from 1 to infinity
integer,
Can be written in column form or row form
– Column form is conventional
– Vector elements referenced by subscript
⎛ x1 ⎞
⎜ ⎟
x=⎜ M ⎟
⎜x ⎟
⎝ n⎠
Jeff Howbert
x T = ( x1 L xn )
T
means " t
transpose"
"
Introduction to Machine Learning
Winter 2012
45
46. Vectors
Can think of a vector as:
– a point in space or
– a directed line segment with a magnitude and
direction
Jeff Howbert
Introduction to Machine Learning
Winter 2012
46
47. Vector arithmetic
Addition of two vectors
– add corresponding elements
z = x + y = (x1 + y1 L xn + yn )
T
– result is a vector
Scalar multiplication of a vector
– multiply each element by scalar
y = ax = (a x1 L axn )
T
– result is a vector
Jeff Howbert
Introduction to Machine Learning
Winter 2012
47
48. Vector arithmetic
Dot product of two vectors
– multiply corresponding elements, th add products
lti l
di
l
t then dd
d t
n
a = x ⋅ y = ∑ xi yi
i =1
– result is a scalar
y
Dot product alternative form
a = x ⋅ y = x y cos (θ )
Jeff Howbert
Introduction to Machine Learning
θ
x
Winter 2012
48
49. Matrices
Definition: an m x n two-dimensional array of
values (usually real numbers).
– m rows
– n columns
Matrix referenced by two-element subscript
– first element in
⎛ a11 L a1n ⎞
subscript is row
⎜
⎟
A=⎜ M O M ⎟
– second element in
⎜a
L amn ⎟
⎝ m1
⎠
subscript is column
– example: A24 or a24 is element in second row,
fourth column of A
Jeff Howbert
Introduction to Machine Learning
Winter 2012
49
50. Matrices
A vector can be regarded as special case of a
matrix, where one of matrix dimensions = 1.
Matrix transpose (denoted T)
– swap columns and rows
row 1 becomes column 1, etc.
– m x n matrix becomes n x m matrix
– example:
4 ⎞
⎛2
⎛ 2 7 − 1 0 3⎞
A=⎜
⎜ 4 6 − 3 1 8⎟
⎟
⎝
⎠
Jeff Howbert
⎟
⎜
6 ⎟
⎜7
AT = ⎜ − 1 − 3⎟
⎜
⎟
1 ⎟
⎜0
⎜3
8 ⎟
⎝
⎠
Introduction to Machine Learning
Winter 2012
50
51. Matrix arithmetic
Addition of two matrices
– matrices must be same size
– add corresponding elements:
cij = aij + bij
– result is a matrix of same size
C= A+B =
⎛ a11 + b11 L a1n + b1n ⎞
⎜
⎟
M
O
M
⎜
⎟
⎜a + b
L amn + bmn ⎟
m1
m1
⎝
⎠
Scalar multiplication of a matrix
– multiply each element by scalar:
bij = d ⋅ aij
– result is a matrix of same size
Jeff Howbert
Introduction to Machine Learning
B = d ⋅A =
⎛ d ⋅ a11 L d ⋅ a1n ⎞
⎜
⎟
O
M ⎟
⎜ M
⎜d ⋅a
L d ⋅ amn ⎟
m1
⎝
⎠
Winter 2012
51
52. Matrix arithmetic
Matrix-matrix multiplication
– vector-matrix multiplication j
p
just a special case
p
TO THE BOARD!!
Multiplication is associative
A⋅(B⋅C)=(A⋅B)⋅C
Multiplication is not commutative
A ⋅ B ≠ B ⋅ A (generally)
Transposition rule:
( A ⋅ B )T = B T ⋅ A T
Jeff Howbert
Introduction to Machine Learning
Winter 2012
52
53. Matrix arithmetic
RULE: In any chain of matrix multiplications, the
column dimension of one matrix in the chain must
match the row dimension of the following matrix
in the chain.
Examples
A3x5
B5x5
C3x1
Right:
A ⋅ B ⋅ AT CT ⋅ A ⋅ B AT ⋅ A ⋅ B C ⋅ CT ⋅ A
Wrong:
A⋅B⋅A
C⋅A⋅B
A ⋅ AT ⋅ B CT ⋅ C ⋅ A
Jeff Howbert
Introduction to Machine Learning
Winter 2012
53
54. Vector projection
Orthogonal projection of y onto x
– Can take place in any space of dimensionality > 2
– Unit vector in direction of x is
y
x / || x ||
– Length of projection of y in
direction of x is
θ
x
|| y || ⋅ cos(θ )
projx( y )
– Orthogonal projection of
y onto x is the vector
projx( y ) = x ⋅ || y || ⋅ cos(θ ) / || x || =
[ ( x ⋅ y ) / || x ||2 ] x (using dot product alternate form)
Jeff Howbert
Introduction to Machine Learning
Winter 2012
54
55. Optimization theory topics
Maximum likelihood
Expectation maximization
Gradient descent
Jeff Howbert
Introduction to Machine Learning
Winter 2012
55