Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Boston Spark Meetup May 24, 2016 by Chris Fregly 1838 views
- Deploy Spark ML and Tensorflow AI M... by Chris Fregly 1295 views
- 陸永祥/全球網路攝影機帶來的機會與挑戰 by 台灣資料科學年會 3591 views
- High Performance Distributed Tensor... by Chris Fregly 2095 views
- TensorFlow 深度學習快速上手班--電腦視覺應用 by Mark Chang 2219 views
- Gradient Descent, Back Propagation,... by Chris Fregly 2788 views

No Downloads

Total views

1,017

On SlideShare

0

From Embeds

0

Number of Embeds

4

Shares

0

Downloads

28

Comments

0

Likes

3

No embeds

No notes for slide

- 1. General remarks about learning Probability Theory and Statistics Linear spaces Machine Learning Preliminaries and Math Refresher M. L¨thi, T. Vetter u February 18, 2008 M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 2. General remarks about learning Probability Theory and Statistics Linear spaces Outline 1 General remarks about learning 2 Probability Theory and Statistics 3 Linear spaces M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 3. General remarks about learning Probability Theory and Statistics Linear spaces Outline 1 General remarks about learning 2 Probability Theory and Statistics 3 Linear spaces M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 4. General remarks about learning Probability Theory and Statistics Linear spaces The problem of learning is arguably at the very core of the problem of intelligence, both biological and artiﬁcial. T. Poggio and C.R. Shelton M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 5. General remarks about learning Probability Theory and Statistics Linear spaces Model building in natural sciences Model building Given a phenomenon, construct a model for it. Example (Heat Conduction) Phenomenon: The spontaneous transfer of thermal energy through matter, from a region of higher temperature to a region of lower temperature Model: ∂Q = −k T · dS ∂t S M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 6. General remarks about learning Probability Theory and Statistics Linear spaces Learning as Model Building Example (Learning) Phenomenon: Learning (Inferring general rules from examples) Model: P(f )P(f |D) f ∗ = arg max f ∈H P(D) M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 7. General remarks about learning Probability Theory and Statistics Linear spaces Learning as Model Building Example (Learning) Phenomenon: Learning (Inferring general rules from examples) Model: ∗ P(f ) )P(D|f f = arg max f ∈H P(D) Neural networks, Decision Trees, Naive Bayes, Support Vector machines, etc. Models for learning The models for learning are the learning algorithms M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 8. General remarks about learning Probability Theory and Statistics Linear spaces Goals of the ﬁrst block Life is short . . . We want to cover the essentials of learning. General Setting Statistical Kernel Methods Mathematically Learning Theory Theory of precise setting When does Kernels of the learning learning work Make linear problem Conditions any algorithms Valid for any algorithm has non-linear. kind of learning to satisfy Learning from algorithm Performance non-vectorial bounds data. M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 9. General remarks about learning Probability Theory and Statistics Linear spaces Mathematics needed in the ﬁrst block The need for mathematics As we treat the learning problem in a formal setting, the results and methods are necessarily formulated in mathematical terms. General Setting Statistical Kernel Methods Probability Learning Theory Linear spaces theory More Linear algebra Statistics probability Basic theory Basic optimization optimization More statistics theory theory M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 10. General remarks about learning Probability Theory and Statistics Linear spaces Mathematics needed in the ﬁrst block The need for mathematics As we treat the learning problem in a formal setting, the results and methods are necessarily formulated in mathematical terms. General Setting Statistical Kernel Methods Probability Learning Theory Linear spaces theory More Linear algebra Statistics probability Basic theory Basic optimization optimization More statistics theory theory A bit of mathematical maturity and an open mind is required. The rest will be explained. M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 11. General remarks about learning Probability Theory and Statistics Linear spaces Nothing is more practical than a good theory. Vladimir N. Vapnik M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 12. General remarks about learning Probability Theory and Statistics Linear spaces Nothing is more practical than a good theory. Vladimir N. Vapnik Nothing (in computer science) is more beautiful than learning theory? M. L¨thi u M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 13. General remarks about learning Probability Theory and Statistics Linear spaces Outline 1 General remarks about learning 2 Probability Theory and Statistics 3 Linear spaces M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 14. General remarks about learning Probability Theory and Statistics Linear spaces M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 15. General remarks about learning Probability Theory and Statistics Linear spaces Probability theory vs Statistics Deﬁnition (Probability Theory) Deﬁnition (Statistics) A branch of mathematics The science of collecting, concerned with the analysis of analyzing, presenting, and random phenomena. interpreting data. General ⇒ Speciﬁc Speciﬁc ⇒ General Statistical Machine learning is closely related to (inferential) statistics. Many state-of-the-art learning algorithms are based on concepts from probability theory. M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 16. General remarks about learning Probability Theory and Statistics Linear spaces Probabilities Deﬁnition (Probability Space) A probability space is the triple (Ω, F, P) where Ω is a set of events ω F is a collection of events (e.g. the power-set P(Ω)) P is a measure that satisﬁes the probability axioms. M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 17. General remarks about learning Probability Theory and Statistics Linear spaces Axioms of Probability 1 For any A ∈ F, there exists a number P(A), the probability of A, satisﬁtying P(A) ≥ 0. 2 P(Ω) = 1. 3 Let {An , n ≥ 1} be a collection of pairwise disjoint events, and let A be their union. Then ∞ P(A) = P(An ). n=1 M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 18. General remarks about learning Probability Theory and Statistics Linear spaces Independence Deﬁnition (Independence) Two events, A and B, are independent iﬀ the probability of their intersection equals the product of the individual probabilities, i.e. P(A ∩ B) = P(A) · P(B). Deﬁnition (Conditional probability) Given two events A and B, with P(B) 0, we deﬁne the conditional probability for A given B, P(A|B), by the relation P(A ∩ B) P(A|B) = . P(B) M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 19. General remarks about learning Probability Theory and Statistics Linear spaces Random Variables A single event is not that interesting. Deﬁnition (Random Variable) A random variable X is a function from the probability space to a vector of real numbers X : Ω → Rn . Random variables are characterized by their distribution function F : Deﬁnition (Probability Distribution Function) Let X : Ω → R be a random variable. We deﬁne FX (x) = P(X ≤ x) − ∞ x ∞. M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 20. General remarks about learning Probability Theory and Statistics Linear spaces Probability density function Deﬁnition (Probability density function) The density function, is the function fX , with the property x FX (x) = fX (y ) dy , −∞ x ∞. −∞ M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 21. General remarks about learning Probability Theory and Statistics Linear spaces Convergence Deﬁnition (Convergence in Probability) Let X1 , X2 , . . . be random variables. We say that Xn converges in probability to the random variable X as n → ∞, iﬀ, for all ε 0, P(|Xn − X | ε) → 0, as n → ∞. p We write Xn − X as n → ∞. → M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 22. General remarks about learning Probability Theory and Statistics Linear spaces Weak law of large numbers Theorem (Bernoulli’s Theorems (Weak law of large numbers)) Let X1 , . . . , Xn be a sequence of independent and identically distributed (i.i.d.) random variables, each having mean µ and standard deviation σ. Then P[|(X1 + . . . + Xn )/n − µ| ε] → 0 as n → ∞. Thus given enough observations xi ∼ FX , the sample mean x = n n xi will approach the true mean µ. 1 i=1 M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 23. General remarks about learning Probability Theory and Statistics Linear spaces Expectation Deﬁnition (Expectation) Let X be a random variable with probability density function fX , and g : R → R a function. We deﬁne the expectation ∞ E [g (X )] := g (x)fX (x) dx. −∞ Deﬁnition (Sample mean) Let a sample x = {x1 , x2 , . . . , xn } be given. We deﬁne the (sample) mean to be n 1 x= xi . n i=1 M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 24. General remarks about learning Probability Theory and Statistics Linear spaces Variance Deﬁnition (Variance) Let X be a random variable with density funciton fX . The variance is given by Var[X ] = E [(X − E [X ])2 ] = E [X 2 ] − (E [X ])2 . The square root Var[X ] of the variance is referred to as the standard deviation. Deﬁnition (Sample Variance) Let the sample x = {x1 , x2 , . . . , xn } with sample mean x be given. We deﬁne the sample variance to be 1 s2 = (xi − x)2 . n−1 M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 25. General remarks about learning Probability Theory and Statistics Linear spaces Notation Assume F has a probability density function: dF (x) f (x) = dx Formally, we write: f (x) dx = dF (x) Example: Expectation ∞ ∞ E [g (X )] := g (x)f (x) dx. = g (x)dF (x) −∞ −∞ M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 26. General remarks about learning Probability Theory and Statistics Linear spaces Outline 1 General remarks about learning 2 Probability Theory and Statistics 3 Linear spaces M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 27. General remarks about learning Probability Theory and Statistics Linear spaces Vector Space A set V together with two binary operations 1 vector addition + : V × V → V and 2 scalar multiplication · : R × V → V is called a vector space over R, if it satisﬁes the following axioms: 1 ∀x, y ∈ V : x + y = y + x (commutativity) 2 ∀x, y ∈ V : x + (y + z) = (x + y ) + z (associativity) 3 ∃0 ∈ V , ∀x ∈ V : 0 + x = x (identity of vector addition) 4 ∃1 ∈ V , ∀x ∈ V : 1 · x = x (identity of vector multiplication) 5 ∀x ∈ V : ∃x ∈ V : x + (−x) = 0 (additive inverse element) 6 ∀α ∈ R, ∀x, y ∈ V : α · (x + y ) = α · x + α · y (distributivity) 7 ∀α, β ∈ R, ∀x ∈ V : (α + β) · x = α · x + β · x (distributivity) 8 ∀α, β ∈ R, ∀x ∈ V : α(β · x) = (αβ) · x M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 28. General remarks about learning Probability Theory and Statistics Linear spaces Vector Space More importantly for us, the deﬁnition implies: x + y ∈ V, ∀x, y ∈ V αx ∈ V , ∀α ∈ R, ∀x ∈ V Subspace criterion Let V be a vector space over R, and let W be a subset of V . Then W is a subspace if and only if it satisﬁes the following 3 conditions: 1 0∈W 2 If x, y ∈ W then x + y ∈ W 3 If x ∈ W and α ∈ R then αx ∈ W M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 29. General remarks about learning Probability Theory and Statistics Linear spaces Normed spaces Deﬁnition (Normed vector space) A normed vector space is a pair (V , · ) where V is a vector space and · is the associated norm, satisfying the following properties for all u, v ∈ V : 1 v ≥ 0 (positivity) 2 u + v ≤ u + v (triangle inequality) 3 αv = |α| v (positive scalability) 4 v = 0 ⇔ v = 0 (positive deﬁniteness) M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 30. General remarks about learning Probability Theory and Statistics Linear spaces Deﬁnition (Inner product space) An real inner product space is a pair (V , ·, · ), where V is a real vector space and ·, · the associated inner product, satisfying the following properties for all u, v , ∈ V 1 u, v = v , u (symmetry) 2 αu, v = α u, v , u, αv = α u, v and u + v , w = u, w + v , w , u, v + w = u, v + u, w , (bilinearity) 3 u, u ≥ 0 (positive deﬁniteness) Deﬁnition (Strict inner product space) A inner product space is called strict if u, u = 0 ⇔ u = 0 M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 31. General remarks about learning Probability Theory and Statistics Linear spaces Inner product space The strict inner product induces a norm: f 2 = f ,f . is used to deﬁne distances and angles between elements. Theorem (Cauchy Schwarz inequality) For all vectors u and v of a real inner product space (V , ·, · ), the following inequality holds: | u, v | ≤ u v . M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
- 32. General remarks about learning Probability Theory and Statistics Linear spaces If you’re not comfortable with any of the presented material, you should take your favourite textbook and read it up within the next two weeks. M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher

No public clipboards found for this slide

Be the first to comment