Upcoming SlideShare
×

# Basic Elements of Probability Theory

3,436 views
3,135 views

Published on

Basic Elements of Probability Theory

Published in: Education, Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
3,436
On SlideShare
0
From Embeds
0
Number of Embeds
38
Actions
Shares
0
0
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Basic Elements of Probability Theory

1. 1. Basic elements of probability theory This document is a condensed version of three Wikipedia articles on basic probability theory, namely Probability, Mutually exclusive events and Independence. It aims to give a brief introduction on the topic. For the links to the original articles, please check the references at the end of each section. Contents 1 Probability 1.1 Interpretations . . . . . . . . . . . 1.2 Theory . . . . . . . . . . . . . . . . 1.3 Applications . . . . . . . . . . . . 1.4 Mathematical treatment . . . . . 1.4.1 Independent probability 1.4.2 Conditional probability . 1.4.3 Summary of probabilities 1.5 Reference . . . . . . . . . . . . . . . . . . . . . . 2 2 3 3 4 4 5 5 6 2 Mutually exclusive events 2.1 Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6 6 7 3 Independence 3.1 Deﬁnition for two events . . . . . . . 3.2 Deﬁnition for more than two events . 3.3 Conditional independence . . . . . . . 3.4 Reference . . . . . . . . . . . . . . . . . 7 8 8 8 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2. 2. 1 Probability Probability is a measure or estimation of how likely it is that something will happen or that a statement is true. Probabilities are given a value between 0 (0% chance, or will not happen) and 1 (100% chance, or will certainly happen). The higher the degree of probability, the more likely the event is to happen, or, in a longer series of samples, the greater the number of times such event is expected to happen. These concepts have been given an axiomatic mathematical derivation in probability theory, which is used widely in such areas of study as mathematics, statistics, ﬁnance, gambling, science, artiﬁcial intelligence/machine learning and philosophy to, for example, draw inferences about the expected frequency of events. Probability theory is also used to describe the underlying mechanics and regularities of complex systems. 1.1 Interpretations When dealing with experiments that are random and well-deﬁned in a purely theoretical setting (like tossing a fair coin), probabilities describe the statistical number of outcomes considered, divided by the number of all outcomes (tossing a fair coin twice will yield head1 head with probability , because the four outcomes head-head, head-tails, tails-head and 4 tails-tails are equally likely to occur). When it comes to practical application, however, the word probability does not have a singular direct deﬁnition. In fact, there are two major categories of probability interpretations, whose adherents possess conﬂicting views about the fundamental nature of probability: objectivists and subjectivists. Objectivists Objectivists assign numbers to describe some objective or physical state of affairs. The most popular version of objective probability is frequentist probability, which claims that the probability of a random event denotes the relative frequency of occurrence of an experiment’s outcome, when repeating the experiment. This interpretation considers probability to be the relative frequency “in the long run” of outcomes. A modiﬁcation of this is propensity probability, which interprets probability as the tendency of some experiment to yield a certain outcome, even if it is performed only once. Subjectivists Subjectivists assign numbers per subjective probability, i.e., as a degree of belief. The degree of belief has been interpreted as, “the price at which you would buy or sell a bet that pays 1 unit of utility if E, 0 if not E.” The most popular version of subjective probability is Bayesian probability, which includes expert knowledge as well as experimental data to produce probabilities. The expert knowledge is represented by some (subjective) prior probability distribution. The data is incorporated in a likelihood function. The product of the prior and the likelihood, normalized, results in a posterior probability distribution that incorporates all the information known to date. Starting from arbitrary, subjective probabilities for a group of agents, some Bayesians claim that all agents will 2
3. 3. eventually have sufﬁciently similar assessments of probabilities, given enough evidence. 1.2 Theory Like other theories, the theory of probability is a representation of probabilistic concepts in formal terms—that is, in terms that can be considered separately from their meaning. These formal terms are manipulated by the rules of mathematics and logic, and any results are interpreted or translated back into the problem domain. There have been at least two successful attempts to formalize probability, namely the Kolmogorov formulation and the Cox formulation. In Kolmogorov’s formulation (see probability space), sets are interpreted as events and probability itself as a measure on a class of sets. In Cox’s theorem, probability is taken as a primitive (that is, not further analyzed) and the emphasis is on constructing a consistent assignment of probability values to propositions. In both cases, the laws of probability are the same, except for technical details. There are other methods for quantifying uncertainty, such as the Dempster-Shafer theory or possibility theory, but those are essentially different and not compatible with the laws of probability as usually understood. 1.3 Applications Probability theory is applied in everyday life in risk assessment and in trade on ﬁnancial markets. Governments apply probabilistic methods in environmental regulation, where it is called pathway analysis. A good example is the effect of the perceived probability of any widespread Middle East conﬂict on oil prices—which have ripple effects in the economy as a whole. An assessment by a commodity trader that a war is more likely vs. less likely sends prices up or down, and signals other traders of that opinion. Accordingly, the probabilities are neither assessed independently nor necessarily very rationally. The theory of behavioral ﬁnance emerged to describe the effect of such group think on pricing, on policy, and on peace and conﬂict. The discovery of rigorous methods to assess and combine probability assessments has changed society. It is important for most citizens to understand how probability assessments are made, and how they contribute to decisions. Another signiﬁcant application of probability theory in everyday life is reliability. Many consumer products, such as automobiles and consumer electronics, use reliability theory in product design to reduce the probability of failure. Failure probability may inﬂuence a manufacture’s decisions on a product’s warranty. The cache language model and other statistical language models that are used in natural language processing are also examples of applications of probability theory. 3
4. 4. 1.4 Mathematical treatment Consider an experiment that can produce a number of results. The collection of all results is called the sample space of the experiment. The power set of the sample space is formed by considering all different collections of possible results. For example, rolling a die can produce six possible results. One collection of possible results gives an odd number on the die. Thus, the subset {1,3,5} is an element of the power set of the sample space of die rolls. These collections are called “events.” In this case, {1,3,5} is the event that the die falls on some odd number. If the results that actually occur fall in a given event, the event is said to have occurred. A probability is a way of assigning every event a value between zero and one, with the requirement that the event made up of all possible results (in our example, the event {1,2,3,4,5,6}) is assigned a value of one. To qualify as a probability, the assignment of values must satisfy the requirement that if you look at a collection of mutually exclusive events (events with no common results, e.g., the events {1,6}, {3}, and {2,4} are all mutually exclusive), the probability that at least one of the events will occur is given by the sum of the probabilities of all the individual events. The probability of an event A is written as P(A), p(A) or Pr(A). This mathematical deﬁnition of probability can extend to inﬁnite sample spaces, and even uncountable sample spaces, using the concept of a measure. The opposite or complement of an event A is the event [not A] (that is, the event of A not occurring); its probability is given by P(notA) = 1 − P(A). As an example, the chance of 1 5 not rolling a six on a six-sided die is 1 – (chance of rolling a six) = 1 − = . 6 6 If two events A and B occur on a single performance of an experiment, this is called the intersection or joint probability of A and B, denoted as P(A ∩ B). 1.4.1 Independent probability If two events, A and B are independent then the joint probability is P(A and B) = P(A ∩ B) = P(A)P(B), for example, if two coins are ﬂipped the chance of both being heads is 1 1 1 × = . 2 2 4 Mutually exclusive If either event A or event B or both events occur on a single performance of an experiment this is called the union of the events A and B denoted as P(A∪ B). If two events are mutually exclusive then the probability of either occurring is P(A or B) = P(A ∪ B) = P(A) + P(B). For example, the chance of rolling a 1 or 2 on a six-sided die is 4
5. 5. P(1 or 2) = P(1) + P(2) = 1 1 1 + = . 6 6 3 Not mutually exclusive If the events are not mutually exclusive then P (A or B) = P (A) + P (B) − P (A and B) . For example, when drawing a single card at random from a regular deck of cards, the 13 12 3 11 chance of getting a heart or a face card (J,Q,K) (or one that is both) is + − = , 52 52 52 26 because of the 52 cards of a deck 13 are hearts, 12 are face cards, and 3 are both: here the possibilities included in the “3 that are both” are included in each of the “13 hearts” and the “12 face cards” but should only be counted once. 1.4.2 Conditional probability Conditional probability is the probability of some event A, given the occurrence of some other event B. Conditional probability is written P(A | B), and is read “the probability of A, given B”. It is deﬁned by P(A ∩ B) P(A | B) = . P(B) If P(B) = 0 then P(A | B) is formally undeﬁned by this expression. However, it is possible to deﬁne a conditional probability for some zero-probability events using a σ-algebra of such events (such as those arising from a continuous random variable). For example, in a bag of 2 red balls and 2 blue balls (4 balls in total), the probability of 1 taking a red ball is ; however, when taking a second ball, the probability of it being either 2 a red ball or a blue ball depends on the ball previously taken, such as, if a red ball was 1 taken, the probability of picking a red ball again would be since only 1 red and 2 blue 3 balls would have been remaining. 1.4.3 Summary of probabilities Event A not A A or B A and B A given B Probability P(A) ∈ [0, 1] P(Ac ) = 1 − P(A) P(A ∪ B) = P(A) + P(B) − P(A ∩ B) P(A ∪ B) = P(A) + P(B) {if A and B are mutually exclusive} P(A ∩ B) = P(A|B)P(B) = P(B|A)P(A) P(A ∩ B) = P(A)P(B) {if A and B are independent} P(A ∩ B) P(B|A)P(A) P(A | B) = = P(B) P(B) 5
6. 6. 1.5 Reference This section is based on http://en.wikipedia.org/wiki/Probability. 2 Mutually exclusive events Two events are mutually exclusive if they cannot occur at the same time. An example is tossing a coin once, which can result in either heads or tails, but not both. In the coin-tossing example, both outcomes are collectively exhaustive, which means that at least one of the outcomes must happen, so these two possibilities together exhaust all the possibilities. However, not all mutually exclusive events are collectively exhaustive. For example, the outcomes 1 and 4 of a single roll of a six-sided die are mutually exclusive (cannot both happen) but not collectively exhaustive (there are other possible outcomes; 2,3,5,6). 2.1 Logic In logic, two mutually exclusive propositions are propositions that logically cannot be true at the same time. Another term for mutually exclusive is “disjoint”. To say that more than two propositions are mutually exclusive, depending on context, means that one cannot be true if the other one is true, or at least one of them cannot be true. The term pairwise mutually exclusive always means two of them cannot be true simultaneously. 2.2 Probability In probability theory, events E1, E2, . . . , En are said to be mutually exclusive if the occurrence of any one of them implies the non-occurrence of the remaining n − 1 events. Therefore, two mutually exclusive events cannot both occur. Formally said, the intersection of each two of them is empty (the null event): A ∩ B = . In consequence, mutually exclusive events have the property: P(A ∩ B) = 0. For example, one cannot draw a card that is both red and a club because clubs are always black. If one draws just one card from the deck, either a red card (heart or diamond) or a black card (club or spade) can be drawn. When A and B are mutually exclusive, P(A ∪ B) = P(A) + P(B). One might ask, “What is the probability of drawing a red card or a club?” This problem would be solved by adding together the probability of drawing a red card and the probability of drawing a club. In a standard 52-card deck, there are 26 13 39 3 twenty-six red cards and thirteen clubs: + = or . 52 52 52 4 One would have to draw at least two cards in order to draw both a red card and a club. The probability of doing so in two draws would depend on whether the ﬁrst card drawn were replaced before the second drawing, since without replacement there would be one fewer card after the ﬁrst card was drawn. The probabilities of the individual events (red, 6
7. 7. and club) would be multiplied rather than added. The probability of drawing a red and 26 13 338 13 a club in two drawings without replacement would be × = , or . With 52 51 2652 102 26 13 338 13 replacement, the probability would be × = , or . 52 52 2704 104 In probability theory the word “or” allows for the possibility of both events happening. The probability of one or both events occurring is denoted P(A ∪ B) and in general it equals P(A) + P(B) − P(A ∩ B). Therefore, if one asks, “What is the probability of drawing a red card or a king?”, drawing any of a red king, a red non-king, or a black king is considered a success. In a standard 52-card deck, there are twenty-six red cards and four kings, two 26 4 2 28 of which are red, so the probability of drawing a red or a king is + − = . 52 52 52 52 However, with mutually exclusive events the last term in the formula, – P(A ∩ B), is zero, so the formula simpliﬁes to the one given in the previous paragraph. Events are collectively exhaustive if all the possibilities for outcomes are exhausted by those possible events, so at least one of those outcomes must occur. The probability that at least one of the events will occur is equal to 1. For example, there are theoretically only two possibilities for ﬂipping a coin. Flipping a head and ﬂipping a tail are collectively exhaustive events, and there is a probability of 1 of ﬂipping either a head or a tail. Events can be both mutually exclusive and collectively exhaustive. In the case of ﬂipping a coin, ﬂipping a head and ﬂipping a tail are also mutually exclusive events. Both outcomes cannot occur for a single trial (i.e., when a coin is ﬂipped only once). The probability of ﬂipping a head and the probability of ﬂipping a tail can be added to yield a probability 1 1 of 1: + = 1. 2 2 2.3 Reference This section is based on http://en.wikipedia.org/wiki/Mutually_exclusive_ events. 3 Independence In probability theory, to say that two events are independent (alternatively called statistically independent or stochastically independent) means that the occurrence of one does not affect the probability of the other. Similarly, two random variables are independent if the realization of one does not affect the probability distribution of the other. The concept of independence extends to dealing with collections of more than two events or random variables. 7
8. 8. 3.1 Deﬁnition for two events Two events A and B are independent if and only if their joint probability equals the product of their probabilities: P(A ∩ B) = P(A)P(B) Why this deﬁnes independence is made clear by rewriting with conditional probabilities: P(A ∩ B) P(B) ⇔ P(A) = P(A | B) P(A ∩ B) = P(A)P(B) ⇔ P(A) = and similarly P(A ∩ B) = P(A)P(B) ⇔ P(B) = P(B | A). Thus, the occurrence of B does not affect the probability of A, and vice versa. Although the derived expressions may seem more intuitive, they are not the preferred deﬁnition, as the conditional probabilities may be undeﬁned if P(A) or P(B) are 0. Furthermore, the preferred deﬁnition makes clear by symmetry that when A is independent of B, B is also independent of A. 3.2 Deﬁnition for more than two events A ﬁnite set of events {Ai } is pairwise independent if and only if every pair of events is independent. That is, if and only if for all distinct pairs of indices m, n P(Am ∩ An ) = P(Am )P(An ). A ﬁnite set of events is mutually independent if and only if every event is independent of any intersection of the other events. That is, if and only if for every subset {An} n n Ai = P i=1 P(Ai ). i=1 This is called the multiplication rule for independent events. For more than two events, a mutually independent set of events is (by deﬁnition) pairwise independent, but the converse is not necessarily true. 3.3 Conditional independence Intuitively, two random variables X and Y are conditionally independent given Z if, once Z is known, the value of Y does not add any additional information about X . For instance, 8
9. 9. two measurements X and Y of the same underlying quantity Z are not independent, but they are conditionally independent given Z (unless the errors in the two measurements are somehow connected). The formal deﬁnition of conditional independence is based on the idea of conditional distributions. If X , Y , and Z are discrete random variables, then we deﬁne X and Y to be conditionally independent given Z if P(X ≤ x, Y ≤ y|Z = z) = P(X ≤ x|Z = z) · P(Y ≤ y|Z = z) for all x, y and z such that P(Z = z) > 0. On the other hand, if the random variables are continuous and have a joint probability density function p, then X and Y are conditionally independent given Z if pX Y |Z (x, y|z) = pX |Z (x|z) · pY |Z ( y|z) for all real numbers x, y and z such that p Z (z) > 0. If X and Y are conditionally independent given Z, then P(X = x, Y = y, Z = z) = P(X = x | Z = z) for any x, y and z with P(Z = z) > 0. That is, the conditional distribution for X given Y and Z is the same as that given Z alone. A similar equation holds for the conditional probability density functions in the continuous case. Independence can be seen as a special kind of conditional independence, since probability can be seen as a kind of conditional probability given no events. 3.4 Reference This section is based (probability_theory). on http://en.wikipedia.org/wiki/Independence_ 9