2. http://publicationslist.org/juniohttp://publicationslist.org/junio
What is it about?
Some systems are too difficult, if not impossible, to model
In these circumstances, a better approach is to match the
system with a known probability model
Instead of modeling how a system works, we model how its
outcomes behave
There are many such models, some of them, though, are most
commonly observed and treatable with moderately advanced
algebraic tools
3. http://publicationslist.org/juniohttp://publicationslist.org/junio
Binomial distribution (Bernoulli trials)
Bernoulli trials refer to random events with only two possible
outcomes, usually named success and failure
Mutually exclusive and independent, these outcomes happen
with probabilities p and 1-p
Examples:
Coin: success and failure with probability ½
Dice: success for one of its faces has probability 1/6 against 5/6
A basket with r red and b black balls: success for red is r/(r+b)
Two coins: success for two heads is ¼
The nature of the Bernoulli trial lends itself to a simple model
6. http://publicationslist.org/juniohttp://publicationslist.org/junio
Binomial distribution (Bernoulli trials)
For N trials, the probability of k successive successes, each
with probability p, is given by the Binomial distribution:
where: is the number of distinct
arrangements for k success and N-k failures
The Bernoulli distribution tells two things:
Too many successes is not probable; and
Too few successes is not probable either
That is, if you are taking the risk in a binary event, the
more you try, the more you fail and also the more
you succeed
8. http://publicationslist.org/juniohttp://publicationslist.org/junio
Binomial distribution (Bernoulli trials)
Example: suppose we try to develop a model to predict the staffing
required for a call center. We know that about one in every thousand
orders will lead to a complaint (hence p = 1/1000) and that we take
about Np=1000 complaints a day, as 1 million orders are shipped
every day.
The standard deviation in this example comes out to be
√Np(1 − p) ≈√1000 ≈ 30, as 1 − p is very close to 1 for the current
value of p. This deviation is quite acceptable for an expected value of
1000.
For this simple example, the required staff is determined according to
the number of complaints an employee can attend per day, considering
Np complaints a day.
12. http://publicationslist.org/juniohttp://publicationslist.org/junio
Gaussian distribution
Gaussians are so useful because:
Great part of the events will occur in the range [mean – sd, mean +sd], what
simplifies probability expectations – outliers are not expected
Identifying a Gaussian distribution leads to a simpler, though rigorous,
comprehension of the phenomenon without having the system under deep
investigation
For Gaussian distributions, basic statistic summaries mean and standard
deviation are applicable
It is simpler to perform calculi over the Gaussian distribution, especially
integrals, for this reason, it is often used as a Kernel
Gaussians are not useful because:
It predicts the absence of outliers, what is not the case for real situations
There are many phenomena that are not Gaussian, cases when mean and
standard deviation are misleading
14. http://publicationslist.org/juniohttp://publicationslist.org/junio
Power-law distribution
In the plot two facts stand out:
the huge number of people who made a handful of visits (fewer than 5 or 6)
at the other extreme, the huge number of visits that a few people made
This kind of distribution is mostly composed of outliers, its mean is 26
visits per person, which makes no sense for the observed data; the
standard deviation, 437, makes even less sense as it predicts negative
numbers of visits
Contrasting to Gaussian distributions with their quickly falling short
tails; power-law distributions are characterized by “heavy (fat, long)
tails”
Such distributions can be identified by a log-log plot that defines a line
whose slope is the power of the distribution function
16. http://publicationslist.org/juniohttp://publicationslist.org/junio
Power-law distribution
Well-known power-law distributions:
the frequency with which words are used in texts
the magnitude of earthquakes
the size of files
the copies of books sold
the intensity of wars
the sizes of sand particles and solar flares
the population of cities
and the distribution of wealth
Challenges imposed by the distribution:
Observations span a wide range of values, often many orders of magnitude
There is no typical scale or value that could be used for summarization
The distribution is extremely skewed, with many data points at the low end and
few (but not negligibly few) data points at very high values
Expectation values often depend on the sample size, and degenerates as more
values are considered in contrast to other distributions
17. http://publicationslist.org/juniohttp://publicationslist.org/junio
Power-law distribution
How to work with power-law distributions?
Do not use classical methods, especially mean and standard deviation
Segment the data
The majority of data points at small values
The set of points in the tail
The intermediate points
For each segment, try to use classical methods
Go into the problem domain so to explain the behavior of each segment