The document describes various probability distributions that can arise from combining Bernoulli random variables. It shows how a binomial distribution emerges from summing Bernoulli random variables, and how Poisson, normal, chi-squared, exponential, gamma, and inverse gamma distributions can approximate the binomial as the number of Bernoulli trials increases. Code examples in R are provided to simulate sampling from these distributions and compare the simulated distributions to their theoretical probability density functions.
The document discusses distances between data and similarity measures in data analysis. It introduces the concept of distance between data as a quantitative measure of how different two data points are, with smaller distances indicating greater similarity. Distances are useful for tasks like clustering data, detecting anomalies, data recognition, and measuring approximation errors. The most common distance measure, Euclidean distance, is explained for vectors of any dimension using the concept of norm from geometry. Caution is advised when calculating distances between data with differing scales.
The document describes various probability distributions that can arise from combining Bernoulli random variables. It shows how a binomial distribution emerges from summing Bernoulli random variables, and how Poisson, normal, chi-squared, exponential, gamma, and inverse gamma distributions can approximate the binomial as the number of Bernoulli trials increases. Code examples in R are provided to simulate sampling from these distributions and compare the simulated distributions to their theoretical probability density functions.
The document discusses distances between data and similarity measures in data analysis. It introduces the concept of distance between data as a quantitative measure of how different two data points are, with smaller distances indicating greater similarity. Distances are useful for tasks like clustering data, detecting anomalies, data recognition, and measuring approximation errors. The most common distance measure, Euclidean distance, is explained for vectors of any dimension using the concept of norm from geometry. Caution is advised when calculating distances between data with differing scales.