This document provides an overview of key concepts in probability and probability distributions. It introduces random variables and their probability distributions, and covers discrete and continuous random variables. Specific probability distributions discussed include the binomial, Poisson, and normal distributions. Expected value and variance are defined as measures of the central tendency and variability of random variables. Examples are provided to illustrate calculating probabilities and parameters for different probability distributions.
If you want to leave a good impression with your paper, this site will give you some quantitative research proposal topics http://www.phdresearchproposal.org/quantitative-research-proposal-topics/
If you want to leave a good impression with your paper, this site will give you some quantitative research proposal topics http://www.phdresearchproposal.org/quantitative-research-proposal-topics/
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 5: Discrete Probability Distribution
5.1: Probability Distribution
Determining the Mean, Variance, and Standard Deviation of a Discrete Random Variable
Visit the website for more services: https://cristinamontenegro92.wixsite.com/onevs
One of the central tasks in computational mathematics and statistics is to accurately approximate unknown target functions. This is typically done with the help of data — samples of the unknown functions. The emergence of Big Data presents both opportunities and challenges. On one hand, big data introduces more information about the unknowns and, in principle, allows us to create more accurate models. On the other hand, data storage and processing become highly challenging. In this talk, we present a set of sequential algorithms for function approximation in high dimensions with large data sets. The algorithms are of iterative nature and involve only vector operations. They use one data sample at each step and can handle dynamic/stream data. We present both the numerical algorithms, which are easy to implement, as well as rigorous analysis for their theoretical foundation.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
2. • Introduction to Random Variables (Ch 10.1)
• Discrete (CH 10.2)
• continuous
• Expected value and Variance (Ch 10.3)
• Binomial (Ch 10.5)
• Poisson (Ch 10.6)
• Normal (Ch 11.2)
• Exponential Distributions (Ch 11.4)
3. • A random variable is a numerical description of the outcome of an
experiment.
• The probability distribution for a random variable describes how
probabilities are distributed over the values of the random variable.
• table, graph, or formula
The required conditions for a discrete probability
function f(x) are:
f(x) > 0
f(x) = 1
4. Discrete random variable
Experiment Random Variable
(x)
Possible values for
random variable
Contact five
customers
No: of customers
who place an order
0,1,2,3,4,5
Inspect shipment of
50 radios
No: of defective
radios
0,1,2,3…..49,50
Operate restaurant for
one day
No: of customers 0,1,2,3……..
Sell an automobile Gender of customer 0 for male , 1 for
female
5. Probability distribution for number obtained on a roll of
dice
No obtained Probability of x = f(x)
1 1/6 = 0.1667
2 1/6 = 0.1667
3 1/6 = 0.1667
4 1/6 = 0.1667
5 1/6 = 0.1667
6 1/6 = 0.1667
∑ f(x) = 1
6. Continuous random variable
Experiment Random Variable (x) Possible values for
random variable
Operate a bank Time between customer
arrivals in minutes
X≥0
Fill a soft drink can ( max
=12.1 ounces)
No: of ounces 0≤x≤12.1
Construct a new library % of project complete
after 6 months
0≤x≤100
Test a new chemical
process
Temp observed when
reaction takes place (150-
212 F )
150≤x≤212
7. Number
Units Sold of Days
0 80
1 50
2 40
3 10
4 20
200
Discrete Probability Distributions
x f(x)
0 .40
1 .25
2 .20
3 .05
4 .10
1.00
a tabular representation of the probability distribution for automobile
sales was developed.
8. .10
.20
.30
.40
.50
0 1 2 3 4
Values of Random Variable x (TV sales)
Probability
Discrete Probability Distributions
9. Problem 7 ( Pg 326)
• The probability distribution for the random variable x follows
a) Is this probability distribution valid? Explain
b) What is the probability that x =30?
c) What is the probability that x≤ 25?
d) What is probability that x> 30?
X f
20 100
25 75
30 125
35 200
10. Solution
a. f (x) 0 for all values of x.
f (x) = 1 Therefore, it is a proper probability distribution.
b. Probability x = 30 is f (30) = .25
c. Probability x 25 is f (20) + f (25) = .20 + .15 = .35
d. Probability x > 30 is f (35) = .40
Practice Q 8-14 pg 326-328
X f f(x)
20 100 0.20
25 75 0.15
30 125 0.25
35 200 0.40
11. Expected value and variance
• The expected value is a weighted
average of the values the random
variable may assume. The weights are
the probabilities.
• The expected value does not have to be
a value the random variable can assume.
• The expected value, or mean, of a
random variable is a measure of its
central location.
• E(x) = = xf(x)
• The variance is a weighted average of
the squared deviations of a random
variable from its mean. The weights
are the probabilities.
• The variance summarizes the
variability in the values of a random
variable.
• Var(x) = 2 = (x - )2f(x)
• = √Var
12. x Frequency f(x) E(x) =
x*f(x)
𝒙 − 𝟐
𝝈𝟐
= 𝒙 −
0 54
1 117
2 72
3 42
4 12
5 3
∑f(x)
No: of cakes sold in a bakery =1.5
18. Binomial Distribution
• Applied to single variable discrete data where results are the numbers of
“successful outcomes” in a given scenario.
• Statistically independent – one does not affect the other.
• Knowledge of outcomes either success or failure
• q+p=1
• e.g.:
• no. of times the lights are red in 20 sets of traffic lights,
• no. of students with green eyes in a class of 40
• no. of plants with diseased leaves from a sample of 50 plants
19. Properties of a Binomial Experiment
• The experiment consists of a sequence of n identical trials.
• Two outcomes, success and failure, are possible on each trial.
• The probability of a success, denoted by p, does not change from trial to
trial.
• The trials are independent
20. Binomial Probability Distribution
𝑛
𝐶𝑟𝑝𝑥
𝑞𝑛−𝑥
• We let x denote the number of successes occurring in the n trials.
𝑓 𝑥 =
𝑛!
𝑥! 𝑛 − 𝑥 !
𝑝𝑥𝑞𝑛−𝑥
where:
x = the number of successes
p = the probability of a success on one trial
n = the number of trials
f(x) = the probability of x successes in n trials
n! = n(n – 1)(n – 2) ….. (2)(1)
𝑁𝑜𝑡𝑒 𝑛
𝐶0=1 ;
𝑛
𝐶1=n;
𝑛
𝐶𝑟 𝑤ℎ𝑒𝑟𝑒
9
𝐶9 =1 ; 𝑝0
= 1
24. Sample problem - Binomial
• Evans Electronics is concerned about a low retention
rate for its employees. In recent years, management has
seen a turnover of 10% of the hourly employees
annually. Thus, for any hourly employee chosen at
random, management estimates a probability of 0.1 that
the person will not be with the company next year.
Choosing 3 hourly employees at random, what is the
probability that 1 of them will leave the company this
year?
Let: p = .10, n = 3, x = 1
25. Solution
Let: p = .10, n = 3, x = 1
f x
n
x n x
p p
x n x
( )
!
!( )!
( )( )
=
1
1 2
3!
(1) (0.1) (0.9) 3(.1)(.81) .243
1!(3 1)!
f = = =
27. Experimental
Outcome
(S, F, F)
(F, S, F)
(F, F, S)
Probability of
Experimental Outcome
p(1 – p)(1 – p) = (.1)(.9)(.9) = .081
(1 – p)p(1 – p) = (.9)(.1)(.9) = .081
(1 – p)(1 – p)p = (.9)(.9)(.1) = .081
Total = .243
With a .10 probability of an employee leaving on any
one trial, the probability of an employee leaving on
the first trial and not on the second and third trials is
given by
(.10)(.90)(.90) = (.10)(.90)2 = .081
28. Binomial Probability Distribution – tree diagram
1st Worker 2nd Worker 3rd Worker x Prob.
Leaves
(.1)
Stays
(.9)
3
2
0
2
2
Leaves (.1)
Leaves (.1)
S (.9)
Stays (.9)
Stays (.9)
S (.9)
S (.9)
S (.9)
L (.1)
L (.1)
L (.1)
L (.1) .0010
.0090
.0090
.7290
.0090
1
1
.0810
.0810
.0810
1
30. Practice Problem 31 pg 350
• Consider a binomial experiment with two trials and
p=0.4
a) Draw a tree diagram for this experiment
b) Compute the probability of one success
c) Compute f(0)
d) Compute f(2)
e) Compute probability of at least one success
f) Compute Expected value, Variance, SD
31. Practice Problem 31 Solution
1 1
2 2!
(1) (.4) (.6) (.4)(.6) .48
1 1!1!
f
= = =
0 2
2 2!
(0) (.4) (.6) (1)(.36) .36
0 0!2!
f
= = =
2 0
2 2!
(2) (.4) (.6) (.16)(1) .16
2 2!0!
f
= = =
P(x 1) = f (1) + f (2) = .48 + .16 = .64
Or
P(x 1) = 1- f (0) = 1-0.36 = 0.64
E(x) = n p = 2 (.4) = .8
Var(x) = n p (1 - p) = 2 (.4) (.6) = .48
= .48 =0.6928
32. Problem 34 pg 351
• A Harris interactive survey for intercontinental hotels and resorts asked
respondents,” when travelling internationally, do you generally venture
out on your own to experience culture or stick with your tour group and
itineraries?” The survey found that 23% of the respondents stick with
their tour group
a) In a sample of six international travelers, what is the probability that
two will stick with their tour group?
b) If n=6,find probability that at least two will stick with the tour group
c) If n=10, find probability that none will stick with the group
35. Poisson
• A Poisson distributed random variable is often useful in
estimating the number of occurrences over a specified
interval of time or space.
• It is a discrete random variable that may assume an
infinite sequence of values (x = 0, 1, 2, . . . ).
• Eg : the number of vehicles arriving at a toll booth in
one hour
• Eg: the arrival of phone calls.
36. • Properties of a Poisson Experiment
• The probability of an occurrence is the same for any two
intervals of equal length.
• The occurrence or nonoccurrence in any interval is independent
of the occurrence or nonoccurrence in any other interval
• A property of the Poisson distribution is that the mean and
variance are equal.
• Since there is no stated upper limit for the number of
occurrences, the probability function f(x) is applicable for
values x = 0, 1, 2, … without limit.
• In practical applications, x will eventually become large
enough so that f(x) is approximately zero and the probability
of any larger values of x becomes negligible.
37. Poisson Probability Function
f x
e
x
x
( )
!
=
where:
x = the number of occurrences in an interval
f(x) = the probability of x occurrences in an interval
= mean number of occurrences in an interval
e = 2.71828
x! = x(x – 1)(x – 2) . . . (2)(1)
A property of the Poisson distribution is that
the mean and variance are equal.
= 2
e - Euler's number - compounding
factor less than annual - non-
terminating series
38. Sample problem – Poisson
• Patients arrive at the emergency room of Mercy Hospital at the average
rate of 6 per hour on weekend evenings. What is the probability of 4
arrivals in 30 minutes on a weekend evening?
40. Sample problem – Poisson
• Patients arrive at the emergency room of Mercy Hospital at the average
rate of 6 per hour on weekend evenings. What is the probability of 4
arrivals in 30 minutes on a weekend evening?
= 6/hour = 3/half-hour, x = 4
4 3
3 (2.71828)
(4) .1680
4!
f
= =
Variance for Number of Arrivals during 30-Minute Periods
= 2 = 3
41. Problem 44 – pg 355
• Consider a poisson distribution with =3.
• Write the appropriate Poisson probability function
• Compute f (2)
• Compute f (1)
• Compute P(x 2)
43. Poisson – problem 46 pg 355
• Phone calls arrive at the rate of 48 per hour at the reservation desk for
regional airways
a. Compute the probability of receiving three calls in a 5 minute interval
b. Compute the probability of receiving exactly 10 calls in 15 minutes
c. Suppose no calls are currently on hold. If the agent takes 5 minutes to
complete the current call, how many callers do you expect to be
waiting by that time? What is the probability that none will be waiting?
d. If no calls are currently being processed, what is the probability that
the agent can take 3 minutes for personal time without being
interrupted by a call?
45. Hypergeometric Probability Distribution
• The hypergeometric distribution is closely related to
the binomial distribution.
• However, for the hypergeometric distribution:
• the trials are not independent
• the probability of success changes from trial to trial.
47. Continuous Probability Distributions
• Uniform Probability Distribution
f (x)
x
Uniform
x
f (x)
Normal
x
f (x) Exponential
Normal Probability Distribution
Exponential Probability Distribution
Normal Approximation of Binomial Probabilities
48. Normal Distribution
• Applied to single variable continuous data e.g.
• heights of plants, weights of lambs, lengths of time
• Used to calculate the probability of occurrences less than, more than,
between given values e.g.
• “the probability that the plants will be less than 70mm”,
• “the probability that the lambs will be heavier than 70kg”,
• “the probability that the time taken will be between 10 and 12 minutes”
49. Normal distribution illustrating compatibility
of z values and standard deviation
• Aprox 68% of all values in normal
distributed population lie within + or –
1 SD from the mean
• Aprox 95% of all values in normal
distributed population lie within + or –
2 SD from the mean
• Aprox 99.7 % of all values in normal
distributed population lie within + or –
3 SD from the mean
50. Area under normal curve and z values
P(X<=40)
X= 40
Mean = μ =50
SD = σ = 25
-25 0 25 50 75 100
125
-3 -2 -1 0 1 2 3
Z= x- μ / σ
Z= number of standard
deviations from x to the mean of
this distribution
X= value of random variable
which is being tested
μ = mean of distribution of
random variable
σ = SD of given distribution
https://www.youtube.com/watch?v=mtbJbDwqWLE
Z= (40 - 50) / 25
51. Standard Normal Probability Distribution
z
x
=
We can think of z as a measure of the number of standard deviations x is from 𝜇.
To find the associated probability, find the table value of Z from normal table
To compute x value from a given probability Eg 20%
1 Convert the % to proportion – 0.20
2 Find the corresponding value of 0.20 from normal
table -0.8
3 Use formula 𝑥 = 𝜇 + 𝑧𝜎 to find x
52. Normal distribution calculation
Find table value of 1 in normal distribution table =0.8413
(P <= 6 ) = 0.8413
(P> 6 ) = 1- 0.8413 = 0.1587
X= 2 ; 𝑍 = −
2
2
= −1 ; 𝑃 𝑋 ≤ 2 = 0.1587
𝑃 2 ≤ 𝑥 ≤ 6 = |0.1587 − 0.8413| = 0.6826
53. Question 1
•Wool fibre breaking strengths are normally
distributed with mean m = 23.56 Newtons and
standard deviation, σ = 4.55. What proportion of
fibres would have a breaking strength of 14.45 or
less?
•X= 14.45 ; Mu = 23.56 ; SD = 4.55
54.
55. • From a new batch If 30.15% of wool fibre is found to be breaking
what could be the breaking strength in newtons
• 𝑥 = 𝜇 + 𝑧𝜎 ---------------------------------------------- (1)
• Where z is table value of 30% or 0.3015 = -0.5
• Sub in (1)
• 𝑥 = 23.56 + −0.5) ∗4.55 = 21.285 newtons is the breaking strength
in new batch
56. Q12 Pg 384
• Given that Z is a standard normal random variable, compute the
following probabilities
a) 𝑃 0 ≤ 𝑧 ≤ 0.83
b) 𝑃 −1.57 ≤ 𝑧 ≤ 0
c) 𝑃 𝑍 > 0.44
d) 𝑃 𝑧 ≥ −0.23
e) 𝑃 𝑧 < 1.20
f) 𝑃 𝑧 ≤ −0.71)
57. Q 12 solution
a. P(0 ≤ z ≤ .83) = .7967 - .5000 = .2967
b. P(-1.57 ≤ z ≤ 0) = .5000 - .0582 = .4418
c. P(z > .44) = 1 - .6700 = .3300
d. P(z ≥ -.23) = 1 - .4090 = .5910
e. P(z < 1.20) = .8849
f. P(z ≤ -.71) = .2389
58. Q19 Pg 385
• In an article about the cost of healthcare, money magazine reported that a
visit to a hospital emergency room for something as simple as a sore throat
has a mean cost of $328. Assume that the cost for this type of hospital
emergency room visit is normally distributed with standard deviation of
$92.Answer the following questions about the cost of hospital emergency
room visit for this medical service.
a. Probability that cost will be more than $500?
b. Probability that cost will be less than $250?
c. Probability that cost will be between $300 and $400?
d. If the cost to the patient is in lower 8% of charges for this medical service ,
what was the cost of this patents emergency room visit?