1
Lab 4 The Central Limit Theorem and A Monte Carlo Simulation
Experiment 1. The Central Limit Theorem
The Central Limit Theorem says that the sampling distribution of means, of samples of size n
from a population with a mean of and a standard deviation of , is approximately a normal
distribution with mean X and standard deviation
n
X
, if sample size 30n .
Please start R, then open a new script file File → New script and save it as Lab4_tutorial by
going to File → Save As and saving it to your M or One Drive.
Note: You will need all the graphs from this tutorial for the Lab Assignment at the end. Please
make sure you save them as you go.
We consider a population that has an exponential distribution with parameter 1.0
(therefore, 10 and 10 ). This distribution is very much skewed to the right. In this
experiment, we will demonstrate the Central Limit Theorem by showing that the sampling
distribution of sample means, of samples from this exponential population, approaches a
normal distribution with mean of 10X and standard deviation of
nn
X
10
, as
sample size n gets sufficiently large.
1. Generate samples from the population with an Exponential Distribution (= 0.1)
Simulate 100 random values from the Exponential Distribution (= 0.1) for each of 60
columns as follows:
#Start by defining a matrix of all zeros and specify the number of rows
#with nrow and number of columns with ncol.
#Label the columns using the dimnames function which takes a list
#list(rownames, columnnames)
samples<-matrix(0,nrow=100,ncol=60,
dimnames=list(NULL, paste("Sample", 1:60, sep=" ")))
#use a for loop to fill each of the columns of the matrix with a random
#sample of 100 values from a Exp(0.1) distn
for (i in 1:60){
samples[,i]<-rexp(100,0.1)
}
2
2. Explore the population distribution by examining the distribution of a random sample:
We can examine a distribution of data set by displaying its mean and standard deviation.
The following is a mean and standard deviation of a random sample of size 100 (Sample 1)
from the population with Exponential distribution ( = 0.1).
#Subset the first column from the matrix and call it sample1
sample1<-samples[,1]
#Find the mean and std dev of Sample 1 (column 1)
mean(sample1)
[1] 9.452421
sd(sample1)
[1] 9.369801
Notice that the sample mean and standard deviation are 9.452421 and 9.369801, while the
population mean and SD are 10.
To examine the shape of the distribution of a sample, please make a histogram of sample1
(or any one of the 60 samples of size 100). Change the title to “Histogram of Exp(0.1)
Sample 1”, and add a footnote “By Your Name”.
This histogram is very much right skewed. It resembles the exponential population distribution.
3
3. Examine the sampling distribution of sample means of samples of size 5:
The five numbers in the ...
1. 1
Lab 4 The Central Limit Theorem and A Monte Carlo
Simulation
Experiment 1. The Central Limit Theorem
The Central Limit Theorem says that the sampling distribution
of means, of samples of size n
n
X
Please start R, then open a new script file File → New script
and save it as Lab4_tutorial by
going to File → Save As and saving it to your M or One Drive.
Note: You will need all the graphs from this tutorial for the Lab
Assignment at the end. Please
make sure you save them as you go.
2. We consider a population that has an exponential distribution
skewed to the right. In this
experiment, we will demonstrate the Central Limit Theorem by
showing that the sampling
distribution of sample means, of samples from this exponential
population, approaches a
deviation of
nn
X
10
sample size n gets sufficiently large.
1. Generate samples from the population with an Exponential
Simulate 100 random values from the Exponential Distribution
columns as follows:
#Start by defining a matrix of all zeros and specify the number
3. of rows
#with nrow and number of columns with ncol.
#Label the columns using the dimnames function which takes a
list
#list(rownames, columnnames)
samples<-matrix(0,nrow=100,ncol=60,
dimnames=list(NULL, paste("Sample", 1:60, sep=" ")))
#use a for loop to fill each of the columns of the matrix with a
random
#sample of 100 values from a Exp(0.1) distn
for (i in 1:60){
samples[,i]<-rexp(100,0.1)
}
2
2. Explore the population distribution by examining the
distribution of a random sample:
4. We can examine a distribution of data set by displaying its
mean and standard deviation.
The following is a mean and standard deviation of a random
sample of size 100 (Sample 1)
#Subset the first column from the matrix and call it sample1
sample1<-samples[,1]
#Find the mean and std dev of Sample 1 (column 1)
mean(sample1)
[1] 9.452421
sd(sample1)
[1] 9.369801
Notice that the sample mean and standard deviation are
9.452421 and 9.369801, while the
population mean and SD are 10.
To examine the shape of the distribution of a sample, please
make a histogram of sample1
(or any one of the 60 samples of size 100). Change the title to
“Histogram of Exp(0.1)
Sample 1”, and add a footnote “By Your Name”.
5. This histogram is very much right skewed. It resembles the
exponential population distribution.
3
3. Examine the sampling distribution of sample means of
samples of size 5:
The five numbers in the first row of Sample 1 to Sample 5 form
a sample of size 5 from the
second row of Sample 1 to
Sample 5 form another sample of size 5 from this exponential
distribution. Because there
are 100 rows in each column, there are 100 random samples of
size 5; and therefore, there
are 100 corresponding sample means. We calculate the 100
means (to be stored in the
vector mean5) and then generate a histogram and descriptive
statistics as follows:
##First subset the first five columns
samples1to5<-samples[,1:5]
samples1to5
6. ##Calculate the row means by using the rowMeans() function.
##Remember, R is case sensitive and the 'M' in means must be
##capitalized for it to run.
mean5<-rowMeans(samples1to5)
Make a histogram for mean5 with a footnote of “By your name”.
Add a normal curve N(
X
10, StDev = 4.472136.
Where 10 and 4.472136 are the theoretical values of the mean
and standard deviation
of all possible sample means of samples of size 5 from the
exponential distribution with
##The graph editor draws the histogram first (which determines
the axes)
##so you may need to play around with the xlim and ylim (the x
and y axis
##limits respectively) to see the normal curve fully on your
graph.
hist(mean5,sub="By Your Name",xlim=c(-
5,30),ylim=c(0,0.1),prob=TRUE)
7. #the argument to dnorm in curve must always be called 'x' so
rename it
x<-mean5
curve(dnorm(x, mean=10, sd=4.472136), add=TRUE)
4
Produce the mean and standard deviation of mean5.
mean(mean5)
[1] 10.70335
sd(mean5)
[1] 5.145444
We observe that the histogram of mean5 does not fit the normal
curve, but it is much less
skewed than Sample 1 which represents the parent population
distribution.
Compare the observed mean and StDev of these 100 means of
samples of size 5 with their
theoretical values of 10 and 4.472136, respectively. How would
you make the observed mean
8. and StDev closer to their theoretical values?
4. Examine the sampling distribution of sample means of
samples of size 20:
The 20 numbers in the first row of Sample 1 to Sample 20 form
a sample of size 20 from the
ers in the
second row of Sample 1 to
Sample 20 form a sample of size 20 from this exponential
distribution. There are 100
random samples and therefore 100 sample means. We calculate
these 100 means (to be
stored in the vector mean20) and then produce a histogram and
some descriptive statistics
as follows:
##First subset the first twenty columns
samples1to20<-samples[,1:20]
5
##Calculate the row means by using the rowMeans() function.
Remember,
##R is case sensitive and the 'M' in means must be capitalized
9. for it
to ##run.
mean20<-rowMeans(samples1to20)
Make a histogram for mean20 with a footnote of “By your
name”. Add a normal curve
where 10 and 2.236068 are
the theoretical values of the mean and StDev of all sample
means of samples of size 20,
respectively.
Produce the mean and standard deviation of mean20.
mean(mean20)
10.27018
sd(mean20)
2.157487
The histogram of mean20 is nearly symmetric, and does fit the
theoretical normal curve well
(Does yours?). Compare the observed mean and StDev of
mean20 with their theoretical values
10 and 2.236068, respectively. How would you make the
observed mean and StDev closer to
their theoretical values?
10. 6
5. Examine the sampling distribution of sample means of
samples of size 60:
Each row (e.g., row 1) in Sample 1 – Sample 60 forms a sample
of size 60. We calculate the
100 sample means (to be stored in mean60) and then make a
histogram and generate some
descriptive statistics as follows:
##This is the complete matrix that we created called "samples"
##Calculate the row means by using the rowMeans() function.
##Remember, R is case sensitive and the 'M' in means must be
##capitalized for it to run.
mean60<-rowMeans(samples)
Make a histogram for mean60 with a footnote of “By your
name”. Add a normal curve
Where 10 and 1.290994 are
11. the theoretical values of the mean and SD of all sample means
of samples of size 60,
respectively.
Produce the mean and standard deviation of mean60.
mean(mean60)
10.01476
sd(mean60)
1.301703
7
Compare this sampling distribution with the theoretical
distribution of the sample means of
samples of size 60, N(10, 1.290994). How would you make the
observed mean and StDev closer
to their theoretical values?
Experiment 2. A Monte Carlo Simulation
The technique of simulating a process that contains random
elements and repeating the
process over and over to see how it behaves is called a Monte
12. Carlo simulation. In this
experiment we will use this technique to explore the answer for
the following question:
What is the probability that two or more people in a group of 35
share the same birthday
(day and month only)?
Assume that there is the same number of people having birthday
on each of the 365 days in a
year. That is, assume people’s birthdays are uniformly
distributed throughout the year. To
simulate 35 people’s birthdays is equivalent to randomly sample
35 numbers from the discrete
uniform distribution over the interval [1, 365].
To explore the answer for the question posted above, we do the
following:
Generate 10 groups of 35 random birthdays as follows:
##Create a matrix of random values from the integers 1 to 365
where each
##number represents a day of the year. We create these random
values using
##the 'sample' function with the matrix function that assigns
them to 35 rows
##and 10 columns.
birthdays<-
matrix(sample(1:365,350,replace=TRUE),nrow=35,ncol=10,
13. dimnames=list(NULL, paste("Birthday", 1:10, sep=" ")))
Look for birthday matches in Birthday 1 using the built-in table
function as follows:
table(birthdays[,1])
9 11 28 29 31 48 58 67 68 70 91 95 104 105 110 119
126 128 137
145
1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1
1
146 164 177 187 211 221 227 232 233 258 264 299
1 1 1 1 1 1 1 1 1 1 1 2
The top number in the output is the day of the year and the
number below gives the frequency
of that day in our sample of 35. So days 58, 126 and 299 all
had two individuals with the same
birthday. We can see how many unique birthdays there are in
our first group of 35 individuals
8
14. by using the length function with the unique function as
follows:
length(unique(birthdays[,1]))
[1] 32
So there are 32 unique birthdays in our first sample of birthdays
which agrees with what we
observed in our table of values above. Now we could do this
for each of our 10 samples
individually but instead let’s do it all together in a for loop as
follows:
for (i in 1:10){
print(length(unique(birthdays[,i])))
print(table(birthdays[,i]))
}
This will print to the console the number of unique birthdays
and table of individual values for
each of our 10 samples at once.
Enlarge the R Console window to have a better view of the
frequencies of the different
birthdays. A birthday with a frequency of 2 or higher is a
birthday that is shared by 2 or more
people.
Scroll the R Console window up and down to search for
birthdays with a frequency of 2 or
higher for each trial, and then record the matched birthdays and
the number of people sharing
15. them in the table below. For example, if the number 37 has a
frequency of 2 in your first
sample, which means 2 people in group #1 share the 37th day of
the year as their birthdays,
then record 2(37) into the space for Trial #1 in the table. Leave
the space blank if there are no
matches in a trial.
Trial # Number of people (day), e.g. 2(150) means 2 people
sharing the
150th day of the year as a birthday.
1
2
3
4
5
6
16. 7
8
9
10
9
Lab 4 Assignment
Please follow the lab guidelines given in Lab 1. For full marks,
your answer for each problem
must contain relevant R outputs and an answer in plain English.
1. Arrange the histograms of Sample 1, mean5, mean20, and
mean60 from the most skewed
to the least skewed. Do these histograms demonstrate any
relationship between sample
17. size and the degree of skewness of the sampling distribution of
sample means? If so, what is
the relationship?
2. Fill out the table below based on your Experiment 1 results
(not the results given in the lab
manual, please). Compare the three pairs of observed mean and
SD of sample means, ( X ̂ ,
X
̂ ) with their corresponding theoretical values.
Sample
Size
Sample
Means
Observed mean and SD of
sample means, ( X ̂ , X ̂ )
Theoretical values of the mean
and SD of sample means,
n = 5 Mean5
n = 20 Mean20
n = 60 Mean60
18. Which pair of the observed mean and SD is the closest to its
theoretical values?
Do your simulation results agree with statement of “the larger
the sample size the more
accurate the estimate?”
3. a) Summarize the objective of Experiment 2 in your own
words in one sentence.
b) Submit the frequency counts of Birthday 1.
c) Submit the table showing birthday matches for the 10 trials.
(The table can be created in
the numbers.)
d) Based on the table made in part c), your estimate for the
probability of at least 2 people
in a group of 35 sharing the same birthday is
___________________. (Hint: The relative
frequency of trials with one or more birthday match is an
estimate of the probability.)
e) Compute the exact probability of at least 2 people in a group
of 35 sharing the same
birthday. Note that there is an R function pbirthday(n,…) that
computes this exact
probability. Then compare the estimate that you found in part d)
with this exact
probability. Explain how you could improve your estimate. Be
sure to differentiate
between “number of trials” and “sample size” in your answer.
19. Here are 4 discussion posts made by students needing responses.
Must be APA format, 12 pt font, intext citation, 1 legitimate,
verifiable source per response, responses must be 150+ words
answered thourghly. I need at least one done every night by
10pm, first one due by tomorrow, Thursday November 14,2019
@ 10 pm and so on until the last one Sunday @ 10 pm. This is
the discussion about Pacific Palm Oil.
Once you have completed your initial post, read those of your
classmates and give them constructive feedback as to whether
they have clearly defined the decision statement as delineated in
the class materials. Be sure that the feedback is constructive and
that you are actually brainstorming with them to come up with
valid arguments on why the final decision statement (the one
that was provided to you) is the best one for PPO. Your job is to
argue the logic presented in the content and support (facts from
case study or citations) provided in the posts.
In responding to CAPO’s attacks, how will PPO keep their
customers and continue to make money?
#1
Christopher Hughes
Some triggers would be that if they expand that customers will
possibly boycott them and with protest it could hurt PPO even
more finically. The opportunity for PPO to push for a decision
would be that they would be able to expand quickly and gain
more capital. He wants to achieve a goal that makes the activist
happy and he still would be able expand without any problems.
Also this way he could possibly manage to get that government
certification.
Statement 1: We will have to expand into the virgin rainforest,
because our other plantations will not be able to keep up with
the huge demand that we have. We're going to try to avoid
killing or harming any Orangutans in the process and will try to
work with some of the activist groups to relocate them. Many
still will not happy with our decision, but at the end of the day
it is a business decision that affects all our customers as well.
20. Statement 2: We will replant some of our plantations, so there is
not further need for us to expand into the virgin rainforest. This
will hurt us in the short term, but will pay of in the long run,
because that way we can rotate plantations growth of palm oil.
This will cut us some slack on the protest groups and avoid
possible boycotts we might run into. The loss of some of our
contracted customers could threaten us, but with the loss of
production it would help reach our demand goals. And within
just a view years we would be able to have three plantations up
and running at full production, which at that time we could
consider expanding with the extra capital we would make. That
extra capital might help us to buy up some of the smaller
plantations, so we would not have to expand into the rainforest.
#2
Ronday Wilson
The major trigger that is pushing Pacific Palm Oil (PPO) to
make a decision is time. They understand that the oldest factory
is coming to the end of its production cycle and will need to
pass through a phase of replanting trees. Moreover, the second
factory is following closely behind it in its need to end its
production cycle. The problem that this causes is that all three
plants contribute to PPO’s level of production and even having
one factory absent will bring hardship to the company.
The goal of the production plantation manager wants the
decision to achieve would be to find a way to somehow keep
their customer contracts whilst addressing their declining
yields. This decision will focus on the future. This decision is
about making strategic decision. “Making strategic decision is
choosing the right road to run on” (Judd, 2013). The manager
has to choose a direction to go in that will address this issue.
Additionally, PPO needs to address the threat of Consumers
Against Palm Oil (CAPO). If their decision does not address
21. both the customer contracts and the CAPO then may have to
address a separate issue for CAPO as this is a present threat.
This present problem refers to managing operations. “Managing
operations is running well on the chosen road.” (Judd, 2013).
The protest have begun so PPO will have to figure out how to
appease them while addressing the future problem too.
Possible decision statement #1: The company will expand into
another country. It will take time to choose another country,
plant the trees and get a yield. Currently all three factories are
producing. This is the perfect time to build a new factory in
another county as the oldest factory will be out of commission
in a few years.
Possible decision statement #2: Palm oil will be imported from
other countries. This would give the plantations time to replant
a new crop, bring the destruction of the current rain forest to a
halt, which would make both the orangutans and CAPO happy.
“But to produce palm oil in large enough quantities to meet
growing demand, farmers across Southeast Asia have been
clearing huge swaths of biodiversity-rich tropical rain forest to
make room for massive palm plantations” (Miniscalco, 2008,
para. 2). Lastly, it would give the new outsourced location time
to start producing. Importing may be expensive but the cost may
be offset by the fact that PPO won’t have to pay for labor or
taxes for the shutdown factory.
The decision statement “In responding to CAPO’s attacks, how
will PPO keep their customers and continue to make money?”
focuses on the two main concerns of the company which are
retaining customers and continuing to make money. This
decision statement is valid as it lets you know that the company
is being proactive and not reactive. Reactive decisions are made
after an event has taken place. “Decisions are declared by
human beings. Sometimes they arise when we have what
philosophers call a break in our existence—some change in our
22. circumstances— that impels us to declare a decision. We can
consider these decisions as reactive to the change” (Howard &
Abbas, 2015, p. 4, para. 6). This decision statement keeps the
primary goals (customers and money) while ignoring the other
such as the orangutans, deforestation and CAPO. My decisions
are incorrect as they did the opposite.
Reference
Howard, R. A. & Abbas, A. E. (2015, January 21). Foundations
of Decision Analysis. Retrieved from
http://create.usc.edu/sites/default/files/publications/m01howa62
4601sec01.pdf
Judd, B. [Strategic Decisions Group] (2013, November 30).
Fundamentals of Decision Quality. [Video file]. Retrieved from
https://www.youtube.com/watch?v=dFV-lzIqfRA
Miniscalco, E. (2008, December 11). Is Harvesting Palm Oil
Destroying the Rainforests? Retrieved
https://www.scientificamerican.com/article/harvesting-palm-oil-
and-rainforests/
#3
Rachylle McGee
Pacific Palm Oil (PPO) is consumed by time in making a
decision on what to do next and needs to work on making the
customers happy. Pacific Palm Oil (PPO) is at the end of their
production cycle for the trees and will need to replant to start
the cycle over again. Pacific Palm Oil (PPO)is having issues
with customers being unhappy by having protest and boycotting
on the ecosystem and wild life being damaged by palm oil
companies. Pacific Palm Oil (PPO) is wanting to expand their
business and gain capital as well. The production plant manager
wants to increase sales, while also improving sales and the
23. increasing the yields of palm oil. The plant manager will create
strategies that will focus on the future of the palm oil company
to improve the amount of palm oil and to increase ways to
improve the way they retrieve palm oil to protect the
environment. By improving the ways company gets palm oil, the
company will also make the environment better and make the
consumers happier by making big changes. The Pacific Palm Oil
company needs to correct the issue of the threats of consumers
against the way palm oil is manufactured, gathered and made in
the production factory. The company does not a decrease in
sales due to customers being unhappy with the company. The
(PPO) company will have to deal with the after math of more
threats and boycotting rallies against the company, which will
make it harder for the company to make sales. The consumers
are already starting to hold protest against the company and this
creates issues for the company and the management team, trying
to handle the issue at hand.
Statement 1: The possible decision for the (PPO) company will
be to move to a new location. Which will create better yields for
the company and protect the wild life such as, the orangutans.
The factory location will move to an uninhabited location, such
as a large island in the Philippines with no wild life animals.
The carbon process for gathering the oil, will be easily released
into the air, without doing much damn to humans and animals.
By moving the company to an island, it will make it harder for
consumers to protest. The move of the company will increase
the amount of funds for the company and decrease
the number of unhappy consumers.
Statement 2: The next possible decision that the (PPO) company
can make is to stop production for a few years, so the soil can
route its fields to gain nutrients in the soil. By doing this the
company will be able to get a better crop the next time around
and so the wild life animals can increase in number. The (PPO)
can gather resources from other countries for a cheaper whole
sale price to keep their consumer base clients. (PPO) company
24. can gather palm oil in China, other Asian counties and the south
African. This will help with the ecosystem. By gathering palm
oil from other countries will increase the location of sales in
foreign countries. And decrease the number of people against
the (PPO) company. There will be some cheap resources
depending on what countries the (PPO) decides to go to. The
wages of workers will also decrease depending on where the
new factory will be located. Such as Philippines, the wages will
be low and there is not much wild life. When a few years have
passed, the company will be able to reopen their old locations to
make palm oil, when the environment is better and the soil is
ready to plant.
Last June Billy Kayong, was killed as an activist going against
a Malaysian palm-oil company (Zuckerman, C., 2019). The
company does not need to go this far, to save their company. It
is the managements job to create new ways of production.
Bill Kayong was helping locals to reclaim land that the
government took, and was using that land to make more money
from palm oil (Zuckerman, C., 2019)..
References:
Zuckerman, J. C. (2017, June 19). The Violent Costs of the
Global Palm-Oil Boom. Retrieved from
https://www.newyorker.com/news/news-desk/the-violent-costs-
of-the-global-palm-oil-boom.
#4
Donovan Sanchez
The main trigger that PPO is facing is that they have become a
target of the Consumer Against Palm Oil (CAPO). CAPO has a
mission to publicly protest and picket the business at its sites.
The group also plans to boycott the brands that use palm oil in
their production. This issue has prompted some of PPO’s big
American customers to inquire about the problem. They have
also hinted to PPO that they may have to seek out a sustainable
25. palm oil producer if the problem is not addressed.
The production managers decision should identify the specific
goals for PPO’s business. This goal is to continue to be
profitable while addressing the concerns of their American
customers, and the Consumers Against Palm Oil attacks.
Possible Decision Statement #1. Pacific Palm Oil will continue
to prosper as a palm oil producing company by addressing the
issue of sustainability. The company’s oldest plantation is
coming to an end and can be relocated to larger fertile land
when replanting becomes a factor. Relocating one of our sites
can become a win in the issue of the problems we face from
Consumers Against Palm Oil. This process may take some time
and revenue, but will ultimately please the company’s bottom
line. It will also satisfy our American customers and the
activists groups targeting PPO, while growing the businesses
future production.
Possible Decision Statement #2. Pacific Palm Oil will be
addressing the issues they face from Consumers Against Palm
Oil by addressing the main problems that cause their crops to
underproduce. PPO will enlist a third-party organization to
process the quality of our young plants. This will help the
company raise revenue while currently holding our lands and
producing more oil without expanding its plantations. This
action will take the pressure off of deforestation and at the same
time pleasing our American customers and the CAPO
organization.
Examine the following decision statement, which is the one you
will be using for the MDQ model and case study:
In responding to CAPO’s attacks, how will PPO keep their
customers and continue to make money?
This statement provides a basic premise of the issue that the
company faces. This decision statement is put in the form of a
question. It does not provide an explanation nor a solution of
the decision that is to be made.