SlideShare a Scribd company logo
1 of 22
Probability Homework Help
For any Homework related queries, Call us at : - +1 678 648 4277
You can mail us at : - info@statisticshomeworkhelper.com or
reach us at : - https://www.statisticshomeworkhelper.com/
Using probability to check the possibility of an event happening
Here, we use probability to see the connection between sleep and stroke. Also we will
use probability to see the connection between two types of cancers.
PROBLEM 1:
a) A number of studies have been conducted to investigate the association between sleep
duration and risk of stroke in middle-aged and older adults. One such study was
conducted in a sample of 31,000 participants with an average age of 62 years.
Information on sociodemographic characteristics and sleep duration was obtained by a
self-administered questionnaire. The study took place over 6 years, during which 1,557
cases of stroke were documented. The risk of stroke was higher in participants who
reported sleeping at least 9 hours per night, in comparison to the group who reported
sleeping 7 to 8 hours per night; the relative risk was 1.23.
A news article reporting the results stated: “Sleeping a lot may increase the risk for
stroke. A new study has found that compared with sleeping seven to eight hours a night,
sleeping nine or more hours increased the relative risk for stroke by 23 percent.
Maintaining appropriate sleep duration might hold great promise as primary prevention
of stroke.”
statisticshomeworkhelper.com
Write a short response to the newspaper editor explaining clearly why the article is
potentially misleading. Be sure to use language accessible to a general audience
without a statistics background. Limit your answer to at most six sentences.
b) Suppose that you have purchased a 40-piece box of salt water taffy; taffy is a type of
soft, chewy candy that became popular in the United States in the 1800s. The box is
advertised as containing a random assortment of 8 flavors. Calculate the probability
that at least one flavor is missing from the box. If using an algebraic approach, explain
your reasoning and any necessary assumptions. If using a simulation approach, be sure
to clearly comment any code and describe your logic.
c) Accurately distinguishing lung cancer from benign lung disease remains challenging,
even with the use of imaging scans; computed tomography (CT) scans are known to
have high sensitivity but poor specificity for lung cancer diagnosis. Tumor markers,
molecules produced by a tumor associated with a cancer or by the body in response to a
cancer, may be useful for clinical diagnosis. Consider two tumor markers for lung
cancer, CYFRA 21-1 and CEA, which tend to be elevated in patients with lung cancer
relative to those with benign lung disease. A study was conducted on patients with
known lung cancer status to assess how these tumor markers could be used for clinical
diagnosis.
statisticshomeworkhelper.com
The study team observed that in patients with lung cancer, CYFRA 21-1 is normally
distributed with mean 4.7 ng/mL and standard deviation 9.2 ng/mL while CEA is
normally distributed with mean 5.9 ng/mL and standard deviation 19.8 ng/mL. In
patients with benign lung disease, CYFRA 21-1 is normally distributed with mean 1.6
ng/mL and standard deviation 4.3 ng/mL while CEA is normally distributed with
mean 2.2 ng/mL and standard deviation 5.3 ng/mL.
Use the data from this study to answer the following questions.
i. Compute the sensitivity and specificity of a diagnosis test based on classifying
patients with CYFRA 21-1 level greater than 3.3 ng/mL as having lung cancer.
ii. Compute the sensitivity and specificity of a diagnosis test based on classifying
patientswith CEA level greater than 5.0 ng/mL as having lung cancer.
iii. Explain the reasoning behind why a diagnostic test with low sensitivity may not be
recommended for use in the general population but appropriate for use in high-risk
groups, such as patients presenting with several risk factors or symptoms strongly
predictive of lung cancer. Use language accessible to someone who has not taken a
statistics course. Limit your answer to no more than six sentences.
statisticshomeworkhelper.com
iv. Suppose a high-risk patient is tested for elevated CYFRA 21-1 level and found to
have
CYFRA 21-1 level below the cutoff in part i. Explain whether it seems reasonable to
rule out lung cancer for this patient based on this test result and the test features
computed in part i. Limit your answer to no more than six sentences, referencing
numerical results as necessary. The study team is interested in whether a diagnostic
test based on both CYFRA 21-1 and CEA is an improvement over tests based solely
on one of the markers. Suppose that a patient is classified as having lung cancer if at
least one of the markers is above the cutoffs used in parts i. and ii; i.e., a patient tests
positive for lung cancer if CYFRA 21-1 level is greater than 3.3 ng/mL, CEA level is
greater than 5.0 ng/mL, or both are elevated.
v. Compute the sensitivity and specificity of this diagnostic test. State any
assumptions necessary to make the calculation and comment on whether those
assumptions seem reasonable.
vi. Does the diagnostic test based on both markers represent an improvement over
the tests in parts i. and ii. for use in high-risk patients? Explain your answer,
referencing numerical results to support your reasoning.
statisticshomeworkhelper.com
Solution
(a) “Dear Sir or madam. With reference to your article about risks of longer sleeping
hours to have a heart stroke I am writing to clarify few points that I do not agree with.
As a statistics graduate I would like to point out that concept of relative risk is more
complicated to express it as an increase in percentage. One would prefer to express
relative risk in terms, risks of people with longer sleeping hours are more by factor
1.23 higer than those of with shorter hours. The other issue why the conclusion could
be misleading is the fact it is self-administered questionnaire. That means there are
could be measurement error or the sample given older age of the sample could be not
representative. For example, if people in the category longer sleeping hours are older
then those with shoter sleeping hours, then conclusion is completely wrong. In that
respect I would like to suggest to take extra care when trying interpret non-scientific
or reviewed studies.”
We need to find probability that in the 40-piece box at least one of the 8 flavors not
included. we calculate first the probability that the 1st flavour, is missing. This is to
say that the probability that each piece does not contain the 1st flavor.
statisticshomeworkhelper.com
The probability intersection of 40 independent events is the product of the following
event: any sample does not contain the 1st flavor, which is given by and hence the
above probabiilty is the product of all these events:
Next we calculate this probabily for other 7 flavors:
i. Computing sensitivity and specifity with cutpoint of 3.3 ng/mL as having lung
cancer.
We generate data from a large sample with equal number of lung cancer patients
statisticshomeworkhelper.com
N=10^5
dat=data.frame(cancer=rep(c("YES","NO"),each=N/2))
dat[dat$cancer=="YES","CYFRA"]=rnorm(N/2,mean=4.7,sd=9.2)
dat[dat$cancer=="NO","CYFRA"]=rnorm(N/2,mean=1.6,sd=4.3)
dat[dat$cancer=="YES","CEA"]=rnorm(N/2,mean=5.9,sd=19.8)
dat[dat$cancer=="NO","CEA"]=rnorm(N/2,mean=2.2,sd=5.3)
dat$CYFRA_Pred=ifelse(dat$CYFRA>=3.3,"YES","NO")
CrossTable(dat$cancer,dat$CYFRA_Pred, prop.c = F,prop.r = T,prop.t = F,
expected = F, cell.layout = F, format="SAS", prop.chisq = F)
##
## ======================================
## dat$CYFRA_Pred
## dat$cancer NO YES Total
statisticshomeworkhelper.com
## --------------------------------------
## NO 32591 17409 50000
## row prop. 0.652 0.348 0.500
## --------------------------------------
## YES 21983 28017 50000
## row prop. 0.440 0.560 0.500
## --------------------------------------
## Total 54574 45426 100000
## ======================================
cat("nSensitivity based on diag test CYFRA 21.1=",
sensitivity(factor(dat$CYFRA_Pred),factor(dat$cancer), negative="NO",
positive="YES"))
##
## Sensitivity based on diag test CYFRA 21.1= 0.56034
cat("nSpecifivity based on diag test CYFRA 21.1=",
specificity(factor(dat$CYFRA_Pred),factor(dat$cancer), negative="NO",
positive="YES"))
statisticshomeworkhelper.com
##
## Specifivity based on diag test CYFRA 21.1= 0.65182
ii.Test based on CEA 5.0 ng.
dat$CEA_Pred=ifelse(dat$CEA>=5,"YES","NO")
CrossTable(dat$cancer,dat$CEA_Pred)
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
## ==========================================
## dat$CEA_Pred
## dat$cancer NO YES Total
## ------------------------------------------
statisticshomeworkhelper.com
## NO 35014 14986 50000
## 1034.948 1488.023
## 0.700 0.300 0.500
## 0.594 0.365
## 0.35 0.15
## ------------------------------------------
## YES 23965 26035 50000
## 1034.948 1488.023
## 0.479 0.521 0.500
## 0.406 0.635
## 0.24 0.26
## ------------------------------------------
## Total 58979 41021 100000
## 0.59 0.41
## =========================================
cat("nSensitivity based on diag test CEA 5.0 ng =",
sensitivity(factor(dat$CEA_Pred),factor(dat$cancer), negative="NO",
positive="YES"))
statisticshomeworkhelper.com
##
## Sensitivity based on diag test CEA 5.0 ng = 0.5207
cat("nSpecifivity based on diag test CEA 5.0 ng =",
specificity(factor(dat$CEA_Pred),factor(dat$cancer), negative="NO", posit
ive="YES"))
##
## Specifivity based on diag test CEA 5.0 ng = 0.70028
(iii) Diagnostic tests with low sensititivy imply that the in the general population such
tests does correctly identify enough risky patients, therefore such criteria is crucial
for large scale diagnostics. On the other hand considering only high risk population
such diagnostic tests might be effective because these type of misclassification
could be much less among such high risk groups.
(iv) If a patient has a CYFRA-level below the considered cut-off 3.3, then given the
specificity of this diagnosis test there is at least 65% chance that the patient has no
cancer. This can be read from the definition of specificity which can be translated
as
proportion of correctly identified non-risk patients below cut-off.
(v) Here we set the rule for predicted classification of lung cancer if at least one of
the
cut offs or rules in (i) and (ii) apply.
statisticshomeworkhelper.com
dat$Joint_Pred=factor(ifelse(dat$CEA>=5|dat$CYFRA>=3.3,"YES","NO"))
CrossTable(dat$cancer,dat$Joint_Pred)
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
## ==========================================
## dat$Joint_Pred
## dat$cancer NO YES Total
## ------------------------------------------
## NO 22828 27172 50000
## 2240.838 1124.693
## 0.457 0.543 0.500
## 0.683 0.408
## 0.228 0.272
statisticshomeworkhelper.com
## ------------------------------------------
## YES 10590 39410 50000
## 2240.838 1124.693
## 0.212 0.788 0.500
## 0.317 0.592
## 0.106 0.394
## ------------------------------------------
## Total 33418 66582 100000
## 0.334 0.666
## ==========================================
cat("nSensitivity based on the joint diag test =",
sensitivity(factor(dat$Joint_Pred),factor(dat$cancer), negative="NO", posit
ive="YES"))
##
## Sensitivity based on the joint diag test = 0.7882
cat("nSpecifivity based on the joint diag test =",
specificity(factor(dat$Joint_Pred),factor(dat$cancer), negative="NO", posit
ive="YES"))
##
## Specifivity based on the joint diag test = 0.45656
statisticshomeworkhelper.com
We observe increase of specifity but still lower sensitivity rate. This implies that such
diagnostics is good for identifying people not having the cancer. But at the same time
has poor performance for identifying people having cancer. The
PROBLEM 2: ICE CREAM SALES
Congratulations! You have inherited an ice cream truck for the summer of 2022. You
will be selling ice cream from the truck every day that summer: 101 days in total, from
Memorial Day weekend until Labor Day weekend. This problem will step you through
projecting how much ice cream you expect to sell (as well as how much revenue you
expect to make) during the summer.
a) Ice cream sales are known to be very dependent on weather. Suppose that for any
particular day, the weather is either rainy or sunny; on average, one-third of the days are
rainy and two-thirds of the days are sunny. Compute the expected number of sunny days
in summer 2022 and the probability of there being more rainy days than sunny days in
summer 2022. You may assume that the weather is independent between days.
b) It is more realistic to think that tomorrow’s weather depends on today’s weather (for
all days throughout the summer). Suppose that if it is sunny today, there is an 80%
chance that tomorrow will also be sunny; however, if it is rainy today, there is only a
30% chance
statisticshomeworkhelper.com
Based on a simulation with 10,000 replicates, estimate the probability of there being more
rainy days than sunny days in summer 2022. Write a brief paragraph outlining the logic of
the simulation and clearly comment your code. The number of ice cream cones sold on a
typical day depends on the weather. When it is sunny, the number of ice cream cones sold
is approximately normally distributed with mean 200 and standard deviation 40. When it
is rainy, the number of ice cream cones sold is approximately normally distributed with
mean 120 and standard deviation 30. Use the round () function to round to the nearest
whole number. You make $2.00 in profit for each ice cream cone sold and your fixed
daily operating cost is $300 per day.
c) Based on a simulation with 10,000 replicates, estimate the probability that you lose
money on any one randomly selected day in the summer; write a brief paragraph
outlining the logic of the simulation and clearly comment your code. Assume the weather
follows the pattern described in part a).
d) Based on a simulation with 10,000 replicates, estimate the probability that you lose
money with your ice cream truck during the summer of 2022; write a brief paragraph
outlining the logic of the simulation and clearly comment your code. Assume the weather
follows the pattern described in part b).
that tomorrow will be sunny. Additionally, suppose that the weather on ‘Day 0’ (i.e.,
the day before your ice cream truck opens) is known to be sunny.
Solution
(a) We note that probability that a single day is sunny follows Bernoulli distribution and
the number of sunny days, in 101 days follows Binomial distribution with the success
rate of a single sunny day () and trial size of 101:
(b)Next we are interested in finding the expected value of :
The event that there are more sunny days than rainy is the same as the event there are
at least 51 sunny days:
We use R-statistical package to calculate this probability (although approximations
using incomplete Beta-function exist).
statisticshomeworkhelper.com
1-pbinom(50,101,2/3)
## [1] 0.9997276
##equivalently
sum(dbinom(51:101,101,2/3))
## [1] 0.9997276
b. Considering the dependent model
We create the empty vector of 102 days (0 day as first) and then each step draw a
Bernouilli sample with success probability of 0.8 if it the previous day is sunny and 0.3
if the previuos day is raining:
We run collect these samples and count number of cases where sunny days are more than
51.
statisticshomeworkhelper.com
set.seed(2344)
##number of simulations
N=10^4
n_sun_sim=rep(NA,N)
for(j in1:N)
{
## we create the empty vector of 102 days (0
day as first)
days=rep(NA,102)
days[1]=1
for(i in2:102)
{
days[i]=ifelse(days[i-
1]==1,rbinom(1,1,0.8),rbinom(1,1,0.3))
}
n_sun_sim[j]=sum(days[-1])
}
cat("Probability of more sunny days=",sum(n_sun_sim>=51)/N)
statisticshomeworkhelper.com
## Probability of more sunny days= 0.886
(c) We are looking for probability that on a single day which could be rainy or sunny
the money is lost. Given the variable profit of $2 and daily fixed cost of $300 this is en
event of selling less than 150 cones:
The number of cone sales,, is conditional on Bernouilli variable that takes 1 if it is sunny
and follows normal distribution.
#N=10^4
n_lose_sim=rep(NA,N)
for(j in1:N)
{
x_s=rbinom(1,1,2/3)
n_lose_sim[j]=ifelse(x_s==1,rnorm(1,mean=200,sd=40),rnorm(1,mean=120,sd=30))
statisticshomeworkhelper.com
}
cat("Proportion of financially unsuccessful days=",sum(n_lose_sim<=149)/N)
## Proportion of financially unsuccessful days= 0.3479
We can actually check validity (or accuracy) of this estimate using the additivity of
expectation operator and R:
pnorm(149,mean=200,sd=40)*2/3+pnorm(149,mean=120,sd=30)*1/3
## [1] 0.3451513
(d) In order to calculate of losing money during the whole year of 2002, We extend
the
simulation.
n_balance_sim=rep(NA,N)
{
## we create the empty vector of 102 days (0 day as first)
statisticshomeworkhelper.com
days=rep(NA,102)
balance=rep(NA,102)
balance[1]=0
days[1]=1
for(i in2:102)
{
days[i]=ifelse(days[i-1]==1,rbinom(1,1,0.8),rbinom(1,1,0.3))
n_c=ifelse(days[i]==1,rnorm(1,mean=200,sd=40),rnorm(1,mean=120,sd=30))
balance[i]=round(n_c)*2-300
}
n_balance_sim[j]=sum(balance)
}
cat("Proportion of financially unsuccessful years=",sum(n_balance_sim<=0)/N)
## Proportion of financially unsuccessful years= 0.0093
statisticshomeworkhelper.com

More Related Content

Similar to Probability Homework Help

This quiz consists of 20 questions most appear to be similar but now.docx
This quiz consists of 20 questions most appear to be similar but now.docxThis quiz consists of 20 questions most appear to be similar but now.docx
This quiz consists of 20 questions most appear to be similar but now.docx
amit657720
 
coad_machine_learning
coad_machine_learningcoad_machine_learning
coad_machine_learning
Ford Sleeman
 
Number of Pages 4 (Double Spaced)Number of sources 8Writi.docx
Number of Pages 4 (Double Spaced)Number of sources 8Writi.docxNumber of Pages 4 (Double Spaced)Number of sources 8Writi.docx
Number of Pages 4 (Double Spaced)Number of sources 8Writi.docx
cherishwinsland
 
1. In a study, 28 adults with mild periodontal disease are assesse
1. In a study, 28 adults with mild periodontal disease are assesse1. In a study, 28 adults with mild periodontal disease are assesse
1. In a study, 28 adults with mild periodontal disease are assesse
TatianaMajor22
 
Case control studies
Case control studiesCase control studies
Case control studies
Bruno Mmassy
 
Session 3 2012 fmhs_paraphrasing and summarising
Session 3 2012 fmhs_paraphrasing and summarisingSession 3 2012 fmhs_paraphrasing and summarising
Session 3 2012 fmhs_paraphrasing and summarising
mtfinn
 
2010StanfordE25 Michele dragoescu e25 project
2010StanfordE25 Michele dragoescu e25 project2010StanfordE25 Michele dragoescu e25 project
2010StanfordE25 Michele dragoescu e25 project
mdragoescu
 
White Paper- A non-invasive blood test for diagnosing lung cancer
White Paper- A non-invasive blood test for diagnosing lung cancerWhite Paper- A non-invasive blood test for diagnosing lung cancer
White Paper- A non-invasive blood test for diagnosing lung cancer
Dusty Majumdar, PhD
 

Similar to Probability Homework Help (18)

Statistics Homework Help
Statistics Homework HelpStatistics Homework Help
Statistics Homework Help
 
1.9. study designs
1.9. study designs1.9. study designs
1.9. study designs
 
Research Methodology 1
 Research Methodology 1 Research Methodology 1
Research Methodology 1
 
This quiz consists of 20 questions most appear to be similar but now.docx
This quiz consists of 20 questions most appear to be similar but now.docxThis quiz consists of 20 questions most appear to be similar but now.docx
This quiz consists of 20 questions most appear to be similar but now.docx
 
Evaluating the Medical Literature
Evaluating the Medical LiteratureEvaluating the Medical Literature
Evaluating the Medical Literature
 
coad_machine_learning
coad_machine_learningcoad_machine_learning
coad_machine_learning
 
EarlyCDT-Lung
EarlyCDT-LungEarlyCDT-Lung
EarlyCDT-Lung
 
Number of Pages 4 (Double Spaced)Number of sources 8Writi.docx
Number of Pages 4 (Double Spaced)Number of sources 8Writi.docxNumber of Pages 4 (Double Spaced)Number of sources 8Writi.docx
Number of Pages 4 (Double Spaced)Number of sources 8Writi.docx
 
1. In a study, 28 adults with mild periodontal disease are assesse
1. In a study, 28 adults with mild periodontal disease are assesse1. In a study, 28 adults with mild periodontal disease are assesse
1. In a study, 28 adults with mild periodontal disease are assesse
 
Experimental design part 2 measurements
Experimental design part 2 measurements Experimental design part 2 measurements
Experimental design part 2 measurements
 
Case control studies
Case control studiesCase control studies
Case control studies
 
London 21.11.2008
London 21.11.2008London 21.11.2008
London 21.11.2008
 
Artificial Intelligence in Medicine.pdf
Artificial Intelligence in Medicine.pdfArtificial Intelligence in Medicine.pdf
Artificial Intelligence in Medicine.pdf
 
Ch 2 basic concepts in pharmacoepidemiology (10 hrs)
Ch 2 basic concepts in pharmacoepidemiology (10 hrs)Ch 2 basic concepts in pharmacoepidemiology (10 hrs)
Ch 2 basic concepts in pharmacoepidemiology (10 hrs)
 
Session 3 2012 fmhs_paraphrasing and summarising
Session 3 2012 fmhs_paraphrasing and summarisingSession 3 2012 fmhs_paraphrasing and summarising
Session 3 2012 fmhs_paraphrasing and summarising
 
2010StanfordE25 Michele dragoescu e25 project
2010StanfordE25 Michele dragoescu e25 project2010StanfordE25 Michele dragoescu e25 project
2010StanfordE25 Michele dragoescu e25 project
 
Expanded PICO V1
Expanded PICO V1Expanded PICO V1
Expanded PICO V1
 
White Paper- A non-invasive blood test for diagnosing lung cancer
White Paper- A non-invasive blood test for diagnosing lung cancerWhite Paper- A non-invasive blood test for diagnosing lung cancer
White Paper- A non-invasive blood test for diagnosing lung cancer
 

More from Statistics Homework Helper

Your Statistics Homework Solver is Here! 📊📚
Your Statistics Homework Solver is Here! 📊📚Your Statistics Homework Solver is Here! 📊📚
Your Statistics Homework Solver is Here! 📊📚
Statistics Homework Helper
 

More from Statistics Homework Helper (20)

📊 Conquer Your Stats Homework with These Top 10 Tips! 🚀
📊 Conquer Your Stats Homework with These Top 10 Tips! 🚀📊 Conquer Your Stats Homework with These Top 10 Tips! 🚀
📊 Conquer Your Stats Homework with These Top 10 Tips! 🚀
 
Your Statistics Homework Solver is Here! 📊📚
Your Statistics Homework Solver is Here! 📊📚Your Statistics Homework Solver is Here! 📊📚
Your Statistics Homework Solver is Here! 📊📚
 
Probability Homework Help
Probability Homework HelpProbability Homework Help
Probability Homework Help
 
Multiple Linear Regression Homework Help
Multiple Linear Regression Homework HelpMultiple Linear Regression Homework Help
Multiple Linear Regression Homework Help
 
Statistics Homework Help
Statistics Homework HelpStatistics Homework Help
Statistics Homework Help
 
SAS Homework Help
SAS Homework HelpSAS Homework Help
SAS Homework Help
 
R Programming Homework Help
R Programming Homework HelpR Programming Homework Help
R Programming Homework Help
 
Statistics Homework Helper
Statistics Homework HelperStatistics Homework Helper
Statistics Homework Helper
 
Statistics Homework Help
Statistics Homework HelpStatistics Homework Help
Statistics Homework Help
 
Do My Statistics Homework
Do My Statistics HomeworkDo My Statistics Homework
Do My Statistics Homework
 
Write My Statistics Homework
Write My Statistics HomeworkWrite My Statistics Homework
Write My Statistics Homework
 
Quantitative Research Homework Help
Quantitative Research Homework HelpQuantitative Research Homework Help
Quantitative Research Homework Help
 
Probability Homework Help
Probability Homework HelpProbability Homework Help
Probability Homework Help
 
Top Rated Service Provided By Statistics Homework Help
Top Rated Service Provided By Statistics Homework HelpTop Rated Service Provided By Statistics Homework Help
Top Rated Service Provided By Statistics Homework Help
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
 
Statistics Homework Help
Statistics Homework HelpStatistics Homework Help
Statistics Homework Help
 
Multivariate and Monova Assignment Help
Multivariate and Monova Assignment HelpMultivariate and Monova Assignment Help
Multivariate and Monova Assignment Help
 
Statistics Multiple Choice Questions and Answers
Statistics Multiple Choice Questions and AnswersStatistics Multiple Choice Questions and Answers
Statistics Multiple Choice Questions and Answers
 
Advanced Statistics Homework Help
Advanced Statistics Homework HelpAdvanced Statistics Homework Help
Advanced Statistics Homework Help
 
Quantitative Methods Assignment Help
Quantitative Methods Assignment HelpQuantitative Methods Assignment Help
Quantitative Methods Assignment Help
 

Recently uploaded

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Krashi Coaching
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
fonyou31
 

Recently uploaded (20)

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 

Probability Homework Help

  • 1. Probability Homework Help For any Homework related queries, Call us at : - +1 678 648 4277 You can mail us at : - info@statisticshomeworkhelper.com or reach us at : - https://www.statisticshomeworkhelper.com/
  • 2. Using probability to check the possibility of an event happening Here, we use probability to see the connection between sleep and stroke. Also we will use probability to see the connection between two types of cancers. PROBLEM 1: a) A number of studies have been conducted to investigate the association between sleep duration and risk of stroke in middle-aged and older adults. One such study was conducted in a sample of 31,000 participants with an average age of 62 years. Information on sociodemographic characteristics and sleep duration was obtained by a self-administered questionnaire. The study took place over 6 years, during which 1,557 cases of stroke were documented. The risk of stroke was higher in participants who reported sleeping at least 9 hours per night, in comparison to the group who reported sleeping 7 to 8 hours per night; the relative risk was 1.23. A news article reporting the results stated: “Sleeping a lot may increase the risk for stroke. A new study has found that compared with sleeping seven to eight hours a night, sleeping nine or more hours increased the relative risk for stroke by 23 percent. Maintaining appropriate sleep duration might hold great promise as primary prevention of stroke.” statisticshomeworkhelper.com
  • 3. Write a short response to the newspaper editor explaining clearly why the article is potentially misleading. Be sure to use language accessible to a general audience without a statistics background. Limit your answer to at most six sentences. b) Suppose that you have purchased a 40-piece box of salt water taffy; taffy is a type of soft, chewy candy that became popular in the United States in the 1800s. The box is advertised as containing a random assortment of 8 flavors. Calculate the probability that at least one flavor is missing from the box. If using an algebraic approach, explain your reasoning and any necessary assumptions. If using a simulation approach, be sure to clearly comment any code and describe your logic. c) Accurately distinguishing lung cancer from benign lung disease remains challenging, even with the use of imaging scans; computed tomography (CT) scans are known to have high sensitivity but poor specificity for lung cancer diagnosis. Tumor markers, molecules produced by a tumor associated with a cancer or by the body in response to a cancer, may be useful for clinical diagnosis. Consider two tumor markers for lung cancer, CYFRA 21-1 and CEA, which tend to be elevated in patients with lung cancer relative to those with benign lung disease. A study was conducted on patients with known lung cancer status to assess how these tumor markers could be used for clinical diagnosis. statisticshomeworkhelper.com
  • 4. The study team observed that in patients with lung cancer, CYFRA 21-1 is normally distributed with mean 4.7 ng/mL and standard deviation 9.2 ng/mL while CEA is normally distributed with mean 5.9 ng/mL and standard deviation 19.8 ng/mL. In patients with benign lung disease, CYFRA 21-1 is normally distributed with mean 1.6 ng/mL and standard deviation 4.3 ng/mL while CEA is normally distributed with mean 2.2 ng/mL and standard deviation 5.3 ng/mL. Use the data from this study to answer the following questions. i. Compute the sensitivity and specificity of a diagnosis test based on classifying patients with CYFRA 21-1 level greater than 3.3 ng/mL as having lung cancer. ii. Compute the sensitivity and specificity of a diagnosis test based on classifying patientswith CEA level greater than 5.0 ng/mL as having lung cancer. iii. Explain the reasoning behind why a diagnostic test with low sensitivity may not be recommended for use in the general population but appropriate for use in high-risk groups, such as patients presenting with several risk factors or symptoms strongly predictive of lung cancer. Use language accessible to someone who has not taken a statistics course. Limit your answer to no more than six sentences. statisticshomeworkhelper.com
  • 5. iv. Suppose a high-risk patient is tested for elevated CYFRA 21-1 level and found to have CYFRA 21-1 level below the cutoff in part i. Explain whether it seems reasonable to rule out lung cancer for this patient based on this test result and the test features computed in part i. Limit your answer to no more than six sentences, referencing numerical results as necessary. The study team is interested in whether a diagnostic test based on both CYFRA 21-1 and CEA is an improvement over tests based solely on one of the markers. Suppose that a patient is classified as having lung cancer if at least one of the markers is above the cutoffs used in parts i. and ii; i.e., a patient tests positive for lung cancer if CYFRA 21-1 level is greater than 3.3 ng/mL, CEA level is greater than 5.0 ng/mL, or both are elevated. v. Compute the sensitivity and specificity of this diagnostic test. State any assumptions necessary to make the calculation and comment on whether those assumptions seem reasonable. vi. Does the diagnostic test based on both markers represent an improvement over the tests in parts i. and ii. for use in high-risk patients? Explain your answer, referencing numerical results to support your reasoning. statisticshomeworkhelper.com
  • 6. Solution (a) “Dear Sir or madam. With reference to your article about risks of longer sleeping hours to have a heart stroke I am writing to clarify few points that I do not agree with. As a statistics graduate I would like to point out that concept of relative risk is more complicated to express it as an increase in percentage. One would prefer to express relative risk in terms, risks of people with longer sleeping hours are more by factor 1.23 higer than those of with shorter hours. The other issue why the conclusion could be misleading is the fact it is self-administered questionnaire. That means there are could be measurement error or the sample given older age of the sample could be not representative. For example, if people in the category longer sleeping hours are older then those with shoter sleeping hours, then conclusion is completely wrong. In that respect I would like to suggest to take extra care when trying interpret non-scientific or reviewed studies.” We need to find probability that in the 40-piece box at least one of the 8 flavors not included. we calculate first the probability that the 1st flavour, is missing. This is to say that the probability that each piece does not contain the 1st flavor. statisticshomeworkhelper.com
  • 7. The probability intersection of 40 independent events is the product of the following event: any sample does not contain the 1st flavor, which is given by and hence the above probabiilty is the product of all these events: Next we calculate this probabily for other 7 flavors: i. Computing sensitivity and specifity with cutpoint of 3.3 ng/mL as having lung cancer. We generate data from a large sample with equal number of lung cancer patients statisticshomeworkhelper.com
  • 9. ## -------------------------------------- ## NO 32591 17409 50000 ## row prop. 0.652 0.348 0.500 ## -------------------------------------- ## YES 21983 28017 50000 ## row prop. 0.440 0.560 0.500 ## -------------------------------------- ## Total 54574 45426 100000 ## ====================================== cat("nSensitivity based on diag test CYFRA 21.1=", sensitivity(factor(dat$CYFRA_Pred),factor(dat$cancer), negative="NO", positive="YES")) ## ## Sensitivity based on diag test CYFRA 21.1= 0.56034 cat("nSpecifivity based on diag test CYFRA 21.1=", specificity(factor(dat$CYFRA_Pred),factor(dat$cancer), negative="NO", positive="YES")) statisticshomeworkhelper.com
  • 10. ## ## Specifivity based on diag test CYFRA 21.1= 0.65182 ii.Test based on CEA 5.0 ng. dat$CEA_Pred=ifelse(dat$CEA>=5,"YES","NO") CrossTable(dat$cancer,dat$CEA_Pred) ## Cell Contents ## |-------------------------| ## | N | ## | Chi-square contribution | ## | N / Row Total | ## | N / Col Total | ## | N / Table Total | ## |-------------------------| ## ## ========================================== ## dat$CEA_Pred ## dat$cancer NO YES Total ## ------------------------------------------ statisticshomeworkhelper.com
  • 11. ## NO 35014 14986 50000 ## 1034.948 1488.023 ## 0.700 0.300 0.500 ## 0.594 0.365 ## 0.35 0.15 ## ------------------------------------------ ## YES 23965 26035 50000 ## 1034.948 1488.023 ## 0.479 0.521 0.500 ## 0.406 0.635 ## 0.24 0.26 ## ------------------------------------------ ## Total 58979 41021 100000 ## 0.59 0.41 ## ========================================= cat("nSensitivity based on diag test CEA 5.0 ng =", sensitivity(factor(dat$CEA_Pred),factor(dat$cancer), negative="NO", positive="YES")) statisticshomeworkhelper.com
  • 12. ## ## Sensitivity based on diag test CEA 5.0 ng = 0.5207 cat("nSpecifivity based on diag test CEA 5.0 ng =", specificity(factor(dat$CEA_Pred),factor(dat$cancer), negative="NO", posit ive="YES")) ## ## Specifivity based on diag test CEA 5.0 ng = 0.70028 (iii) Diagnostic tests with low sensititivy imply that the in the general population such tests does correctly identify enough risky patients, therefore such criteria is crucial for large scale diagnostics. On the other hand considering only high risk population such diagnostic tests might be effective because these type of misclassification could be much less among such high risk groups. (iv) If a patient has a CYFRA-level below the considered cut-off 3.3, then given the specificity of this diagnosis test there is at least 65% chance that the patient has no cancer. This can be read from the definition of specificity which can be translated as proportion of correctly identified non-risk patients below cut-off. (v) Here we set the rule for predicted classification of lung cancer if at least one of the cut offs or rules in (i) and (ii) apply. statisticshomeworkhelper.com
  • 13. dat$Joint_Pred=factor(ifelse(dat$CEA>=5|dat$CYFRA>=3.3,"YES","NO")) CrossTable(dat$cancer,dat$Joint_Pred) ## Cell Contents ## |-------------------------| ## | N | ## | Chi-square contribution | ## | N / Row Total | ## | N / Col Total | ## | N / Table Total | ## |-------------------------| ## ## ========================================== ## dat$Joint_Pred ## dat$cancer NO YES Total ## ------------------------------------------ ## NO 22828 27172 50000 ## 2240.838 1124.693 ## 0.457 0.543 0.500 ## 0.683 0.408 ## 0.228 0.272 statisticshomeworkhelper.com
  • 14. ## ------------------------------------------ ## YES 10590 39410 50000 ## 2240.838 1124.693 ## 0.212 0.788 0.500 ## 0.317 0.592 ## 0.106 0.394 ## ------------------------------------------ ## Total 33418 66582 100000 ## 0.334 0.666 ## ========================================== cat("nSensitivity based on the joint diag test =", sensitivity(factor(dat$Joint_Pred),factor(dat$cancer), negative="NO", posit ive="YES")) ## ## Sensitivity based on the joint diag test = 0.7882 cat("nSpecifivity based on the joint diag test =", specificity(factor(dat$Joint_Pred),factor(dat$cancer), negative="NO", posit ive="YES")) ## ## Specifivity based on the joint diag test = 0.45656 statisticshomeworkhelper.com
  • 15. We observe increase of specifity but still lower sensitivity rate. This implies that such diagnostics is good for identifying people not having the cancer. But at the same time has poor performance for identifying people having cancer. The PROBLEM 2: ICE CREAM SALES Congratulations! You have inherited an ice cream truck for the summer of 2022. You will be selling ice cream from the truck every day that summer: 101 days in total, from Memorial Day weekend until Labor Day weekend. This problem will step you through projecting how much ice cream you expect to sell (as well as how much revenue you expect to make) during the summer. a) Ice cream sales are known to be very dependent on weather. Suppose that for any particular day, the weather is either rainy or sunny; on average, one-third of the days are rainy and two-thirds of the days are sunny. Compute the expected number of sunny days in summer 2022 and the probability of there being more rainy days than sunny days in summer 2022. You may assume that the weather is independent between days. b) It is more realistic to think that tomorrow’s weather depends on today’s weather (for all days throughout the summer). Suppose that if it is sunny today, there is an 80% chance that tomorrow will also be sunny; however, if it is rainy today, there is only a 30% chance statisticshomeworkhelper.com
  • 16. Based on a simulation with 10,000 replicates, estimate the probability of there being more rainy days than sunny days in summer 2022. Write a brief paragraph outlining the logic of the simulation and clearly comment your code. The number of ice cream cones sold on a typical day depends on the weather. When it is sunny, the number of ice cream cones sold is approximately normally distributed with mean 200 and standard deviation 40. When it is rainy, the number of ice cream cones sold is approximately normally distributed with mean 120 and standard deviation 30. Use the round () function to round to the nearest whole number. You make $2.00 in profit for each ice cream cone sold and your fixed daily operating cost is $300 per day. c) Based on a simulation with 10,000 replicates, estimate the probability that you lose money on any one randomly selected day in the summer; write a brief paragraph outlining the logic of the simulation and clearly comment your code. Assume the weather follows the pattern described in part a). d) Based on a simulation with 10,000 replicates, estimate the probability that you lose money with your ice cream truck during the summer of 2022; write a brief paragraph outlining the logic of the simulation and clearly comment your code. Assume the weather follows the pattern described in part b). that tomorrow will be sunny. Additionally, suppose that the weather on ‘Day 0’ (i.e., the day before your ice cream truck opens) is known to be sunny.
  • 17. Solution (a) We note that probability that a single day is sunny follows Bernoulli distribution and the number of sunny days, in 101 days follows Binomial distribution with the success rate of a single sunny day () and trial size of 101: (b)Next we are interested in finding the expected value of : The event that there are more sunny days than rainy is the same as the event there are at least 51 sunny days: We use R-statistical package to calculate this probability (although approximations using incomplete Beta-function exist). statisticshomeworkhelper.com
  • 18. 1-pbinom(50,101,2/3) ## [1] 0.9997276 ##equivalently sum(dbinom(51:101,101,2/3)) ## [1] 0.9997276 b. Considering the dependent model We create the empty vector of 102 days (0 day as first) and then each step draw a Bernouilli sample with success probability of 0.8 if it the previous day is sunny and 0.3 if the previuos day is raining: We run collect these samples and count number of cases where sunny days are more than 51. statisticshomeworkhelper.com
  • 19. set.seed(2344) ##number of simulations N=10^4 n_sun_sim=rep(NA,N) for(j in1:N) { ## we create the empty vector of 102 days (0 day as first) days=rep(NA,102) days[1]=1 for(i in2:102) { days[i]=ifelse(days[i- 1]==1,rbinom(1,1,0.8),rbinom(1,1,0.3)) } n_sun_sim[j]=sum(days[-1]) } cat("Probability of more sunny days=",sum(n_sun_sim>=51)/N) statisticshomeworkhelper.com
  • 20. ## Probability of more sunny days= 0.886 (c) We are looking for probability that on a single day which could be rainy or sunny the money is lost. Given the variable profit of $2 and daily fixed cost of $300 this is en event of selling less than 150 cones: The number of cone sales,, is conditional on Bernouilli variable that takes 1 if it is sunny and follows normal distribution. #N=10^4 n_lose_sim=rep(NA,N) for(j in1:N) { x_s=rbinom(1,1,2/3) n_lose_sim[j]=ifelse(x_s==1,rnorm(1,mean=200,sd=40),rnorm(1,mean=120,sd=30)) statisticshomeworkhelper.com
  • 21. } cat("Proportion of financially unsuccessful days=",sum(n_lose_sim<=149)/N) ## Proportion of financially unsuccessful days= 0.3479 We can actually check validity (or accuracy) of this estimate using the additivity of expectation operator and R: pnorm(149,mean=200,sd=40)*2/3+pnorm(149,mean=120,sd=30)*1/3 ## [1] 0.3451513 (d) In order to calculate of losing money during the whole year of 2002, We extend the simulation. n_balance_sim=rep(NA,N) { ## we create the empty vector of 102 days (0 day as first) statisticshomeworkhelper.com