SlideShare a Scribd company logo
1 of 3
Name: Date: Block:
Lab 2.7: The Titanic Shuffle
Notes:
In the lastlab,we learnedthatbyusinga do-loopandthe resample function,we couldsimulate
shufflingourdatamanytimes. Thishelpsusdeterminehow likelyitisthata difference betweengroups
isdue to chance.
Data scientistsusuallyconsideranyz-score largerthan3 or smallerthan -3 to be extreme.
Andby extreme we meanthattheyoccurso rarelyby chance alone that we start to believe that
somethingbesideschance alone is causingthe z-scorestobe solarge.
1. Lookat the boxplotof farespaidseparatedbywhetherthe passengersurvivedornot. Do you
believericherpassengerswere more likelytosurvive?Why? Refertothe quartilestosupportyour
claim.
2. Calculate howmuchmore the typical survivorpaidversusthe typical non-survivorinourdata. Copy
the resulthere:
3. What wasthe typical fare paidbysurvivors?Non-survivers?How muchmore didthe typical survivor
pay?Write the code youusedto findthese answers.
4. Basedon the boxplots,whywouldusingthe medianbe abettermeasure of typical thanthe mean?
5. Use a do-loopandthe resample functiontoshuffle the passenger'ssurvival status300 timesand
compute eachgroup's medianfare paid.Save yourshuffleddataasa new objectcalledshfl_med. Paste
your code here:
(hint:You can look in lab 2.6 for help with constructing thecode.)
6. Type inthe commandhead(shfl_med) andpaste the resulthere:
7. Use the transformfunctiontoadd a variable calledDiff tothe shuffledmediansyoujustcalculated.
Save these valuesasa newobjectcalledshfl_diff . Paste yourcode here:
(hint:Youcan lookinlab2.6 for helpwithconstructingthe code.)
8. Type inthe commandhead(shfl_diff)andpaste yourresulthere:
9. Create a plotof the mediandifferencesinthe fare paidforyourrandomizedsurvivorsandnon-
survivors. Copyandpaste your plothere:
10. What was the actual difference inthe medianfare paidbysurvivorsandnon-survivors inthe data?
Basedon yourplot,do youthinkthisdifference isbig?Why?
11. Compute the meanand standarddeviationof your300 randomizeddifferences.
What code didyou use to findthe mean?
Paste the resulthere:
What code didyou use to findthe standarddeviation?
Paste the resulthere:
12. Convertthe actual difference youcomputedtoaz-score usingthe meanandstandard deviations
youjust computed.Write downthe formulayouusedandthe z-score you got.
13. How manystandarddeviationsawayfromthe meanisthe actual mediandifference infarespaidby
survivorsandnon-survivors?
14. Use the subsetandnrow functionstocompute the estimatedprobabilitiesof az-score beinglarger
than 3, smallerthan -3, andlarger than3 OR smallerthan-3. Assignthissubsetthe name
zscore_probability.
What code didyou use?
Paste yourresulthere:
Redo your simulationand the analysis,BUT thistime use the meanfare paid insteadof the median
fare paid.
15. Use a do-loopandthe resample functiontoshuffle the passenger'ssurvival status300 timesand
compute eachgroup's meanfare paid.Save yourshuffleddata asa new objectcalledshfl_mean. Paste
your code here:
16. Type in the commandhead(shfl_mean)andpaste the resulthere:
17. Use the transformfunctiontoadda variable calledDifftothe shuffledmeansyoujustcalculated.
Save these valuesasa newobjectcalledshfl_diff2. Paste your code here:
18. Type in the commandhead(shfl_diff2) andpaste yourresulthere:
19. Create a plotof the meandifferencesinthe fare paidforyourrandomizedsurvivorsandnon-
survivors. Copyandpaste your plothere:
20. What was the actual difference inthe meanfare paidbysurvivorsandnon-survivorsinthe data?
Basedon yourplot,do youthinkthisdifference isbig?Why?
21. Compute the standarddeviationof your300 randomizeddifferences.
What code didyou use to findthe standarddeviation?
Paste the resulthere:
22. Convertthe actual difference youcomputedtoaz-score usingthe real meanand standard
deviation.Write downthe formulayouusedandthe z-score yougot.
23. How manystandarddeviationsawayfromthe meanisthe actual mediandifference infarespaidby
survivorsandnon-survivors?
24. Doesyour conclusion change dependingonthe methodyouuse todescribe the typical fare paidby
survivorsandnon-survivors?Explain.
25. If a journalistwalkeduptoyou rightnow and askedif the amountof fare paidfor a Titanicticket
had an effectona person'sprobabilityof surviving,whatwouldyousay?How wouldyoujustifyyour
answer?

More Related Content

Similar to Lab 2 7_the_titanic_shuffle

SAMPLING MEAN DEFINITION The term sampling mean is.docx
SAMPLING MEAN  DEFINITION  The term sampling mean is.docxSAMPLING MEAN  DEFINITION  The term sampling mean is.docx
SAMPLING MEAN DEFINITION The term sampling mean is.docx
agnesdcarey33086
 
Capstone Project - Nicholas Imholte - Final Draft
Capstone Project - Nicholas Imholte - Final DraftCapstone Project - Nicholas Imholte - Final Draft
Capstone Project - Nicholas Imholte - Final Draft
Nick Imholte
 
Suggest one psychological research question that could be answered.docx
Suggest one psychological research question that could be answered.docxSuggest one psychological research question that could be answered.docx
Suggest one psychological research question that could be answered.docx
picklesvalery
 
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxSAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
anhlodge
 
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxSAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
agnesdcarey33086
 
Random Variables G11
Random Variables G11Random Variables G11
Random Variables G11
SeineGaming
 

Similar to Lab 2 7_the_titanic_shuffle (20)

Module 6: Ensemble Algorithms
Module 6:  Ensemble AlgorithmsModule 6:  Ensemble Algorithms
Module 6: Ensemble Algorithms
 
Chapter 4
Chapter 4Chapter 4
Chapter 4
 
Pengenalan Ekonometrika
Pengenalan EkonometrikaPengenalan Ekonometrika
Pengenalan Ekonometrika
 
Random Variable & Probability Distribution 1.pptx
Random Variable & Probability Distribution 1.pptxRandom Variable & Probability Distribution 1.pptx
Random Variable & Probability Distribution 1.pptx
 
Operations management chapter 03 homework assignment use this
Operations management chapter 03 homework assignment use thisOperations management chapter 03 homework assignment use this
Operations management chapter 03 homework assignment use this
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
(8) Lesson 9.2
(8) Lesson 9.2(8) Lesson 9.2
(8) Lesson 9.2
 
Package cluster r
Package cluster rPackage cluster r
Package cluster r
 
SAMPLING MEAN DEFINITION The term sampling mean is.docx
SAMPLING MEAN  DEFINITION  The term sampling mean is.docxSAMPLING MEAN  DEFINITION  The term sampling mean is.docx
SAMPLING MEAN DEFINITION The term sampling mean is.docx
 
A Summary Of Data Analysis
A Summary Of Data AnalysisA Summary Of Data Analysis
A Summary Of Data Analysis
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
mcp-bandits.pptx
mcp-bandits.pptxmcp-bandits.pptx
mcp-bandits.pptx
 
Capstone Project - Nicholas Imholte - Final Draft
Capstone Project - Nicholas Imholte - Final DraftCapstone Project - Nicholas Imholte - Final Draft
Capstone Project - Nicholas Imholte - Final Draft
 
Suggest one psychological research question that could be answered.docx
Suggest one psychological research question that could be answered.docxSuggest one psychological research question that could be answered.docx
Suggest one psychological research question that could be answered.docx
 
An algorithm for building
An algorithm for buildingAn algorithm for building
An algorithm for building
 
Decision Tree and Bayesian Classification
Decision Tree and Bayesian ClassificationDecision Tree and Bayesian Classification
Decision Tree and Bayesian Classification
 
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxSAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
 
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxSAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
 
HMM & R & FK
HMM & R & FKHMM & R & FK
HMM & R & FK
 
Random Variables G11
Random Variables G11Random Variables G11
Random Variables G11
 

Lab 2 7_the_titanic_shuffle

  • 1. Name: Date: Block: Lab 2.7: The Titanic Shuffle Notes: In the lastlab,we learnedthatbyusinga do-loopandthe resample function,we couldsimulate shufflingourdatamanytimes. Thishelpsusdeterminehow likelyitisthata difference betweengroups isdue to chance. Data scientistsusuallyconsideranyz-score largerthan3 or smallerthan -3 to be extreme. Andby extreme we meanthattheyoccurso rarelyby chance alone that we start to believe that somethingbesideschance alone is causingthe z-scorestobe solarge. 1. Lookat the boxplotof farespaidseparatedbywhetherthe passengersurvivedornot. Do you believericherpassengerswere more likelytosurvive?Why? Refertothe quartilestosupportyour claim. 2. Calculate howmuchmore the typical survivorpaidversusthe typical non-survivorinourdata. Copy the resulthere: 3. What wasthe typical fare paidbysurvivors?Non-survivers?How muchmore didthe typical survivor pay?Write the code youusedto findthese answers. 4. Basedon the boxplots,whywouldusingthe medianbe abettermeasure of typical thanthe mean? 5. Use a do-loopandthe resample functiontoshuffle the passenger'ssurvival status300 timesand compute eachgroup's medianfare paid.Save yourshuffleddataasa new objectcalledshfl_med. Paste your code here: (hint:You can look in lab 2.6 for help with constructing thecode.) 6. Type inthe commandhead(shfl_med) andpaste the resulthere: 7. Use the transformfunctiontoadd a variable calledDiff tothe shuffledmediansyoujustcalculated. Save these valuesasa newobjectcalledshfl_diff . Paste yourcode here:
  • 2. (hint:Youcan lookinlab2.6 for helpwithconstructingthe code.) 8. Type inthe commandhead(shfl_diff)andpaste yourresulthere: 9. Create a plotof the mediandifferencesinthe fare paidforyourrandomizedsurvivorsandnon- survivors. Copyandpaste your plothere: 10. What was the actual difference inthe medianfare paidbysurvivorsandnon-survivors inthe data? Basedon yourplot,do youthinkthisdifference isbig?Why? 11. Compute the meanand standarddeviationof your300 randomizeddifferences. What code didyou use to findthe mean? Paste the resulthere: What code didyou use to findthe standarddeviation? Paste the resulthere: 12. Convertthe actual difference youcomputedtoaz-score usingthe meanandstandard deviations youjust computed.Write downthe formulayouusedandthe z-score you got. 13. How manystandarddeviationsawayfromthe meanisthe actual mediandifference infarespaidby survivorsandnon-survivors? 14. Use the subsetandnrow functionstocompute the estimatedprobabilitiesof az-score beinglarger than 3, smallerthan -3, andlarger than3 OR smallerthan-3. Assignthissubsetthe name zscore_probability. What code didyou use? Paste yourresulthere:
  • 3. Redo your simulationand the analysis,BUT thistime use the meanfare paid insteadof the median fare paid. 15. Use a do-loopandthe resample functiontoshuffle the passenger'ssurvival status300 timesand compute eachgroup's meanfare paid.Save yourshuffleddata asa new objectcalledshfl_mean. Paste your code here: 16. Type in the commandhead(shfl_mean)andpaste the resulthere: 17. Use the transformfunctiontoadda variable calledDifftothe shuffledmeansyoujustcalculated. Save these valuesasa newobjectcalledshfl_diff2. Paste your code here: 18. Type in the commandhead(shfl_diff2) andpaste yourresulthere: 19. Create a plotof the meandifferencesinthe fare paidforyourrandomizedsurvivorsandnon- survivors. Copyandpaste your plothere: 20. What was the actual difference inthe meanfare paidbysurvivorsandnon-survivorsinthe data? Basedon yourplot,do youthinkthisdifference isbig?Why? 21. Compute the standarddeviationof your300 randomizeddifferences. What code didyou use to findthe standarddeviation? Paste the resulthere: 22. Convertthe actual difference youcomputedtoaz-score usingthe real meanand standard deviation.Write downthe formulayouusedandthe z-score yougot. 23. How manystandarddeviationsawayfromthe meanisthe actual mediandifference infarespaidby survivorsandnon-survivors? 24. Doesyour conclusion change dependingonthe methodyouuse todescribe the typical fare paidby survivorsandnon-survivors?Explain. 25. If a journalistwalkeduptoyou rightnow and askedif the amountof fare paidfor a Titanicticket had an effectona person'sprobabilityof surviving,whatwouldyousay?How wouldyoujustifyyour answer?