Quantitative Methods for Lawyers - Class #15 - R Boot Camp - Part 2 - Professor Daniel Martin Katz

5,794 views

Published on

Quantitative Methods for Lawyers - Class #15 - R Boot Camp - Part 2 - Professor Daniel Martin Katz

Published in: Law, Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,794
On SlideShare
0
From Embeds
0
Number of Embeds
4,739
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Quantitative Methods for Lawyers - Class #15 - R Boot Camp - Part 2 - Professor Daniel Martin Katz

  1. 1. Quantitative Methods for Lawyers Statistical Tests Using R R Boot Camp - Part 2 Class #15 @ computational computationallegalstudies.com professor daniel martin katz danielmartinkatz.com lexpredict.com slideshare.net/DanielKatz
  2. 2. My Challenge to You Use R to Download and Clean this Simple DataSet
  3. 3. https://s3.amazonaws.com/KatzCloud/Bloom.csv Load into R and Do Some Basic Calculations
  4. 4. Load into R and Do Some Basic Calculations # Just Remember that R does not Handle https://
  5. 5. Load into R and Do Some Basic Calculations # Just Remember that R does not Handle https:// # Okay We Are Now Loaded
  6. 6. # Here is the Data - It Looks Okay
  7. 7. # Here is the Data - It Looks Okay # Here is the Problem - Our Data are Factors not Numeric
  8. 8. # Here is the Data - It Looks Okay # Here is the Problem - Our Data are Factors not Numeric # Thus we get this when trying to calculate a mean
  9. 9. # Here is the Data - It Looks Okay # Here is the Problem - Our Data are Factors not Numeric # Thus we get this when trying to calculate a mean We have two problems - (1) the fact that our data is non numeric (2) and the commas
  10. 10. # Here is the Data - It Looks Okay # Here is the Problem - Our Data are Factors not Numeric # Thus we get this when trying to calculate a mean # Okay This Is What We Need We have two problems - (1) the fact that our data is non numeric (2) and the commas
  11. 11. Okay Lets Do Some Basic Calculations
  12. 12. Okay Lets Do Some Basic Calculations # Okay We Are All Set
  13. 13. Okay Lets See How to Use R to Run Some of the Calculations that we have already seen in this class
  14. 14. The Binomial Distribution
  15. 15. Binomial Distribution “A binomial experiment (also known as a Bernoulli trial) is a statistical experiment that has the following properties: The experiment consists of n repeated trials. Each trial can result in just two possible outcomes. The probability of success, denoted by P, is the same on every trial. The trials are independent”
  16. 16. Example: Coin Flip Nostradamus Predicting Coin Flips - Does you Friend Have the General Ability to Actually Predict Coin Flips? How Would You Evaluate This Proposition? How Many Predictions Would Your Friend Have to Get Right For You To Believe They Actually Have Real Ability?
  17. 17. Example: Coin Flip Nostradamus Ho: Cannot Actually Predict Coin Flips H1: Can Actually Predict Coin Flip (i.e. do so at a rate greater than chance) Ho is the Null Hypothesis H1 is the Alternative Hypothesis
  18. 18. Reject the Null versus Failing to Reject the Null If We Fail to Reject the Null, we are left with the assumption of no relationship In the Coin Flip Example, We might have enough evidence to reject the null Remember the default (null) is that there is no relationship Although a Relationship might actually exist
  19. 19. Example: Coin Flip Nostradamus If He Were Guessing - what is the Probability Coin Flip Nostradamus Predicts at least 3 of 4 Coin Tosses ? p probability of success x number of successes n number of trials 3 or 4 4 1/2
  20. 20. Example: Coin Flip Nostradamus If He Were Guessing - what is the Probability Coin Flip Nostradamus Predicts at least 3 of 4 Coin Tosses ? p probability of success x number of successes n number of trials 3 or 4 4 1/2
  21. 21. Example: Coin Flip Nostradamus If He Were Guessing - what is the Probability Coin Flip Nostradamus Predicts at least 3 of 4 Coin Tosses ? #Here We Get Only For X=3
  22. 22. Example: Coin Flip Nostradamus If He Were Guessing - what is the Probability Coin Flip Nostradamus Predicts at least 3 of 4 Coin Tosses ? #Here We Get Only For X=3 #Now We Get a Vector if X=3, X=4
  23. 23. Example: Coin Flip Nostradamus If He Were Guessing - what is the Probability Coin Flip Nostradamus Predicts at least 3 of 4 Coin Tosses ? #Here We Get Only For X=3 #Now We Get The Sum of X=3, X=4 #Now We Get a Vector if X=3, X=4
  24. 24. Does 30 heads in 50 flips imply an unfair coin?
  25. 25. Does 30 heads in 50 flips imply an unfair coin?
  26. 26. Does 30 heads in 50 flips imply an unfair coin? Assuming a Fair Coin - what is the 95% Conf. Interval for 50 flips?
  27. 27. Does 30 heads in 50 flips imply an unfair coin? Assuming a Fair Coin - what is the 95% Conf. Interval for 50 flips?
  28. 28. Imagine that I gave out a 15 question multiple choice test with 5 possible answers per question.
  29. 29. Imagine that I gave out a 15 question multiple choice test with 5 possible answers per question. Using random guessing, what is the probability of getting exactly 7 questions correct?
  30. 30. Imagine that I gave out a 15 question multiple choice test with 5 possible answers per question. p probability of success x number of successes n number of trials 7 15 1/5 Using random guessing, what is the probability of getting exactly 7 questions correct?
  31. 31. Imagine that I gave out a 15 question multiple choice test with 5 possible answers per question. p probability of success x number of successes n number of trials 7 15 1/5 Using random guessing, what is the probability of getting exactly 7 questions correct?
  32. 32. Imagine that I gave out a 15 question multiple choice test with 5 possible answers per question. p probability of success x number of successes n number of trials 7 15 1/5 Using random guessing, what is the probability of getting exactly 7 questions correct? This is the exact probability for 7 But What About 7 or Greater?
  33. 33. Imagine that I gave out a 15 question multiple choice test with 5 possible answers per question. Using random guessing, what is the probability of getting greater than 7 questions correct? This is our prior answer Here we are summing 7:15
  34. 34. Normal Distribution
  35. 35. Imagine that a population of students take a test with an average score of 78 and a standard deviation of 9. Assuming the test scores are normally distributed, how many students received a 90 or higher?
  36. 36. Imagine that a population 100 Students take a test with an average score of 78 and a standard deviation of 9. Assuming the test scores are normally distributed, how many students received a 90 or higher?
  37. 37. Imagine that a population 100 Students take a test with an average score of 78 and a standard deviation of 9. Assuming the test scores are normally distributed, how many students received a 90 or higher? pnorm(q , mean= ,  sd= , lower.tail=TRUE) This is the Syntax:
  38. 38. Imagine that a population 100 Students take a test with an average score of 78 and a standard deviation of 9. Assuming the test scores are normally distributed, how many students received a 90 or higher? pnorm(q , mean= ,  sd= , lower.tail=TRUE) This is the Syntax: pnorm(90 , mean= 78 ,  sd=9 , lower.tail=FALSE) What We Want: because we want upper tail
  39. 39. Imagine that a population 100 Students take a test with an average score of 78 and a standard deviation of 9. Assuming the test scores are normally distributed, how many students received a 90 or higher? pnorm(q , mean= ,  sd= , lower.tail=TRUE) This is the Syntax: pnorm(90 , mean= 78 ,  sd=9 , lower.tail=FALSE) What We Want: because we want upper tail
  40. 40. In the 2011-2012 year the national average on the LSAT was 150.66 with a Standard Deviation of 10.19 Assuming those scores are normally distributed, what percentage of test takers scored 160 or above? http://www.lsac.org/docs/default-source/ research-%28lsac-resources%29/tr-12-03.pdf Table 1 on Page 9
  41. 41. In the 2011-2012 year the national average on the LSAT was 150.66 with a Standard Deviation of 10.19 Assuming those scores are normally distributed, what percentage of test takers scored 160 or above? http://www.lsac.org/docs/default-source/ research-%28lsac-resources%29/tr-12-03.pdf Table 1 on Page 9
  42. 42. Hypothesis Testing
  43. 43. 235,000 175,000 750,000 230,000 450,000 150,000 1,000,060 910,000 150,000 220,000 130,000 170,000 234,000 450,000 890,000 101,000 120,000 560,000 321,000 456,000 102,000 30,000 793,000 250,900 862,000 673,000 463,000 54,000 39,000 687,000 260,800 682,000 3,514,000 67,000 356,000 13,000 42,000 4,000 402,000 943,000 961,600 630,000 398,800 52,000 976,500 540,000 Awards in Rest of State Awards in Bloom County N = 21 N = 25 Are Damage Awards in Bloom County Excessive? H0: There is No Difference Between the Mean Damage Award in Bloom County and the Mean Damage Award in the Rest of the State This is a Two Sample Problem
  44. 44. H0: There is No Difference Between the Mean Damage Award in Bloom County and the Mean Damage Award in the Rest of the State Num of Obs. Mean Std. Dev. GROUP 1 Rest of State 21 $371,621 $289,823 GROUP 2 Bloom County 25 $547,784 $703,314
  45. 45. Given the sample size we might consider a T-Test
  46. 46. But We Need To Check For Normality
  47. 47. Okay this means we need to use a non-parametric test Good News is that is Available in
  48. 48. This is tricky because we have do not have even sized samples Also the Sample is Pretty Small As reported on Page 274 of Lawless, et. al.
  49. 49. Chi Squared Test
  50. 50. Male Female Totals Not Research Asst 319 323 642 Research Assistant 60 34 94 Total 379 357 736 RA’s Hired at a School are mostly Men 60 out of 94 RA’s are Men (See Above) Could this just be chance or is it too large to be explained by chance? Chi Square ( ) Statisticχ 2
  51. 51. Male Female Totals Not Research Asst 319 323 642 Research Assistant 60 34 94 Total 379 357 736
  52. 52. F Statistic F Test
  53. 53. 2500 1500 1500 1300 2000 1500 1500 2000 2000 1500 1400 1500 2000 800 3000 1500 2000 2000 1700 2500 2299 1900 2050 2101 1160 2101 1300 3500 900 995 1299 1900 995 771 1250 900 749 1200 950 1200 995 1300 1600 1601 1000 1371 2400 1500 1325 1500 1799 2780 1800 1399 2225 1700 3800 2299 1800 1450 1500 1000 1500 1799 1600 1600 2000 2500 1200 2500 2000 1500 Northern District Western District Southern District Eastern District Attorneys Fees in Chapter 7 BK’s in Texas Districts (From Lawless, et al) https://s3.amazonaws.com/KatzCloud/AttyFeesTexasBKDist.csv
  54. 54. Daniel Martin Katz @ computational computationallegalstudies.com lexpredict.com danielmartinkatz.com illinois tech - chicago kent college of law@

×