Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Professor Daniel Martin Katz

5,446 views
5,327 views

Published on

Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Professor Daniel Martin Katz

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,446
On SlideShare
0
From Embeds
0
Number of Embeds
3,905
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Professor Daniel Martin Katz

  1. 1. Quantitative Methods for Lawyers Exploring Data in R Loading Datasets R Boot Camp - Part 1 Class #14 @ computational computationallegalstudies.com professor daniel martin katz danielmartinkatz.com lexpredict.com slideshare.net/DanielKatz
  2. 2. A Place to Get Familiar with the Language
  3. 3. Can You Earn All 7 Badges ? http://tryr.codeschool.com/
  4. 4. The Cheat Sheet
  5. 5. http://cran.r-project.org/doc/contrib/Short-refcard.pdf Download It, Print It and Keep it with you when you are working: This can be extremely helpful
  6. 6. Let Me Start By Flagging Some Additional Resources that are Available to Learn R
  7. 7. http://www.ats.ucla.edu/stat/r/
  8. 8. https://www.coursera.org/course/compdata SignUp Here: Videos Are Here: http://www.youtube.com/watch? v=EiKxy5IecUw&list=PL7Tw2kQ2ed vpNEGrU0cGKwmdDRKc5A6C4
  9. 9. http://www.r-bloggers.com/That is Me :) Wearing Google Glasses
  10. 10. http://www.programmingr.com/
  11. 11. http://www.statmethods.net/
  12. 12. http://www.stat.yale.edu/~jay/JSM2012/PDFs/intro.pdf
  13. 13. http://cran.r-project.org/web/packages/IPSUR/vignettes/IPSUR.pdf A 412 Page Book on Probability and Statistics Using R
  14. 14. As You Learn More Take a Look at the Style Guide Produced By http://google-styleguide.googlecode.com/svn/trunk/Rguide.xml
  15. 15. Setting Your Working Directory in _
  16. 16. Initially we need to make sure we understand what directory / folder R is using
  17. 17. We use getwd() in order to determine the current working directory
  18. 18. We use getwd() in order to determine the current working directory This is a Mac file extension but I am in my Users/katzd folder
  19. 19. Within this folder is my Desktop so lets go there
  20. 20. Within this folder is my Desktop so lets go there I have now Set my Working Directory to the Desktop
  21. 21. Actually I want to point this to my R folder which is located on my Desktop If you retype the command add a slash and hit tab ... this menu will pop up and you can find the “R” folder The use of tab is very helpful as it can be used to figure out how to complete lots of arguments in R
  22. 22. So now I have my directory setup properly
  23. 23. Loading a Dataset(s) into r
  24. 24. To Get Started You Need to Be Able to Load a Dataset(s) into r
  25. 25. In general, your dataset(s) is going to be either located either on your computer or online
  26. 26. This is Calvin Johnson of the Detroit Lions It is located on this website: http://s3.amazonaws.com/ KatzCloud/Calvin_Test_Data.csv I have made available to you a simple dataset featuring game by game statistics for each game in Calvin's professional career.
  27. 27. How Do I Load Datasets from the Internet ?
  28. 28. There are various file formats in which your data may be located
  29. 29. Subject to limitations such as terms of service, etc. It is quite possible to turn anything online into your dataset
  30. 30. http://computationallegalstudies.com/2009/07/01/how- python-can-turn-the-internet-into-your-dataset-part-1/
  31. 31. We will focus upon loading the most common dataset formats .dta .csv .xls
  32. 32. This is Calvin Johnson of the Detroit Lions It is located on this website: http://s3.amazonaws.com/ KatzCloud/Calvin_Test_Data.csv Note the file extension of .csv As you learned while getting your 7 badges We Will Need to Assign The Dataset a Name Once We Load it into R
  33. 33. Here are all of the default settings including header=TRUE
  34. 34. Here Read in a .CSV file using the full URL <- This is used to assign an object Here I have given the set the name calvin_game_data
  35. 35. If you have downloaded the .CSV file locally to your machine than make sure that the path extension is set to the location where the data set currently resides
  36. 36. Type this and then hit tab when your cursor is between the quotes (this will bring up all files within the current working directory ... in this case all files on my desktop
  37. 37. I then select the Calvin_Test_Data.csv
  38. 38. Getting Rid of the NA’s If you want to view the data in a spreadsheet form: View(calvin_game_data) Notice that we have an issue with extra rows full of NA's. We need to generate a clean version of the data without those rows of missing values. There are several ways to fix this but here is one way: calvin_games_data <- calvin_game_data[complete.cases(calvin_game_data), ] (Note: I will explain this syntax on the next slide)
  39. 39. Getting Rid of the NA’s Lets take this apart. The complete.cases command creates a logical vector specifying which observations/rows have no missing values across the entire sequence. To test observe this try running the following: complete.cases(calvin_game_data) You will see that each row gets a true/false value. Those True/False are in response to the presence of the NA values. In the full command we are creating a new dataset called "calvin_games_data" The syntax on the right in plain language is to take calvin_game_data and then complete the cases using a row, column logic. The syntax of complete cases is as follows: complete.cases(x, y) Notice that in the following we use x=calvin_game_data and y is left blank after the comma. The default here with the blank is to take the whole row. calvin_games_data <- calvin_game_data[complete.cases(calvin_game_data), ]
  40. 40. The Head( ) command will give you the first few rows but notice that the row numbering is still off
  41. 41. The NULL here will reset the row numbers
  42. 42. Learning Some of the Syntax
  43. 43. Some Basic Commands What is the fewest yards Calvin has had in a Game? What is the most Touchdowns Calvin has had in a Game?
  44. 44. Some Basic Commands What is the fewest yards Calvin has had in a Game? What is the most Touchdowns Calvin has had in a Game? Min Selects the Smallest Value Syntax is Dataset$Variable Dollar Sign Selects the Column Max Selects the Largest Value Syntax is Dataset$Variable Dollar Sign Selects the Column
  45. 45. Some Basic Commands How Many Touchdowns has Calvin had in his career? What are the respective quantiles of Calvin’s Yards Per Game?
  46. 46. Some Basic Commands How Many Touchdowns has Calvin had in his career? What are the respective quantiles of Calvin’s Yards Per Game?
  47. 47. Some Basic Commands Across his Career what are Calvin’s average yards per game? What is the Standard Deviation of those Yards?
  48. 48. Some Basic Commands Across his Career what are Calvin’s average yards per game? What is the Standard Deviation of those Yards?
  49. 49. Some Basic Commands What About the Skewness and Kurtosis of those Yards?
  50. 50. Some Basic Commands If you want a high level perspective on your variables try the summary command:
  51. 51. Plotting Data Lets Plot Calvin’s Yards Per Game
  52. 52. Plotting Data Lets Plot Calvin’s Yards Per Game Notice the Default Bin Widths, Labels & Style of the Histogram
  53. 53. Getting Help
  54. 54. Getting Help
  55. 55. Getting Help
  56. 56. Getting Help
  57. 57. Plotting Data Lets Plot Calvin’s Yards Per Game
  58. 58. Box and Whisker Plot Earlier in the Course We Saw This Data ... New York City 31.5 33.6 42.4 52.5 62.7 71.6 76.8 75.5 68.2 57.5 47.6 36.6 Houston 50.4 53.9 60.6 68.3 74.5 80.4 82.6 82.3 78.2 69.6 61 53.5 San Francisco 48.7 52.2 53.3 55.6 58.1 61.5 62.7 63.7 64.5 61 54.8 49.4 http://s3.amazonaws.com/KatzCloud/AvgTemp.csv Load the Data from Here:
  59. 59. Load the Data from My Cloud Take a Peak at the Results
  60. 60. Okay so this is not exactly a great looking plot
  61. 61. Notice here how I passed the vector of names
  62. 62. In the RStudio Plots Window Use the Copy to Clipboard Option Then Scale the Plot So that the Y Axis is Larger
  63. 63. The Final Product
  64. 64. More To Come in Part 2 of BootCamp
  65. 65. Daniel Martin Katz @ computational computationallegalstudies.com lexpredict.com danielmartinkatz.com illinois tech - chicago kent college of law@

×