Streamlining Python Development: A Guide to a Modern Project Setup
Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Professor Daniel Martin Katz
1. Quantitative
Methods
for
Lawyers Exploring Data in R
Loading Datasets
R Boot Camp - Part 1
Class #14
@ computational
computationallegalstudies.com
professor daniel martin katz danielmartinkatz.com
lexpredict.com slideshare.net/DanielKatz
20. Within this
folder is my
Desktop so lets
go there
I have now Set
my Working
Directory to the
Desktop
21. Actually I want
to point this to
my R folder
which is located
on my Desktop
If you retype
the command
add a slash and
hit tab ...
this menu will
pop up and you
can find the “R”
folder
The use of tab
is very helpful
as it can be
used to figure
out how to
complete lots of
arguments in R
26. This is Calvin Johnson of the Detroit Lions
It is located on this website:
http://s3.amazonaws.com/
KatzCloud/Calvin_Test_Data.csv
I have made available to
you a simple dataset
featuring game by game
statistics for each game in
Calvin's professional career.
31. We will focus upon
loading the most
common dataset
formats
.dta .csv .xls
32. This is Calvin Johnson of the Detroit Lions
It is located on this website:
http://s3.amazonaws.com/
KatzCloud/Calvin_Test_Data.csv
Note the file extension of .csv
As you
learned
while
getting
your 7
badges
We Will Need to Assign The
Dataset a Name Once We
Load it into R
33. Here are all of the default
settings including
header=TRUE
34. Here Read in
a .CSV file
using the full
URL
<-
This is used to
assign an
object
Here I have
given the set
the name
calvin_game_data
35. If you have
downloaded
the .CSV file
locally to your
machine than
make sure that
the path
extension is set
to the location
where the data
set currently
resides
36. Type this
and
then hit tab when
your cursor is
between the
quotes
(this will bring up
all files within the
current working
directory ... in this
case all files on
my desktop
38. Getting Rid of the NA’s
If you want to view the data in a spreadsheet form:
View(calvin_game_data)
Notice that we have an issue with extra rows full of NA's. We
need to generate a clean version of the data without those rows
of missing values.
There are several ways to fix this but here is one way:
calvin_games_data <- calvin_game_data[complete.cases(calvin_game_data), ]
(Note: I will explain this syntax on the next slide)
39. Getting Rid of the NA’s
Lets take this apart. The complete.cases command creates a logical vector
specifying which observations/rows have no missing values across the entire
sequence. To test observe this try running the following:
complete.cases(calvin_game_data)
You will see that each row gets a true/false value. Those True/False are in
response to the presence of the NA values.
In the full command we are creating a new dataset called "calvin_games_data"
The syntax on the right in plain language is to take calvin_game_data and then
complete the cases using a row, column logic.
The syntax of complete cases is as follows: complete.cases(x, y)
Notice that in the following we use x=calvin_game_data and y is left blank after
the comma. The default here with the blank is to take the whole row.
calvin_games_data <- calvin_game_data[complete.cases(calvin_game_data), ]
40. The Head( )
command will
give you the
first few rows
but notice that
the row
numbering is
still off
43. Some Basic Commands
What is the fewest yards Calvin has
had in a Game?
What is the most Touchdowns Calvin
has had in a Game?
44. Some Basic Commands
What is the fewest yards Calvin has
had in a Game?
What is the most Touchdowns Calvin
has had in a Game?
Min Selects the Smallest Value
Syntax is Dataset$Variable
Dollar Sign Selects the Column
Max Selects the Largest Value
Syntax is Dataset$Variable
Dollar Sign Selects the Column
45. Some Basic Commands
How Many Touchdowns has Calvin
had in his career?
What are the respective quantiles of
Calvin’s Yards Per Game?
46. Some Basic Commands
How Many Touchdowns has Calvin
had in his career?
What are the respective quantiles of
Calvin’s Yards Per Game?
47. Some Basic Commands
Across his Career what are Calvin’s
average yards per game?
What is the Standard Deviation of
those Yards?
48. Some Basic Commands
Across his Career what are Calvin’s
average yards per game?
What is the Standard Deviation of
those Yards?
58. Box and Whisker Plot
Earlier in the Course We Saw This Data ...
New York City
31.5 33.6 42.4 52.5 62.7 71.6 76.8 75.5 68.2 57.5 47.6 36.6
Houston
50.4 53.9 60.6 68.3 74.5 80.4 82.6 82.3 78.2 69.6 61 53.5
San Francisco
48.7 52.2 53.3 55.6 58.1 61.5 62.7 63.7 64.5 61 54.8 49.4
http://s3.amazonaws.com/KatzCloud/AvgTemp.csv
Load the Data from Here: