- 1. 1 R: THE TRUE BASICS R, also called the Languagefor Statistical Computing, was developed by Ross Ihaka and Robert Gentleman at the University of Auckland in the nineties. It is considered an open sourceimplementation of the S language, which was developed by John Chambers in the Bell Laboratories in the eighties. R provides a wide variety of statistical techniques and visualization capabilities. Another very importantfeature about R is that it is highly extensible. Because of this and more importantly becauseR is open source, it actually was the vehicle to bring the power of S to a larger community. Like in every programming language, there are pros and cons. ADVANTAGES: 1) It is an open sourceand free. 2) Master at graphics 3) Command – line Interface 4) Reproducibility through R scripts 5) R packages: Extensions of R DISADVANTAGES: 1) Easy to learn, harder to master 2) Poorly written code hard to read/maintain 3) Command – Line interface daunting at first 4) Poorly written code is slow The first step in R is one of the mostimportant components of R, and where most of the action happens, is the R console. It's a place where you can execute R commands. You simply type something at the promptin the console, hit Enter, and R interprets and executes your command.
- 2. 2 Let's start our experiments by having R do some basic arithmetic; we'll calculate the sum of 1 and 2. We simply type 1 + 2 in the consoleand hit Enter. R compiles what you typed, calculates the result and prints that resultas a numerical value. Now let's try to type sometext in the console. We usedouble quotes for this. You can also simply type a number and hit Enter. R understood your character string and numerical value, but simply printed that string as an output. This sbrings meto the first super importantconcept in R: the variable. A variable allows you to storea value or an object in R. You can then later use this variable's name to easily access the value or the object that is stored within this variable. You can use the less than sign followed by a dash to create a variable.
- 3. 3 Supposethe number 2 is the height of a rectangle. Let's assign this value 2 to a variable height. We type height, less than sign, dash, 2: This time, R does not print anything, because it assumes that you will be using this variable in the future. If we now simply type and execute height in the console, R returns 2: We can do a similar thing for the width of our imaginary rectangle. We assign the value 4 to a variablewidth. If wetype width, we see that indeed, it contains the value 4. As you'reassigning variables in the R console, you're actually accumulating an R workspace. It's theplace where variables and information is stored in R. You can access the objects in the workspacewith the ls() function. Simply type ls followed by empty parentheses and hit enter. This shows you a list of all the variables you have created in the R session. If you havefollowed all the examples up to now, you should see "height" and "width". This tells you that there are two objects in your workspaceatthe moment. When you type height in the console, R looks for the variable height in the workspace, finds it, and prints the corresponding value. If, however, wetry to printa non-existing variable, depth for example, R throws an error, becausedepth is not defined in the workspaceand thus not found. The principle of accumulating a workspace through variable assignmentmakes these variables available for further use. Supposewe wantto find out the area of our imaginary rectangle, which is height multiplied by width. Let's go ahead and type height asterisk width. The result is 8, as you'd expect. We can take it one step further and also assign the result of this calculation to a new variable, area. We again use the assignment operator. If you now type area, you'll see that it contains 8 as well. Inspecting
- 4. 4 the workspaceagain with ls, shows thatthe workspacecontains threeobjects now: area, height and width.
- 5. 5 Basic Data Types R's fundamental data types, also called atomic vector types. Throughoutour experiments, we will use the function class(). This is a usefulway to see what type a variable is. Let's head over to the consoleand start with TRUE, in capital letters. TRUE is a logical. That's also what class(TRUE) tells us. Logical are so- called boolean values, and can be either `TRUE` or `FALSE`. Well, actually, `NA`, to denote missing values, is also a logical. We can performall sorts of operations on them such as addition, subtraction, multiplication, division and many more. A special type of numeric is the integer. Itis a way to represent natural numbers like 1 and 2. To specify that a number is integer, you can add a capital L to them. We don't see the difference between the integer 2 and the numeric 2 from the output. However, the `class()` function reveals the difference. Instead of asking for the class of a variable, you can also use the is-dot-functions to see whether variables are actually of a certain type. To see if a variableis a numeric, we can usethe is- dot-numeric function. Itappears that both are numeric. To see if a variableis integer, we can use is-dot-integer. This shows us that integers are numeric, but that not all numeric are integers, so there's some
- 6. 6 kind of type hierarchy going on here. Lastbut not least, there's the character string. The class of this type of object is "character". It's importantto note that there are other data types in R, such as double for higher precision numeric, complex for handling complex numbers, and raw to storeraw bytes.
- 7. 7 Vectors 1) Create and name vectors: A vector is nothing more than a sequence of data elements of the _same_ basic data type. Firstthings first: creating a vector in R! You use the `c()` function for this, which allows us to combine values into a vector. Supposeyou'replaying a basic card game, and record the suit of 5 cards you draw from a deck. A possibleoutcome and corresponding vector to contain this information could be this one of coursewe could also assign this character vector to a new variable, drawn suits for example. We now have a character vector, drawn suits. Wecan assertthat it is a vector, by typing is dot vector drawn suits Likewise, we could create a vector of integers for example to storehow much cards of each suit remain after we drew the 5 cards. Let's call this vector remain. There are 11 more spades, 12 morehearts, 11 diamonds, and all 13 clubs still remain. .
- 8. 8 We can use the `names ()` function for this. Let's firstcreate another character vector, `suits`, thatcontains the strings "spades", "hearts", "diamonds", and "clubs", the names we wantto give your vector elements. 2) Vector Arithmetic: We learned that we can usevariables to perform arithmetic Remember how you summed apples and oranges? From the previous section, we also know that actually these variables, `my_apples` and `my_oranges`,aresimply vectors. This means that we can perform arithmetic with vectors in R.
- 9. 9 The most important thing to remember about operations with vectors in R , is that they will be applied element by element. This means that standard mathematics is extended to vectors in an element-wise fashion. Imagineyou have a vector containing your gambling earnings for the past 3 days. Not bad for a few days in the desert, is it? Imaginea well-dressed gentleman approaches you and offers to triple your earnings for the past three days, if you beat him in one round of poker. If you wantto calculate the expected earnings for each of the pastthree days, you can easily do it in R. As you can see, R multiplies each element in the `earnings` vector with 3, resulting in 150 dollars of promised earnings in the first day, 300 in the second day and 90 in the third day.. Likewise, division, subtraction, summation and many more are all carried out element wise, just as if you are carrying outthe operation between two scalars three times. Fromthese lines of code you don't see anything differentfrom what we'vedone before, becauseof course, you were working with vectors all along. The mathematics naturally extended to vectors that contain more than one element. Let's go back to your Vegas adventures. To enjoy your earnings, you also decided to go shopping and spend some money every day on the Las Vegas Strip. You recorded a vector of expenses. Because you are a very conscious programmer in training, you decide to compute whether your luck in the casino was sufficient to pay for your expenses.
- 10. 10 MATRICES Creating and naming matrices: A matrix is kind of like the big brother of the vector. Where a vector is a sequence of data elements, which is one-dimensional, a matrix is a similar collection of data elements, but this time arranged into a fixed number of rows and columns. Since we are only working with rows and columns, a matrix is called two-dimensional. The matrix can contain only one atomic vector type. This means that you can't have logical and numeric in a matrix for example. There's really not much more theory about matrices than this: it's really a natural extension of the vector, going fromone to two dimensions. Of course, this has its implications for manipulating and subsetting matrices, but let's start with simply creating and naming them. To build a matrix, you usethe matrix function. Most importantly, it needs a vector, containing the values you want to place in the matrix, and at least one matrix dimension. You can chooseto specify the number of rows or the number of columns. Havea look at the following example, that creates a 2-by-3 matrixcontaining the values 1 to 6, by specifying the vector and setting the row argument to 2: R sees that the input vector has length 6 and that there haveto be two rows. Itthen infers that you'll probably want3 columns, such that the number of matrix elements matches the number of input vector elements.
- 11. 11 If you prefer to fill up the matrix in a row-wisefashion, such that the 1, 2 and 3 are in the first row, you can set the `by row` argumentof matrix to `TRUE` Can you spot the difference? Remember how R did recycling when you weresubsetting vectors using logical vectors? The same thing happens when you pass the matrix function a vector that is too shortto fill up the entire matrix. Supposeyou pass a vector containing the values 1 to 3 to the matrix function, and explicitly say you wanta matrix with 2 rows and 3 columns: R fills up the matrix column by column and simply repeats the vector. If you try to fill up the matrix with a vector whosemultiple does not nicely fit in the matrix, for example when you want to put a 4-element vector in a 6- element matrix, R generates a warning message. Actually, apartfrom the `matrix()` function, there's yet another easy way to create matrices that is more intuitive in some cases. You can pastevectors together using the `cbind()` and `rbind()` functions. Havea look at these calls `cbind()`, shortfor column bind, takes the vectors you pass it, and sticks them together as if they were columns of a matrix. The `rbind()` function, shortfor row bind, does the samething but takes the input as rows and makes a matrix out of them. These functions can come in pretty handy, because they're often more easy to use than the `matrix()` function.
- 12. 12 If you want to add another row to it, containing the values 7, 8, 9, you could simply run this command: You can do a similar thing with `cbind()`: Next up is naming the matrix. In the case of vectors, you simply used the names() function, but in the case of matrices, you could assign names to both columns and rows. That's why R came up with the rownames () and colnames () functions. Their use is pretty straightforward. Retaking thematrix `m` from before, we can set the row names justthe same way as wenamed vectors, but this time with the row names function.