Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

RMySQL Tutorial For Beginners

13,811 views

Published on

In this tutorial, we learn to access MySQL database from R using the RMySQL package. The tutorial covers everything from creating tables, appending data to removing tables from the database.

Published in: Data & Analytics

RMySQL Tutorial For Beginners

  1. 1. www.r- R2 Academy RMySQL Tutorial Access MySQL from R
  2. 2. Course Material R2 Academy All the material related to this course are available at our Website Slides can be viewed at SlideShare Scripts can be downloaded from GitHub Videos can be viewed on our YouTube Channel www.rsquaredacademy.com 2
  3. 3. Table Of Contents R2 Academy → Objectives → Introduction → Installing RMySQL → RMySQL Commands → Connecting to MySQL → Database Info → Listing Tables → Creating Tables → Import data into R data frame → Export data from R www.rsquaredacademy.com 3
  4. 4. Objectives R2 Academy → Install & load RMySQL package → Connect to a MySQL Database from R → Display database information → List tables in the database → Create new table → Import data into R for analysis → Export data from R → Remove tables & disconnect www.rsquaredacademy.com 4
  5. 5. Introduction R2 Academy www.rsquaredacademy.com 5 In real world, data is often stored in relational databases such as MySQL and an analyst is required to extract the data in order to perform any type of analysis. If you are using R for statistical analysis and a relational database for storing the data, you need to interact with the database in order to access the relevant data sets. One way to accomplish the above task is to export the data from the database in some file format and import the same into R. Similarly, if you have some data as a data frame in R and want to store it in a database, you will need to export the data from R and import it into the database. This method can be very cumbersome and frustrating. The RMySQL package was created to help R users to easily access a MySQL database from R. In order to take advantage of the features of the package, you need the following: • Access to MySQL database • Knowledge of basic SQL commands • Latest version of R (3.2.3) • RStudio (Version 0.99.491) (Optional) • RMySQL Package (Version 0.10.8)
  6. 6. RMySQL Package R2 Academy www.rsquaredacademy.com 6 RMySQL package allows you to access MySQL from R. It was created by Jeffrey Horner but is being maintained by Jeroen Ooms and Hadley Wickham. The latest release of the package is version 0.10.8. You can install and load the package using the following commands: # install the package install.packages("RMySQL") # load the package library(RMySQL)
  7. 7. Connect To Database R2 Academy www.rsquaredacademy.com 7 We can establish a connection to a MySQL database using the dbConnect() function. In order to connect to the database, we need to specify the following: • MySQL Connection • Database name • Username • Password • Host Details Below is an example: # create a MySQL connection object con <- dbConnect(MySQL(), user = 'root', password = 'password', host = 'localhost', dbname = 'world')
  8. 8. Connection Summary R2 Academy www.rsquaredacademy.com 8 We can get a summary or meta data of the connection using summary() function. We need to specify the name of the MySQL Connection object for which we are seeking meta data. Below is an example: # connect to MySQL con <- dbConnect(MySQL(), user = 'root', password = 'password', host = 'localhost', dbname = 'world') > summary(con) <MySQLConnection:0,0> User: root Host: localhost Dbname: world Connection type: localhost via TCP/IP Results:
  9. 9. Database Info R2 Academy www.rsquaredacademy.com 9 The dbGetInfo() function can be used to access information about the database to which we have established a connection. Among other things, it will return the following information about host, server and connection type. > dbGetInfo(con) $host [1] "localhost" $user [1] "root" $dbname [1] "world" $conType [1] "localhost via TCP/IP" $serverVersion [1] "5.7.9-log" $protocolVersion [1] 10 $threadId [1] 7 $rsId list()
  10. 10. List Tables R2 Academy www.rsquaredacademy.com 10 Once we have successfully established a connection to a MySQL database, we can use the dbListTables() function to access the list of tables that are present in that particular database. We need to specify the name of the MySQL connection object for which we are seeking the list of tables. Below is an example: # list of tables in the database > dbListTables(con) [1] "city" "country" "countrylanguage" [4] "mtcars" As you can see, there are four tables in the database to which we established the connection through RMySQL package. In the function, we have not specified the database name but the name of the MySQL connection object we created when we connected to the database.
  11. 11. List Fields R2 Academy www.rsquaredacademy.com 11 To get a list of fields or columns in a particular table in the database, we can use the dbListFields() function. We need to specify the name of the MySQL connection object as well as the table name. If the table exists in the database, the names of the fields will be returned. Below is an example: # list of fields in table city > dbListFields(con, "city") [1] "ID" "Name" "CountryCode" "District" [5] "Population" The name of the table must be enclosed in single/double quotes and the names of the fields is returned as a character vector.
  12. 12. Testing Data Types R2 Academy www.rsquaredacademy.com 12 To test the SQL data type of an object, we can use the dbDataType() function. Below is an example: > # data type > dbDataType(RMySQL::MySQL(), "a") [1] "text" > dbDataType(RMySQL::MySQL(), 1:5) [1] "bigint" > dbDataType(RMySQL::MySQL(), 1.5) [1] "double" We need to specify the driver details as well as the object to test the SQL data type.
  13. 13. Querying Data R2 Academy www.rsquaredacademy.com 13 There are three different methods of querying data from a database: • Import the complete table using dbReadTable() • Send query and retrieve results using dgGetQuery() • Submit query using dbSendQuery() and fetch results using dbFetch() Let us explore each of the above methods one by one.
  14. 14. Import Table R2 Academy www.rsquaredacademy.com 14 The dbReadTable() can be used to extract an entire table from a MySQL database. We can use this method only if the table is not very big. We need to specify the name of the MySQL connection object and the table. The name of the table must be enclosed in single/double quotes. In the below example, we read the entire table named “trial” from the database. > dbReadTable(con, "trial") x y 1 1 a 2 2 b 3 3 c 4 4 d 5 5 e 6 6 f 7 7 g 8 8 h 9 9 i 10 10 j
  15. 15. Import Rows R2 Academy www.rsquaredacademy.com 15 The dbGetQuery() function can be used to extract specific rows from a table. We can use this method when we want to import rows that meet certain conditions from a big table stored in the database. We need to specify the name of the MySQL connection object and query. The query must be enclosed in single/double quotes. In the below example, we read the first 5 lines from the table named trial. > dbGetQuery(con, "SELECT * FROM trial LIMIT 5;") x y 1 1 a 2 2 b 3 3 c 4 4 d 5 5 e dbGetQuery() function sends the query and fetches the results from the table in the database.
  16. 16. Import Data in Batches R2 Academy www.rsquaredacademy.com 16 We can import data in batches as well. To achieve this, we will use two different functions: • dbSendQuery() • dbFetch() dbSendQuery() will submit the query but will not extract any data. To fetch data from the database, we will use the dbFetch() function which will fetch data from the query that was executed by dbSendQuery(). As you can see, this method works in two stages. Let us look at an example to get a better understanding: > # pull data in batches > query <- dbSendQuery(con, "SELECT * FROM trial;") > data <- dbFetch(query, n = 5) We store the result of the dbSendQuery() function in an object ‘query.’ The MySQL connection object and the SQL query are the inputs to this function. Next, we fetch the data using the dbFetch() function. The inputs for this function are the result of the dbSendQuery() function and the number of rows to be fetched. The rows fetched are stored in a new object ‘data ‘.
  17. 17. Query Information R2 Academy www.rsquaredacademy.com 17 The dbGetInfo() function returns information about query that has been submitted for execution using dbSendQuery(). Below is an example: > res <- dbSendQuery(con, "SELECT * FROM trial;") > dbGetInfo(res) $statement [1] "SELECT * FROM trial;" $isSelect [1] 1 $rowsAffected [1] -1 $rowCount [1] 0 $completed [1] 0 $fieldDescription $fieldDescription[[1]] NULL
  18. 18. Query & Rows Info R2 Academy www.rsquaredacademy.com 18 The dbGetStatement() function returns query that has been submitted for execution using dbSendQuery(). Below is an example: > res <- dbSendQuery(con, "SELECT * FROM trial;") > dbGetStatement(res) [1] "SELECT * FROM trial;“ The dbGetRowCount() function returns the number of rows fetched from the database by the dbFetch() function. Below is an example: > res <- dbSendQuery(con, "SELECT * FROM trial;") > data <- dbFetch(res, n = 5) > dbGetRowCount(res) [1] 5 The dbGetRowsAffected() function returns the number of rows affected returns query that has been submitted for execution using dbSendQuery() function. Below is an example: > dbGetRowsAffected(res) [1] -1
  19. 19. Column Info R2 Academy www.rsquaredacademy.com 19 The dbColumnInfo() function returns information about the columns of the table for which query has been submitted using dbSendQuery(). Below is an example: > res <- dbSendQuery(con, "SELECT * FROM trial;") > dbColumnInfo(res) name Sclass type length 1 row_names character BLOB/TEXT 196605 2 x double BIGINT 20 3 y character BLOB/TEXT 196605 The dbClearResult() function frees all the resources associated with the result set of the dbSendQuery() function. Below is an example: > res <- dbSendQuery(con, "SELECT * FROM trial;") > dbClearResult(res) [1] TRUE
  20. 20. Export/Write Table R2 Academy www.rsquaredacademy.com 20 The dbWriteTable() function is used to export data from R to a database. It can be used for the following: • Create new table • Overwrite existing table • Append data to table In the first example, we will create a dummy data set and export it to the database. We will specify the following within the dbWriteTable() function: 1. Name of the MySQL connection object 2. Name of the table to created in the database 3. Name of the data frame to be exported
  21. 21. Export/Write Table R2 Academy www.rsquaredacademy.com 21 We will create the table trial that we have so far used in all the previous examples: # list of tables in the database > dbListTables(con) [1] "city" "country" "countrylanguage" [4] "mtcars" # create dummy data set > x <- 1:10 > y <- letters[1:10] > trial <- data.frame(x, y, stringsAsFactors = FALSE) # create table in the database > dbWriteTable(con, "trial", trial) [1] TRUE # updated list of tables in the database > dbListTables(con) [1] "city" "country" "countrylanguage" [4] "mtcars" "trial"
  22. 22. Overwrite Table R2 Academy www.rsquaredacademy.com 22 We can overwrite the data in a table by using the overwrite option and setting it to TRUE. Let us overwrite the table we created in the previous example: # list of tables in the database > dbListTables(con) [1] "city" "country" "countrylanguage" [4] "mtcars" "trial" # create dummy data set > x <- sample(100, 10) > y <- letters[11:20] > trial2 <- data.frame(x, y, stringsAsFactors = FALSE) # overwrite table in the database > dbWriteTable(con, "trial", trial2, overwrite = TRUE) [1] TRUE
  23. 23. Append Data R2 Academy www.rsquaredacademy.com 23 We can overwrite the data in a table by using the append option and setting it to TRUE. Let us append data to the table we created in the previous example: # list of tables in the database > dbListTables(con) [1] "city" "country" "countrylanguage" [4] "mtcars" "trial" # create dummy data set > x <- sample(100, 10) > y <- letters[5:14] > trial3 <- data.frame(x, y, stringsAsFactors = FALSE) # append data to the table in the database > dbWriteTable(con, "trial", trial3, append = TRUE) [1] TRUE
  24. 24. Remove Table R2 Academy www.rsquaredacademy.com 24 The dbRemoveTable() function can be used to remove tables from the database. We need to specify the name of the MySQL connection object and the table to be removed. The name of the table must be enclosed in single/double quotes. Below is an example # list of tables in the database > dbListTables(con) [1] "city" "country" "countrylanguage" [4] "mtcars" "trial" # remove table trial > dbRemoveTable(con, "trial") [1] TRUE # updated list of tables in the database > dbListTables(con) [1] "city" "country" "countrylanguage" [4] "mtcars"
  25. 25. Disconnect R2 Academy www.rsquaredacademy.com 25 It is very important to close the connection to the database. The dbDisconnect() function can be used to disconnect from the database. We need to specify the name of the MySQL connection object. Below is an example # create a MySQL connection object con <- dbConnect(MySQL(), user = 'root', password = 'password', host = 'localhost', dbname = 'world') # disconnect from the database > dbDisconnect(con) [1] TRUE
  26. 26. R2 Academy www.rsquaredacademy.com 26 Visit Rsquared Academy for tutorials on: → R Programming → Business Analytics → Data Visualization → Web Applications → Package Development → Git & GitHub

×