2. openxlsx()
ØOPENXLSX: Simplifies the creation of excel files by providing a high level
interface to read, write and format excel worksheets with the added benefit
of removing the dependency on Java.
Ø Import functions include:
loadWorkbook()
readWorkbook()
The features of XLConnect::readWorksheet() and
XLConnect::readWorkbookFromFile() are merged in the
openxlsx::readWorkbook()
Ø Export functions inlude:
createWorkbook()
addworksheet() – alternative to XLConnect::createSheet()
writedata() – alternative to XLConnectwriteWorkSheet()
saveWorkbook()
3. openxlsx::loadWorkbook()
loadWorkbook(): Loads and returns a workbook object conserving styles
and format of the original .xlsx file
>loadWorkbook(file)
Where file = a path to an excel workbook to be loaded
#install the openxlsx package
>install.packages(“openxlsx”)
#load the functions from openxlsx package.
>library(openxlsx)
#load the excel workbook
> wb<- loadWorkbook("sample.xlsx")
>class(wb)
To know more about the features of loadWorkbook() use
>?openxlsx::loadWorkbook
4. openxlsx::readWorkbook()
readWorkbook(): Reads the data from an Excel file or directly from a
workbook object of a loadWorkbook() function into a data.frame.
>wb1<-readWorkbook(File, sheet = 1, startRow = 1, colNames = TRUE,
rowNames = FALSE, detectDates = FALSE, skipEmptyRows = TRUE,
skipEmptyCols = TRUE, check.names = FALSE, na.strings = "NA“, cols= 2:7,
rows = 5:10)
Where
file = An Excel file, Workbook object or URL to xlsx file.
sheet = The name or index of the sheet to read data from.
startRow = The index of the first column to read from. Empty rows at the top
of a file are always skipped, regardless of the value of startRow.
colNames = If TRUE, the first row of data will be used as column names.
rowNames = If TRUE, first column of data will be used as row names.
detectDates = If TRUE, attempt to recognise dates and perform conversion.
cols = A numeric vector specifying which range of columns in the Excel file to
read. If NULL, all columns are read. E.g. cols = c(1,5,7) gives the lists of 1,5 and
number 7 column and cols: 2:4 gives the range of columns from column 2 to 4.
The same goes with rows = 5:10 and rows =c(3,4,9)
5. openxlsx::readWorkbook()
#Read the 1st excel sheet from wb R object i.e. sample.xlsx file.
>rwb_store<-readWorkbook(wb,sheet = 1)
>View(rwb_store)
#Read the 2nd excel sheet directly from sample.xlsx file.
>rwb_bike<- readWorkbook (“sample.xlsx”, sheet=“bike_sharing_program”)
>View(rwb_bike)
#Optimized query for large datasets
>orwb_store<- readWorkbook("sample.xlsx",sheet = "store",colNames= T, cols
= 2:4, rows =14000:20000 )
To know more about the features of loadWorkbook() use
>?openxlsx::readWorkbook
Rupak Roy
6. openxlsx::addWorksheet()
createWorkbook(): Creates a new workbook object.
>cwb<-createWorkbook()
addWorksheet(): Adds a worksheet to a workbook object.
>addWorksheet(cwb, sheetName, header = NULL,orientation =
getOption("openxlsx.orientation", default = "portrait") )
Where as,
wb = a workbook object to attach the new worksheet
sheetname = a name for the new worksheet
orientation = One of "portrait" or "landscape“
Rupak Roy
7. openxlsx::addWorksheet()
#Create new empty excel/work sheets in a workbook object
>addWorksheet(cwb,"new_sheet1")
>addWorksheet(cwb,"new_sheet2")
>addWorksheet(cwb,"new_sheet3")
To know more about the features of addWorkSheet() we can always use
>?openxlsx::addWorksheet
Rupak Roy
8. openxlsx::writeData()
writeData(): Writes an object to worksheet.
>writeData(wb, sheet, x, startCol = 1, startRow = 1, xy = NULL, colNames =
TRUE, rowNames = FALSE, headerStyle = NULL, keepNA = FALSE)
Where as,
wb = A Workbook object containing a worksheet.
sheetname = The worksheet to write to. Can be the worksheet index or name.
x = Object to be written e.g. object containing data.frame
startCol = A vector specifiying the starting column to write to.
startRow = A vector specifiying the starting row to write to.
xy = An alternative to specifying startCol and startRow individually. A vector of
the form c(startCol, startRow).
rowNames/colNames = If TRUE, data.frame row/col names of x are written.
keepNA = If TRUE, NA values are converted to #N/A in Excel else NA cells will be
empty.
Rupak Roy
9. openxlsx::writeData()
>wb<-loadWorkbook(“sample.xlsx”) #load a workbook
>names(wb) #list the available worksheets in the workbook
#Create an empty excel/worksheet in a workbook object
>addWorksheet(wb, “new_sheet”)
>names(wb)
#read a worksheet from the workbook (wb) and save it in a R object
>rwb<-readWorkbook(wb,sheet="store.")
#write the R object rwb (worksheet) in the workbook wb
>writeData(wb,sheet = "new_sheet",rwb)
#save the workbook in the disk
> saveWorkbook(wb,"my_first_workbook.xlsx")
To know more about the features of writeData() we can always use
>?openxlsx::writeData
10. openxlsx::saveWorkbook()
Few steps involved before we can use use openxlsx::saveWorkbook():
ü Install Rtools which is a collection of tools necessary for building R
packages in Windows.
Available for download at
https://cran.rproject.org/bin/windows/Rtools/
also included with this module.
ü Follow the installation guide in the next slide.
ü Set Sys.setenv("R_ZIPCMD" = “…………path/bin/zip.exe")
>Sys.setenv("R_ZIPCMD" = "C:/Rtools/bin/zip.exe")
13. openxlsx::saveWorkbook()
saveWorkbook(): save a workbook object to file
>saveWorkbook(wb, “file.xlsx”, overwrite= TRUE)
Where
wb = a workbook object to write to file
file = name of the file to save as
overwrite = If TRUE, overwrite any existing file
>saveWorkbook(wb,"my_first_workbook.xlsx")
To know more about the features of saveWorkbook() we can always use
>?openxlsx::writeData
Rupak Roy
14. readxl::read_excel()
Another common package used to read the excel .xlsx and .xls file with the
added benefit of removing the dependency on Java.
read_excel(): Reads xls and xlsx files. read_excel() calls excel_format() to
determine if path is xls or xlsx, based on the file extension and the file itself,
in that order. Use read_xls() and read_xlsx() directly if you know better and
want to prevent such guessing.
>read_excel(path, sheet = NULL, range = NULL, col_names = TRUE, na = "", skip
= 0, n_max = Inf)
Where, path = path to the .xlsx or .xls file
sheet = Sheet to read either a name of a sheet or an integer (the position of the
sheet). If not mentioned by default reads the first sheet.
range = A cell range to read from, includes typical Excel ranges like "B3:D87"
n_max = Maximum number of data rows to read. Ignored if range is given
na = Defines missing value formats. Default it treats blank cells as missing data.
15. readxl::read_excel()
#install the readxl package
>install.packages(“readxl”)
#load the functions from readxl package.
>library(readxl)
#load the workbooksheet from the sample.xlsx workbook
>myworkbook<- read_excel(“sample.xlsx”, sheet= 2)
To know more about the features of read_excel() we can always use
>?readxl::read_excel
Rupak Roy
16. Next:
We will learn how to import data from popular
databases using RODBC package.
Import export Excel files
Rupak Roy