Loading and Handling
Data in R
Submitted To: Submitted By:
Dr. Chetna Arora Sagar Verma(15)
Vipin(49)
UNIVERSITY SCHOOL OF MANAGEMENT
KURUKSHETRA UNIVERSITY
Introduction
Applications used by Business Organisations generate a huge amount of data. This data is to be
analysed for drawing useful insights which helps the decision makers to make better and faster
decisions.
Analytical data processing
Analytical data processing is a part of business intelligence that includes relational database, data
warehousing, data mining and report mining.
It is a computer processing technique that handle different type of business processing practises
like sales, budgeting, Financial reporting, management reporting etc.
Steps of analytical data processing
Data Input Processing
Descriptive
Statistics
Visualisation
of Data
Report
Generation
Output
 Data input refers to the process of entering or feeding information into a system, device, or
software program.
 Processing refers to the series of actions or steps taken to manipulate, transform, or analyze data
once it has been entered into a system. It's the stage where raw data is converted into meaningful
information.
 Descriptive statistics refers to the methods used to summarize the present data in a meaningful
way.
 Visualization Data that helps you understand the underlying patterns, trends, and relationships
in the data through graphical representation.
 Report Generation is powerful way to present analysis in a structured and visually appealing
manner.
 Output the results of executing commands or code, which can be displayed in various forms
such as text, tables, graphs, or files.
Expression, Variables and Functions
Expressions
An expression consists of terms joined together by operators.
Arithmetic operations
An expression formed by using variables, constants and operations of:
 Addition
 Subtraction
 Multiplication
 Division
Expression, Variables and Functions
Addition X+Y (y is added to x)
> 4+8
[1] 12
Subtraction X-Y (y subtracted from x)
>10-7
[1] 3
Multiplication X*Y (x multiply by y)
>7*8
[1] 56
Division X/Y (x divide by y)
>10/2
[1] 5
Exponentiation X^Y (x raised to the power y)
>2^5
[1] 32
Square Root sqrt(X) (computing the square root of x)
> sqrt(25)
[1] 5
Expression, Variables and Functions
Logical Values
Logical values are TRUE and FALSE or T and F. The equality operator is ==.
> 8 < 4
[1] FALSE
> 3 * 2 == 5
[1] FALSE
> F == FALSE
[1] TRUE
> T == TRUE
[1] TRUE
Expression, Variables and Functions
Dates
The default format of date is YYYY – MM – DD.
Print system’s date.
> Sys.Date()
[1] “2024-10-22”
Print system’s time.
> Sys.time()
[1] “2024-10-22 13:30:54”
Expression, Variables and Functions
Variables
A variable is a memory allocated for the storage of specific data and the name associated with the
variable and is used to work around this reserved block.
Rules for naming variables in R—
•In R variable name must be a combination of letters, digits, period(.) and underscores.
•It must start with a letter or period(.) and if it starts with period then the period should not be
followed by number.
•Reserved words in R cannot be used in variable name.
Expression, Variables and Functions
VALID VARIABLES INVALID VARIABLES
myValue
.my.value.one
My_value_one
Data4
.1nikku
TRUE
vik@sh
_temp
Expression, Variables and Functions
Functions
A function is a set of statements organized together to perform a specific task.
R has a large number of in-built functions and the user can create their own functions.
In R, a function is an object so that R interpreter is able to pass control to the function, along with arguments
that may be necessary for the function to accomplish the actions.
Some basic in-built functions are:-
sum() function – returns the sum of all the values in its arguments
> sum(1, 2, 3)
[1] 6
> sum(1, 5, NA, na.rm=FALSE)
[1] NA
* If na.rm is FALSE, an NA or NaN value in any of the argument will cause NA or NaN to be returned
Expression, Variables and Functions
min() function – returns the minimum of all the values present in their arguments
> min(1, 2, 3)
[1] 1
> min(1, 5, 7, NA, na.rm=TRUE)
[1] 1
max() function – returns the maximum of all the values present in their arguments
> max(1, 2, 3)
[1] 3
> max(1, 5, 7, NA, na.rm=FALSE)
[1] NA
seq() function – generates a regular sequence
Syntax
seq(start from, end at, interval, length.out)
> seq(1, 10, 2)
[1] 1 3 5 7 9
Expression, Variables and Functions
Manipulating Text in Data
There are many in-built string functions available in R that manipulate text or string.
Finding a part of some text string, searching some string in a text or concatenating strings and
other similar operations come under manipulating text operation.
String values have to be enclosed within double quotes.
> “R is a statistical programming language”
[1] “R is a statistical programming language”
Few string functions are as follows:
rep() function – repeats a given argument for a specified number of items.
> rep(“statistics”, 3)
[1] “statistics” “statistics” “statistics”
Expression, Variables and Functions
grep() function – finds the index position at which string is present.
> grep(“statistics”, c(“R”, “is”, “a”, “statistical”, “language”), fixed = TRUE)
[1] 4
toupper() function – converts a given character vector into upper case.
> toupper(“statistics”)
[1] “STATISTICS”
tolower() function – converts a given character vector into lower case.
> tolower(“STATISTICS”)
[1] “statistics”
substr() function – extracts or replaces substring in a character vector.
> substr(“statistics”, 7, 9)
[1] “tic”
Vectors
A vector is a sequence of data elements of the same datatype.
Members in a vector are officially called components.
Examples:
A vector containing three numeric values
> c(2, 3, 5)
[1] 2 3 5
A vector of logical values
> c(TRUE, FALSE, TRUE, FALSE, FALSE)
[1] TRUE FALSE TRUE FALSE FALSE
Vectors
A vector cannot hold values of different data types. Consider the following example:
Creating a vector with integer, string and Boolean values together.
> c(4, 8, “R”, FALSE)
[1] “4” “8” “R” “FALSE”
** All the values are converted into the same data type, i.e. ‘character’.
Sequence Vector
A sequence vector can be created with a start:end notation. Example: To create a sequence of
numbers between 1 and 7, use > 1:5 or > seq(1:5).
To change increment, > seq(1, 10, 2) or > seq(from=1, to=10, by=2)
Vectors
Vector Access
Assigning variable name ‘VariableSeq’ to a vector with string values.
> VariableSeq <- c(“R”, “is”, “a”, “programming”, “language”)
To access values in vector, use the following commands and observe the output.
> VariableSeq[i] ,put value of i from 1 to 5
Vector Names
To assign names to vector elements, names() function is used.
> placeholder <- 1:5
> names(placeholder) <- c(“I”, “am”, “an”, “R”, “programmer”)
Retrieve the vector elements using the indices position.
> placeholder[i]
R ppt for skejsjsjsjjssjskskskskskksk.pptx

R ppt for skejsjsjsjjssjskskskskskksk.pptx

  • 1.
    Loading and Handling Datain R Submitted To: Submitted By: Dr. Chetna Arora Sagar Verma(15) Vipin(49) UNIVERSITY SCHOOL OF MANAGEMENT KURUKSHETRA UNIVERSITY
  • 2.
    Introduction Applications used byBusiness Organisations generate a huge amount of data. This data is to be analysed for drawing useful insights which helps the decision makers to make better and faster decisions. Analytical data processing Analytical data processing is a part of business intelligence that includes relational database, data warehousing, data mining and report mining. It is a computer processing technique that handle different type of business processing practises like sales, budgeting, Financial reporting, management reporting etc.
  • 3.
    Steps of analyticaldata processing Data Input Processing Descriptive Statistics Visualisation of Data Report Generation Output  Data input refers to the process of entering or feeding information into a system, device, or software program.  Processing refers to the series of actions or steps taken to manipulate, transform, or analyze data once it has been entered into a system. It's the stage where raw data is converted into meaningful information.  Descriptive statistics refers to the methods used to summarize the present data in a meaningful way.  Visualization Data that helps you understand the underlying patterns, trends, and relationships in the data through graphical representation.  Report Generation is powerful way to present analysis in a structured and visually appealing manner.  Output the results of executing commands or code, which can be displayed in various forms such as text, tables, graphs, or files.
  • 4.
    Expression, Variables andFunctions Expressions An expression consists of terms joined together by operators. Arithmetic operations An expression formed by using variables, constants and operations of:  Addition  Subtraction  Multiplication  Division
  • 5.
  • 6.
    Addition X+Y (yis added to x) > 4+8 [1] 12 Subtraction X-Y (y subtracted from x) >10-7 [1] 3 Multiplication X*Y (x multiply by y) >7*8 [1] 56
  • 7.
    Division X/Y (xdivide by y) >10/2 [1] 5 Exponentiation X^Y (x raised to the power y) >2^5 [1] 32 Square Root sqrt(X) (computing the square root of x) > sqrt(25) [1] 5
  • 8.
    Expression, Variables andFunctions Logical Values Logical values are TRUE and FALSE or T and F. The equality operator is ==. > 8 < 4 [1] FALSE > 3 * 2 == 5 [1] FALSE > F == FALSE [1] TRUE > T == TRUE [1] TRUE
  • 9.
    Expression, Variables andFunctions Dates The default format of date is YYYY – MM – DD. Print system’s date. > Sys.Date() [1] “2024-10-22” Print system’s time. > Sys.time() [1] “2024-10-22 13:30:54”
  • 10.
    Expression, Variables andFunctions Variables A variable is a memory allocated for the storage of specific data and the name associated with the variable and is used to work around this reserved block. Rules for naming variables in R— •In R variable name must be a combination of letters, digits, period(.) and underscores. •It must start with a letter or period(.) and if it starts with period then the period should not be followed by number. •Reserved words in R cannot be used in variable name.
  • 11.
    Expression, Variables andFunctions VALID VARIABLES INVALID VARIABLES myValue .my.value.one My_value_one Data4 .1nikku TRUE vik@sh _temp
  • 12.
    Expression, Variables andFunctions Functions A function is a set of statements organized together to perform a specific task. R has a large number of in-built functions and the user can create their own functions. In R, a function is an object so that R interpreter is able to pass control to the function, along with arguments that may be necessary for the function to accomplish the actions. Some basic in-built functions are:- sum() function – returns the sum of all the values in its arguments > sum(1, 2, 3) [1] 6 > sum(1, 5, NA, na.rm=FALSE) [1] NA * If na.rm is FALSE, an NA or NaN value in any of the argument will cause NA or NaN to be returned
  • 13.
    Expression, Variables andFunctions min() function – returns the minimum of all the values present in their arguments > min(1, 2, 3) [1] 1 > min(1, 5, 7, NA, na.rm=TRUE) [1] 1 max() function – returns the maximum of all the values present in their arguments > max(1, 2, 3) [1] 3 > max(1, 5, 7, NA, na.rm=FALSE) [1] NA seq() function – generates a regular sequence Syntax seq(start from, end at, interval, length.out) > seq(1, 10, 2) [1] 1 3 5 7 9
  • 14.
    Expression, Variables andFunctions Manipulating Text in Data There are many in-built string functions available in R that manipulate text or string. Finding a part of some text string, searching some string in a text or concatenating strings and other similar operations come under manipulating text operation. String values have to be enclosed within double quotes. > “R is a statistical programming language” [1] “R is a statistical programming language” Few string functions are as follows: rep() function – repeats a given argument for a specified number of items. > rep(“statistics”, 3) [1] “statistics” “statistics” “statistics”
  • 15.
    Expression, Variables andFunctions grep() function – finds the index position at which string is present. > grep(“statistics”, c(“R”, “is”, “a”, “statistical”, “language”), fixed = TRUE) [1] 4 toupper() function – converts a given character vector into upper case. > toupper(“statistics”) [1] “STATISTICS” tolower() function – converts a given character vector into lower case. > tolower(“STATISTICS”) [1] “statistics” substr() function – extracts or replaces substring in a character vector. > substr(“statistics”, 7, 9) [1] “tic”
  • 16.
    Vectors A vector isa sequence of data elements of the same datatype. Members in a vector are officially called components. Examples: A vector containing three numeric values > c(2, 3, 5) [1] 2 3 5 A vector of logical values > c(TRUE, FALSE, TRUE, FALSE, FALSE) [1] TRUE FALSE TRUE FALSE FALSE
  • 17.
    Vectors A vector cannothold values of different data types. Consider the following example: Creating a vector with integer, string and Boolean values together. > c(4, 8, “R”, FALSE) [1] “4” “8” “R” “FALSE” ** All the values are converted into the same data type, i.e. ‘character’. Sequence Vector A sequence vector can be created with a start:end notation. Example: To create a sequence of numbers between 1 and 7, use > 1:5 or > seq(1:5). To change increment, > seq(1, 10, 2) or > seq(from=1, to=10, by=2)
  • 18.
    Vectors Vector Access Assigning variablename ‘VariableSeq’ to a vector with string values. > VariableSeq <- c(“R”, “is”, “a”, “programming”, “language”) To access values in vector, use the following commands and observe the output. > VariableSeq[i] ,put value of i from 1 to 5 Vector Names To assign names to vector elements, names() function is used. > placeholder <- 1:5 > names(placeholder) <- c(“I”, “am”, “an”, “R”, “programmer”) Retrieve the vector elements using the indices position. > placeholder[i]