SlideShare a Scribd company logo
Intorduction:
Hi everyone, this session will be dealing with Data Analysis using R Language. Many would
have found difficult to get started with Data Analysis and R as well. I can assure this will be very
helpful for the beginners who really seeks help.
So what is Data Analysis? By definition, it is the process of evaluating data using analytical
and logical reasoning to examine each component of the data provided. This form of analysis is
just one of the many steps that must be completed when conducting a research experiment. Data
from various sources is gathered, reviewed, and then analyzed to form some sort of finding or
conclusion. There are a variety of specific data analysis method, some of which include data
mining, text analytics, business intelligence, and data visualizations. But in a very simple way,
we can say that FINDING PATTERNS OR DATA INSIGHTS which will help to get
concentrate business decisions/exceed customer experience.
And R is a statistical tool used for data analysis and data science as well. R has in-built
functions and provides a wide variety of statistical (linear and nonlinear modelling, classical
statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and
is highly extensible.
Basics:
To become a Data Analyst you should be strong in the following areas:
 Statistics
 Data Mining
 Python/R
 Distributed Computing
Let's start with Statistics, which is further classified into descriptive statistics( measure of central
tendency, measure of dispersion, shape of data ), inferential statistics( infer from the sample data what the
population might think ), explorative statistics( analysing to summarize their main characteristics ).
Next comes the Data Mining, which includes data pre-processing( data cleaning, data transformation
), modelling etc.
When comes to R i have already given a introduction about it, and Python, its again a wonderful
programming language for Data Analysis which has many packages namely pandas, scikit-learn,
matplotlib for visualization. R and Python are the 2 stars preferred by data analysts. Both are having their
own strength and weakness.
For Distributed Computing, i mean HADOOP technology, which is used mainly for storage and
processing time of big data. Since history, data volume and variety is getting increased distributed
computing been the limelight with Hadoop eco-system, which is simply called big data technology.
Nowadays Hadoop has become the synonym for big data.
Steps involved:
The actual session starts here.. Make sure that the environment is ready. I 've explained the
steps to be followed in detail...
Step-1: PROBLEM STATEMENT
You should be very clear about the problem statement given, what you are expected to do.
Ask yourselves, what problem you have, is the data given is sufficient to solve the given problem
statement.
Step-2: DATA PREPROCESSING
This is a very important process that a Data Analyst under goes. Initially you should
collect the required data. First set the working directory where the file is present using setwd().
you can use any of the code to read the file with respect to the file format.
 read.csv()
 read.table()
 read.xlsx()
 for XML do the following
library(XML)
doc <- xmlTreeParse(fileUrl, useInternal = TRUE)
And convert the loaded data to data frame to make the manipulation easy using
data.frame(). Next comes the data cleaning, to handle missing values you can make use of
is.na(), to remove missing values you can use na.omit() or na.exclude().
Next is data transformation, here we have type transformation which can be done by
as.numeric()/as.double()/as.factor() etc. Normalization and Standardization also comes under
data transformation.
Once the preprocessing process is over 60% of work is over.
Step-3: POPULATION AND SAMPLE
Before getting into this, load the necessary packages needed using library("package
name"), eg: library("caret"), library("class"). And dont forget to initialize the seed value, make
use of set.seed(). Coming to the point, it is very important to to split the given dataset to training
and testing data, since training data represents the population which is sample. Testing data
should only be used to test the model, unless you should not touch it. Model is built only using
the training data.
This can be done by many methods here i have used createDataPartition() which is the
function available in caret package.
index <- createDataPartition(y, times = 1, p = 0.5, list = TRUE, ...)
where, y - predictor variable
times - number of partitions
p - percentage of data that will be trained
list - logical - should the results be in a list (TRUE) or a matrix with the number of rows
equal to floor(p * length(y)) and times columns.
training_data <- dataframe[index,]
testing_data <- dataframe[-index,]
Now training and testing data are partitioned and the model is ready to train.
Step-4 : DATA MODELING
To train the model we can use the function train() available in the caret package.
model_trained <- train(y, x, method = "rf", preProcess = NULL, ...)
Here Y is the predictor variable and X(x1 to xn) is the control/independent variables.
There are many other methods like rf(random forest) such as glm(generic linear model).
Refer http://caret.r-forge.r-project.org/bytag.html to know more about the models. Each model
has its own restrictions.
Step-5 : PREDICTION
Once the model is been trained we can predict the model using the function available -
predict().
predicted_model <- predict(model_trained, testing_data)
You can also see whether your model is built and classified perfectly or not. Using
confusionMatrix() we can achieve this.
check <- confusionMatrix(y, predicted_model)
In other words, you can use this confusion matrix to check against the training model to
see how it will work for the training data.
Step-6 : PLOTS
The last step invloves plotting, you can make use of plot() which can be box plot or scatter
plot or histogram or as per the requirement. As per the saing "1 picture speaks more than 1000
words", you can make use of plots to describe your results.
Step-7 : REPORT
Finally for report submission you can use Rmarkdown, where the file should be saved
with the extension .rmd. To use Rmoarkdown check for the packages that are needed to be
installed.
--Thank You--

More Related Content

What's hot

data warehousing & minining 1st unit
data warehousing & minining 1st unitdata warehousing & minining 1st unit
data warehousing & minining 1st unitbhagathk
 
XL-MINER: Data Exploration
XL-MINER: Data ExplorationXL-MINER: Data Exploration
XL-MINER: Data Exploration
DataminingTools Inc
 
XL-MINER: Data Utilities
XL-MINER: Data UtilitiesXL-MINER: Data Utilities
XL-MINER: Data Utilities
DataminingTools Inc
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
DHIVYADEVAKI
 
Introduction To XL-Miner
Introduction To XL-MinerIntroduction To XL-Miner
Introduction To XL-Miner
DataminingTools Inc
 
Introduction to dm and dw
Introduction to dm and dwIntroduction to dm and dw
Introduction to dm and dw
ANUSUYA T K
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Jason Rodrigues
 
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining AppliedDMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining AppliedJohannes Hoppe
 
Pandas data transformational data structure patterns and challenges final
Pandas   data transformational data structure patterns and challenges  finalPandas   data transformational data structure patterns and challenges  final
Pandas data transformational data structure patterns and challenges final
Rajesh M
 
DMDW Lesson 08 - Further Data Mining Algorithms
DMDW Lesson 08 - Further Data Mining AlgorithmsDMDW Lesson 08 - Further Data Mining Algorithms
DMDW Lesson 08 - Further Data Mining AlgorithmsJohannes Hoppe
 
Data pre processing
Data pre processingData pre processing
Data pre processingpommurajopt
 
Data Structure - Elementary Data Organization
Data Structure - Elementary  Data Organization Data Structure - Elementary  Data Organization
Data Structure - Elementary Data Organization
Uma mohan
 
XL Miner: Classification
XL Miner: ClassificationXL Miner: Classification
XL Miner: Classification
DataminingTools Inc
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
thamizh arasi
 
Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641Aiswaryadevi Jaganmohan
 
XL-MINER: Associations
XL-MINER: AssociationsXL-MINER: Associations
XL-MINER: Associations
DataminingTools Inc
 
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
error007
 
Understanding the Machine Learning Algorithms
Understanding the Machine Learning AlgorithmsUnderstanding the Machine Learning Algorithms
Understanding the Machine Learning Algorithms
Rupak Roy
 
03. Data Preprocessing
03. Data Preprocessing03. Data Preprocessing
03. Data Preprocessing
Achmad Solichin
 
Data Preprocessing || Data Mining
Data Preprocessing || Data MiningData Preprocessing || Data Mining
Data Preprocessing || Data Mining
Iffat Firozy
 

What's hot (20)

data warehousing & minining 1st unit
data warehousing & minining 1st unitdata warehousing & minining 1st unit
data warehousing & minining 1st unit
 
XL-MINER: Data Exploration
XL-MINER: Data ExplorationXL-MINER: Data Exploration
XL-MINER: Data Exploration
 
XL-MINER: Data Utilities
XL-MINER: Data UtilitiesXL-MINER: Data Utilities
XL-MINER: Data Utilities
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
Introduction To XL-Miner
Introduction To XL-MinerIntroduction To XL-Miner
Introduction To XL-Miner
 
Introduction to dm and dw
Introduction to dm and dwIntroduction to dm and dw
Introduction to dm and dw
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining AppliedDMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
 
Pandas data transformational data structure patterns and challenges final
Pandas   data transformational data structure patterns and challenges  finalPandas   data transformational data structure patterns and challenges  final
Pandas data transformational data structure patterns and challenges final
 
DMDW Lesson 08 - Further Data Mining Algorithms
DMDW Lesson 08 - Further Data Mining AlgorithmsDMDW Lesson 08 - Further Data Mining Algorithms
DMDW Lesson 08 - Further Data Mining Algorithms
 
Data pre processing
Data pre processingData pre processing
Data pre processing
 
Data Structure - Elementary Data Organization
Data Structure - Elementary  Data Organization Data Structure - Elementary  Data Organization
Data Structure - Elementary Data Organization
 
XL Miner: Classification
XL Miner: ClassificationXL Miner: Classification
XL Miner: Classification
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641
 
XL-MINER: Associations
XL-MINER: AssociationsXL-MINER: Associations
XL-MINER: Associations
 
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 
Understanding the Machine Learning Algorithms
Understanding the Machine Learning AlgorithmsUnderstanding the Machine Learning Algorithms
Understanding the Machine Learning Algorithms
 
03. Data Preprocessing
03. Data Preprocessing03. Data Preprocessing
03. Data Preprocessing
 
Data Preprocessing || Data Mining
Data Preprocessing || Data MiningData Preprocessing || Data Mining
Data Preprocessing || Data Mining
 

Viewers also liked

Sourcing vs Recruiting
Sourcing vs RecruitingSourcing vs Recruiting
Sourcing vs Recruiting
Exelare
 
програма Exporec - огляд
програма Exporec - оглядпрограма Exporec - огляд
програма Exporec - огляд
APPAU_Ukraine
 
Activision
ActivisionActivision
Activision
Raul19ra
 
Miniput viser vej til storkunderne
Miniput viser vej til storkunderneMiniput viser vej til storkunderne
Miniput viser vej til storkunderneHenrik Spandet-M
 
Fiso - Fundación Iberoamericana de Seguridad y Salud Ocupacional
Fiso - Fundación Iberoamericana de Seguridad y Salud OcupacionalFiso - Fundación Iberoamericana de Seguridad y Salud Ocupacional
Fiso - Fundación Iberoamericana de Seguridad y Salud Ocupacional
Jaime Aravena Castillo
 
SAP ile Değişiklik Yönetiminin Yeni Yüzü - SAP Management of Change
SAP ile Değişiklik Yönetiminin Yeni Yüzü - SAP Management of ChangeSAP ile Değişiklik Yönetiminin Yeni Yüzü - SAP Management of Change
SAP ile Değişiklik Yönetiminin Yeni Yüzü - SAP Management of Change
Artius Consulting
 
El amor no existe
El amor no existeEl amor no existe
El amor no existe
Universidad del Quindío
 
Calendario escolar-2015-2016-ultima-version
Calendario escolar-2015-2016-ultima-versionCalendario escolar-2015-2016-ultima-version
Calendario escolar-2015-2016-ultima-version
Julio Cesar Silverio
 

Viewers also liked (12)

WESTON CSM resume
WESTON CSM resumeWESTON CSM resume
WESTON CSM resume
 
Sourcing vs Recruiting
Sourcing vs RecruitingSourcing vs Recruiting
Sourcing vs Recruiting
 
програма Exporec - огляд
програма Exporec - оглядпрограма Exporec - огляд
програма Exporec - огляд
 
S Voyles 1
S Voyles 1S Voyles 1
S Voyles 1
 
Activision
ActivisionActivision
Activision
 
Miniput viser vej til storkunderne
Miniput viser vej til storkunderneMiniput viser vej til storkunderne
Miniput viser vej til storkunderne
 
New york
New yorkNew york
New york
 
Fiso - Fundación Iberoamericana de Seguridad y Salud Ocupacional
Fiso - Fundación Iberoamericana de Seguridad y Salud OcupacionalFiso - Fundación Iberoamericana de Seguridad y Salud Ocupacional
Fiso - Fundación Iberoamericana de Seguridad y Salud Ocupacional
 
SAP ile Değişiklik Yönetiminin Yeni Yüzü - SAP Management of Change
SAP ile Değişiklik Yönetiminin Yeni Yüzü - SAP Management of ChangeSAP ile Değişiklik Yönetiminin Yeni Yüzü - SAP Management of Change
SAP ile Değişiklik Yönetiminin Yeni Yüzü - SAP Management of Change
 
El amor no existe
El amor no existeEl amor no existe
El amor no existe
 
Sophee Smiles
Sophee SmilesSophee Smiles
Sophee Smiles
 
Calendario escolar-2015-2016-ultima-version
Calendario escolar-2015-2016-ultima-versionCalendario escolar-2015-2016-ultima-version
Calendario escolar-2015-2016-ultima-version
 

Similar to Analysis using r

Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptx
Malla Reddy University
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
Renjith M P
 
data wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjhdata wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjh
VISHALMARWADE1
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
Shanmugasundaram M
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Osman Ali
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Rohit Dubey
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dmsumit621
 
Data .pptx
Data .pptxData .pptx
Data .pptx
ssuserbda195
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
Dr. Abdul Ahad Abro
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cycle
hktripathy
 
Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...
RINUSATHYAN
 
Data Science Using Scikit-Learn
Data Science Using Scikit-LearnData Science Using Scikit-Learn
Data Science Using Scikit-Learn
Ducat India
 
Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R Notes
LakshmiSarvani6
 
Stata tutorial university of princeton
Stata tutorial university of princetonStata tutorial university of princeton
Stata tutorial university of princeton
Douglas Branco Dias Santana
 
employee turnover prediction document.docx
employee turnover prediction document.docxemployee turnover prediction document.docx
employee turnover prediction document.docx
rohithprabhas1
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
Data Science Council of America
 
Congrats ! You got your Data Science Job
Congrats ! You got your Data Science JobCongrats ! You got your Data Science Job
Congrats ! You got your Data Science Job
Rohit Dubey
 
fINAL ML PPT.pptx
fINAL ML PPT.pptxfINAL ML PPT.pptx
fINAL ML PPT.pptx
19445KNithinbabu
 
Comparing EDA with classical and Bayesian analysis.pptx
Comparing EDA with classical and Bayesian analysis.pptxComparing EDA with classical and Bayesian analysis.pptx
Comparing EDA with classical and Bayesian analysis.pptx
PremaGanesh1
 

Similar to Analysis using r (20)

Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptx
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
 
data wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjhdata wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjh
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Data .pptx
Data .pptxData .pptx
Data .pptx
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cycle
 
Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...
 
Data Science Using Scikit-Learn
Data Science Using Scikit-LearnData Science Using Scikit-Learn
Data Science Using Scikit-Learn
 
Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R Notes
 
Stata tutorial university of princeton
Stata tutorial university of princetonStata tutorial university of princeton
Stata tutorial university of princeton
 
employee turnover prediction document.docx
employee turnover prediction document.docxemployee turnover prediction document.docx
employee turnover prediction document.docx
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
 
Congrats ! You got your Data Science Job
Congrats ! You got your Data Science JobCongrats ! You got your Data Science Job
Congrats ! You got your Data Science Job
 
Lecture-6-7.pptx
Lecture-6-7.pptxLecture-6-7.pptx
Lecture-6-7.pptx
 
fINAL ML PPT.pptx
fINAL ML PPT.pptxfINAL ML PPT.pptx
fINAL ML PPT.pptx
 
Comparing EDA with classical and Bayesian analysis.pptx
Comparing EDA with classical and Bayesian analysis.pptxComparing EDA with classical and Bayesian analysis.pptx
Comparing EDA with classical and Bayesian analysis.pptx
 

Recently uploaded

一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 

Recently uploaded (20)

一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 

Analysis using r

  • 1. Intorduction: Hi everyone, this session will be dealing with Data Analysis using R Language. Many would have found difficult to get started with Data Analysis and R as well. I can assure this will be very helpful for the beginners who really seeks help. So what is Data Analysis? By definition, it is the process of evaluating data using analytical and logical reasoning to examine each component of the data provided. This form of analysis is just one of the many steps that must be completed when conducting a research experiment. Data from various sources is gathered, reviewed, and then analyzed to form some sort of finding or conclusion. There are a variety of specific data analysis method, some of which include data mining, text analytics, business intelligence, and data visualizations. But in a very simple way, we can say that FINDING PATTERNS OR DATA INSIGHTS which will help to get concentrate business decisions/exceed customer experience. And R is a statistical tool used for data analysis and data science as well. R has in-built functions and provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. Basics: To become a Data Analyst you should be strong in the following areas:  Statistics  Data Mining  Python/R  Distributed Computing Let's start with Statistics, which is further classified into descriptive statistics( measure of central tendency, measure of dispersion, shape of data ), inferential statistics( infer from the sample data what the population might think ), explorative statistics( analysing to summarize their main characteristics ). Next comes the Data Mining, which includes data pre-processing( data cleaning, data transformation ), modelling etc. When comes to R i have already given a introduction about it, and Python, its again a wonderful programming language for Data Analysis which has many packages namely pandas, scikit-learn, matplotlib for visualization. R and Python are the 2 stars preferred by data analysts. Both are having their own strength and weakness. For Distributed Computing, i mean HADOOP technology, which is used mainly for storage and processing time of big data. Since history, data volume and variety is getting increased distributed computing been the limelight with Hadoop eco-system, which is simply called big data technology. Nowadays Hadoop has become the synonym for big data.
  • 2. Steps involved: The actual session starts here.. Make sure that the environment is ready. I 've explained the steps to be followed in detail... Step-1: PROBLEM STATEMENT You should be very clear about the problem statement given, what you are expected to do. Ask yourselves, what problem you have, is the data given is sufficient to solve the given problem statement. Step-2: DATA PREPROCESSING This is a very important process that a Data Analyst under goes. Initially you should collect the required data. First set the working directory where the file is present using setwd(). you can use any of the code to read the file with respect to the file format.  read.csv()  read.table()  read.xlsx()  for XML do the following library(XML) doc <- xmlTreeParse(fileUrl, useInternal = TRUE) And convert the loaded data to data frame to make the manipulation easy using data.frame(). Next comes the data cleaning, to handle missing values you can make use of is.na(), to remove missing values you can use na.omit() or na.exclude(). Next is data transformation, here we have type transformation which can be done by as.numeric()/as.double()/as.factor() etc. Normalization and Standardization also comes under data transformation. Once the preprocessing process is over 60% of work is over. Step-3: POPULATION AND SAMPLE Before getting into this, load the necessary packages needed using library("package name"), eg: library("caret"), library("class"). And dont forget to initialize the seed value, make use of set.seed(). Coming to the point, it is very important to to split the given dataset to training and testing data, since training data represents the population which is sample. Testing data should only be used to test the model, unless you should not touch it. Model is built only using the training data. This can be done by many methods here i have used createDataPartition() which is the function available in caret package. index <- createDataPartition(y, times = 1, p = 0.5, list = TRUE, ...) where, y - predictor variable times - number of partitions p - percentage of data that will be trained list - logical - should the results be in a list (TRUE) or a matrix with the number of rows
  • 3. equal to floor(p * length(y)) and times columns. training_data <- dataframe[index,] testing_data <- dataframe[-index,] Now training and testing data are partitioned and the model is ready to train. Step-4 : DATA MODELING To train the model we can use the function train() available in the caret package. model_trained <- train(y, x, method = "rf", preProcess = NULL, ...) Here Y is the predictor variable and X(x1 to xn) is the control/independent variables. There are many other methods like rf(random forest) such as glm(generic linear model). Refer http://caret.r-forge.r-project.org/bytag.html to know more about the models. Each model has its own restrictions. Step-5 : PREDICTION Once the model is been trained we can predict the model using the function available - predict(). predicted_model <- predict(model_trained, testing_data) You can also see whether your model is built and classified perfectly or not. Using confusionMatrix() we can achieve this. check <- confusionMatrix(y, predicted_model) In other words, you can use this confusion matrix to check against the training model to see how it will work for the training data. Step-6 : PLOTS The last step invloves plotting, you can make use of plot() which can be box plot or scatter plot or histogram or as per the requirement. As per the saing "1 picture speaks more than 1000 words", you can make use of plots to describe your results. Step-7 : REPORT Finally for report submission you can use Rmarkdown, where the file should be saved with the extension .rmd. To use Rmoarkdown check for the packages that are needed to be installed. --Thank You--