SlideShare a Scribd company logo
1 of 3
Download to read offline
Intorduction:
Hi everyone, this session will be dealing with Data Analysis using R Language. Many would
have found difficult to get started with Data Analysis and R as well. I can assure this will be very
helpful for the beginners who really seeks help.
So what is Data Analysis? By definition, it is the process of evaluating data using analytical
and logical reasoning to examine each component of the data provided. This form of analysis is
just one of the many steps that must be completed when conducting a research experiment. Data
from various sources is gathered, reviewed, and then analyzed to form some sort of finding or
conclusion. There are a variety of specific data analysis method, some of which include data
mining, text analytics, business intelligence, and data visualizations. But in a very simple way,
we can say that FINDING PATTERNS OR DATA INSIGHTS which will help to get
concentrate business decisions/exceed customer experience.
And R is a statistical tool used for data analysis and data science as well. R has in-built
functions and provides a wide variety of statistical (linear and nonlinear modelling, classical
statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and
is highly extensible.
Basics:
To become a Data Analyst you should be strong in the following areas:
 Statistics
 Data Mining
 Python/R
 Distributed Computing
Let's start with Statistics, which is further classified into descriptive statistics( measure of central
tendency, measure of dispersion, shape of data ), inferential statistics( infer from the sample data what the
population might think ), explorative statistics( analysing to summarize their main characteristics ).
Next comes the Data Mining, which includes data pre-processing( data cleaning, data transformation
), modelling etc.
When comes to R i have already given a introduction about it, and Python, its again a wonderful
programming language for Data Analysis which has many packages namely pandas, scikit-learn,
matplotlib for visualization. R and Python are the 2 stars preferred by data analysts. Both are having their
own strength and weakness.
For Distributed Computing, i mean HADOOP technology, which is used mainly for storage and
processing time of big data. Since history, data volume and variety is getting increased distributed
computing been the limelight with Hadoop eco-system, which is simply called big data technology.
Nowadays Hadoop has become the synonym for big data.
Steps involved:
The actual session starts here.. Make sure that the environment is ready. I 've explained the
steps to be followed in detail...
Step-1: PROBLEM STATEMENT
You should be very clear about the problem statement given, what you are expected to do.
Ask yourselves, what problem you have, is the data given is sufficient to solve the given problem
statement.
Step-2: DATA PREPROCESSING
This is a very important process that a Data Analyst under goes. Initially you should
collect the required data. First set the working directory where the file is present using setwd().
you can use any of the code to read the file with respect to the file format.
 read.csv()
 read.table()
 read.xlsx()
 for XML do the following
library(XML)
doc <- xmlTreeParse(fileUrl, useInternal = TRUE)
And convert the loaded data to data frame to make the manipulation easy using
data.frame(). Next comes the data cleaning, to handle missing values you can make use of
is.na(), to remove missing values you can use na.omit() or na.exclude().
Next is data transformation, here we have type transformation which can be done by
as.numeric()/as.double()/as.factor() etc. Normalization and Standardization also comes under
data transformation.
Once the preprocessing process is over 60% of work is over.
Step-3: POPULATION AND SAMPLE
Before getting into this, load the necessary packages needed using library("package
name"), eg: library("caret"), library("class"). And dont forget to initialize the seed value, make
use of set.seed(). Coming to the point, it is very important to to split the given dataset to training
and testing data, since training data represents the population which is sample. Testing data
should only be used to test the model, unless you should not touch it. Model is built only using
the training data.
This can be done by many methods here i have used createDataPartition() which is the
function available in caret package.
index <- createDataPartition(y, times = 1, p = 0.5, list = TRUE, ...)
where, y - predictor variable
times - number of partitions
p - percentage of data that will be trained
list - logical - should the results be in a list (TRUE) or a matrix with the number of rows
equal to floor(p * length(y)) and times columns.
training_data <- dataframe[index,]
testing_data <- dataframe[-index,]
Now training and testing data are partitioned and the model is ready to train.
Step-4 : DATA MODELING
To train the model we can use the function train() available in the caret package.
model_trained <- train(y, x, method = "rf", preProcess = NULL, ...)
Here Y is the predictor variable and X(x1 to xn) is the control/independent variables.
There are many other methods like rf(random forest) such as glm(generic linear model).
Refer http://caret.r-forge.r-project.org/bytag.html to know more about the models. Each model
has its own restrictions.
Step-5 : PREDICTION
Once the model is been trained we can predict the model using the function available -
predict().
predicted_model <- predict(model_trained, testing_data)
You can also see whether your model is built and classified perfectly or not. Using
confusionMatrix() we can achieve this.
check <- confusionMatrix(y, predicted_model)
In other words, you can use this confusion matrix to check against the training model to
see how it will work for the training data.
Step-6 : PLOTS
The last step invloves plotting, you can make use of plot() which can be box plot or scatter
plot or histogram or as per the requirement. As per the saing "1 picture speaks more than 1000
words", you can make use of plots to describe your results.
Step-7 : REPORT
Finally for report submission you can use Rmarkdown, where the file should be saved
with the extension .rmd. To use Rmoarkdown check for the packages that are needed to be
installed.
--Thank You--

More Related Content

What's hot

data warehousing & minining 1st unit
data warehousing & minining 1st unitdata warehousing & minining 1st unit
data warehousing & minining 1st unitbhagathk
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data MiningDHIVYADEVAKI
 
Introduction to dm and dw
Introduction to dm and dwIntroduction to dm and dw
Introduction to dm and dwANUSUYA T K
 
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining AppliedDMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining AppliedJohannes Hoppe
 
Pandas data transformational data structure patterns and challenges final
Pandas   data transformational data structure patterns and challenges  finalPandas   data transformational data structure patterns and challenges  final
Pandas data transformational data structure patterns and challenges finalRajesh M
 
DMDW Lesson 08 - Further Data Mining Algorithms
DMDW Lesson 08 - Further Data Mining AlgorithmsDMDW Lesson 08 - Further Data Mining Algorithms
DMDW Lesson 08 - Further Data Mining AlgorithmsJohannes Hoppe
 
Data pre processing
Data pre processingData pre processing
Data pre processingpommurajopt
 
Data Structure - Elementary Data Organization
Data Structure - Elementary  Data Organization Data Structure - Elementary  Data Organization
Data Structure - Elementary Data Organization Uma mohan
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree inductionthamizh arasi
 
Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641Aiswaryadevi Jaganmohan
 
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kambererror007
 
Understanding the Machine Learning Algorithms
Understanding the Machine Learning AlgorithmsUnderstanding the Machine Learning Algorithms
Understanding the Machine Learning AlgorithmsRupak Roy
 
Data Preprocessing || Data Mining
Data Preprocessing || Data MiningData Preprocessing || Data Mining
Data Preprocessing || Data MiningIffat Firozy
 

What's hot (20)

data warehousing & minining 1st unit
data warehousing & minining 1st unitdata warehousing & minining 1st unit
data warehousing & minining 1st unit
 
XL-MINER: Data Exploration
XL-MINER: Data ExplorationXL-MINER: Data Exploration
XL-MINER: Data Exploration
 
XL-MINER: Data Utilities
XL-MINER: Data UtilitiesXL-MINER: Data Utilities
XL-MINER: Data Utilities
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
Introduction To XL-Miner
Introduction To XL-MinerIntroduction To XL-Miner
Introduction To XL-Miner
 
Introduction to dm and dw
Introduction to dm and dwIntroduction to dm and dw
Introduction to dm and dw
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining AppliedDMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
 
Pandas data transformational data structure patterns and challenges final
Pandas   data transformational data structure patterns and challenges  finalPandas   data transformational data structure patterns and challenges  final
Pandas data transformational data structure patterns and challenges final
 
DMDW Lesson 08 - Further Data Mining Algorithms
DMDW Lesson 08 - Further Data Mining AlgorithmsDMDW Lesson 08 - Further Data Mining Algorithms
DMDW Lesson 08 - Further Data Mining Algorithms
 
Data pre processing
Data pre processingData pre processing
Data pre processing
 
Data Structure - Elementary Data Organization
Data Structure - Elementary  Data Organization Data Structure - Elementary  Data Organization
Data Structure - Elementary Data Organization
 
XL Miner: Classification
XL Miner: ClassificationXL Miner: Classification
XL Miner: Classification
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641
 
XL-MINER: Associations
XL-MINER: AssociationsXL-MINER: Associations
XL-MINER: Associations
 
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 
Understanding the Machine Learning Algorithms
Understanding the Machine Learning AlgorithmsUnderstanding the Machine Learning Algorithms
Understanding the Machine Learning Algorithms
 
03. Data Preprocessing
03. Data Preprocessing03. Data Preprocessing
03. Data Preprocessing
 
Data Preprocessing || Data Mining
Data Preprocessing || Data MiningData Preprocessing || Data Mining
Data Preprocessing || Data Mining
 

Viewers also liked

Sourcing vs Recruiting
Sourcing vs RecruitingSourcing vs Recruiting
Sourcing vs RecruitingExelare
 
програма Exporec - огляд
програма Exporec - оглядпрограма Exporec - огляд
програма Exporec - оглядAPPAU_Ukraine
 
Activision
ActivisionActivision
ActivisionRaul19ra
 
Miniput viser vej til storkunderne
Miniput viser vej til storkunderneMiniput viser vej til storkunderne
Miniput viser vej til storkunderneHenrik Spandet-M
 
Fiso - Fundación Iberoamericana de Seguridad y Salud Ocupacional
Fiso - Fundación Iberoamericana de Seguridad y Salud OcupacionalFiso - Fundación Iberoamericana de Seguridad y Salud Ocupacional
Fiso - Fundación Iberoamericana de Seguridad y Salud OcupacionalJaime Aravena Castillo
 
SAP ile Değişiklik Yönetiminin Yeni Yüzü - SAP Management of Change
SAP ile Değişiklik Yönetiminin Yeni Yüzü - SAP Management of ChangeSAP ile Değişiklik Yönetiminin Yeni Yüzü - SAP Management of Change
SAP ile Değişiklik Yönetiminin Yeni Yüzü - SAP Management of ChangeArtius Consulting
 
Calendario escolar-2015-2016-ultima-version
Calendario escolar-2015-2016-ultima-versionCalendario escolar-2015-2016-ultima-version
Calendario escolar-2015-2016-ultima-versionJulio Cesar Silverio
 

Viewers also liked (12)

WESTON CSM resume
WESTON CSM resumeWESTON CSM resume
WESTON CSM resume
 
Sourcing vs Recruiting
Sourcing vs RecruitingSourcing vs Recruiting
Sourcing vs Recruiting
 
програма Exporec - огляд
програма Exporec - оглядпрограма Exporec - огляд
програма Exporec - огляд
 
S Voyles 1
S Voyles 1S Voyles 1
S Voyles 1
 
Activision
ActivisionActivision
Activision
 
Miniput viser vej til storkunderne
Miniput viser vej til storkunderneMiniput viser vej til storkunderne
Miniput viser vej til storkunderne
 
New york
New yorkNew york
New york
 
Fiso - Fundación Iberoamericana de Seguridad y Salud Ocupacional
Fiso - Fundación Iberoamericana de Seguridad y Salud OcupacionalFiso - Fundación Iberoamericana de Seguridad y Salud Ocupacional
Fiso - Fundación Iberoamericana de Seguridad y Salud Ocupacional
 
SAP ile Değişiklik Yönetiminin Yeni Yüzü - SAP Management of Change
SAP ile Değişiklik Yönetiminin Yeni Yüzü - SAP Management of ChangeSAP ile Değişiklik Yönetiminin Yeni Yüzü - SAP Management of Change
SAP ile Değişiklik Yönetiminin Yeni Yüzü - SAP Management of Change
 
El amor no existe
El amor no existeEl amor no existe
El amor no existe
 
Sophee Smiles
Sophee SmilesSophee Smiles
Sophee Smiles
 
Calendario escolar-2015-2016-ultima-version
Calendario escolar-2015-2016-ultima-versionCalendario escolar-2015-2016-ultima-version
Calendario escolar-2015-2016-ultima-version
 

Similar to Analysis using r

Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxMalla Reddy University
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple stepsRenjith M P
 
data wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjhdata wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjhVISHALMARWADE1
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxShanmugasundaram M
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data AnalyticsOsman Ali
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Rohit Dubey
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dmsumit621
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cyclehktripathy
 
Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...RINUSATHYAN
 
Data Science Using Scikit-Learn
Data Science Using Scikit-LearnData Science Using Scikit-Learn
Data Science Using Scikit-LearnDucat India
 
Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R NotesLakshmiSarvani6
 
employee turnover prediction document.docx
employee turnover prediction document.docxemployee turnover prediction document.docx
employee turnover prediction document.docxrohithprabhas1
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfData Science Council of America
 
Congrats ! You got your Data Science Job
Congrats ! You got your Data Science JobCongrats ! You got your Data Science Job
Congrats ! You got your Data Science JobRohit Dubey
 
Comparing EDA with classical and Bayesian analysis.pptx
Comparing EDA with classical and Bayesian analysis.pptxComparing EDA with classical and Bayesian analysis.pptx
Comparing EDA with classical and Bayesian analysis.pptxPremaGanesh1
 

Similar to Analysis using r (20)

Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptx
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
 
data wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjhdata wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjh
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Data .pptx
Data .pptxData .pptx
Data .pptx
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cycle
 
Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...
 
Data Science Using Scikit-Learn
Data Science Using Scikit-LearnData Science Using Scikit-Learn
Data Science Using Scikit-Learn
 
Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R Notes
 
Stata tutorial university of princeton
Stata tutorial university of princetonStata tutorial university of princeton
Stata tutorial university of princeton
 
employee turnover prediction document.docx
employee turnover prediction document.docxemployee turnover prediction document.docx
employee turnover prediction document.docx
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
 
Congrats ! You got your Data Science Job
Congrats ! You got your Data Science JobCongrats ! You got your Data Science Job
Congrats ! You got your Data Science Job
 
Lecture-6-7.pptx
Lecture-6-7.pptxLecture-6-7.pptx
Lecture-6-7.pptx
 
fINAL ML PPT.pptx
fINAL ML PPT.pptxfINAL ML PPT.pptx
fINAL ML PPT.pptx
 
Comparing EDA with classical and Bayesian analysis.pptx
Comparing EDA with classical and Bayesian analysis.pptxComparing EDA with classical and Bayesian analysis.pptx
Comparing EDA with classical and Bayesian analysis.pptx
 

Recently uploaded

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 

Recently uploaded (20)

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 

Analysis using r

  • 1. Intorduction: Hi everyone, this session will be dealing with Data Analysis using R Language. Many would have found difficult to get started with Data Analysis and R as well. I can assure this will be very helpful for the beginners who really seeks help. So what is Data Analysis? By definition, it is the process of evaluating data using analytical and logical reasoning to examine each component of the data provided. This form of analysis is just one of the many steps that must be completed when conducting a research experiment. Data from various sources is gathered, reviewed, and then analyzed to form some sort of finding or conclusion. There are a variety of specific data analysis method, some of which include data mining, text analytics, business intelligence, and data visualizations. But in a very simple way, we can say that FINDING PATTERNS OR DATA INSIGHTS which will help to get concentrate business decisions/exceed customer experience. And R is a statistical tool used for data analysis and data science as well. R has in-built functions and provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. Basics: To become a Data Analyst you should be strong in the following areas:  Statistics  Data Mining  Python/R  Distributed Computing Let's start with Statistics, which is further classified into descriptive statistics( measure of central tendency, measure of dispersion, shape of data ), inferential statistics( infer from the sample data what the population might think ), explorative statistics( analysing to summarize their main characteristics ). Next comes the Data Mining, which includes data pre-processing( data cleaning, data transformation ), modelling etc. When comes to R i have already given a introduction about it, and Python, its again a wonderful programming language for Data Analysis which has many packages namely pandas, scikit-learn, matplotlib for visualization. R and Python are the 2 stars preferred by data analysts. Both are having their own strength and weakness. For Distributed Computing, i mean HADOOP technology, which is used mainly for storage and processing time of big data. Since history, data volume and variety is getting increased distributed computing been the limelight with Hadoop eco-system, which is simply called big data technology. Nowadays Hadoop has become the synonym for big data.
  • 2. Steps involved: The actual session starts here.. Make sure that the environment is ready. I 've explained the steps to be followed in detail... Step-1: PROBLEM STATEMENT You should be very clear about the problem statement given, what you are expected to do. Ask yourselves, what problem you have, is the data given is sufficient to solve the given problem statement. Step-2: DATA PREPROCESSING This is a very important process that a Data Analyst under goes. Initially you should collect the required data. First set the working directory where the file is present using setwd(). you can use any of the code to read the file with respect to the file format.  read.csv()  read.table()  read.xlsx()  for XML do the following library(XML) doc <- xmlTreeParse(fileUrl, useInternal = TRUE) And convert the loaded data to data frame to make the manipulation easy using data.frame(). Next comes the data cleaning, to handle missing values you can make use of is.na(), to remove missing values you can use na.omit() or na.exclude(). Next is data transformation, here we have type transformation which can be done by as.numeric()/as.double()/as.factor() etc. Normalization and Standardization also comes under data transformation. Once the preprocessing process is over 60% of work is over. Step-3: POPULATION AND SAMPLE Before getting into this, load the necessary packages needed using library("package name"), eg: library("caret"), library("class"). And dont forget to initialize the seed value, make use of set.seed(). Coming to the point, it is very important to to split the given dataset to training and testing data, since training data represents the population which is sample. Testing data should only be used to test the model, unless you should not touch it. Model is built only using the training data. This can be done by many methods here i have used createDataPartition() which is the function available in caret package. index <- createDataPartition(y, times = 1, p = 0.5, list = TRUE, ...) where, y - predictor variable times - number of partitions p - percentage of data that will be trained list - logical - should the results be in a list (TRUE) or a matrix with the number of rows
  • 3. equal to floor(p * length(y)) and times columns. training_data <- dataframe[index,] testing_data <- dataframe[-index,] Now training and testing data are partitioned and the model is ready to train. Step-4 : DATA MODELING To train the model we can use the function train() available in the caret package. model_trained <- train(y, x, method = "rf", preProcess = NULL, ...) Here Y is the predictor variable and X(x1 to xn) is the control/independent variables. There are many other methods like rf(random forest) such as glm(generic linear model). Refer http://caret.r-forge.r-project.org/bytag.html to know more about the models. Each model has its own restrictions. Step-5 : PREDICTION Once the model is been trained we can predict the model using the function available - predict(). predicted_model <- predict(model_trained, testing_data) You can also see whether your model is built and classified perfectly or not. Using confusionMatrix() we can achieve this. check <- confusionMatrix(y, predicted_model) In other words, you can use this confusion matrix to check against the training model to see how it will work for the training data. Step-6 : PLOTS The last step invloves plotting, you can make use of plot() which can be box plot or scatter plot or histogram or as per the requirement. As per the saing "1 picture speaks more than 1000 words", you can make use of plots to describe your results. Step-7 : REPORT Finally for report submission you can use Rmarkdown, where the file should be saved with the extension .rmd. To use Rmoarkdown check for the packages that are needed to be installed. --Thank You--