SlideShare a Scribd company logo
- Objectives
- Contents:
• Introduction of R
• Implementation of R integration with Hadoop
• When to use R in combination with Hadoop
• Examples using Hadoop
- Q&A
- References
Security Classification: Internal
Objectives
3
• Understand R
• Understand when to use R in combination
with Hadoop
• Understand the implementation of
integration
Security Classification: InternalR integration with Hadoop 5
• Software for Statistical Data Analysis
• Based on S
• Programming Environment
• Interpreted Language
• Data Storage, Analysis, Graphing
• Free and Open Source Software
Security Classification: InternalR integration with Hadoop 6
• Free and Open Source
• Strong User Community
• Highly extensible, flexible
• Implementation of high end statistical methods
• Flexible graphics and intelligent defaults
But ..
• Steep learning curve
• Slow for large datasets
Security Classification: InternalR integration with Hadoop 7
Security Classification: InternalR integration with Hadoop 9
• Use Hadoop to execute R code
• Use R to access data stored in Hadoop
Security Classification: InternalR integration with Hadoop 10
No Factor Mantra Guideline
1 R's natural strength Use R for statistical
computing
Consider integrating when your project can
be solved using code available in R, or when it
is not easily solved in other languages
2 Hadoop's natural
strength
Use Hadoop for
distributed storage &
batch computing
Consider integrating when your problem
requires lots of storage or when it could
benefit from parallelization
3 Coding effort Work smart, not hard R and Hadoop are tools, not "cure-all"
panaceas. Consider not integrating if it is
easier to solve your problem with other tools
4 Processing time Work smart, not hard Although some problems can benefit from
parallelization, consider not integrating if the
gains are negligible since this can help you
reduce the complexity of your project
Security Classification: InternalR integration with Hadoop 11
N
o
Scenario Use
R/Hadoop
?
Why? Example
1 Analyzing small data
stored in Hadoop
Y R can quickly download data analyze it
locally
Want to analyze summary datasets derived from
map reduce jobs done in Hadoop
2 Extracting complex
features from large
data stored in Hadoop
Y R has more built-in and contributed
functions that analyze data than many
standard programming languages
R is a natural language to use to write an algorithm
or classifier that extracts information about objects
contained in images
3 Applying prediction
and classification
models to datasets
Y R is better at modeling than many
standard programming languages
Using a logistic regression model to generate
predictions in a large dataset
4 Implementing an
"iteration-based"
machine
learning algorithm
Maybe 1) Other languages may be faster than R
for your analysis
2) Hadoop reads and writes a lot of data
to disks, other "big data" tools, like Spark
(and SparkR) are designed for speed in
these scenarios by working in memory
Training a k-means classification algorithm or
logistic regression on a large dataset
5 Simple preprocessing
of large data stored in
Hadoop
N Standard programming languages are
much faster than R at executing many
basic text and image processing
tasks
Pre-processing twitter tweets for use in a natural
language processing project
Security Classification: InternalR integration with Hadoop 12
Security Classification: InternalR integration with Hadoop 13
rhdfs:
• Manipulate HDFS directly from R
• Mimic as much of the HDFS Java API as possible
• Examples:
– Read a HDFS text file into a data frame.
– Serialize/Deserialize a model to HDFS
– Write an HDFS file to local storage
• rhdfs/pkg/inst/unitTests
• rhdfs/pkg/inst/examples
Security Classification: InternalR integration with Hadoop 14
rhbase:
• Manipulate HBASE tables and their content
• Uses Thrift C++ API as the mechanism to
communicate to HBASE
• Examples:
– Create a data frame from a collection of rows
and columns in an HBASE table
– Update an HBASE table with values from a data
frame
Security Classification: InternalR integration with Hadoop 15
rmr:
• Designed to be the simplest and most elegant way to
write MapReduce programs
• Gives the R programmer the tools necessary to perform
data analysis in a way that is “R” like
• Provides an abstraction layer to hide the implementation
details
Security Classification: InternalR integration with Hadoop 16
Security Classification: InternalR integration with Hadoop 17
Security Classification: InternalR integration with Hadoop 18
Security Classification: InternalR integration with Hadoop 19
Security Classification: InternalR integration with Hadoop 20
Security Classification: InternalR integration with Hadoop 21
Security Classification: InternalR integration with Hadoop 22
Security Classification: Internal
References
Big data and Hadoop
introduction 24
- http://cran-rproject.org
- http://revolutionanalytics.com
- Hadoop for dummies
R – a brief introduction
Gilberto Câmara
R Hadoop integration

More Related Content

What's hot

data hiding techniques.ppt
data hiding techniques.pptdata hiding techniques.ppt
data hiding techniques.ppt
Muzamil Amin
 
Big data ppt
Big data pptBig data ppt
Big data ppt
IDBI Bank Ltd.
 
Information retrieval 13 alternative set theoretic models
Information retrieval 13 alternative set theoretic modelsInformation retrieval 13 alternative set theoretic models
Information retrieval 13 alternative set theoretic models
Vaibhav Khanna
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
Mohammad Junaid Khan
 
GFS & HDFS Introduction
GFS & HDFS IntroductionGFS & HDFS Introduction
GFS & HDFS Introduction
Hariharan Ganesan
 
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer SheetPerspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
Hoang Nguyen Phong
 
Stuart russell and peter norvig artificial intelligence - a modern approach...
Stuart russell and peter norvig   artificial intelligence - a modern approach...Stuart russell and peter norvig   artificial intelligence - a modern approach...
Stuart russell and peter norvig artificial intelligence - a modern approach...
Lê Anh Đạt
 
Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data Analysis
Vincenzo Gulisano
 
Challenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptxChallenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptx
GovardhanV7
 
data science chapter-4,5,6
data science chapter-4,5,6data science chapter-4,5,6
data science chapter-4,5,6
varshakumar21
 
Data Mining Technique - CRISP-DM
Data Mining Technique - CRISP-DMData Mining Technique - CRISP-DM
Data Mining Technique - CRISP-DM
Ashish Chandra Jha
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
ShivanandaVSeeri
 
CS9222 Advanced Operating System
CS9222 Advanced Operating SystemCS9222 Advanced Operating System
CS9222 Advanced Operating System
Kathirvel Ayyaswamy
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
RohithND
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques
Houw Liong The
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Ghulam Imaduddin
 
R Programming Language
R Programming LanguageR Programming Language
R Programming Language
NareshKarela1
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Marina Santini
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3Laila Fatehy
 

What's hot (20)

data hiding techniques.ppt
data hiding techniques.pptdata hiding techniques.ppt
data hiding techniques.ppt
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Information retrieval 13 alternative set theoretic models
Information retrieval 13 alternative set theoretic modelsInformation retrieval 13 alternative set theoretic models
Information retrieval 13 alternative set theoretic models
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
GFS & HDFS Introduction
GFS & HDFS IntroductionGFS & HDFS Introduction
GFS & HDFS Introduction
 
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer SheetPerspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
 
Stuart russell and peter norvig artificial intelligence - a modern approach...
Stuart russell and peter norvig   artificial intelligence - a modern approach...Stuart russell and peter norvig   artificial intelligence - a modern approach...
Stuart russell and peter norvig artificial intelligence - a modern approach...
 
Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data Analysis
 
Challenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptxChallenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptx
 
data science chapter-4,5,6
data science chapter-4,5,6data science chapter-4,5,6
data science chapter-4,5,6
 
Data Mining Technique - CRISP-DM
Data Mining Technique - CRISP-DMData Mining Technique - CRISP-DM
Data Mining Technique - CRISP-DM
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
CS9222 Advanced Operating System
CS9222 Advanced Operating SystemCS9222 Advanced Operating System
CS9222 Advanced Operating System
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Google File System
Google File SystemGoogle File System
Google File System
 
R Programming Language
R Programming LanguageR Programming Language
R Programming Language
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3
 

Viewers also liked

Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
Dzung Nguyen
 
T-SQL performance improvement - session 2 - Owned copy
T-SQL performance improvement - session 2 - Owned copyT-SQL performance improvement - session 2 - Owned copy
T-SQL performance improvement - session 2 - Owned copyDzung Nguyen
 
JIRA Service Desk + ChatOps Webinar Deck
JIRA Service Desk + ChatOps Webinar DeckJIRA Service Desk + ChatOps Webinar Deck
JIRA Service Desk + ChatOps Webinar Deck
Addteq
 
Big data and Hadoop introduction
Big data and Hadoop introductionBig data and Hadoop introduction
Big data and Hadoop introductionDzung Nguyen
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
Amir Shaikh
 
JIRA Service Desk - Tokyo, Japan Sept. 26, 2014
JIRA Service Desk - Tokyo, Japan Sept. 26, 2014JIRA Service Desk - Tokyo, Japan Sept. 26, 2014
JIRA Service Desk - Tokyo, Japan Sept. 26, 2014
Adam Laskowski
 
RHadoop, R meets Hadoop
RHadoop, R meets HadoopRHadoop, R meets Hadoop
RHadoop, R meets Hadoop
Revolution Analytics
 

Viewers also liked (7)

Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 
T-SQL performance improvement - session 2 - Owned copy
T-SQL performance improvement - session 2 - Owned copyT-SQL performance improvement - session 2 - Owned copy
T-SQL performance improvement - session 2 - Owned copy
 
JIRA Service Desk + ChatOps Webinar Deck
JIRA Service Desk + ChatOps Webinar DeckJIRA Service Desk + ChatOps Webinar Deck
JIRA Service Desk + ChatOps Webinar Deck
 
Big data and Hadoop introduction
Big data and Hadoop introductionBig data and Hadoop introduction
Big data and Hadoop introduction
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
JIRA Service Desk - Tokyo, Japan Sept. 26, 2014
JIRA Service Desk - Tokyo, Japan Sept. 26, 2014JIRA Service Desk - Tokyo, Japan Sept. 26, 2014
JIRA Service Desk - Tokyo, Japan Sept. 26, 2014
 
RHadoop, R meets Hadoop
RHadoop, R meets HadoopRHadoop, R meets Hadoop
RHadoop, R meets Hadoop
 

Similar to R Hadoop integration

Unit-3_BDA.ppt
Unit-3_BDA.pptUnit-3_BDA.ppt
Unit-3_BDA.ppt
PoojaShah174393
 
R & Python on Hadoop
R & Python on HadoopR & Python on Hadoop
R & Python on Hadoop
Ming Yuan
 
Big Data - Analytics with R
Big Data - Analytics with RBig Data - Analytics with R
Big Data - Analytics with R
Techsparks
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
Anthony Thomas
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
Hitendra Kumar
 
Integrating R & Hadoop - Text Mining & Sentiment Analysis
Integrating R & Hadoop - Text Mining & Sentiment AnalysisIntegrating R & Hadoop - Text Mining & Sentiment Analysis
Integrating R & Hadoop - Text Mining & Sentiment Analysis
Aravind Babu
 
What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?
tommychauhan
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
Harikrishnan K
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
Dr.Florence Dayana
 
Hadoop - A Very Short Introduction
Hadoop - A Very Short IntroductionHadoop - A Very Short Introduction
Hadoop - A Very Short Introduction
dewang_mistry
 
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love ItIBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
IBM Analytics
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
Mahmoud Yassin
 
Hadoop With R language.pptx
Hadoop With R language.pptxHadoop With R language.pptx
Hadoop With R language.pptx
ujjwalmatoliya
 
Lecture 2 Hadoop.pptx
Lecture 2 Hadoop.pptxLecture 2 Hadoop.pptx
Lecture 2 Hadoop.pptx
Anonymous9etQKwW
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
Mishika Bharadwaj
 

Similar to R Hadoop integration (20)

Unit-3_BDA.ppt
Unit-3_BDA.pptUnit-3_BDA.ppt
Unit-3_BDA.ppt
 
R & Python on Hadoop
R & Python on HadoopR & Python on Hadoop
R & Python on Hadoop
 
Big Data - Analytics with R
Big Data - Analytics with RBig Data - Analytics with R
Big Data - Analytics with R
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Integrating R & Hadoop - Text Mining & Sentiment Analysis
Integrating R & Hadoop - Text Mining & Sentiment AnalysisIntegrating R & Hadoop - Text Mining & Sentiment Analysis
Integrating R & Hadoop - Text Mining & Sentiment Analysis
 
What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Hadoop - A Very Short Introduction
Hadoop - A Very Short IntroductionHadoop - A Very Short Introduction
Hadoop - A Very Short Introduction
 
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love ItIBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Hadoop With R language.pptx
Hadoop With R language.pptxHadoop With R language.pptx
Hadoop With R language.pptx
 
Lecture 2 Hadoop.pptx
Lecture 2 Hadoop.pptxLecture 2 Hadoop.pptx
Lecture 2 Hadoop.pptx
 
hadoop_module
hadoop_modulehadoop_module
hadoop_module
 
Anju
AnjuAnju
Anju
 
Hadoop
HadoopHadoop
Hadoop
 
ODSC and iRODS
ODSC and iRODSODSC and iRODS
ODSC and iRODS
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 

R Hadoop integration

  • 1.
  • 2. - Objectives - Contents: • Introduction of R • Implementation of R integration with Hadoop • When to use R in combination with Hadoop • Examples using Hadoop - Q&A - References
  • 3. Security Classification: Internal Objectives 3 • Understand R • Understand when to use R in combination with Hadoop • Understand the implementation of integration
  • 4.
  • 5. Security Classification: InternalR integration with Hadoop 5 • Software for Statistical Data Analysis • Based on S • Programming Environment • Interpreted Language • Data Storage, Analysis, Graphing • Free and Open Source Software
  • 6. Security Classification: InternalR integration with Hadoop 6 • Free and Open Source • Strong User Community • Highly extensible, flexible • Implementation of high end statistical methods • Flexible graphics and intelligent defaults But .. • Steep learning curve • Slow for large datasets
  • 7. Security Classification: InternalR integration with Hadoop 7
  • 8.
  • 9. Security Classification: InternalR integration with Hadoop 9 • Use Hadoop to execute R code • Use R to access data stored in Hadoop
  • 10. Security Classification: InternalR integration with Hadoop 10 No Factor Mantra Guideline 1 R's natural strength Use R for statistical computing Consider integrating when your project can be solved using code available in R, or when it is not easily solved in other languages 2 Hadoop's natural strength Use Hadoop for distributed storage & batch computing Consider integrating when your problem requires lots of storage or when it could benefit from parallelization 3 Coding effort Work smart, not hard R and Hadoop are tools, not "cure-all" panaceas. Consider not integrating if it is easier to solve your problem with other tools 4 Processing time Work smart, not hard Although some problems can benefit from parallelization, consider not integrating if the gains are negligible since this can help you reduce the complexity of your project
  • 11. Security Classification: InternalR integration with Hadoop 11 N o Scenario Use R/Hadoop ? Why? Example 1 Analyzing small data stored in Hadoop Y R can quickly download data analyze it locally Want to analyze summary datasets derived from map reduce jobs done in Hadoop 2 Extracting complex features from large data stored in Hadoop Y R has more built-in and contributed functions that analyze data than many standard programming languages R is a natural language to use to write an algorithm or classifier that extracts information about objects contained in images 3 Applying prediction and classification models to datasets Y R is better at modeling than many standard programming languages Using a logistic regression model to generate predictions in a large dataset 4 Implementing an "iteration-based" machine learning algorithm Maybe 1) Other languages may be faster than R for your analysis 2) Hadoop reads and writes a lot of data to disks, other "big data" tools, like Spark (and SparkR) are designed for speed in these scenarios by working in memory Training a k-means classification algorithm or logistic regression on a large dataset 5 Simple preprocessing of large data stored in Hadoop N Standard programming languages are much faster than R at executing many basic text and image processing tasks Pre-processing twitter tweets for use in a natural language processing project
  • 12. Security Classification: InternalR integration with Hadoop 12
  • 13. Security Classification: InternalR integration with Hadoop 13 rhdfs: • Manipulate HDFS directly from R • Mimic as much of the HDFS Java API as possible • Examples: – Read a HDFS text file into a data frame. – Serialize/Deserialize a model to HDFS – Write an HDFS file to local storage • rhdfs/pkg/inst/unitTests • rhdfs/pkg/inst/examples
  • 14. Security Classification: InternalR integration with Hadoop 14 rhbase: • Manipulate HBASE tables and their content • Uses Thrift C++ API as the mechanism to communicate to HBASE • Examples: – Create a data frame from a collection of rows and columns in an HBASE table – Update an HBASE table with values from a data frame
  • 15. Security Classification: InternalR integration with Hadoop 15 rmr: • Designed to be the simplest and most elegant way to write MapReduce programs • Gives the R programmer the tools necessary to perform data analysis in a way that is “R” like • Provides an abstraction layer to hide the implementation details
  • 16. Security Classification: InternalR integration with Hadoop 16
  • 17. Security Classification: InternalR integration with Hadoop 17
  • 18. Security Classification: InternalR integration with Hadoop 18
  • 19. Security Classification: InternalR integration with Hadoop 19
  • 20. Security Classification: InternalR integration with Hadoop 20
  • 21. Security Classification: InternalR integration with Hadoop 21
  • 22. Security Classification: InternalR integration with Hadoop 22
  • 23.
  • 24. Security Classification: Internal References Big data and Hadoop introduction 24 - http://cran-rproject.org - http://revolutionanalytics.com - Hadoop for dummies R – a brief introduction Gilberto Câmara

Editor's Notes

  1. R is a software that provides a programming environment for doing statistical data analysis. This software was written by Robert Gentleman and Ross Ihaka and the name of the software bear the name of the creators. It is a free implementation of S, another popular statistical software. R can be effectively used for data storage, data analysis and a variety of graphing functions. R is distributed free and it is an open source software.
  2. R is a great software. It is freely distributed (free both in price as well as in freedom of usage, no restrictions). It has a very strong user community who are ready to help newbies and share information. It has extensive documentation. Best of all, it is extremely scalable, meaning from very low end to very high end, all types of statistical methods can be easily implemented using R. The graphics of R are very flexible and there are many intelligent defaults. Intelligent defaults mean R can guess what you are trying to do and act accordingly. On the downside, it can be time-consuming to learn to use it effectively. The learning process is slow, sometimes frustrating, but in the end, it is a rewarding experience. However, for very large datasets, R can sometimes be slow, but there are several ways to speed up R. The newer versions are invariably faster than the older ones, so continuous upgrading of the software is a good way to speed things up.