Big data analytics using R

•Download as PPTX, PDF•

3 likes•1,332 views

Karthik Padmanabhan ( MLE℠)

This presentation outlines the usage of the open source software (R) in Analytics using the Big Data.

Data & Analytics

Big Data Practice – Current Limitations
• There are hundreds of Vendors in the Big Data
Space with each having its own limitations/strengths.
So it becomes very hard to learn multiple software for
each of the tasks.
• Also connecting these individual systems using
customized connectors becomes a big challenge.
• The main deterrent is the steep learning curve behind
these technologies and hence no human resources
can be found for implementation projects.

Points in Favor of R
• R is open source and has a large community behind it
working on and coming up with lot of innovation.
• It has close to 5000 packages and the count is increasing
exponentially.
• There is a specific research community working on using
R for Big Data Analytics.
• Companies like Revolutionary Analytics are coming up
with innovative approaches in customizing and using R for
handling massive datasets.

Points Against R
• R is a single threaded programming language with
limited memory management capabilities as it uses
the system memory.
• In order to use R for handling massive datasets it has
to scale up both in processing and memory
management capabilities.
• We will look into these aspects and some specific
strategies/approaches to circumvent these issues.

Verdict – Provide a chance to R.
• We will suggest some new ways/techniques in R to
handle the Big data though lot of these are in
research stages and not in production yet.
• However companies and research communities are
in high hopes that the current limitations will be
addressed and R can be used as a single go to
software offering integrated End to End capabilities in
this space.

Approaches to address R current
limitations
• We can spread the work to be handled in parallel by
running in multiple CPU’s or running in clusters.
• There are some latest packages in R which would be
of interest to explore in this direction.
• We have described some of these in the next slides.

R Packages for Big Data
• Snow
The package snow (an acronym for Simple Network of Workstations)
provides a high-level interface for using a workstation cluster for
parallel computations in R.
• Multicore
This package provides functions for parallel execution of R code on
machines with multiple cores or CPUs. Unlike other parallel processing
methods all jobs share the full state of R when spawned, so no data or
code needs to be initialized. The actual spawning is very fast as well
since no new R instance needs to be started.
• Parallel
Includes a parallel random number generator (RNG); important for
simulations. Particularly suitable for ’single program, multiple data’
(SPMD) problems.

R Packages for Big Data
• scaleR Algorithms (Revolutionary Analytics)
ScaleR algorithms enable R developers to run R scripts on massive
data sets at high speeds. ScaleR enables R developers to easily
maximize compute capability without writing any distributed
applications themselves.
• Rhipe
Rhipe is a software package that allows the R user to create
MapReduce jobs that work entirely within the R environment using R
expressions.
• Seague
We can peform simple parallel processing with a fast & easy setup on
AWS Elastic Map Reduce. So its like performing the parallel computing
directly on top of cloud. Segue has a simple goal: Parallel functionality
in R; two lines of code; in under 15 minutes.

Sample Applications
• Social media mining with R (Ex: Mining Twitter streams)
• WebCrawling using R with the RCurl package. Also
Apache Nutch will be another option.
• Social Network Analysis (SNA) with graph data.

Your Views/Queries
• Please email it to karinfoin@yahoo.com
• Thank you

What's hot

R Programming Overview dlamb3244

High Performance Predictive Analytics in R and HadoopRevolution Analytics

Intro to R for SAS and SPSS User WebinarRevolution Analytics

Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Revolution Analytics

Why R? A Brief Introduction to the Open Source Statistics PlatformSyracuse University

Are You Ready for Big Data Big Analytics? Revolution Analytics

Webinar : Introduction to R Programming and Machine LearningEdureka!

R programming Language , Rahul SinghRavi Basil

R and Big Data using Revolution R Enterprise with HadoopRevolution Analytics

R and-hadoopBryan Downing

High Performance Predictive Analytics in R and HadoopDataWorks Summit

R tutorialRichard Vidgen

In-Database Analytics Deep Dive with Teradata and RevolutionRevolution Analytics

R reproducibilityRevolution Analytics

The R EcosystemRevolution Analytics

Taking R Analytics to SQL and the CloudRevolution Analytics

Data Analytics DomainMultisoft Virtual Academy

Big data analytics on teradata with revolution r enterprise bill jacobsBill Jacobs

12Nov13 Webinar: Big Data Analysis with Teradata and Revolution AnalyticsRevolution Analytics

Hadoop - A Very Short Introductiondewang_mistry

What's hot (20)

R Programming Overview

High Performance Predictive Analytics in R and Hadoop

Intro to R for SAS and SPSS User Webinar

Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...

Why R? A Brief Introduction to the Open Source Statistics Platform

Are You Ready for Big Data Big Analytics?

Webinar : Introduction to R Programming and Machine Learning

R programming Language , Rahul Singh

R and Big Data using Revolution R Enterprise with Hadoop

R and-hadoop

High Performance Predictive Analytics in R and Hadoop

R tutorial

In-Database Analytics Deep Dive with Teradata and Revolution

R reproducibility

The R Ecosystem

Taking R Analytics to SQL and the Cloud

Data Analytics Domain

Big data analytics on teradata with revolution r enterprise bill jacobs

12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics

Hadoop - A Very Short Introduction

Viewers also liked

Data Analytics using Rrichards9696

R language tutorialDavid Chiu

Introduction to hadoopkarthika karthi

Jsm big-dataSean Taylor

Example R usage for oracle DBA UKOUG 2013BertrandDrouvot

Python for Data Anaysis第２回勉強会４，５章Makoto Kawano

Introduction To RMichael Driscoll

Intoroduction of Pandas with PythonAtsushi Hayakawa

Data Science and Big Data Analytics Book from EMC Education ServicesEMC

Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2Jeffrey Breen

RHadoopPraveen Kumar Donta

A Workshop on RAjay Ohri

Presentation R basic teaching moduleSander Timmer

Big Data: tools and techniques for working with large data setsBoston Consulting Group

Tools and techniques for data scienceAjay Ohri

Data analysis with RShareThis

Language RGirish Khanzode

Big Data Tutorial - Marko Grobelnik - 25 May 2012Marko Grobelnik

Statistics for data scientistsAjay Ohri

Python for R UsersAjay Ohri

Viewers also liked (20)

Data Analytics using R

R language tutorial

Introduction to hadoop

Jsm big-data

Example R usage for oracle DBA UKOUG 2013

Python for Data Anaysis第２回勉強会４，５章

Introduction To R

Intoroduction of Pandas with Python

Data Science and Big Data Analytics Book from EMC Education Services

Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2

RHadoop

A Workshop on R

Presentation R basic teaching module

Big Data: tools and techniques for working with large data sets

Tools and techniques for data science

Data analysis with R

Language R

Big Data Tutorial - Marko Grobelnik - 25 May 2012

Statistics for data scientists

Python for R Users

Similar to Big data analytics using R

Analytics Beyond RAM Capacity using RAlex Palamides

Performance and Scale Options for R with Hadoop: A comparison of potential ar...Revolution Analytics

An introduction to R is a document usefulssuser3c3f88

Moving From SAS to R Webinar Presentation - 07Aug14Revolution Analytics

R programming presentationAkshat Sharma

Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance

Michal Marušan: Scalable RGapData Institute

Open source analyticsAjay Ohri

Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Data Con LA

Is Spark the right choice for data analysis ?Ahmed Kamal

Map reducecloudtechJakir Hossain

R_L1-Aug-2022.pptxShantilalBhayal1

Big data Big AnalyticsAjay Ohri

IJSRED-V2I3P43IJSRED

Introduction to Microsoft R ServicesGregg Barrett

Analyzing Big data in R and Scala using Apache Spark 17-7-19Ahmed Elsayed

R as supporting tool for analytics and simulationAlvaro Gil

Big data: Descoberta de conhecimento em ambientes de big data e computação na...Rio Info

Scalable Data Analysis in R -- Lee EdlefsenRevolution Analytics

Data Science - Part II - Working with R & R studioDerek Kane

Similar to Big data analytics using R (20)

Analytics Beyond RAM Capacity using R

Performance and Scale Options for R with Hadoop: A comparison of potential ar...

An introduction to R is a document useful

Moving From SAS to R Webinar Presentation - 07Aug14

R programming presentation

Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...

Michal Marušan: Scalable R

Open source analytics

Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...

Is Spark the right choice for data analysis ?

Map reducecloudtech

R_L1-Aug-2022.pptx

Big data Big Analytics

IJSRED-V2I3P43

Introduction to Microsoft R Services

Analyzing Big data in R and Scala using Apache Spark 17-7-19

R as supporting tool for analytics and simulation

Big data: Descoberta de conhecimento em ambientes de big data e computação na...

Scalable Data Analysis in R -- Lee Edlefsen

Data Science - Part II - Working with R & R studio

Recently uploaded

办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort

How we prevented account sharing with MFAAndrei Kaleshka

Industrialised data - the key to AI success.pdfLars Albertsson

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh

GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch

Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss

Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna

办理学位证纽约大学毕业证(NYU毕业证书）原版一比一fhwihughh

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa

办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408

04242024_CCC TUG_Joins and Relationshipsccctableauusergroup

Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda

Recently uploaded (20)

办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service

How we prevented account sharing with MFA

Industrialised data - the key to AI success.pdf

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...

GA4 Without Cookies [Measure Camp AMS]

Dubai Call Girls Wifey O52&786472 Call Girls Dubai

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...

办理学位证纽约大学毕业证(NYU毕业证书）原版一比一

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf

办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps

04242024_CCC TUG_Joins and Relationships

Customer Service Analytics - Make Sense of All Your Data.pptx

Big data analytics using R

1. USING “R” IN BIG DATA ANALYTICS Author – Karthik Padmanabhan Designation – Deputy Manager Ford Motor Company

2. Big Data Practice – Current Limitations • There are hundreds of Vendors in the Big Data Space with each having its own limitations/strengths. So it becomes very hard to learn multiple software for each of the tasks. • Also connecting these individual systems using customized connectors becomes a big challenge. • The main deterrent is the steep learning curve behind these technologies and hence no human resources can be found for implementation projects.

3. Points in Favor of R • R is open source and has a large community behind it working on and coming up with lot of innovation. • It has close to 5000 packages and the count is increasing exponentially. • There is a specific research community working on using R for Big Data Analytics. • Companies like Revolutionary Analytics are coming up with innovative approaches in customizing and using R for handling massive datasets.

4. Points Against R • R is a single threaded programming language with limited memory management capabilities as it uses the system memory. • In order to use R for handling massive datasets it has to scale up both in processing and memory management capabilities. • We will look into these aspects and some specific strategies/approaches to circumvent these issues.

5. Verdict – Provide a chance to R. • We will suggest some new ways/techniques in R to handle the Big data though lot of these are in research stages and not in production yet. • However companies and research communities are in high hopes that the current limitations will be addressed and R can be used as a single go to software offering integrated End to End capabilities in this space.

6. Approaches to address R current limitations • We can spread the work to be handled in parallel by running in multiple CPU’s or running in clusters. • There are some latest packages in R which would be of interest to explore in this direction. • We have described some of these in the next slides.

7. R Packages for Big Data • Snow The package snow (an acronym for Simple Network of Workstations) provides a high-level interface for using a workstation cluster for parallel computations in R. • Multicore This package provides functions for parallel execution of R code on machines with multiple cores or CPUs. Unlike other parallel processing methods all jobs share the full state of R when spawned, so no data or code needs to be initialized. The actual spawning is very fast as well since no new R instance needs to be started. • Parallel Includes a parallel random number generator (RNG); important for simulations. Particularly suitable for ’single program, multiple data’ (SPMD) problems.

8. R Packages for Big Data • scaleR Algorithms (Revolutionary Analytics) ScaleR algorithms enable R developers to run R scripts on massive data sets at high speeds. ScaleR enables R developers to easily maximize compute capability without writing any distributed applications themselves. • Rhipe Rhipe is a software package that allows the R user to create MapReduce jobs that work entirely within the R environment using R expressions. • Seague We can peform simple parallel processing with a fast & easy setup on AWS Elastic Map Reduce. So its like performing the parallel computing directly on top of cloud. Segue has a simple goal: Parallel functionality in R; two lines of code; in under 15 minutes.

9. Sample Applications • Social media mining with R (Ex: Mining Twitter streams) • WebCrawling using R with the RCurl package. Also Apache Nutch will be another option. • Social Network Analysis (SNA) with graph data.

10. Your Views/Queries • Please email it to karinfoin@yahoo.com • Thank you

Big data analytics using R

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Big data analytics using R

Similar to Big data analytics using R (20)

Recently uploaded

Recently uploaded (20)

Big data analytics using R