SlideShare a Scribd company logo
1 of 8
GANDHINAGAR INSTITUTE OF TECHNOLGY
Information Technology Department
RDD Transformations
Presented By:-Shaishav Shah
Student ID: GIT_IT_B_21
Guided By
Prof. Pooja Shah
BDA (2171607)
What is RDD?
• RDD means Resilient distributed dataset.
• Spark revolves around the concept of RDD which is a fault-
tolerant collection of elements that can be operated in parallel.
• There are two ways to create RDDs, it can be created by
parallelizing an existing collection in your driver program, or
referencing a dataset in an external storage system such as
(HDFS, Hbase, or any datasource offering Hadoop format)
RDDs & its Operations:-
• There are basically two types of RDDs operations in spark.
1. Transformations.
2. Actions.
Transformations
• The RDD transformations are some functions that takes one
RDD as input and form one or more than one RDD as an
output .
• As all RDDs are immutable then the main RDD will not be
changed.
• It is lazy operation though it creates some RDDs but they can
executes when an action is called.
Types of RDD Transformation:
• To improve the computation performance, we can set some
transformations as pipelined. It helps to optimize process.
• There are two kinds of transformations:
1. Narrow Transformation
2. Wide Transformation
Narrow Transformation
• Narrow transformations are
generated as a result of
Map, Filter or these kind of
operations
• It originates from a single
partition in a parent RDD .
Only some partitions are
used to find result.
Wide Transformation
• Wide Transformations are
generated as a result of
GroupBykey(),
ReduceBykey() or these kind
of operations.
• In these case to form a data
partition, it can take data from
more than one partitions.
• It is also known as shuffle
partition.
Thank You

More Related Content

Similar to Rdd transformations bda

WHAT IS HADOOP AND ITS COMPONENTS?
WHAT IS HADOOP AND ITS COMPONENTS? WHAT IS HADOOP AND ITS COMPONENTS?
WHAT IS HADOOP AND ITS COMPONENTS? nakshatraL
 
Geek Night - Functional Data Processing using Spark and Scala
Geek Night - Functional Data Processing using Spark and ScalaGeek Night - Functional Data Processing using Spark and Scala
Geek Night - Functional Data Processing using Spark and ScalaAtif Akhtar
 
Unit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptxUnit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptxRahul Borate
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analyticsinoshg
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive huguk
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkRahul Kumar
 
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014cdmaxime
 
JDG 7 & Spark Integration
JDG 7 & Spark IntegrationJDG 7 & Spark Integration
JDG 7 & Spark IntegrationTed Won
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark Mostafa
 
Spark cluster computing with working sets
Spark cluster computing with working setsSpark cluster computing with working sets
Spark cluster computing with working setsJinxinTang
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark Juan Pedro Moreno
 
Bigdata and Hadoop with Docker
Bigdata and Hadoop with DockerBigdata and Hadoop with Docker
Bigdata and Hadoop with Dockerharidasnss
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for BeginnersAnirudh
 
Resilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARKResilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARKTaposh Roy
 

Similar to Rdd transformations bda (20)

WHAT IS HADOOP AND ITS COMPONENTS?
WHAT IS HADOOP AND ITS COMPONENTS? WHAT IS HADOOP AND ITS COMPONENTS?
WHAT IS HADOOP AND ITS COMPONENTS?
 
Spark
SparkSpark
Spark
 
SPARK ARCHITECTURE
SPARK ARCHITECTURESPARK ARCHITECTURE
SPARK ARCHITECTURE
 
Geek Night - Functional Data Processing using Spark and Scala
Geek Night - Functional Data Processing using Spark and ScalaGeek Night - Functional Data Processing using Spark and Scala
Geek Night - Functional Data Processing using Spark and Scala
 
Unit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptxUnit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptx
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analytics
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
 
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
 
JDG 7 & Spark Integration
JDG 7 & Spark IntegrationJDG 7 & Spark Integration
JDG 7 & Spark Integration
 
APACHE SPARK.pptx
APACHE SPARK.pptxAPACHE SPARK.pptx
APACHE SPARK.pptx
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
Spark cluster computing with working sets
Spark cluster computing with working setsSpark cluster computing with working sets
Spark cluster computing with working sets
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
Big data overview
Big data overviewBig data overview
Big data overview
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Bigdata and Hadoop with Docker
Bigdata and Hadoop with DockerBigdata and Hadoop with Docker
Bigdata and Hadoop with Docker
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for Beginners
 
Resilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARKResilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARK
 

More from ShaishavShah8

Diffie hellman key algorithm
Diffie hellman key algorithmDiffie hellman key algorithm
Diffie hellman key algorithmShaishavShah8
 
Clipping computer graphics
Clipping  computer graphicsClipping  computer graphics
Clipping computer graphicsShaishavShah8
 
Classification of debuggers sp
Classification of debuggers spClassification of debuggers sp
Classification of debuggers spShaishavShah8
 
Parallel and perspective projection in 3 d cg
Parallel and perspective projection in 3 d cgParallel and perspective projection in 3 d cg
Parallel and perspective projection in 3 d cgShaishavShah8
 
Asymptotic notations ada
Asymptotic notations adaAsymptotic notations ada
Asymptotic notations adaShaishavShah8
 
Classical cyphers python programming
Classical cyphers python programmingClassical cyphers python programming
Classical cyphers python programmingShaishavShah8
 
Logics for non monotonic reasoning-ai
Logics for non monotonic reasoning-aiLogics for non monotonic reasoning-ai
Logics for non monotonic reasoning-aiShaishavShah8
 
Introduction to data warehouse dmbi
Introduction to data warehouse dmbiIntroduction to data warehouse dmbi
Introduction to data warehouse dmbiShaishavShah8
 
Introduction to xml, uses of xml wt
Introduction to xml, uses of xml wtIntroduction to xml, uses of xml wt
Introduction to xml, uses of xml wtShaishavShah8
 
Applications of huffman coding dcdr
Applications of huffman coding dcdrApplications of huffman coding dcdr
Applications of huffman coding dcdrShaishavShah8
 
Cookie management using jsp a java
Cookie management using jsp  a javaCookie management using jsp  a java
Cookie management using jsp a javaShaishavShah8
 
Introduction to data warehouse
Introduction to data warehouseIntroduction to data warehouse
Introduction to data warehouseShaishavShah8
 

More from ShaishavShah8 (18)

Diffie hellman key algorithm
Diffie hellman key algorithmDiffie hellman key algorithm
Diffie hellman key algorithm
 
Constructor oopj
Constructor oopjConstructor oopj
Constructor oopj
 
Clipping computer graphics
Clipping  computer graphicsClipping  computer graphics
Clipping computer graphics
 
Classification of debuggers sp
Classification of debuggers spClassification of debuggers sp
Classification of debuggers sp
 
Parallel and perspective projection in 3 d cg
Parallel and perspective projection in 3 d cgParallel and perspective projection in 3 d cg
Parallel and perspective projection in 3 d cg
 
Asymptotic notations ada
Asymptotic notations adaAsymptotic notations ada
Asymptotic notations ada
 
Arrays in java oopj
Arrays in java oopjArrays in java oopj
Arrays in java oopj
 
Classical cyphers python programming
Classical cyphers python programmingClassical cyphers python programming
Classical cyphers python programming
 
Logics for non monotonic reasoning-ai
Logics for non monotonic reasoning-aiLogics for non monotonic reasoning-ai
Logics for non monotonic reasoning-ai
 
Introduction to data warehouse dmbi
Introduction to data warehouse dmbiIntroduction to data warehouse dmbi
Introduction to data warehouse dmbi
 
Lan, wan, man mcwc
Lan, wan, man mcwcLan, wan, man mcwc
Lan, wan, man mcwc
 
Introduction to xml, uses of xml wt
Introduction to xml, uses of xml wtIntroduction to xml, uses of xml wt
Introduction to xml, uses of xml wt
 
Agile process se
Agile process seAgile process se
Agile process se
 
Applications of huffman coding dcdr
Applications of huffman coding dcdrApplications of huffman coding dcdr
Applications of huffman coding dcdr
 
Cookie management using jsp a java
Cookie management using jsp  a javaCookie management using jsp  a java
Cookie management using jsp a java
 
Login control .net
Login control .netLogin control .net
Login control .net
 
LAN, WAN, MAN
LAN, WAN, MANLAN, WAN, MAN
LAN, WAN, MAN
 
Introduction to data warehouse
Introduction to data warehouseIntroduction to data warehouse
Introduction to data warehouse
 

Recently uploaded

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Rdd transformations bda

  • 1. GANDHINAGAR INSTITUTE OF TECHNOLGY Information Technology Department RDD Transformations Presented By:-Shaishav Shah Student ID: GIT_IT_B_21 Guided By Prof. Pooja Shah BDA (2171607)
  • 2. What is RDD? • RDD means Resilient distributed dataset. • Spark revolves around the concept of RDD which is a fault- tolerant collection of elements that can be operated in parallel. • There are two ways to create RDDs, it can be created by parallelizing an existing collection in your driver program, or referencing a dataset in an external storage system such as (HDFS, Hbase, or any datasource offering Hadoop format)
  • 3. RDDs & its Operations:- • There are basically two types of RDDs operations in spark. 1. Transformations. 2. Actions.
  • 4. Transformations • The RDD transformations are some functions that takes one RDD as input and form one or more than one RDD as an output . • As all RDDs are immutable then the main RDD will not be changed. • It is lazy operation though it creates some RDDs but they can executes when an action is called.
  • 5. Types of RDD Transformation: • To improve the computation performance, we can set some transformations as pipelined. It helps to optimize process. • There are two kinds of transformations: 1. Narrow Transformation 2. Wide Transformation
  • 6. Narrow Transformation • Narrow transformations are generated as a result of Map, Filter or these kind of operations • It originates from a single partition in a parent RDD . Only some partitions are used to find result.
  • 7. Wide Transformation • Wide Transformations are generated as a result of GroupBykey(), ReduceBykey() or these kind of operations. • In these case to form a data partition, it can take data from more than one partitions. • It is also known as shuffle partition.