Rdd transformations bda

•Download as PPTX, PDF•

0 likes•56 views

This document discusses RDD transformations in Spark. It defines RDD as a resilient distributed dataset, which are fault-tolerant collections of elements that can be operated on in parallel. There are two types of RDD operations: transformations and actions. Transformations are lazy operations that take an RDD as input and return one or more RDDs as output, without changing the original RDD. Transformations are either narrow, operating only on a single partition, or wide, taking data from multiple partitions and shuffling the data.

Technology

What is RDD?
• RDD means Resilient distributed dataset.
• Spark revolves around the concept of RDD which is a fault-
tolerant collection of elements that can be operated in parallel.
• There are two ways to create RDDs, it can be created by
parallelizing an existing collection in your driver program, or
referencing a dataset in an external storage system such as
(HDFS, Hbase, or any datasource offering Hadoop format)

RDDs & its Operations:-
• There are basically two types of RDDs operations in spark.
1. Transformations.
2. Actions.

Transformations
• The RDD transformations are some functions that takes one
RDD as input and form one or more than one RDD as an
output .
• As all RDDs are immutable then the main RDD will not be
changed.
• It is lazy operation though it creates some RDDs but they can
executes when an action is called.

Types of RDD Transformation:
• To improve the computation performance, we can set some
transformations as pipelined. It helps to optimize process.
• There are two kinds of transformations:
1. Narrow Transformation
2. Wide Transformation

Narrow Transformation
• Narrow transformations are
generated as a result of
Map, Filter or these kind of
operations
• It originates from a single
partition in a parent RDD .
Only some partitions are
used to find result.

Wide Transformation
• Wide Transformations are
generated as a result of
GroupBykey(),
ReduceBykey() or these kind
of operations.
• In these case to form a data
partition, it can take data from
more than one partitions.
• It is also known as shuffle
partition.

Similar to Rdd transformations bda

WHAT IS HADOOP AND ITS COMPONENTS? nakshatraL

SparkHeena Madan

SPARK ARCHITECTUREGauravBiswas9

Geek Night - Functional Data Processing using Spark and ScalaAtif Akhtar

Unit II Real Time Data Processing tools.pptxRahul Borate

Spark Driven Big Data Analyticsinoshg

Secrets of Spark's success - Deenar Toraskar, Think Reactive huguk

Reactive dashboard’s using apache sparkRahul Kumar

Apache Spark on HDinsight TrainingSynergetics Learning and Cloud Consulting

Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014cdmaxime

JDG 7 & Spark IntegrationTed Won

APACHE SPARK.pptxDeepaThirumurugan

Programming in Spark using PySpark Mostafa

Spark cluster computing with working setsJinxinTang

Introduction to Apache Spark Juan Pedro Moreno

Big data overviewbeCloudReady

Apache Spark CoreGirish Khanzode

Bigdata and Hadoop with Dockerharidasnss

Apache Spark for BeginnersAnirudh

Resilient Distributed DataSets - Apache SPARKTaposh Roy

Similar to Rdd transformations bda (20)

WHAT IS HADOOP AND ITS COMPONENTS?

Spark

SPARK ARCHITECTURE

Geek Night - Functional Data Processing using Spark and Scala

Unit II Real Time Data Processing tools.pptx

Spark Driven Big Data Analytics

Secrets of Spark's success - Deenar Toraskar, Think Reactive

Reactive dashboard’s using apache spark

Apache Spark on HDinsight Training

Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014

JDG 7 & Spark Integration

APACHE SPARK.pptx

Programming in Spark using PySpark

Spark cluster computing with working sets

Introduction to Apache Spark

Big data overview

Apache Spark Core

Bigdata and Hadoop with Docker

Apache Spark for Beginners

Resilient Distributed DataSets - Apache SPARK

Recently uploaded

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Manulife - Insurer Transformation Award 2024The Digital Insurer

MINDCTI Revenue Release Quarter One 2024MIND CTI

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Ransomware_Q4_2023. The report. [EN].pdfOverkill Security

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

How to Troubleshoot Apps for the Modern Connected Worker

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Strategies for Landing an Oracle DBA Job as a Fresher

Manulife - Insurer Transformation Award 2024

MINDCTI Revenue Release Quarter One 2024

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Automating Google Workspace (GWS) & more with Apps Script

Apidays New York 2024 - The value of a flexible API Management solution for O...

Boost Fertility New Invention Ups Success Rates.pdf

Artificial Intelligence Chap.5 : Uncertainty

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Ransomware_Q4_2023. The report. [EN].pdf

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Axa Assurance Maroc - Insurer Innovation Award 2024

Rdd transformations bda

1. GANDHINAGAR INSTITUTE OF TECHNOLGY Information Technology Department RDD Transformations Presented By:-Shaishav Shah Student ID: GIT_IT_B_21 Guided By Prof. Pooja Shah BDA (2171607)

2. What is RDD? • RDD means Resilient distributed dataset. • Spark revolves around the concept of RDD which is a fault- tolerant collection of elements that can be operated in parallel. • There are two ways to create RDDs, it can be created by parallelizing an existing collection in your driver program, or referencing a dataset in an external storage system such as (HDFS, Hbase, or any datasource offering Hadoop format)

3. RDDs & its Operations:- • There are basically two types of RDDs operations in spark. 1. Transformations. 2. Actions.

4. Transformations • The RDD transformations are some functions that takes one RDD as input and form one or more than one RDD as an output . • As all RDDs are immutable then the main RDD will not be changed. • It is lazy operation though it creates some RDDs but they can executes when an action is called.

5. Types of RDD Transformation: • To improve the computation performance, we can set some transformations as pipelined. It helps to optimize process. • There are two kinds of transformations: 1. Narrow Transformation 2. Wide Transformation

6. Narrow Transformation • Narrow transformations are generated as a result of Map, Filter or these kind of operations • It originates from a single partition in a parent RDD . Only some partitions are used to find result.

7. Wide Transformation • Wide Transformations are generated as a result of GroupBykey(), ReduceBykey() or these kind of operations. • In these case to form a data partition, it can take data from more than one partitions. • It is also known as shuffle partition.

8. Thank You

Rdd transformations bda

Recommended

Recommended

More Related Content

Similar to Rdd transformations bda

Similar to Rdd transformations bda (20)

More from ShaishavShah8

More from ShaishavShah8 (18)

Recently uploaded

Recently uploaded (20)

Rdd transformations bda