SlideShare a Scribd company logo
Dr. Sandeep G. Deshmukh
Introduction to
Contents
❑ Big Data
❑ Distributed Systems
❑ Hadoop
➢ Hadoop Distributed File System (HDFS)
➢ MapReduce
2
Show of Hands
Introduction to Big Data
Big data is data that exceeds the processing capacity of
conventional database systems.
The data is too big, moves too fast, or doesn’t fit the strictures of
your database architectures.
To gain value from this data, you must choose an alternative way
to process it.
https://www.oreilly.com/ideas/what-is-big-data
Definition
Quantity of data
Data sets too large to store and analyze
using traditional databases
Volume
Velocity
Speed at which data is generated
Speed at which data is moving around
and analyzed
Analyze data while it is being generated
without even putting it into databases
Variety
Different types of data that we can use
Veracity
Messiness or trustworthiness of the data
Volume makes up for quality
Eg. Tweets with spelling mistakes, short
words ( u -> you, thr-> there)
Value
Getting value out of Big Data!!!
Definition
“Big data” is
high-volume, -velocity and -variety information assets
that demand cost-effective, innovative forms of information processing
for enhanced insight and decision making
By Gartner
Definition
Big data is a term for
data sets that are so large or complex that traditional data processing applications
are inadequate
Challenges include analysis, capture, data curation, search,sharing, storage,
transfer, visualization, querying, updating and information privacy.
The term often refers simply to the use of predictive analytics or certain other
advanced methods to extract value from data, and seldom to a particular size of
data set.
Accuracy in big data may lead to more confident decision making, and better
decisions can result in greater operational efficiency, cost reduction and reduced
risk.
Wikipedia
Use Case: Big Data in Oil & Gas Drilling
http://analytics-magazine.org/images/stories/novdec12/big-data.jpg
Use Case: Uber - Pay Surge Pricing if Battery is Low
● A Brief History of Big Data Everyone Should Read
● Beyond Volume, Variety and Velocity is the Issue of Big Data Veracity
● What is big data? - OpenSource.com
● What is big data? - O’Reilly
● 5 Big Data Use Cases To Watch
● Best Big Data Analytics Use Cases
● The 5 game changing big data use cases
● Big Data - The 5 Vs Everyone Must Know
● Top SlideShare Presentations on Big Data
Further Reading
Distributed Systems
A distributed system is a collection of independent computers that appears to
its users as a single coherent system.
Distributed Systems: Principles and Paradigms, 2nd Edition, Andrew S. Tanenbaum, Maarten Van Steen, 2006
http://www.mypearsonstore.com/bookstore/distributed-systems-principles-and-paradigms-9780132392273?xid=PSED
Definition
Distributed Systems: Principles and Paradigms, 2nd Edition, Andrew S. Tanenbaum, Maarten Van Steen, 2006
Transparency Description
Access Hide differences in data representation and how a resource is accessed
Location Hide where a resource is located
Migration Hide that a resource may move to another location
Relocation Hide that a resource may be moved to another location while in use
Replication Hide that a resource is replicated
Concurrency Hide that a resource may be shared by several competitive users
Failure Hide the failure and recovery of a resource
Forms of Transparency in Distributed Systems
● A distributed system consists of components (i.e., computers) that are autonomous
● Users (be they people or programs) think they are dealing with a single system. This means that one way or
the other the autonomous components need to collaborate. How to establish this collaboration lies at the
heart of developing distributed systems.
A distributed system is a model in which components located on networked
computers communicate and coordinate their actions by passing messages.
The components interact with each other in order to achieve a common goal.
Three significant characteristics of distributed systems are: concurrency of
components, lack of a global clock, and independent failure of components.
Wikipedia
https://www.oreilly.com/ideas/what-is-big-data
Definition
● Distributed Computing - Wikipedia
● Distributed computing
● Characteristics of distributed system
Further Reading
Miscellaneous Concepts
Big Data Primers: Size does matter
Big Data Primers: Vertical Vs Horizontal Scaling
Vertical Scaling Horizontal Scaling
Big Data Primers: The scale of infrastructure

More Related Content

What's hot

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Haluan Irsad
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Simplilearn
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Vipin Batra
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
Dan Gunter
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big Data
Bernard Marr
 
Hadoop
Hadoop Hadoop
Hadoop
ABHIJEET RAJ
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
Prof .Pragati Khade
 
Apache PIG
Apache PIGApache PIG
Apache PIG
Prashant Gupta
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
Harshdeep Kaur
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
Rahul Agarwal
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
SujaAldrin
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File Systemelliando dias
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data mining
MITS Gwalior
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
Aswadmehar
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Stanley Wang
 
Big Data
Big DataBig Data

What's hot (20)

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big Data
 
Hadoop
Hadoop Hadoop
Hadoop
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
 
Apache PIG
Apache PIGApache PIG
Apache PIG
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
HDFS Architecture
HDFS ArchitectureHDFS Architecture
HDFS Architecture
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data mining
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Big Data
Big DataBig Data
Big Data
 

Viewers also liked

Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Apache Apex
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
Apache Apex
 
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and EnrichmentIngesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Apache Apex
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)
Apache Apex
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
Hortonworks
 
Big data ppt
Big data pptBig data ppt
Big data ppt
IDBI Bank Ltd.
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
Bernard Marr
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
Fault-Tolerant File Input & Output
Fault-Tolerant File Input & OutputFault-Tolerant File Input & Output
Fault-Tolerant File Input & Output
Apache Apex
 
Hortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopHortonworks Big Data & Hadoop
Hortonworks Big Data & Hadoop
Mark Ginnebaugh
 
Apache Apex Kafka Input Operator
Apache Apex Kafka Input OperatorApache Apex Kafka Input Operator
Apache Apex Kafka Input Operator
Apache Apex
 
Hortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1
Tugdual Grall
 
Apache Hadoop 0.23
Apache Hadoop 0.23Apache Hadoop 0.23
Apache Hadoop 0.23
Hortonworks
 
Hadoop crashcourse v3
Hadoop crashcourse v3Hadoop crashcourse v3
Hadoop crashcourse v3
Hortonworks
 
Big data Overview
Big data OverviewBig data Overview
Big data Overview
Arnon Rotem-Gal-Oz
 
The Power of Data Insights - Big Data as the Fuel and Analytics as the Engine...
The Power of Data Insights - Big Data as the Fuel and Analytics as the Engine...The Power of Data Insights - Big Data as the Fuel and Analytics as the Engine...
The Power of Data Insights - Big Data as the Fuel and Analytics as the Engine...
Prof. Dr. Diego Kuonen
 
Big Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudBig Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS Cloud
Amazon Web Services
 
Big data characteristics, value chain and challenges
Big data characteristics, value chain and challengesBig data characteristics, value chain and challenges
Big data characteristics, value chain and challenges
Musfiqur Rahman
 

Viewers also liked (20)

Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and EnrichmentIngesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Fault-Tolerant File Input & Output
Fault-Tolerant File Input & OutputFault-Tolerant File Input & Output
Fault-Tolerant File Input & Output
 
Hortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopHortonworks Big Data & Hadoop
Hortonworks Big Data & Hadoop
 
Apache Apex Kafka Input Operator
Apache Apex Kafka Input OperatorApache Apex Kafka Input Operator
Apache Apex Kafka Input Operator
 
Hortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data London
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1
 
Apache Hadoop 0.23
Apache Hadoop 0.23Apache Hadoop 0.23
Apache Hadoop 0.23
 
Hadoop crashcourse v3
Hadoop crashcourse v3Hadoop crashcourse v3
Hadoop crashcourse v3
 
Big data Overview
Big data OverviewBig data Overview
Big data Overview
 
The Power of Data Insights - Big Data as the Fuel and Analytics as the Engine...
The Power of Data Insights - Big Data as the Fuel and Analytics as the Engine...The Power of Data Insights - Big Data as the Fuel and Analytics as the Engine...
The Power of Data Insights - Big Data as the Fuel and Analytics as the Engine...
 
Big Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudBig Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS Cloud
 
Big data characteristics, value chain and challenges
Big data characteristics, value chain and challengesBig data characteristics, value chain and challenges
Big data characteristics, value chain and challenges
 

Similar to Intro to Big Data Hadoop

How do data analysts work with big data and distributed computing frameworks.pdf
How do data analysts work with big data and distributed computing frameworks.pdfHow do data analysts work with big data and distributed computing frameworks.pdf
How do data analysts work with big data and distributed computing frameworks.pdf
Soumodeep Nanee Kundu
 
Big data
Big dataBig data
Big data
Sakshi Chawla
 
Unit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptxUnit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptx
subhashchandra197
 
Security for Big Data
Security for Big DataSecurity for Big Data
Security for Big Data
Joey Jablonski
 
Characterizing and Processing of Big Data Using Data Mining Techniques
Characterizing and Processing of Big Data Using Data Mining TechniquesCharacterizing and Processing of Big Data Using Data Mining Techniques
Characterizing and Processing of Big Data Using Data Mining Techniques
IJTET Journal
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
Poonam Kshirsagar
 
using big-data methods analyse the Cross platform aviation
 using big-data methods analyse the Cross platform aviation using big-data methods analyse the Cross platform aviation
using big-data methods analyse the Cross platform aviation
ranjit banshpal
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
IJSRD
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
IJSRD
 
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptxUnit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
YashiBatra1
 
Unit 1
Unit 1Unit 1
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
ijtsrd
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
IRJET Journal
 
pole2016-A-Recent-Study-of-Emerging-Tools.pdf
pole2016-A-Recent-Study-of-Emerging-Tools.pdfpole2016-A-Recent-Study-of-Emerging-Tools.pdf
pole2016-A-Recent-Study-of-Emerging-Tools.pdf
Akuhuruf
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
almaraniabwmalk
 
Big Data
Big DataBig Data
Big Data
Kirubaburi R
 
M.Florence Dayana
M.Florence DayanaM.Florence Dayana
M.Florence Dayana
Dr.Florence Dayana
 
Bar camp bigdata
Bar camp bigdataBar camp bigdata
Bar camp bigdata
Uppisatish Ag
 
Seminar presentation
Seminar presentationSeminar presentation
Seminar presentation
Klawal13
 

Similar to Intro to Big Data Hadoop (20)

How do data analysts work with big data and distributed computing frameworks.pdf
How do data analysts work with big data and distributed computing frameworks.pdfHow do data analysts work with big data and distributed computing frameworks.pdf
How do data analysts work with big data and distributed computing frameworks.pdf
 
Big data
Big dataBig data
Big data
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Unit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptxUnit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptx
 
Security for Big Data
Security for Big DataSecurity for Big Data
Security for Big Data
 
Characterizing and Processing of Big Data Using Data Mining Techniques
Characterizing and Processing of Big Data Using Data Mining TechniquesCharacterizing and Processing of Big Data Using Data Mining Techniques
Characterizing and Processing of Big Data Using Data Mining Techniques
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
using big-data methods analyse the Cross platform aviation
 using big-data methods analyse the Cross platform aviation using big-data methods analyse the Cross platform aviation
using big-data methods analyse the Cross platform aviation
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
 
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptxUnit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
 
Unit 1
Unit 1Unit 1
Unit 1
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
 
pole2016-A-Recent-Study-of-Emerging-Tools.pdf
pole2016-A-Recent-Study-of-Emerging-Tools.pdfpole2016-A-Recent-Study-of-Emerging-Tools.pdf
pole2016-A-Recent-Study-of-Emerging-Tools.pdf
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Big Data
Big DataBig Data
Big Data
 
M.Florence Dayana
M.Florence DayanaM.Florence Dayana
M.Florence Dayana
 
Bar camp bigdata
Bar camp bigdataBar camp bigdata
Bar camp bigdata
 
Seminar presentation
Seminar presentationSeminar presentation
Seminar presentation
 

More from Apache Apex

Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
Apache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
Apache Apex
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Apache Apex
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Apex
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big Data
Apache Apex
 
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentDeep Dive into Apache Apex App Development
Deep Dive into Apache Apex App Development
Apache Apex
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
Apache Apex
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data Processing
Apache Apex
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
Apache Apex
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
Apache Apex
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
Apache Apex
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Apache Apex
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Apache Apex
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
Apache Apex
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
Apache Apex
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Apache Apex
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
Apache Beam (incubating)
Apache Beam (incubating)Apache Beam (incubating)
Apache Beam (incubating)
Apache Apex
 

More from Apache Apex (20)

Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big Data
 
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentDeep Dive into Apache Apex App Development
Deep Dive into Apache Apex App Development
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data Processing
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
Apache Beam (incubating)
Apache Beam (incubating)Apache Beam (incubating)
Apache Beam (incubating)
 

Recently uploaded

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 

Recently uploaded (20)

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 

Intro to Big Data Hadoop

  • 1. Dr. Sandeep G. Deshmukh Introduction to
  • 2. Contents ❑ Big Data ❑ Distributed Systems ❑ Hadoop ➢ Hadoop Distributed File System (HDFS) ➢ MapReduce 2
  • 5. Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it. https://www.oreilly.com/ideas/what-is-big-data Definition
  • 6. Quantity of data Data sets too large to store and analyze using traditional databases Volume
  • 7. Velocity Speed at which data is generated Speed at which data is moving around and analyzed Analyze data while it is being generated without even putting it into databases
  • 8. Variety Different types of data that we can use
  • 9. Veracity Messiness or trustworthiness of the data Volume makes up for quality Eg. Tweets with spelling mistakes, short words ( u -> you, thr-> there)
  • 10. Value Getting value out of Big Data!!!
  • 11. Definition “Big data” is high-volume, -velocity and -variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making By Gartner
  • 12. Definition Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate Challenges include analysis, capture, data curation, search,sharing, storage, transfer, visualization, querying, updating and information privacy. The term often refers simply to the use of predictive analytics or certain other advanced methods to extract value from data, and seldom to a particular size of data set. Accuracy in big data may lead to more confident decision making, and better decisions can result in greater operational efficiency, cost reduction and reduced risk. Wikipedia
  • 13. Use Case: Big Data in Oil & Gas Drilling http://analytics-magazine.org/images/stories/novdec12/big-data.jpg
  • 14. Use Case: Uber - Pay Surge Pricing if Battery is Low
  • 15. ● A Brief History of Big Data Everyone Should Read ● Beyond Volume, Variety and Velocity is the Issue of Big Data Veracity ● What is big data? - OpenSource.com ● What is big data? - O’Reilly ● 5 Big Data Use Cases To Watch ● Best Big Data Analytics Use Cases ● The 5 game changing big data use cases ● Big Data - The 5 Vs Everyone Must Know ● Top SlideShare Presentations on Big Data Further Reading
  • 17. A distributed system is a collection of independent computers that appears to its users as a single coherent system. Distributed Systems: Principles and Paradigms, 2nd Edition, Andrew S. Tanenbaum, Maarten Van Steen, 2006 http://www.mypearsonstore.com/bookstore/distributed-systems-principles-and-paradigms-9780132392273?xid=PSED Definition
  • 18. Distributed Systems: Principles and Paradigms, 2nd Edition, Andrew S. Tanenbaum, Maarten Van Steen, 2006
  • 19. Transparency Description Access Hide differences in data representation and how a resource is accessed Location Hide where a resource is located Migration Hide that a resource may move to another location Relocation Hide that a resource may be moved to another location while in use Replication Hide that a resource is replicated Concurrency Hide that a resource may be shared by several competitive users Failure Hide the failure and recovery of a resource Forms of Transparency in Distributed Systems
  • 20. ● A distributed system consists of components (i.e., computers) that are autonomous ● Users (be they people or programs) think they are dealing with a single system. This means that one way or the other the autonomous components need to collaborate. How to establish this collaboration lies at the heart of developing distributed systems.
  • 21. A distributed system is a model in which components located on networked computers communicate and coordinate their actions by passing messages. The components interact with each other in order to achieve a common goal. Three significant characteristics of distributed systems are: concurrency of components, lack of a global clock, and independent failure of components. Wikipedia https://www.oreilly.com/ideas/what-is-big-data Definition
  • 22. ● Distributed Computing - Wikipedia ● Distributed computing ● Characteristics of distributed system Further Reading
  • 24. Big Data Primers: Size does matter
  • 25. Big Data Primers: Vertical Vs Horizontal Scaling Vertical Scaling Horizontal Scaling
  • 26. Big Data Primers: The scale of infrastructure