SlideShare a Scribd company logo
1 of 21
Real-Time Big Data
handarusakti@gmail.com
What is Big Data?
Bussiness Data
• Structured data
• Unstructured data (not less important than
structured data)
Data Analysis:
Predictive Analysis
Objectives
• Depend on our context
• Objective first, plan later
These Three Trends
• A shift to scalable, elastic computing
infrastructure.
• An explosion in the complexity and variety of
data available.
• The power and value that come from
combining disparate data for comprehensive
analysis.
What is Hadoop?
• A file store, HDFS (Hadoop Distributed File
System)
• A distributed processing system:
– 1.0: MapReduce
– 2.0: Yarn (a distributed operating system)
• Process comes to data
Hadoop 1.0 vs. Hadoop 2.0
HDFS
• Designed to distributing store very large data
sets reliably, and to stream those data sets at
high bandwidth to distributing computation
• HDFS Comics
YARN
• A cluster management technology
• YARN combines a central resource manager
that reconciles the way applications use
Hadoop system resources with node
manager agents that monitor the processing
operations of individual cluster nodes
YARN
Spark
• Doing large scale stream processing
• Achieve low latency
• Comparasion:
– Spark Streaming: 670k records/second/node
– Storm: 115k records/second/node
– Apache S4: 7.5k records/second/node
Spark
• Spark offers an integrated framework for
advanced analytics, including a machine
learning library (MLLib), a graph engine
(GraphX), a streaming analytics engine (Spark
Streaming) and a fast interactive query tool
(Shark)
Spark
Flume
• A distributed, reliable, and available service for
efficiently collecting, aggregating, and moving
large amounts of streaming data into the Hadoop
Distributed File System (HDFS)
• It has a simple and flexible architecture based on
streaming data flows; and is robust and fault
tolerant with tunable reliability mechanisms for
failover and recovery
Sqoop
• A tool designed for efficiently transferring bulk
data between Hadoop and structured
datastores such as relational databases
RT-BigData Proposal
Log Flume
RDBMS Sqoop HDFS
S
Spark Streaming
Shark GraphXMLLib
Dashboards
Spark
Spark SQL
MESOS
Images taken from:
• http://www.datameer.com/images/product/big_data_
hadoop/img_bigdata.png
• http://www.kdnuggets.com/websites/cartoons.html
• http://www.alexjf.net/blog/distributed-
systems/hadoop-yarn-installation-definitive-guide
• http://hadoop.apache.org/docs/r1.2.1/hdfs_design.ht
ml
• http://hortonworks.com/hadoop/yarn/

More Related Content

What's hot

The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
Zubair Nabi
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analytics
templedf
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
Mark Kromer
 

What's hot (20)

Analytics 3
Analytics 3Analytics 3
Analytics 3
 
Big data analysis using hadoop cluster
Big data analysis using hadoop clusterBig data analysis using hadoop cluster
Big data analysis using hadoop cluster
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & Hadoop
 
Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster Services
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Big Data Technologies - Hadoop
Big Data Technologies - HadoopBig Data Technologies - Hadoop
Big Data Technologies - Hadoop
 
Building a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with RBuilding a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with R
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoop
 
Obfuscating LinkedIn Member Data
Obfuscating LinkedIn Member DataObfuscating LinkedIn Member Data
Obfuscating LinkedIn Member Data
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analytics
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Teradata Loom Introductory Presentation
Teradata Loom Introductory PresentationTeradata Loom Introductory Presentation
Teradata Loom Introductory Presentation
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 

Similar to Real-Time Big Data

Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
Farzad Nozarian
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
David Smelker
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
Taldor Group
 
Константин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Константин Швачко, Yahoo!, - Scaling Storage and Computation with HadoopКонстантин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Константин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Media Gorod
 

Similar to Real-Time Big Data (20)

MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering students
 
Apache-Hadoop-Slides.pptx
Apache-Hadoop-Slides.pptxApache-Hadoop-Slides.pptx
Apache-Hadoop-Slides.pptx
 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoop
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
The Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersThe Hadoop Ecosystem for Developers
The Hadoop Ecosystem for Developers
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
VTU 6th Sem Elective CSE - Module 4 cloud computing
VTU 6th Sem Elective CSE - Module 4  cloud computingVTU 6th Sem Elective CSE - Module 4  cloud computing
VTU 6th Sem Elective CSE - Module 4 cloud computing
 
module4-cloudcomputing-180131071200.pdf
module4-cloudcomputing-180131071200.pdfmodule4-cloudcomputing-180131071200.pdf
module4-cloudcomputing-180131071200.pdf
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
 
Anju
AnjuAnju
Anju
 
Big data applications
Big data applicationsBig data applications
Big data applications
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
 
Константин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Константин Швачко, Yahoo!, - Scaling Storage and Computation with HadoopКонстантин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Константин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 

More from Handaru Sakti

Android career opportunities
Android career opportunitiesAndroid career opportunities
Android career opportunities
Handaru Sakti
 
Fisikawan dan Dunia Kerja
Fisikawan dan Dunia KerjaFisikawan dan Dunia Kerja
Fisikawan dan Dunia Kerja
Handaru Sakti
 

More from Handaru Sakti (15)

Game Theory of Oligopolistic Pricing Strategies
Game Theory of  Oligopolistic Pricing StrategiesGame Theory of  Oligopolistic Pricing Strategies
Game Theory of Oligopolistic Pricing Strategies
 
Innovation management
Innovation managementInnovation management
Innovation management
 
Product Design Language System
Product Design Language SystemProduct Design Language System
Product Design Language System
 
IES Triangle Principle
IES Triangle PrincipleIES Triangle Principle
IES Triangle Principle
 
Business Model Canvas
Business Model CanvasBusiness Model Canvas
Business Model Canvas
 
Transition management of product as platform
Transition management of  product as platformTransition management of  product as platform
Transition management of product as platform
 
My Storial
My StorialMy Storial
My Storial
 
Storial - Be Storyteller
Storial - Be StorytellerStorial - Be Storyteller
Storial - Be Storyteller
 
Mobile App Trends in 2016
Mobile App Trends in 2016Mobile App Trends in 2016
Mobile App Trends in 2016
 
Why Functional Programming Is Important in Big Data Era
Why Functional Programming Is Important in Big Data EraWhy Functional Programming Is Important in Big Data Era
Why Functional Programming Is Important in Big Data Era
 
Android career opportunities
Android career opportunitiesAndroid career opportunities
Android career opportunities
 
Loader
LoaderLoader
Loader
 
Android Support Package
Android Support PackageAndroid Support Package
Android Support Package
 
Fisikawan dan Dunia Kerja
Fisikawan dan Dunia KerjaFisikawan dan Dunia Kerja
Fisikawan dan Dunia Kerja
 
SAH2H PPT
SAH2H PPTSAH2H PPT
SAH2H PPT
 

Recently uploaded

NewBase 24 May 2024 Energy News issue - 1727 by Khaled Al Awadi_compresse...
NewBase   24 May  2024  Energy News issue - 1727 by Khaled Al Awadi_compresse...NewBase   24 May  2024  Energy News issue - 1727 by Khaled Al Awadi_compresse...
NewBase 24 May 2024 Energy News issue - 1727 by Khaled Al Awadi_compresse...
Khaled Al Awadi
 
zidauu _business communication.pptx /pdf
zidauu _business  communication.pptx /pdfzidauu _business  communication.pptx /pdf
zidauu _business communication.pptx /pdf
zukhrafshabbir
 

Recently uploaded (20)

Inside the Black Box of Venture Capital (VC)
Inside the Black Box of Venture Capital (VC)Inside the Black Box of Venture Capital (VC)
Inside the Black Box of Venture Capital (VC)
 
NewBase 24 May 2024 Energy News issue - 1727 by Khaled Al Awadi_compresse...
NewBase   24 May  2024  Energy News issue - 1727 by Khaled Al Awadi_compresse...NewBase   24 May  2024  Energy News issue - 1727 by Khaled Al Awadi_compresse...
NewBase 24 May 2024 Energy News issue - 1727 by Khaled Al Awadi_compresse...
 
Powers and Functions of CPCB - The Water Act 1974.pdf
Powers and Functions of CPCB - The Water Act 1974.pdfPowers and Functions of CPCB - The Water Act 1974.pdf
Powers and Functions of CPCB - The Water Act 1974.pdf
 
MichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdfMichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdf
 
Special Purpose Vehicle (Purpose, Formation & examples)
Special Purpose Vehicle (Purpose, Formation & examples)Special Purpose Vehicle (Purpose, Formation & examples)
Special Purpose Vehicle (Purpose, Formation & examples)
 
Copyright: What Creators and Users of Art Need to Know
Copyright: What Creators and Users of Art Need to KnowCopyright: What Creators and Users of Art Need to Know
Copyright: What Creators and Users of Art Need to Know
 
New Product Development.kjiy7ggbfdsddggo9lo
New Product Development.kjiy7ggbfdsddggo9loNew Product Development.kjiy7ggbfdsddggo9lo
New Product Development.kjiy7ggbfdsddggo9lo
 
Event Report - IBM Think 2024 - It is all about AI and hybrid
Event Report - IBM Think 2024 - It is all about AI and hybridEvent Report - IBM Think 2024 - It is all about AI and hybrid
Event Report - IBM Think 2024 - It is all about AI and hybrid
 
The Truth About Dinesh Bafna's Situation.pdf
The Truth About Dinesh Bafna's Situation.pdfThe Truth About Dinesh Bafna's Situation.pdf
The Truth About Dinesh Bafna's Situation.pdf
 
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptxBlinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
 
FEXLE- Salesforce Field Service Lightning
FEXLE- Salesforce Field Service LightningFEXLE- Salesforce Field Service Lightning
FEXLE- Salesforce Field Service Lightning
 
8 Questions B2B Commercial Teams Can Ask To Help Product Discovery
8 Questions B2B Commercial Teams Can Ask To Help Product Discovery8 Questions B2B Commercial Teams Can Ask To Help Product Discovery
8 Questions B2B Commercial Teams Can Ask To Help Product Discovery
 
Elevate Your Online Presence with SEO Services
Elevate Your Online Presence with SEO ServicesElevate Your Online Presence with SEO Services
Elevate Your Online Presence with SEO Services
 
How Do Venture Capitalists Make Decisions?
How Do Venture Capitalists Make Decisions?How Do Venture Capitalists Make Decisions?
How Do Venture Capitalists Make Decisions?
 
PitchBook’s Guide to VC Funding for Startups
PitchBook’s Guide to VC Funding for StartupsPitchBook’s Guide to VC Funding for Startups
PitchBook’s Guide to VC Funding for Startups
 
Unleash Data Power with EnFuse Solutions' Comprehensive Data Management Servi...
Unleash Data Power with EnFuse Solutions' Comprehensive Data Management Servi...Unleash Data Power with EnFuse Solutions' Comprehensive Data Management Servi...
Unleash Data Power with EnFuse Solutions' Comprehensive Data Management Servi...
 
zidauu _business communication.pptx /pdf
zidauu _business  communication.pptx /pdfzidauu _business  communication.pptx /pdf
zidauu _business communication.pptx /pdf
 
Unveiling the Dynamic Gemini_ Personality Traits and Sign Dates.pptx
Unveiling the Dynamic Gemini_ Personality Traits and Sign Dates.pptxUnveiling the Dynamic Gemini_ Personality Traits and Sign Dates.pptx
Unveiling the Dynamic Gemini_ Personality Traits and Sign Dates.pptx
 
LinkedIn Masterclass Techweek 2024 v4.1.pptx
LinkedIn Masterclass Techweek 2024 v4.1.pptxLinkedIn Masterclass Techweek 2024 v4.1.pptx
LinkedIn Masterclass Techweek 2024 v4.1.pptx
 
Potato Flakes Manufacturing Plant Project Report.pdf
Potato Flakes Manufacturing Plant Project Report.pdfPotato Flakes Manufacturing Plant Project Report.pdf
Potato Flakes Manufacturing Plant Project Report.pdf
 

Real-Time Big Data

  • 2. What is Big Data?
  • 3. Bussiness Data • Structured data • Unstructured data (not less important than structured data)
  • 6. Objectives • Depend on our context • Objective first, plan later
  • 7. These Three Trends • A shift to scalable, elastic computing infrastructure. • An explosion in the complexity and variety of data available. • The power and value that come from combining disparate data for comprehensive analysis.
  • 8. What is Hadoop? • A file store, HDFS (Hadoop Distributed File System) • A distributed processing system: – 1.0: MapReduce – 2.0: Yarn (a distributed operating system) • Process comes to data
  • 9. Hadoop 1.0 vs. Hadoop 2.0
  • 10. HDFS • Designed to distributing store very large data sets reliably, and to stream those data sets at high bandwidth to distributing computation • HDFS Comics
  • 11.
  • 12. YARN • A cluster management technology • YARN combines a central resource manager that reconciles the way applications use Hadoop system resources with node manager agents that monitor the processing operations of individual cluster nodes
  • 13. YARN
  • 14. Spark • Doing large scale stream processing • Achieve low latency • Comparasion: – Spark Streaming: 670k records/second/node – Storm: 115k records/second/node – Apache S4: 7.5k records/second/node
  • 15. Spark • Spark offers an integrated framework for advanced analytics, including a machine learning library (MLLib), a graph engine (GraphX), a streaming analytics engine (Spark Streaming) and a fast interactive query tool (Shark)
  • 16. Spark
  • 17. Flume • A distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming data into the Hadoop Distributed File System (HDFS) • It has a simple and flexible architecture based on streaming data flows; and is robust and fault tolerant with tunable reliability mechanisms for failover and recovery
  • 18. Sqoop • A tool designed for efficiently transferring bulk data between Hadoop and structured datastores such as relational databases
  • 19. RT-BigData Proposal Log Flume RDBMS Sqoop HDFS S Spark Streaming Shark GraphXMLLib Dashboards Spark Spark SQL MESOS
  • 20.
  • 21. Images taken from: • http://www.datameer.com/images/product/big_data_ hadoop/img_bigdata.png • http://www.kdnuggets.com/websites/cartoons.html • http://www.alexjf.net/blog/distributed- systems/hadoop-yarn-installation-definitive-guide • http://hadoop.apache.org/docs/r1.2.1/hdfs_design.ht ml • http://hortonworks.com/hadoop/yarn/