Real-Time Big Data

Real-Time Big Data
handarusakti@gmail.com

Bussiness Data
• Structured data
• Unstructured data (not less important than
structured data)

Objectives
• Depend on our context
• Objective first, plan later

These Three Trends
• A shift to scalable, elastic computing
infrastructure.
• An explosion in the complexity and variety of
data available.
• The power and value that come from
combining disparate data for comprehensive
analysis.

What is Hadoop?
• A file store, HDFS (Hadoop Distributed File
System)
• A distributed processing system:
– 1.0: MapReduce
– 2.0: Yarn (a distributed operating system)
• Process comes to data

HDFS
• Designed to distributing store very large data
sets reliably, and to stream those data sets at
high bandwidth to distributing computation
• HDFS Comics

YARN
• A cluster management technology
• YARN combines a central resource manager
that reconciles the way applications use
Hadoop system resources with node
manager agents that monitor the processing
operations of individual cluster nodes

Spark
• Doing large scale stream processing
• Achieve low latency
• Comparasion:
– Spark Streaming: 670k records/second/node
– Storm: 115k records/second/node
– Apache S4: 7.5k records/second/node

Spark
• Spark offers an integrated framework for
advanced analytics, including a machine
learning library (MLLib), a graph engine
(GraphX), a streaming analytics engine (Spark
Streaming) and a fast interactive query tool
(Shark)

Flume
• A distributed, reliable, and available service for
efficiently collecting, aggregating, and moving
large amounts of streaming data into the Hadoop
Distributed File System (HDFS)
• It has a simple and flexible architecture based on
streaming data flows; and is robust and fault
tolerant with tunable reliability mechanisms for
failover and recovery

Sqoop
• A tool designed for efficiently transferring bulk
data between Hadoop and structured
datastores such as relational databases

RT-BigData Proposal
Log Flume
RDBMS Sqoop HDFS
S
Spark Streaming
Shark GraphXMLLib
Dashboards
Spark
Spark SQL
MESOS

Images taken from:
• http://www.datameer.com/images/product/big_data_
hadoop/img_bigdata.png
• http://www.kdnuggets.com/websites/cartoons.html
• http://www.alexjf.net/blog/distributed-
systems/hadoop-yarn-installation-definitive-guide
• http://hadoop.apache.org/docs/r1.2.1/hdfs_design.ht
ml
• http://hortonworks.com/hadoop/yarn/

Real-Time Big Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Real-Time Big Data

Similar to Real-Time Big Data (20)

More from Handaru Sakti

More from Handaru Sakti (15)

Recently uploaded

Recently uploaded (20)

Real-Time Big Data