Introducing Big Data

Pravin Singh
introducing
BIG DATA

WHAT THE HECK IS BIG DATA?
Any collection of data
sets so large and complex
that it becomes difficult to
process using current data
management tools or
traditional data processing
applications.
Volume
• Exceeds
physical limits
of vertical
scalability
Velocity
• Decision
window small
due to data
change rate
Variety
• Many different
formats make
integration
expensive

WHY SO FAST?
 Massive Parallel Processing
 Data Locality
 Optimized for write once – read many
 Sequential reads, not random access

Hello Hadoop!
You have an interesting name.

Hadoop Architecture
Source: Hortonworks

The Hadoop Zoo
HDFS
MapReduce
Pig Hive HCat Giraph Mahout
Zookeeper

The Real Simple Hadoop Architecture
MapReduce Engine
JobTracker
TaskTracker
1
TaskTracker
2
…
TaskTracker
N
HDFS Cluster
NameNode
DataNode
1
DataNode
2
…
DataNode
N

Hello HDFS!
Have we met before?

HDFS
My Data.txt
150 MB
64 MB
64 MB
22 MB
Name
Node
64 MB64 MB
64 MB64 MB
22 MB22 MB

Hello MapReduce!
Have you lost some weight?

MapReduce
Input
File
<Key, Value>
<Key, Value>
<Key, Value>
.
.
Shuffle
& Sort
<Key, Value>
<Key, Value>
<Key, Value>
.
.
Result

MapReduce
Big Data for
Dummies.txt
How many times the
words “Big data” and
“Hadoop” show up?

MapReduce
<Big data, 7>
<Hadoop, 4>
<Big data, 9>
<Hadoop, 6>
<Big data, 3>
<Hadoop, 8>
<Big data, 7>
<Big data, 9>
<Big data, 3>
<Hadoop, 4>
<Hadoop, 6>
<Hadoop, 8>
<Big data, 7, 9, 3>
<Hadoop, 4, 6, 8>
<Big data, 19>
<Hadoop, 18>

Let’s Play MapReduce!
’coz All Talk and No Play Makes Session a Dull Affair.

?
Questions. Comments. Feedback.

See you at the (Data) Lake Next Time.
THANK YOU!

Introducing Big Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Introducing Big Data

Similar to Introducing Big Data (20)

Recently uploaded

Recently uploaded (20)

Introducing Big Data

Editor's Notes