SlideShare a Scribd company logo
HADOOP: 
PAST, 
PRESENT 
AND 
FUTURE 
BIG 
DATA 
INTELLIGENCE 
PRACTICE 
© 
2014 
Trace3, 
All 
rights 
reserved.
Roadmap 
© 
2014 
Trace3, 
All 
rights 
reserved. 
1 
~1 
hour 
1-­‐ 
What 
Makes 
Up 
Hadoop 
1.x? 
2-­‐ 
What’s 
New 
In 
Hadoop 
2.x? 
3-­‐ 
The 
Future 
Of 
Hadoop 
…
WHAT 
MAKES 
UP 
HADOOP 
1.0? 
© 
2014 
Trace3, 
All 
rights 
reserved.
What’s 
a 
“Node”? 
Processes 
/ 
Daemons 
/ 
Services 
© 
2014 
Trace3, 
All 
rights 
reserved. 
Node 
aka 
Server 
OperaZng 
System 
Compute 
Storage 
Memory
Hadoop 
1.0: 
HDFS 
+ 
MapReduce 
© 
2014 
Trace3, 
All 
rights 
reserved. 
4 
NameNode 
JobTracker 
DataNode 
/ 
TaskTracker 
DataNode 
/ 
TaskTracker 
DataNode 
/ 
TaskTracker 
DataNode 
/ 
TaskTracker 
Client 
1-­‐1 
11-­‐-­‐23
Hadoop 
1.0: 
HDFS 
+ 
MapReduce 
© 
2014 
Trace3, 
All 
rights 
reserved. 
5 
NameNode 
JobTracker 
DataNode 
/ 
TaskTracker 
DataNode 
/ 
TaskTracker 
2-­‐1 
3-­‐2 
Map 
Reduce 
DataNode 
/ 
TaskTracker 
DataNode 
/ 
TaskTracker 
Client 
1-­‐1 
1-­‐2 
1-­‐3 
Map 
Reduce 
3-­‐3 
4-­‐1 
2-­‐3 
4-­‐2 
2-­‐2 
3-­‐1 
4-­‐3
MapReduce 
v1 
LimitaZons 
© 
2014 
Trace3, 
All 
rights 
reserved. 
6 
Scalability 
Maximum 
cluster 
size 
is 
4,000 
nodes 
and 
maximum 
concurrent 
tasks 
is 
40,000 
Availability 
JobTracker 
failure 
kills 
all 
queued 
and 
running 
jobs 
Resources 
ParZZoned 
into 
Map 
and 
Reduce 
Hard 
parGGoning 
of 
Map 
and 
Reduce 
slots 
led 
to 
low 
resource 
uZlizaZon 
No 
Support 
for 
Alternate 
Paradigms 
/ 
Services 
Only 
MapReduce 
batch 
jobs, 
nothing 
else
Hadoop 
1.0: 
Single 
Use 
System 
Pig 
Hive 
MapReduce 
(cluster 
resource 
management 
and 
data 
processing) 
© 
2014 
Trace3, 
All 
rights 
reserved. 
7 
HADOOP 
1.0 
Single 
Use 
System 
Batch 
Apps 
HDFS 
(redundant, 
reliable 
storage)
WHAT’S 
NEW 
IN 
HADOOP 
2.0? 
© 
2014 
Trace3, 
All 
rights 
reserved.
YARN 
© 
2014 
Trace3, 
All 
rights 
reserved. 
9 
YARN 
Replaces 
MapReduce 
Yet 
Another 
Resource 
NegoZator 
YARN 
will 
be 
the 
de-­‐facto 
distributed 
operaZng 
system 
for 
Big 
Data
YARN 
= 
BIG 
DATA 
© 
2014 
Trace3, 
All 
rights 
reserved. 
10
YARN: 
No 
Longer 
Just 
Batch 
Apps 
© 
2014 
Trace3, 
All 
11 
rights 
reserved. 
Store 
DATA 
in 
one 
place 
Interact 
with 
that 
data 
in 
MULTIPLE 
WAYS 
with 
Predictable 
Performance 
and 
Quality 
of 
Service 
ApplicaGons 
Run 
NaGvely 
IN 
Hadoop 
YARN 
(cluster 
resource 
management) 
HDFS2 
(redundant, 
reliable 
storage) 
BATCH 
(MapReduce) 
INTERACTIVE 
(Tez) 
ONLINE 
(HBase) 
STREAMING 
(DataTorrent) 
GRAPH 
(Giraph)
YARN: 
ApplicaZons 
Online 
Running 
all 
on 
the 
same 
Hadoop 
cluster 
to 
give 
applicaZons 
access 
to 
all 
the 
same 
source 
data! 
© 
2014 
Trace3, 
All 
12 
rights 
reserved. 
MapReduce 
v2 
Real-­‐Time 
Stream 
Processing 
Master-­‐Worker 
In-­‐Memory 
Apache 
Storm
YARN: 
Quickly 
Maturing 
© 
2014 
Trace3, 
All 
13 
Version 
2.3 
Version 
2.5 
rights 
reserved. 
2010 
2011 
2012 
2013 
2014 
Today 
Conceived 
at 
Yahoo! 
Alpha 
Releases 
– 
2.0 
Beta 
Releases 
– 
2.1 
GA 
Released 
– 
2.2 
Version 
2.4 
200,000+ 
nodes, 
800,000+ 
jobs 
daily 
10 
million+ 
hours 
of 
compute 
daily
YARN: 
What 
Has 
Changed? 
© 
2014 
Trace3, 
All 
14 
rights 
reserved. 
YARN 
MRv1 
RM 
ResourceManager 
AM 
ApplicaZonMaster 
JT 
JobTracker 
Scheduler 
Scheduler 
NM 
TT 
NodeManager 
TaskTracker 
Container 
Map 
& 
Reduce 
Slot 
ResourceManager 
Scheduler 
JobTracker 
Scheduler 
NodeManager 
ApplicaZonMaster 
TaskTracker 
Map 
Reduce 
NodeManager 
Container 
Container 
TaskTracker 
Map 
Reduce
The 
6 
Benefits 
Of 
YARN 
© 
2014 
Trace3, 
All 
rights 
reserved. 
15 
• Scale 
• New 
programming 
models 
and 
services 
• Improved 
cluster 
uZlizaZon 
• Agility 
• Backwards 
compaZble 
with 
MapReduce 
v1 
• Mixed 
workloads 
on 
the 
same 
source 
of 
data
THE 
FUTURE 
OF 
HADOOP 
© 
2014 
Trace3, 
All 
rights 
reserved.
SQL 
on 
Hadoop 
Speed 
Deliver 
interacGve 
query 
performance. 
SQL 
Support 
array 
of 
SQL 
semanGcs 
for 
analyGc 
applicaGons 
running 
against 
Hadoop. 
Scale 
SQL 
interface 
to 
Hadoop 
designed 
for 
queries 
that 
scale 
from 
Terabytes 
to 
Petabytes 
© 
2014 
Trace3, 
All 
rights 
reserved.
SQL 
on 
Hadoop 
Hive 
on 
Apache 
Tez 
Hortonworks 
HDP2 
Hive 
on 
Apache 
Spark 
Cloudera 
CDH5 
Apache 
Drill 
MapR 
M7 
Cloudera 
Impala 
Cloudera 
CDH5 
Pivotal 
HAWQ 
Pivotal 
Big 
Data 
Suite 
© 
2014 
Trace3, 
All 
rights 
reserved.
Apache 
Spark 
© 
2014 
Trace3, 
All 
rights 
reserved. 
Apache 
Spark 
(Databricks) 
YARN 
(cluster 
resource 
management) 
HDFS2 
(redundant, 
reliable 
storage) 
Programming 
Languages 
Java, 
Scala, 
Python, 
R* 
InteracZve 
Shell 
Ability 
to 
write 
code 
and 
get 
output. 
Faster 
by 
~100x 
Due 
how 
it 
handles 
data 
in 
memory.
Apache 
Spark 
– 
Wordcount 
© 
2014 
Trace3, 
All 
rights 
reserved.
HOYA: 
HBase 
(NoSQL) 
on 
YARN 
Dynamic 
Scaling 
On-­‐demand 
cluster 
size. 
Increase 
and 
decrease 
the 
size 
with 
load. 
Easier 
Deployment 
APIs 
to 
create, 
start, 
stop 
and 
delete 
HBase 
clusters. 
Availability 
Recover 
from 
Region 
Server 
loss 
with 
a 
new 
container. 
© 
2014 
Trace3, 
All 
rights 
reserved.
Apache 
REEF 
Machine 
Learning 
Framework 
well 
suited 
for 
building 
machine 
learning 
jobs. 
Scalable 
/ 
Fault 
Tolerant 
Makes 
it 
easy 
to 
implement 
scalable, 
fault-­‐ 
tolerant 
runGme 
environments 
for 
a 
range 
of 
computaGonal 
models. 
Maintain 
State 
Users 
can 
build 
jobs 
that 
uGlize 
data 
from 
where 
it’s 
needed 
and 
also 
maintain 
state 
a`er 
jobs 
are 
done. 
© 
2014 
Trace3, 
All 
rights 
reserved. 
Retainable 
Evaluator 
ExecuGon 
Framework
Real-­‐Time 
Stream 
Processing 
© 
2014 
Trace3, 
All 
rights 
reserved. 
Apache 
Storm 
Streaming
Heterogeneous 
Storage 
NameNode 
Storage 
© 
2014 
Trace3, 
All 
rights 
reserved. 
NameNode 
SATA 
SSD 
Fusion 
IO 
THEN 
NOW
Hadoop 
Roadmap 
• Apache 
Hadoop 
2.5 
– NodeManager 
© 
2014 
Trace3, 
All 
rights 
reserved. 
Restart 
w/o 
disrupGon 
• Apache 
Hadoop 
2.6 
– Memory 
As 
Storage 
Tier 
– Dynamic 
Resource 
ConfiguraGon 
– Support 
For 
Docker 
Containers 
Q3 
2014 
Q4 
2014
I 
KNOW 
YOU 
HAVE 
QUESTIONS 
© 
2014 
Trace3, 
All 
rights 
reserved. 
26
THANK 
YOU! 
hqp://bigdatajoe.io/ 
hqp://bigdatacentric.com/ 
@bigdatajoerossi 
bigdatajoerossi@gmail.com 
© 
2014 
Trace3, 
All 
rights 
reserved.

More Related Content

What's hot

Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Sumeet Singh
 
MapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APIMapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase API
mcsrivas
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
Ted Dunning
 
Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
Cloudera, Inc.
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
 
YARN - Hadoop's Resource Manager
YARN - Hadoop's Resource ManagerYARN - Hadoop's Resource Manager
YARN - Hadoop's Resource Manager
VertiCloud Inc
 
Hive Now Sparks
Hive Now SparksHive Now Sparks
Hive Now Sparks
DataWorks Summit
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
David Kaiser
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
Uwe Printz
 
10c introduction
10c introduction10c introduction
10c introduction
mapr-academy
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
rightsize
 
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Simplilearn
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
Tugdual Grall
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
Vigen Sahakyan
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on spark
trihug
 
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On TezFebruary 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
Yahoo Developer Network
 
Philly DB MapR Overview
Philly DB MapR OverviewPhilly DB MapR Overview
Philly DB MapR Overview
MapR Technologies
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Vigen Sahakyan
 

What's hot (20)

Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
MapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APIMapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase API
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 
Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
YARN - Hadoop's Resource Manager
YARN - Hadoop's Resource ManagerYARN - Hadoop's Resource Manager
YARN - Hadoop's Resource Manager
 
Hive Now Sparks
Hive Now SparksHive Now Sparks
Hive Now Sparks
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
 
10c introduction
10c introduction10c introduction
10c introduction
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on spark
 
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On TezFebruary 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
 
Philly DB MapR Overview
Philly DB MapR OverviewPhilly DB MapR Overview
Philly DB MapR Overview
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 

Similar to Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition

Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
Hortonworks
 
Huhadoop - v1.1
Huhadoop - v1.1Huhadoop - v1.1
Huhadoop - v1.1
Big Data Joe™ Rossi
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
Adam Muise
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
hdhappy001
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Hortonworks
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
BigDataEverywhere
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
huguk
 
MapR Unique features
MapR Unique featuresMapR Unique features
MapR Unique features
Vishwas Tengse
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
POSSCON
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
Rajan Kanitkar
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
Ameet Paranjape
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
EMC
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
t3rmin4t0r
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Hortonworks
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
Emil Andreas Siemes
 
Introduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleIntroduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeople
SpringPeople
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
MapR Technologies
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
DataWorks Summit
 
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Modern Data Stack France
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
Data Con LA
 

Similar to Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition (20)

Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
 
Huhadoop - v1.1
Huhadoop - v1.1Huhadoop - v1.1
Huhadoop - v1.1
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 
MapR Unique features
MapR Unique featuresMapR Unique features
MapR Unique features
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 
Introduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleIntroduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeople
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
 
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 

More from Big Data Joe™ Rossi

OC Big Data Monthly Meetup #6 - Session 2 - Basho/Riak
OC Big Data Monthly Meetup #6 - Session 2 - Basho/RiakOC Big Data Monthly Meetup #6 - Session 2 - Basho/Riak
OC Big Data Monthly Meetup #6 - Session 2 - Basho/Riak
Big Data Joe™ Rossi
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
Big Data Joe™ Rossi
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
Big Data Joe™ Rossi
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
Big Data Joe™ Rossi
 
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - AltiscaleOC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
Big Data Joe™ Rossi
 
OC Big Data Monthly Meetup #5 - Session 2 - Sumo Logic
OC Big Data Monthly Meetup #5 - Session 2 - Sumo LogicOC Big Data Monthly Meetup #5 - Session 2 - Sumo Logic
OC Big Data Monthly Meetup #5 - Session 2 - Sumo Logic
Big Data Joe™ Rossi
 

More from Big Data Joe™ Rossi (6)

OC Big Data Monthly Meetup #6 - Session 2 - Basho/Riak
OC Big Data Monthly Meetup #6 - Session 2 - Basho/RiakOC Big Data Monthly Meetup #6 - Session 2 - Basho/Riak
OC Big Data Monthly Meetup #6 - Session 2 - Basho/Riak
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
 
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - AltiscaleOC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
 
OC Big Data Monthly Meetup #5 - Session 2 - Sumo Logic
OC Big Data Monthly Meetup #5 - Session 2 - Sumo LogicOC Big Data Monthly Meetup #5 - Session 2 - Sumo Logic
OC Big Data Monthly Meetup #5 - Session 2 - Sumo Logic
 

Recently uploaded

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 

Recently uploaded (20)

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 

Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition

  • 1. HADOOP: PAST, PRESENT AND FUTURE BIG DATA INTELLIGENCE PRACTICE © 2014 Trace3, All rights reserved.
  • 2. Roadmap © 2014 Trace3, All rights reserved. 1 ~1 hour 1-­‐ What Makes Up Hadoop 1.x? 2-­‐ What’s New In Hadoop 2.x? 3-­‐ The Future Of Hadoop …
  • 3. WHAT MAKES UP HADOOP 1.0? © 2014 Trace3, All rights reserved.
  • 4. What’s a “Node”? Processes / Daemons / Services © 2014 Trace3, All rights reserved. Node aka Server OperaZng System Compute Storage Memory
  • 5. Hadoop 1.0: HDFS + MapReduce © 2014 Trace3, All rights reserved. 4 NameNode JobTracker DataNode / TaskTracker DataNode / TaskTracker DataNode / TaskTracker DataNode / TaskTracker Client 1-­‐1 11-­‐-­‐23
  • 6. Hadoop 1.0: HDFS + MapReduce © 2014 Trace3, All rights reserved. 5 NameNode JobTracker DataNode / TaskTracker DataNode / TaskTracker 2-­‐1 3-­‐2 Map Reduce DataNode / TaskTracker DataNode / TaskTracker Client 1-­‐1 1-­‐2 1-­‐3 Map Reduce 3-­‐3 4-­‐1 2-­‐3 4-­‐2 2-­‐2 3-­‐1 4-­‐3
  • 7. MapReduce v1 LimitaZons © 2014 Trace3, All rights reserved. 6 Scalability Maximum cluster size is 4,000 nodes and maximum concurrent tasks is 40,000 Availability JobTracker failure kills all queued and running jobs Resources ParZZoned into Map and Reduce Hard parGGoning of Map and Reduce slots led to low resource uZlizaZon No Support for Alternate Paradigms / Services Only MapReduce batch jobs, nothing else
  • 8. Hadoop 1.0: Single Use System Pig Hive MapReduce (cluster resource management and data processing) © 2014 Trace3, All rights reserved. 7 HADOOP 1.0 Single Use System Batch Apps HDFS (redundant, reliable storage)
  • 9. WHAT’S NEW IN HADOOP 2.0? © 2014 Trace3, All rights reserved.
  • 10. YARN © 2014 Trace3, All rights reserved. 9 YARN Replaces MapReduce Yet Another Resource NegoZator YARN will be the de-­‐facto distributed operaZng system for Big Data
  • 11. YARN = BIG DATA © 2014 Trace3, All rights reserved. 10
  • 12. YARN: No Longer Just Batch Apps © 2014 Trace3, All 11 rights reserved. Store DATA in one place Interact with that data in MULTIPLE WAYS with Predictable Performance and Quality of Service ApplicaGons Run NaGvely IN Hadoop YARN (cluster resource management) HDFS2 (redundant, reliable storage) BATCH (MapReduce) INTERACTIVE (Tez) ONLINE (HBase) STREAMING (DataTorrent) GRAPH (Giraph)
  • 13. YARN: ApplicaZons Online Running all on the same Hadoop cluster to give applicaZons access to all the same source data! © 2014 Trace3, All 12 rights reserved. MapReduce v2 Real-­‐Time Stream Processing Master-­‐Worker In-­‐Memory Apache Storm
  • 14. YARN: Quickly Maturing © 2014 Trace3, All 13 Version 2.3 Version 2.5 rights reserved. 2010 2011 2012 2013 2014 Today Conceived at Yahoo! Alpha Releases – 2.0 Beta Releases – 2.1 GA Released – 2.2 Version 2.4 200,000+ nodes, 800,000+ jobs daily 10 million+ hours of compute daily
  • 15. YARN: What Has Changed? © 2014 Trace3, All 14 rights reserved. YARN MRv1 RM ResourceManager AM ApplicaZonMaster JT JobTracker Scheduler Scheduler NM TT NodeManager TaskTracker Container Map & Reduce Slot ResourceManager Scheduler JobTracker Scheduler NodeManager ApplicaZonMaster TaskTracker Map Reduce NodeManager Container Container TaskTracker Map Reduce
  • 16. The 6 Benefits Of YARN © 2014 Trace3, All rights reserved. 15 • Scale • New programming models and services • Improved cluster uZlizaZon • Agility • Backwards compaZble with MapReduce v1 • Mixed workloads on the same source of data
  • 17. THE FUTURE OF HADOOP © 2014 Trace3, All rights reserved.
  • 18. SQL on Hadoop Speed Deliver interacGve query performance. SQL Support array of SQL semanGcs for analyGc applicaGons running against Hadoop. Scale SQL interface to Hadoop designed for queries that scale from Terabytes to Petabytes © 2014 Trace3, All rights reserved.
  • 19. SQL on Hadoop Hive on Apache Tez Hortonworks HDP2 Hive on Apache Spark Cloudera CDH5 Apache Drill MapR M7 Cloudera Impala Cloudera CDH5 Pivotal HAWQ Pivotal Big Data Suite © 2014 Trace3, All rights reserved.
  • 20. Apache Spark © 2014 Trace3, All rights reserved. Apache Spark (Databricks) YARN (cluster resource management) HDFS2 (redundant, reliable storage) Programming Languages Java, Scala, Python, R* InteracZve Shell Ability to write code and get output. Faster by ~100x Due how it handles data in memory.
  • 21. Apache Spark – Wordcount © 2014 Trace3, All rights reserved.
  • 22. HOYA: HBase (NoSQL) on YARN Dynamic Scaling On-­‐demand cluster size. Increase and decrease the size with load. Easier Deployment APIs to create, start, stop and delete HBase clusters. Availability Recover from Region Server loss with a new container. © 2014 Trace3, All rights reserved.
  • 23. Apache REEF Machine Learning Framework well suited for building machine learning jobs. Scalable / Fault Tolerant Makes it easy to implement scalable, fault-­‐ tolerant runGme environments for a range of computaGonal models. Maintain State Users can build jobs that uGlize data from where it’s needed and also maintain state a`er jobs are done. © 2014 Trace3, All rights reserved. Retainable Evaluator ExecuGon Framework
  • 24. Real-­‐Time Stream Processing © 2014 Trace3, All rights reserved. Apache Storm Streaming
  • 25. Heterogeneous Storage NameNode Storage © 2014 Trace3, All rights reserved. NameNode SATA SSD Fusion IO THEN NOW
  • 26. Hadoop Roadmap • Apache Hadoop 2.5 – NodeManager © 2014 Trace3, All rights reserved. Restart w/o disrupGon • Apache Hadoop 2.6 – Memory As Storage Tier – Dynamic Resource ConfiguraGon – Support For Docker Containers Q3 2014 Q4 2014
  • 27. I KNOW YOU HAVE QUESTIONS © 2014 Trace3, All rights reserved. 26
  • 28. THANK YOU! hqp://bigdatajoe.io/ hqp://bigdatacentric.com/ @bigdatajoerossi bigdatajoerossi@gmail.com © 2014 Trace3, All rights reserved.