INTRODUCTION TO
THE HADOOP
ECOSYSTEM
BAKING A LAYER CAKE AND BEYOND…
“Qu’ils mangent de la
brioche.”
1
BEFORE WE BEGIN
Questions for the
audience….
How Many of You
have :
Been working with Hadoop for more than 3
months?
Been ...
About the Speaker
BSCIS - The College of Engineering, The Ohio State University
‘Big Data’ Consultant with > 25 years in I...
What is Hadoop?
‘A Framework of software tools to allow one to take a
large problem and process individual pieces in
paral...
Our Hadoop Layer Cake:
Circa 2010
Storag
e
Job
Control
Data Access
5
Programmin
g
Languages
Data Access
Our Hadoop Layer Cake:
Circa 2013 Hadoop 2.0
Storag
e
Job
Control
6
Resourc
e
Control
Real
Time
Messag
es
Conf...
The only constant is
change…
Hadoop is a disruptive technology, forcing the enterprise
to rethink how it handles data.
The...
PROPRIETARY SOFTWARE IS BAD.
“Qu’ils mangent de la
brioche.”
8
‘Let them eat
cake’
Myth
:
Reality
:VENDOR LOCK IN IS BAD.
HADOOP IS ONLY GOOD FOR BATCH
PROCESSING
“Qu’ils mangent de la
brioche.”
9
‘Let them eat
cake’
Myth
:
Reality
:HADOOP CAN ...
[CENSOR
ED]
PROJE
CT
DAT
E
CLIE
NT
REAL TIME HADOOP
SINGLE DATA CENTER SOLUTION
Nightly Batch Jobs Create the
Next Days Ad...
HADOOP IS A STAND ALONE SYSTEM AND WILL REPLACE
TRADITIONAL VENDOR’S PRODUCTS
“Qu’ils mangent de la
brioche.”
11
‘Let them...
PROJE
CT
DAT
E
CLIE
NT
TOD
AY
HADOOP AND THE
ENTERPRISE
WE CAN ALL GET ALONG….
Hadoop communicates
well with the rest of t...
PROJE
CT
DAT
E
CLIE
NT
TOD
AY
HADOOP AND THE
ENTERPRISE
WE CAN ALL GET ALONG….
Hadoop communicates
well with the rest of t...
How Traditional Vendors view
Hadoop
In the beginning they saw Hadoop as a threat.
They will crush them.
If you can’t beat ...
HADOOP CLUSTERS SHOULD BE BUILT ON COMMODITY
HARDWARE .
“Qu’ils mangent de la
brioche.”
15
‘Let them eat
cake’
Myth
:
Real...
PROJE
CT
DAT
E
CLIE
NT
ALTERNATIVE CLUSTER
LAYOUT
STORAGE / COMPUTE CLUSTER
A Higher Density of Disk
and Compute Cluster
P...
HADOOP HADOOP IS OPEN SOURCE AND
THEREFORE FREE.
“Qu’ils mangent de la
brioche.”
17
‘Let them eat cake’
Myth
:
Reality
:T....
There aint no such thing as a free
lunch…
Customers are paying for support.
Tools are primitive, requires work, no real po...
Take away…
Hadoop is a tool set that is constantly evolving.
Beware of marketing myths…
Do your own homework and talk to t...
YOU CAN HAVE YOUR
CAKE AND EAT IT TOO!
QUESTIONS?
Thank You For Your
Time
What is a layer cake?
layer cake
noun [C] US
: two or more soft cakes put on top of each other with
jam, cream, icing, etc...
What is Hadoop?
Storage Layer
The Storage Layer is a Distributed File System that
accomplishes the following:
Uniform Acce...
What is Hadoop?
Job Control Layer
The Job Control Layer is the layer that accomplishes the following:
Manages and Schedule...
What is Hadoop?
Data Access Layer
The Data Access Layer is the layer that accomplishes the
following:
Allows for a higher ...
What is Hadoop?
Job Flow Control Layer
The Data Access Layer is the layer that accomplishes the following:
Allows for a hi...
List of Apache Incubator
Projects associated with
Hadoop:
Storm
Accumulo
Knox
Sentry
Falcon
DataFu
Drill
Tez
Twill
Phoenix...
Upcoming SlideShare
Loading in …5
×

Dubai Big Data in Finance, Intro to Hadoop 2-Apr-14 - Michael Segel

670 views

Published on

A high level introduction to Hadoop and its layer cake. Presented at Dubai's Big Data in Finance on Apr 2nd 2014

Published in: Economy & Finance, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
670
On SlideShare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Dubai Big Data in Finance, Intro to Hadoop 2-Apr-14 - Michael Segel

  1. 1. INTRODUCTION TO THE HADOOP ECOSYSTEM BAKING A LAYER CAKE AND BEYOND… “Qu’ils mangent de la brioche.” 1
  2. 2. BEFORE WE BEGIN Questions for the audience…. How Many of You have : Been working with Hadoop for more than 3 months? Been working with Hadoop for more than 6 months? Been working with Hadoop for more than 1 year?How many of you have heard about this thing called ‘Hadoop’ / ‘Big Data’ and thought it would be fun to check it out?
  3. 3. About the Speaker BSCIS - The College of Engineering, The Ohio State University ‘Big Data’ Consultant with > 25 years in IT Working solely in the ‘Big Data’ space since 2009 Founded Chicago area Hadoop User Group (CHUG) in April 2010 1600+ Members Over 200 different companies across all industries in the Chicagoland area. Routinely has talked at different Conferences around the US on Hadoop. Guest Lecture at Illinois Institute of Technology. CoAuthored papers found on InfoQ. MapR Admin, Cloudera Admin & Developer Certified. 3 email: MSegel (at) segel.com Skype: Michael_Segel
  4. 4. What is Hadoop? ‘A Framework of software tools to allow one to take a large problem and process individual pieces in parallel. ‘ 4
  5. 5. Our Hadoop Layer Cake: Circa 2010 Storag e Job Control Data Access 5 Programmin g Languages
  6. 6. Data Access Our Hadoop Layer Cake: Circa 2013 Hadoop 2.0 Storag e Job Control 6 Resourc e Control Real Time Messag es Confused? This is just the tip of the iceberg. Data Frameworks
  7. 7. The only constant is change… Hadoop is a disruptive technology, forcing the enterprise to rethink how it handles data. The core Apache Framework is just the starting point. Disruption allows new vendors to compete with established vendors. If you can build a better mousetrap, you will attract customers. Hadoop plays nice with others…
  8. 8. PROPRIETARY SOFTWARE IS BAD. “Qu’ils mangent de la brioche.” 8 ‘Let them eat cake’ Myth : Reality :VENDOR LOCK IN IS BAD.
  9. 9. HADOOP IS ONLY GOOD FOR BATCH PROCESSING “Qu’ils mangent de la brioche.” 9 ‘Let them eat cake’ Myth : Reality :HADOOP CAN ALSO BE USED FOR ‘REAL TIME’ PROBLEMS.
  10. 10. [CENSOR ED] PROJE CT DAT E CLIE NT REAL TIME HADOOP SINGLE DATA CENTER SOLUTION Nightly Batch Jobs Create the Next Days Advertising Lists Client Phone Connects to the web serviceWeb Service talks to Ad EnginePhone connects to Ad Engine to get Ad Ad Engine connects to HBase to get list of potential Ads to display, sending the correct Ad to phone.
  11. 11. HADOOP IS A STAND ALONE SYSTEM AND WILL REPLACE TRADITIONAL VENDOR’S PRODUCTS “Qu’ils mangent de la brioche.” 11 ‘Let them eat cake’ Myth : Reality :HADOOP IS PART OF THE ENTERPRISE . IT CAN BE STANDALONE, OR IT CAN WORK WITH EXISTING INFRASTRUCTURE.
  12. 12. PROJE CT DAT E CLIE NT TOD AY HADOOP AND THE ENTERPRISE WE CAN ALL GET ALONG…. Hadoop communicates well with the rest of the Enterprise… Central cluster feeds distributed web services with local database backing… [split in to two slides]
  13. 13. PROJE CT DAT E CLIE NT TOD AY HADOOP AND THE ENTERPRISE WE CAN ALL GET ALONG…. Hadoop communicates well with the rest of the Enterprise… Traditional Data Stores play nice with Hadoop. Some seeing HDFS files as external tables. [split in to two slides]
  14. 14. How Traditional Vendors view Hadoop In the beginning they saw Hadoop as a threat. They will crush them. If you can’t beat them, join them…. Oracle Partners with Cloudera EMC partnered with MapR, then released its own distribution. (Green Stack) Terradata partners with Hortonworks. Microsoft partnered with Hortonworks. Intel Tried to create their own distro. Last week, dumped their distro, made large investment in to Cloudera. IBM … Has its own distro, yet certifies their tools to run on Cloudera Cisco partners with MapR Amazon (AWS) has own distro, Partners with MapR.
  15. 15. HADOOP CLUSTERS SHOULD BE BUILT ON COMMODITY HARDWARE . “Qu’ils mangent de la brioche.” 15 ‘Let them eat cake’ Myth : Reality :YOU CAN DESIGN YOUR CLUSTER AROUND CONSTRAINTS…
  16. 16. PROJE CT DAT E CLIE NT ALTERNATIVE CLUSTER LAYOUT STORAGE / COMPUTE CLUSTER A Higher Density of Disk and Compute Cluster Premium over Commodity Hardware I/O Latency Could be part of a virtualization solution.
  17. 17. HADOOP HADOOP IS OPEN SOURCE AND THEREFORE FREE. “Qu’ils mangent de la brioche.” 17 ‘Let them eat cake’ Myth : Reality :T.A.N.S.T.A.A.F.L ‘TANS - TAH - FELL’ (THERE AINT NO SUCH THING AS A FREE LUNCH )
  18. 18. There aint no such thing as a free lunch… Customers are paying for support. Tools are primitive, requires work, no real point and click solution in place, but getting there. Hadoop fills the gap where you want a custom solution. Merging semi-structured and structured data is going to be data dependent, requiring customization. Beyond ETL, SQL, custom apps require developer expertise. (You must invest in skills. ) Depending on Use Case, Time to Value (TtV) will differ. Bottom Line, there is a cost reduction over traditional solutions, but its not free.
  19. 19. Take away… Hadoop is a tool set that is constantly evolving. Beware of marketing myths… Do your own homework and talk to the vendors. Make them earn your business. T.A.S.T.A.A.F.L applies, you need to make an investment in terms of skills. Hadoop isn’t a separate solution and should be part of your overall Enterprise strategy. Hadoop isn’t a silver bullet. By itself, it doesn’t solve your business problems.
  20. 20. YOU CAN HAVE YOUR CAKE AND EAT IT TOO!
  21. 21. QUESTIONS? Thank You For Your Time
  22. 22. What is a layer cake? layer cake noun [C] US : two or more soft cakes put on top of each other with jam, cream, icing, etc. (= a sweet mixture made from sugar) between the cakes and covering the top and sides : a term for a diagram showing how various parts of a group of components tie together in terms of a functional stack. 22
  23. 23. What is Hadoop? Storage Layer The Storage Layer is a Distributed File System that accomplishes the following: Uniform Access from any machine in the cluster. Fast Access ( Resiliency (Self Healing) Redundancy (Replication) This is known as HDFS - Hadoop File System
  24. 24. What is Hadoop? Job Control Layer The Job Control Layer is the layer that accomplishes the following: Manages and Schedules Jobs to be run. (Default [FIFO], Capacity Scheduler, Manages the over all job, and distributes the subprocesses across the cluster. Manages the subprocesses being run on each node in the cluster. This is accomplished by a Job Tracker (Cluster level) and Task Tracker (Node Level)
  25. 25. What is Hadoop? Data Access Layer The Data Access Layer is the layer that accomplishes the following: Allows for a higher level access which can be translated to a Map/Reduce Job Pig (Yahoo!) Hive (Facebook) Allows for Adhoc access to data outside of the Map/Reduce Framework (HBase)
  26. 26. What is Hadoop? Job Flow Control Layer The Data Access Layer is the layer that accomplishes the following: Allows for a higher level access which can be translated to a Map/Reduce Job Pig (Yahoo!) Hive (Facebook) Allows for Adhoc access to data outside of the Map/Reduce Framework (HBase) Allows for processes to be chained together to create a work flow (Oozie)* *No where else to put it…
  27. 27. List of Apache Incubator Projects associated with Hadoop: Storm Accumulo Knox Sentry Falcon DataFu Drill Tez Twill Phoenix Hadoop Dev Tools Tajo

×