More Related Content Similar to Hadoop workshop Similar to Hadoop workshop (20) Hadoop workshop1. Hadoop workshop
Cloud Connect Shanghai
Sep 15, 2013
Ari Flink – Operations Architect
Mac Fang – Manager, Hadoop development
Dean Zhu – Hadoop Developer
2. Agenda
1. Introductions (5 minutes)
2. Hadoop and Big Data Concepts (20 minutes)
3. Cisco Webex Hadoop architecture (10 minutes)
4. Cisco UCS Hadoop Common Platform Architecture (10 minutes)
5. Exercise 1 (30 minutes)
– Configure a Hadoop single node VM on a laptop
6. Hive and Impala concepts (15 minutes)
7. Exercise 2 (30 minutes)
– Analytics using Apache Hive and Cloudera Impala
8. Q & A
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
2
3. Hadoop and Big Data Overview
– Enterprise data management and big data
– Problems, Opportunities and Use case examples
– Hadoop architecture concepts
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
3
4. What is Big Data?
For our purposes, big data refers to distributed computing
architectures specifically aimed at the “3 V’s” of data: Volume,
Velocity, and Variety
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
4
5. Traditional Enterprise Data Management
Operational
(OLTP)
Operational
(OLTP)
ETL
Operational
(OLTP)
Online
Transactional
Processing
Extract,
Transform, and
Load
EDW
Enterprise
Data
Warehouse
BI/Reports
Business
Intelligence
(batch processing)
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
5
6. Traditional Business Intelligence Questions
Transactional Data (e.g. OLTP)
Real-time, but limited reporting/analytics
• What are the top 5
most active stocks
traded in the last
hour?
• How many new
purchase orders have
we received since
noon?
Cloud Connect 2013 Shanghai
Enterprise Data Warehouse
High value, structured, indexed, cleansed
• How many more
hurricane windows are
sold in Gulf-area
stores during
hurricane season vs.
the rest of the year?
• What were the top 10
most frequently backordered products over
the past year?
© 2013 Cisco and/or its affiliates. All rights reserved.
6
7. So what has changed?
The Explosion of Unstructured Data
10,000
1.8 trillion gigabytes of data
was created in 2011…
UNSTRUCTURED DATA
• Approx. 500 quadrillion files
(IN BILLIONS)
GB of Data
• More than 90% is unstructured
data
• Quantity doubles every 2 years
• Most unstructured data is neither
stored nor analyzed!
STRUCTURED DATA
0
2005
2010
2015
Source: Cloudera
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
7
8. Enterprise Data Management with Big Data
Inmemory
analytics
Operational
(OLTP)
Operational
(OLTP)
BI/Reports
ETL
MPP EDW
Operational
(OLTP)
Web
Dashboards
Big Data
(Hadoop, etc.)
ETL
Machine
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
8
9. Traditional Business Intelligence Questions
Transactional Data (e.g.
OLTP)
Fast data, real-time
• What are the top 5
most active stocks
traded in the last hour?
• How many new
purchase orders have
we received since
noon?
Cloud Connect 2013 Shanghai
Enterprise Data Warehouse
High value, structured,
indexed, cleansed
Big Data
Lower value, semi-structured,
multi-source, raw/”dirty”
• How many more
hurricane windows are
sold in Gulf-area
stores during hurricane
season vs. the rest of
the year?
• What were the top 10
most frequently backordered products over
the past year?
• Which products do
customers click on the
most and/or spend the
most time browsing
without buying?
• How do we optimally
set pricing for each
product in each store
for individual
customers everyday?
• Did the recent
marketing launch
generate the expected
online buzz, and did
that translate to sales?
© 2013 Cisco and/or its affiliates. All rights reserved.
9
10. Example: Web and Location Analytics
iPhone searches
Amazon for Vizio TV’s
in Electronics
1336083635.130 10.8.8.158 TCP_MISS/200 8400 GET
http://www.amazon.com/gp/aw/s/ref=is_box_?k=Visio+tv… "Mozilla/5.0 (iPhone;
CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko)
Version/5.1 Mobile/9A405 Safari/7534.48.3"
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
10
11. Big Data and Key Infrastructure Attributes
(What big data isn’t)
Usually not blade servers (not enough local storage)
Usually not virtualized (hypervisor only adds overhead)
Usually not highly oversubscribed (significant east-west traffic)
Usually not SAN/NAS
Low-cost, DASbased, scale-out
clustered filesystem
Move the
compute to
the storage
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
$$$
11
11
12. Cost, Performance, and Capacity
HW:SW $ split 30:70
Enterprise
Database
Structured Data:
Relational
Database
$20K/TB
Massive Scale-Out
Column Store
$10K/TB
Unstructured Data:
Hadoop
No SQL
$300-$1K/TB
Machine Logs, Web Click
Stream, Call Data Records,
Satellite Feeds, GPS Data,
Sensor Readings, Sales Data,
Blogs, Emails, Video
HW:SW $ split 70:30
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
12
13. Big Data Software Architectures
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
13
14. Three basic big data software architectures
MPP Relational
Database
Real-time NoSQL
Fast key-value
store/retrieve
•HBase (part of
Apache
Hadoop)*
•DataStax
(Cassandra)*
•Oracle NoSQL*
•Amazon Dynamo
Cloud Connect 2013 Shanghai
Scale-out BI/DW
Batch-oriented
Hadoop
Heavy lifting, processing
•Cloudera*
•MapR*
•Intel Hadoop*
•Pivotal HD*
© 2013 Cisco and/or its affiliates. All rights reserved.
•Greenplum DB
(Pivotal DB)*
•ParAccel*
•Vertica
•Netezza
•Teradata
*Cisco Partners
14
15. What Is Hadoop?
Hadoop is a distributed, faulttolerant framework for storing and
analyzing data.
Its two primary components are the
Hadoop Filesystem HDFS and the
MapReduce application engine.
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
15
16. Hadoop Components and Operations
Hadoop Distributed File System (HDFS)
File
Scalable & Fault Tolerant
Filesystem is distributed, stored
across all data nodes in the cluster
Files are divided into multiple large
blocks – 64MB default, typically
128MB – 512MB
Data is stored reliably. Each block is
replicated 3 times by default
Types of Node Functions
– Name Node - Manages HDFS
– Job Tracker – Manages
MapReduce Jobs
– Data Node/Task Tracker – stores
blocks/does work
Cloud Connect 2013 Shanghai
Block
1
Block
2
Block
3
Block
4
Block
5
Block
6
ToR
FEX/switch
ToR
FEX/switch
ToR
FEX/switch
Data
node 1
Data
node 6
Data
node 11
Data
node 2
Data
node 7
Data
node 12
Data
node 3
Data
node 8
Data
node 13
Data
node 4
Data
node 9
Name
Node
Data
node 5
Data
node 10
Job
Tracker
© 2013 Cisco and/or its affiliates. All rights reserved.
16
17. HDFS Architecture
Switch
ToR
FEX/switch
ToR
FEX/switch
ToR
FEX/switch
Data
node 1
Data
node 6
Data
node 11
2
1
Data
node 2
2
3
Data
node 7
1
Data
node 12
2
Data
node 3
3
Data
node 8
1
4
Data
node 13
3
Data
node 4
4
Data
node 9
4
Data
node 14
Data
node 5
Cloud Connect 2013 Shanghai
Data
node 10
Data
node 15
© 2013 Cisco and/or its affiliates. All rights reserved.
Name Node
/usr/sean/foo.txt:blk_1,blk_2
/usr/jacob/bar.txt:blk_3,blk_4
Data node 1:blk_1
Data node 2:blk_2, blk_3
Data node 3:blk_3
17
18. Rack Awareness
“Rack” 1
“Rack” 2
“Rack” 3
Data
node 1
Data
node 6
Data
node 11
2
1
Data
node 2
2
3
Data
node 7
1
Data
node 12
2
Data
node 3
3
Data
node 8
1
4
Data
node 13
3
Data
node 4
4
Data
node 9
4
Data
node 14
Data
node 5
Cloud Connect 2013 Shanghai
Data
node 10
Rack Awareness provides Hadoop the
optional ability to group nodes together in
logical “racks” (i.e. failure domains)
Logical “racks” may or may not correspond
to physical data center racks
Distributes blocks across different “racks”
to avoid failure domain of a single “rack”
It can also lessen block movement between
“racks”
Data
node 15
© 2013 Cisco and/or its affiliates. All rights reserved.
18
19. MapReduce Example: Word Count
Input
Map
the
quick
brown
fox
the fox
ate the
mouse
how now
brown
cow
Cloud Connect 2013 Shanghai
Shuffle & Sort
Reduce
the, 1
brown, 1
fox, 1
quick, 1
Output
Reduce
Map
brown, 2
fox, 2
how, 1
now, 1
the, 3
Reduce
ate, 1
cow, 1
mouse,
1
quick, 1
the, 1
fox, 1
the, 1
Map
quick, 1
how, 1
now, 1
brown, 1
Map
ate, 1
mouse, 1
cow, 1
© 2013 Cisco and/or its affiliates. All rights reserved.
19
20. MapReduce Architecture
Switch
ToR
FEX/switch
ToR
FEX/switch
M1
Task
Tracker 1
Task
Tracker 6
ToR
FEX/switch
R2
Task
Tracker 11
M2
Task
Tracker 2
Task
Tracker 7
M1
Task
Tracker 12
Job Tracker
M3
Task
M2
Tracker 3
Task
Tracker 4
M3
Task
Tracker 8
Task
Tracker 9
Task
Tracker 13
Task
Tracker 14
R1
Task
Tracker 5
Cloud Connect 2013 Shanghai
Task
Tracker 10
Task
Tracker 15
© 2013 Cisco and/or its affiliates. All rights reserved.
Job1:TT1:Mapper1,Mapper2
Job1:TT4:Mapper3,Reducer1
Job2:TT6:Reducer2
Job2:TT7:Mapper1,Mapper3
20
21. Cisco Webex Cloud and
Hadoop Architecture
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
21
22. Global Scale: 13 datacenters &
iPoPs around the globe
Dedicated network: dual path 10G
circuits between DCs
Multi-tenant: 95k sites
Datacenter / PoP
Leased network link
© 2010 Cisco and/or its affiliates. All rights reserved. rights reserved.
C97-717209-00 © 2012 Cisco and/or its affiliates. All
Real-time collaboration: voice,
desktop sharing, video, chat
22
22
23. People make mistakes
Hardware fails
Software fails
Even failovers sometimes fail
Datacenter / PoP
Leased network link
© 2010 Cisco and/or its affiliates. All rights reserved. rights reserved.
C97-717209-00 © 2012 Cisco and/or its affiliates. All
23
23
25. Cisco UCS and Big Data
Building a big data cluster with the UCS
Common Platform Architecture (CPA)
CPA Networking
CPA Sizing and Scaling
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
25
26. The evolution of big data deployments
General Purpose IT Data Center
Dedicated “Pod” for Big Data
IT Infrastructure
Generic IT servers
SAP
VMware
WEB
X86 servers
Big Data
Big Data
Experimental use of Big Data
App team mandated infrastructure
Deployed into IT Ops mandated
infrastructures
Purpose built for Big Data
“Skunk works”
Big Data has established business
value
Small to medium clusters
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
Performance matters
Large or small clusters
27. Hadoop Hardware Evolving in the Enterprise
Typical 2009
Hadoop node
• 1RU server
• 4 x 1TB 3.5”
spindles
• 2 x 4-core CPU
• 1 x GE
• 24 GB RAM
• Single PSU
• Running Apache
•$
Cloud Connect 2013 Shanghai
Economics favor
“fat” nodes
• 6x-9x more
data/node
• 3x-6x more
IOPS/node
• Saturated gigabit,
10GE on the rise
• Fewer total nodes
lowers
licensing/support
costs
• Increased
significance of node
and switch failure
© 2013 Cisco and/or its affiliates. All rights reserved.
Typical 2013
Hadoop node
• 2RU server
• 12 x 3TB 3.5” or 24
x 1TB 2.5” spindles
• 2 x 8-core CPU
• 1-2 x 10GE
• 128 GB RAM
• Dual PSU
• Running
commercial/licensed
distribution
• $$$
27
28. Cisco UCS Common Platform Architecture (CPA)
Building Blocks for Big Data
UCS Manager
UCS 6200 Series
Fabric Interconnects
Nexus 2232
Fabric Extenders
LAN, SAN, Management
UCS 240 M3
Servers
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
28
29. CPA Network Design for Big Data
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
29
30. CPA: Topology
Single wire for data and management
8 x 10GE
uplinks per
FEX= 2:1
oversub (16
servers/rack),
no
portchannel
(static pinning)
2 x 10GE links
per server for all
traffic, data and
management
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
31. CPA Recommended FEX Connectivity
2 FEX’s and 2 FI’s
•
•
2232 FEX has 4 buffer groups: ports 1-8, 9-16, 17-24, 25-32
Distribute servers across port groups to maximize buffer
performance and predictably distribute static pinning on uplinks
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
32. Can Hadoop really push 10GE?
It can, depending on workload, so tune for it!
Analytic workloads tend to be
lighter on the network
Transform workloads tend to be
heavier on the network
Hadoop has numerous
parameters which affect network
Take advantage of 10GE CPA:
–
–
–
–
–
–
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
mapred.reduce.slowstart.completed.maps
dfs.balance.bandwidthPerSec
mapred.reduce.parallel.copies
mapred.reduce.tasks
mapred.tasktracker.reduce.tasks.maximum
mapred.compress.map.output
32
33. CPA Sizing and Scaling for Big Data
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
33
34. Cisco UCS Reference Configurations for Big Data
Full Rack UCS Solutions
Bundle for Hadoop,
NoSQL Performance
2 x UCS 6296
2 x Nexus 2232 PP
16 x C240 M3 (SFF)
2 x UCS 6296
2 x Nexus 2232 PP
16 x C240 M3 (LFF)
2x E5-2665 (16 cores)
256GB
24 x 1TB 7.2K SAS
Cloud Connect 2013 Shanghai
Full Rack UCS Solutions
Bundle for Hadoop
Capacity
E5-2640 (12 cores)
128GB
12x 3TB 7.2K SATA
© 2013 Cisco and/or its affiliates. All rights reserved.
34
35. Sizing
Part science, part art
Start with current storage requirement
– Factor in replication (typically 3x) and compression (varies by data set)
– Factor in 20-30% free space for temp (Hadoop) or up to 50% for some NoSQL
systems
– Factor in average daily/weekly data ingest rate
– Factor in expected growth rate (i.e. increase in ingest rate over time)
If I/O requirement known, use next table for guidance
Most big data architectures are very linear, so more nodes = more capacity and
better performance
Strike a balance between price/performance of individual nodes vs. total # of
nodes
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
35
36. CPA sizing and application guidelines
CPU
2 x E5-2690
2 x E5-2665
2 x E5-2640
256
256
128
24 x 600GB 10K
24 x 1TB 7.2K
12 x 3TB 7.2K
IO Bandwidth (GB/Sec)
2.6
2.0
1.1
Cores
256
256
192
Memory (TB)
4
4
2
Capacity (TB)
225
384
576
IO Bandwidth (GB/Sec)
41.3
31.9
16.9
MPP DB
NoSQL
Hadoop
NoSQL
Hadoop
Memory (GB)
Server
Disk Drives
Rack-Level
Applications
Best Performance
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
Best Price/TB
36
37. Scaling the CPA
L2/L3 Switching
Single Rack
16 servers
Single Domain
Up to 10 racks, 160 servers
Multiple Domains
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
37
38. Scaling the Common Platform Architecture
Multiple domains based on 16 servers per rack and 2 x 2232 FEXs
Consider intra- and inter-domain bandwidth:
Servers Per
Domain
(Pair of Fabric
Interconnects)
Available
North-Bound
10GE ports
(per fabric)
Southbound
oversubscription
(per fabric)
Northbound
oversubscription
(per fabric)
Intra-domain
server-to-server
bandwidth (per
fabric, Gbits/sec)
Inter-domain
server-to-server
bandwidth (per
fabric, Gbits/sec)
160
16
2:1
5:1
5
1
144
24
2:1
3:1
5
1.67
128
32
2:1
2:1
5
2.5
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
38
39. Multi-Domain CPA Customer Example
• 10 Gits/sec Intra-Domain
Server to Server NW
Bandwidth
• 5 Gbits/sec Inter-Domain
Server to Server NW
Bandwidth
• Static pinning from FEX to
FI (no port-channel)
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
39
40. Recommendations: UCS Domains and Racks
Single Domain Recommendation
Multi Domain Recommendation
Create one Hadoop rack per UCS Domain
Turn off or enable at physical rack level
• For simplicity and ease of
use, leave Rack Awareness
off
• Consider turning it on to limit
physical rack level fault
domain (e.g. localized
failures due to physical data
center issues – water, power,
cooling, etc.)
Cloud Connect 2013 Shanghai
• With multiple domains,
enable Rack Awareness
such that each UCS Domain
is its own Hadoop rack
• Provides HDFS data
protection across domains
• Helps minimize crossdomain traffic
© 2013 Cisco and/or its affiliates. All rights reserved.
40
41. Exercise 1
Set up a single node VM cluster on the laptop
– Step 1: copy files from USB memory stick
– Step 2: Mac & Dean to fill in …
– Step 3: Mac & Dean to fill in …
– etc
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
41
42. Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
42
43. Hive
An SQL-like interface to Hadoop
Top level Apache project
– http://hive.apache.org/
Hive history
– Created at Facebook to allow people to quickly and easily leverage Hadoop without the effort of
writing Java MapReduce
– Currently used at many companies for log processing, business intelligence and analytics
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
43
44. Hive Components
Shell: allows interactive queries
Driver: session handles, fetch, execute
Compiler: parse, plan, optimize
Execution engine: DAG of stages (MR, HDFS, metadata)
Metastore: schema, location in HDFS, SerDe
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
45. Data Model
Tables
– Typed columns (int, float, string, boolean)
– Also, list: map (for JSON-like data)
Partitions
– For example, range-partition tables by date
Buckets
– Hash partitions within ranges (useful for sampling, join optimization)
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
46. Hive
DBMS
Hive
Language
SQL-92 standard
Subset of SQL-92 plus Hive
extensions
Updates
INSERT, UPDATE, DELETE
INSERT OVERWRITE
No UPDATE or DELETE
Transactions
Yes
No
Latency
Sub-second
Minutes to hours
Indexes
Any number of indexes,
important to performance
No indexes, data is always
scanned in parallel
Dataset size
TBs
PBs
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
46
47. Metastore
Database: namespace containing a set of tables
Holds table definitions (column types, physical layout)
Holds partitioning information
Can be stored in Derby, MySQL, and other relational databases
Cloud Connect 2013
Source: cc-licensedShanghai Cloudera
slide by
© 2013 Cisco and/or its affiliates. All rights reserved.
50. Hive Physical Layout
Warehouse directory in HDFS
– E.g., /user/hive/warehouse
Tables stored in subdirectories of warehouse
– Partitions form subdirectories of tables
Actual data stored in HDFS files
– E.g. text, SequenceFile, RCfile, Avro
– Arbitrary format with a custom SerDe
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
51. External and Hive managed tables
Hive managed tables
– Data moved to location /user/hive/warehouse
– Can be stored in a more efficient format than text e.g. RCFile
– If you drop the table, the raw data is lost
hive> CREATE TABLE test(id INT, name STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY 'n'
STORED AS TEXTFILE;
External tables
– Can overlay multiple tables all pointing to the same raw data
– To create external table, simply point to the location of data while creating the tables
hive> CREATE TABLE test (id INT, name STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY 'n'
STORED AS TEXTFILE
LOCATION '/home/test/data';
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
52. Hive: Example
Hive looks similar to an SQL database
Relational join on two tables:
– Table of word counts from Shakespeare collection
– Table of word counts from the bible
SELECT s.word, s.freq, k.freq FROM shakespeare s
JOIN bible k ON (s.word = k.word) WHERE s.freq >= 1 AND k.freq
>= 1
ORDER BY s.freq DESC LIMIT 5;
the
I
and
to
of
Cloud Connect 2013 Shanghai
25848
23031
19671
18038
16700
62394
8854
38985
13526
34654
© 2013 Cisco and/or its affiliates. All rights reserved.
54. Impala
General purpose MPP SQL query engine for Hadoop
– Query latency milliseconds to hours, interactive data exploration
– Runs on the existing Hadoop cluster on existing HDFS files and hardware
High performance
– C++
– Direct access to HDFS and Hbase data, no MapReduce
Unified platform
– Use existing Hive metadata and query language (HiveQL)
– Submit queries via ODBC or Thrift API
Performance
– Disk throughput limited by hw to 100MB/sec
– 3 .. 90 x faster than Hive, depending on the type of the query
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
54
55. Impala Details
Unified metadata
HiveQL interface
Hive Metastore
SQL App
HDFS NN
ODBC
StateStored
impalad
impalad
impalad
Query Planner
Query Planner
Query Planner
Query Coordinator
Query Coordinator
Query Coordinator
Query Exec Engine
Query Exec Engine
Query Exec Engine
HDFS DN HBase
HDFS DN HBase
HDFS DN HBase
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
55
56. Impala Details
Unified metadata
HiveQL interface
Hive Metastore
SQL App
ODBC
Impalad keep contact to
StateStored to update their state
and to receive metadata for query
planning
HDFS NN
StateStored
impalad
impalad
impalad
Query Planner
Query Planner
Query Planner
Query Coordinator
Query Coordinator
Query Coordinator
Query Exec Engine
Query Exec Engine
Query Exec Engine
HDFS DN HBase
HDFS DN HBase
HDFS DN HBase
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
56
57. Impala Details
Unified metadata
HiveQL interface
Hive Metastore
SQL App
HDFS NN
ODBC
StateStore
Query coordinator initiates
execution on remote impalad’s
impalad
impalad
impalad
Query Planner
Query Planner
Query Planner
Query Coordinator
Query Coordinator
Query Coordinator
Query Exec Engine
Query Exec Engine
Query Exec Engine
HDFS DN HBase
HDFS DN HBase
HDFS DN HBase
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
57
58. Impala Details
Unified metadata
HiveQL interface
Hive Metastore
SQL App
HDFS NN
ODBC
StateStore
Intermediate results are streamed between impalad’s
and query results are streamed back to client
impalad
impalad
impalad
Query Planner
Query Planner
Query Planner
Query Coordinator
Query Coordinator
Query Coordinator
Query Exec Engine
Query Exec Engine
Query Exec Engine
HDFS DN HBase
HDFS DN HBase
HDFS DN HBase
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
58
59. Exercise 2
Analytics with Hive and Impala
– Step 1: copy test dataset from USB memory stick
– Step 2: Mac & Dean to fill in …
– Step 3: Mac & Dean to fill in …
– etc
Cloud Connect 2013 Shanghai
© 2013 Cisco and/or its affiliates. All rights reserved.
59
Editor's Notes Summary slides after each model Hadoop, NoSQL and MPP. 3 bullets on actual implementation to tie back. To later section. Sean to 27 Hadoop optimized for large streaming reads, not for low latency or fast writesHDFS optimized for fewer, larger files (> 100 MB), 128MB block size or higherFiles are write-once currently (append support available in 0.21, but mainly for HBase; otherwise not recommended)Blocks are replicated 3x by default, on three different data nodesNameNode stores file metadata in fsimage.txt:/usr/sean/foo.txt:blk_1,blk_2,blk_3 – but but it doesn't know which data nodes own those blocks until they report inBlocks are just files on the underlying filesystem (ext3, etc.) - blk_1234No metadata on the slave node that describes the data contained on that slave (or any other)When NameNode starts up, it starts in safe mode, and won't leave safe mode until it knows where at least one copy 99.999% of blocks are (configurable) based on block reports, then waits 30 seconds and exits safe modeNameNode block map is solely based on slave block reports, always cached in memory, nothing persistentAll data nodes heartbeat into NameNode every 3 seconds; NameNode will evict if no heartbeat after 5 minutes, and re-replicate “lost” blocks if no heartbeat after 10 minutesAs blocks are written, checksums are calculated and stored with the block (blk_1234.meta). Upon read it compares the calculated checksum with stored checksumTo avoid bit rot, a daemon runs to check the checksum every 3 weeks after a given block was written Hadoop optimized for large streaming reads, not for low latency or fast writesHDFS optimized for fewer, larger files (> 100 MB), 128MB block size or higherFiles are write-once currently (append support available in 0.21, but mainly for HBase; otherwise not recommended)Blocks are replicated 3x by default, on three different data nodesNameNode stores file metadata in fsimage.txt:/usr/sean/foo.txt:blk_1,blk_2,blk_3 – but but it doesn't know which data nodes own those blocks until they report inBlocks are just files on the underlying filesystem (ext3, etc.) - blk_1234No metadata on the slave node that describes the data contained on that slave (or any other)When NameNode starts up, it starts in safe mode, and won't leave safe mode until it knows where at least one copy 99.999% of blocks are (configurable) based on block reports, then waits 30 seconds and exits safe modeNameNode block map is solely based on slave block reports, always cached in memory, nothing persistentAll data nodes heartbeat into NameNode every 3 seconds; NameNode will evict if no heartbeat after 5 minutes, and re-replicate “lost” blocks if no heartbeat after 10 minutesAs blocks are written, checksums are calculated and stored with the block (blk_1234.meta). Upon read it compares the calculated checksum with stored checksumTo avoid bit rot, a daemon runs to check the checksum every 3 weeks after a given block was written JobTracker assigns map or reduce tasks to TaskTracker slaves (data nodes) with available “slots”. For map tasks, JobTracker attempts to assign work on local blocks to avoid expensive shipping of blocks across the networkEach task (mapper or reducer) runs in its own child JVM on the slave node. TaskTracker process kicks off its child tasks based on preconfigured number of task slotsEach child task JVM eats up a chunk of RAM, placing a limit on total # of slotsRule of thumb: 25-30% of space set aside for temp storage, outside of HDFS, to hold intermediate map output data before sending to reducersIf a child JVM dies, TaskTracker will remove it and report to the JobTracker that it died; JobTracker will attempt to reassign that task to a different TaskTrackerIf any specific task fails 4 times, the whole job failsIf a TaskTracker reports a high # of failed tasks, it'll get blacklisted for that jobIf a TaskTracker gets blacklisted for multiple jobs, it gets put on a global blacklist for 24 hours As of Feb 2013 As of Feb 2013 CEP: Complex Event Processing Big data projects often start out co-mingled within existing general purpose data center infrastructure, but eventually outgrow it and need to move to a dedicated “pod”. This is usually where we come in.