Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

<Insert Picture Here>

Oracle Solaris 11 as a Big Data Platform
Apache Hadoop Use Case
Orgad Kimchi, Principal Software Engineer
Oracle ISV Engineering

Disclaimer

The following is intended to outline our general product
direction. It is intended for information purposes only, and
may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality,
and should not be relied upon in making purchasing
decisions. The development, release, and timing of any
features or functionality described for Oracle’s products
remains at the sole discretion of
Oracle Corporation.

2

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Oracle Confidential, Proprietary Information

Agenda
 Hadoop Overview
 The Benefits of Using Oracle Solaris Technologies for a

Hadoop Cluster

3



What is Big Data
 Big Data is both: Large and Variable Datasets + New Set of

Technologies
 Extremely large files of unstructured or semi-structured data
 Large and highly distributed datasets that are otherwise difficult to manage
as a single unit of information
 That can economically acquire, organize, store, analyze and extract value
from Big Data datasets – thus facilitating better, more informed business
decisions

4



Introduction To Hadoop

5



What is Hadoop ?
 Originated at Google 2003

 Generation of search indexes and web scores
 Top level Apache project, Consists of two key services

1. Hadoop Distributed File System (HDFS), highly
scalable, fault-tolerant , distributed
2. MapReduce API (Java), Can be scripted in other
languages
 Hadoop brings the ability to cheaply process large
amounts of data, regardless of its structure.
6



Components of Hadoop

7



HDFS
 HDFS is the file system responsible for storing data on

the cluster
 Written in Java (based on Google’s GFS)
 Sits on top of a native file system (ext3, ext4, xfs, ZFS)
 POSIX like file permissions model
 Provides redundant storage for massive amounts of data
 HDFS is optimized for large, streaming reads of files

8



The Five Hadoop Daemons
Hadoop is comprised of five separate daemons
 NameNode : Holds the metadata for HDFS
 Secondary NameNode : Performs housekeeping functions for the

NameNode
 DataNode : Stores actual HDFS data blocks
 JobTracker : Manages MapReduce jobs, distributes individual
tasks to machines running the TaskTracker. Coordinates
MapReduce stages.
 TaskTracker : Responsible for instantiating and monitoring
individual Map and Reduce tasks

9



Hadoop Architecture

10



The benefits of using Oracle
Solaris technologies for a
Hadoop cluster

11



Solaris Zones Hadoop Architecture

12



Built-in Virtualization
Oracle Solaris 11 Zones
•

Secure, light-weight virtualization

•

Scales to 100s of zones/ node

•

Built-in, no cost virtualization

•

Combines Isolation with Resource
Management

•

Widely used for:
•

Consolidation

•

Legacy OS support

•

Rapid Application Deployment

•

Securely Protecting Applications

Co-engineered with installation, security,
ZFS, networking, IPS, SPARC and x86
hypervisors
13



1 out of 3 Oracle Solaris Systems
running Oracle Solaris Zones

The benefits of using Oracle Solaris Zones for a
Hadoop cluster
Oracle Solaris Zones Benefits

 Fast provision of new cluster

members using the Solaris
zones cloning feature
 Very high network throughput

between the zones for data
node replication

14



Oracle Solaris Zones

 Source http://dtrace.org/blogs/brendan/2013/01/11/virtualization-

performance-zones-kvm-xen
15



Oracle Solaris Zones

 Source http://dtrace.org/blogs/brendan/2013/01/11/virtualization-

performance-zones-kvm-xen
16



Oracle Solaris 11: Storage Virtualization
Secure Datasets for Each Tenant
Finance

HR

Sales

Zone

Zone

Zone

Finance
Dataset

HR
Dataset

Sales
Dataset

10x storage savings for virtualization

2x storage compression

17


• Virtual flash-enabled storage
pools for speed
• Built-in data services save
storage software costs
• File and block sharing
• Wire-speed encryption
on disk, over the wire
• Extreme data integrity
• Unlimited scale


The benefits of using Oracle Solaris ZFS for a
Hadoop cluster
Oracle Solaris ZFS Benefits
 Immense data capacity,128 bit file

system, perfect for big data-set
 Optimized disk I/O utilization for

better I/O performance with ZFS
built-in compression
 Secure data at rest using ZFS

encryption

18



Performance analysis
Each Oracle Solaris Zone can have different workload; it can be disk I/O,
network I/O, CPU, memory, or combination of these. In addition, a single
Oracle Solaris Zone can overload the entire system resources.


•Each Oracle Solaris Zone can have different workload; it can be disk I/O,
network I/O, CPU, memory, or combination of these. In addition, a single
Oracle Solaris Zone can overload the entire system resources.
DTrace - comprehensive, advanced tracing tool for troubleshooting
systematic problems in real time.


19



zonestat
The zonestat command allow us to monitor all the Solaris zones running on our
environment and provide us in real time statistics for the CPU, memory and Network
utilization.
root@global_zone:~# zonestat 10 10
Interval: 1, Duration: 0:00:10
SUMMARY
Cpus/Online: 128/12
PhysMem: 256G
VirtMem: 259G
---CPU---- --PhysMem-- --VirtMem-- --PhysNet-ZONE USED %PART USED %USED USED %USED PBYTE %PUSE
[total] 118.10 92.2% 24.6G 9.62% 60.0G 23.0% 18.4E 100%
[system] 0.00 0.00% 9684M 3.69% 40.5G 15.5%
data-node3 42.13 32.9% 4897M 1.86% 6146M 2.30% 18.4E 100%
data-node1 41.49 32.4% 4891M 1.86% 6173M 2.31% 18.4E 100%
data-node2 33.97 26.5% 4851M 1.85% 6145M 2.30% 18.4E 100%
global 0.34 0.27% 283M 0.10% 420M 0.15% 2192 0.00%
name-node 0.15 0.11% 419M 0.15% 718M 0.26%
126 0.00%
sec-name-node 0.00 0.00% 205M 0.07% 363M 0.13%
0 0.00%
20



DISK I/O Performance Monitoring

21



fsstat
The fsstat command allows us to monitor Disk I/O activity per Disk or per Solaris
Zone.

For example: monitoring writes to all ZFS file systems at 10 second intervals.
root@global_zone:~# fsstat -Z zfs 10 10
new name
name attr attr lookup rddir read read write write
file remov chng
get
set
ops
ops
ops bytes
ops bytes
0

0
0
0
0
0
0

22

0
0
0
0
0
0

744
0
0
151
0
359
0
413
0
14
0
14

11.4K
0 6.01K 5.87M
0
0 3.27K
0 1.41K 1.94M
0 8.72K
0 2.75K 3.95M
0 9.03K
0 2.98K 4.22M
0
51
0
0
0
0
51
0
0
0



0 zfs:global
7 1.42K zfs:data-node1
0
0 zfs:name-node
0
0 zfs:sec-name-node

DISK I/O - Cont'd
Run the DTrace iopattern script, as shown, to analyze the type of disk I/O
workload (is it random or sequential)
root@global_zone:~# /usr/dtrace/DTT/iopattern
%RAN %SEQ COUNT
MIN
MAX
AVG
KR
69
31
236
1024 1048576 448830 103441
75
25
577
512 1048576 327938 184306
92
8
598
512 1048576 198293 114275
74
26
379
512 1048576 330296 121954
66
34
281
1024 1048576 500550 137358
80
20
346
1024 1048576 332114 112218
81
19
444
512 1048576 290734 124694
65
35
337
512 1048576 490375 161139
75
25
704
512 1048576 353086 241105
75
25
444
1024 1048576 386634 167642
77
23
666
1024 1048576 397105 258274

23



KW
0
479
1525
294
0
0
1366
244
1642
0
0

Visualization

For more information about dim_STAT http://dimitrik.free.fr

24



Flame Graphs

For more information http://dtrace.org/blogs/brendan/2011/12/16/flame-graphs

25



Hadoop on an Oracle SPARC T4-2 Server

 Source https://blogs.oracle.com/taylor22

26



For more information

 How to Set Up a Hadoop Cluster Using Oracle Solaris Zones
 How to Build Native Hadoop Libraries for Oracle Solaris 11
 How to Set Up a Hadoop Cluster Using Oracle Solaris (Hands-On

Lab)
 Performance Analysis in a Multitenant Cloud Environment Using
Hadoop Cluster and Oracle Solaris 11
 My Blog

27



Follow us on

Questions

28



Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Similar to Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case (20)

More from Orgad Kimchi

More from Orgad Kimchi (9)

Recently uploaded

Recently uploaded (20)

Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Editor's Notes