Your SlideShare is downloading. ×
0
<Insert Picture Here>

Oracle Solaris 11 as a Big Data Platform
Apache Hadoop Use Case
Orgad Kimchi, Principal Software En...
Disclaimer

The following is intended to outline our general product
direction. It is intended for information purposes on...
Agenda
 Hadoop Overview
 The Benefits of Using Oracle Solaris Technologies for a

Hadoop Cluster

3

Copyright © 2013, O...
What is Big Data
 Big Data is both: Large and Variable Datasets + New Set of

Technologies
 Extremely large files of uns...
Introduction To Hadoop

5

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Oracle Confidential, Prop...
What is Hadoop ?
 Originated at Google 2003

 Generation of search indexes and web scores
 Top level Apache project, Co...
Components of Hadoop

7

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Oracle Confidential, Propri...
HDFS
 HDFS is the file system responsible for storing data on

the cluster
 Written in Java (based on Google’s GFS)
 Si...
The Five Hadoop Daemons
Hadoop is comprised of five separate daemons
 NameNode : Holds the metadata for HDFS
 Secondary ...
Hadoop Architecture

10

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Oracle Confidential, Propri...
The benefits of using Oracle
Solaris technologies for a
Hadoop cluster

11

Copyright © 2013, Oracle and/or its affiliates...
Solaris Zones Hadoop Architecture

12

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Oracle Confid...
Built-in Virtualization
Oracle Solaris 11 Zones
•

Secure, light-weight virtualization

•

Scales to 100s of zones/ node

...
The benefits of using Oracle Solaris Zones for a
Hadoop cluster
Oracle Solaris Zones Benefits

 Fast provision of new clu...
Oracle Solaris Zones

 Source http://dtrace.org/blogs/brendan/2013/01/11/virtualization-

performance-zones-kvm-xen
15

C...
Oracle Solaris Zones

 Source http://dtrace.org/blogs/brendan/2013/01/11/virtualization-

performance-zones-kvm-xen
16

C...
Oracle Solaris 11: Storage Virtualization
Secure Datasets for Each Tenant
Finance

HR

Sales

Zone

Zone

Zone

Finance
Da...
The benefits of using Oracle Solaris ZFS for a
Hadoop cluster
Oracle Solaris ZFS Benefits
 Immense data capacity,128 bit ...
Performance analysis
Each Oracle Solaris Zone can have different workload; it can be disk I/O,
network I/O, CPU, memory, o...
zonestat
The zonestat command allow us to monitor all the Solaris zones running on our
environment and provide us in real ...
DISK I/O Performance Monitoring

21

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Oracle Confiden...
fsstat
The fsstat command allows us to monitor Disk I/O activity per Disk or per Solaris
Zone.

For example: monitoring wr...
DISK I/O - Cont'd
Run the DTrace iopattern script, as shown, to analyze the type of disk I/O
workload (is it random or seq...
Visualization

For more information about dim_STAT http://dimitrik.free.fr

24

Copyright © 2013, Oracle and/or its affili...
Flame Graphs

For more information http://dtrace.org/blogs/brendan/2011/12/16/flame-graphs

25

Copyright © 2013, Oracle a...
Hadoop on an Oracle SPARC T4-2 Server

 Source https://blogs.oracle.com/taylor22

26

Copyright © 2013, Oracle and/or its...
For more information

 How to Set Up a Hadoop Cluster Using Oracle Solaris Zones
 How to Build Native Hadoop Libraries f...
Follow us on

Questions

28

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Oracle Confidential, Pr...
Upcoming SlideShare
Loading in...5
×

Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

2,785

Published on

The following are benefits of using Oracle Solaris Zones for a Hadoop cluster:

Fast provision of new cluster members using the zone cloning feature
Very high network throughput between the zones for data node replication
Optimized disk I/O utilization for better I/O performance with ZFS built-in compression
Secure data at rest using ZFS encryption
For more information see: http://www.oracle.com/technetwork/articles/servers-storage-admin/howto-setup-hadoop-zones-1899993.html

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,785
On Slideshare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
59
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Storage Virtualization is possible through ZFS, the default storage subsystem in Solaris 11. ZFS simplifies storage management through the use of virtual storage pools that can include flash for high performance data operations. ZFS datasets can be assigned to a specific zone and then encrypted at wire-speed to keep data separate in a virtualized environment. ZFS provides both file and block sharing for UNIX and Windows environments. ZFS data services such as deduplication, compression, replication and migration, snapshots and more are built in to ZFS so customers don’t have to purchase extra software or hardware options.ZFS is designed for extreme data integrity – there has never been a reported service case of corrupted data since 2006 when it first shipped with Solaris 10. ZFS is a128-bit file system designed to scale for the next 50 years of data management. All other file systems today are 64 bit or less
  • Transcript of "Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case"

    1. 1. <Insert Picture Here> Oracle Solaris 11 as a Big Data Platform Apache Hadoop Use Case Orgad Kimchi, Principal Software Engineer Oracle ISV Engineering
    2. 2. Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle Corporation. 2 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    3. 3. Agenda  Hadoop Overview  The Benefits of Using Oracle Solaris Technologies for a Hadoop Cluster 3 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    4. 4. What is Big Data  Big Data is both: Large and Variable Datasets + New Set of Technologies  Extremely large files of unstructured or semi-structured data  Large and highly distributed datasets that are otherwise difficult to manage as a single unit of information  That can economically acquire, organize, store, analyze and extract value from Big Data datasets – thus facilitating better, more informed business decisions 4 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    5. 5. Introduction To Hadoop 5 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    6. 6. What is Hadoop ?  Originated at Google 2003  Generation of search indexes and web scores  Top level Apache project, Consists of two key services 1. Hadoop Distributed File System (HDFS), highly scalable, fault-tolerant , distributed 2. MapReduce API (Java), Can be scripted in other languages  Hadoop brings the ability to cheaply process large amounts of data, regardless of its structure. 6 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    7. 7. Components of Hadoop 7 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    8. 8. HDFS  HDFS is the file system responsible for storing data on the cluster  Written in Java (based on Google’s GFS)  Sits on top of a native file system (ext3, ext4, xfs, ZFS)  POSIX like file permissions model  Provides redundant storage for massive amounts of data  HDFS is optimized for large, streaming reads of files 8 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    9. 9. The Five Hadoop Daemons Hadoop is comprised of five separate daemons  NameNode : Holds the metadata for HDFS  Secondary NameNode : Performs housekeeping functions for the NameNode  DataNode : Stores actual HDFS data blocks  JobTracker : Manages MapReduce jobs, distributes individual tasks to machines running the TaskTracker. Coordinates MapReduce stages.  TaskTracker : Responsible for instantiating and monitoring individual Map and Reduce tasks 9 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    10. 10. Hadoop Architecture 10 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    11. 11. The benefits of using Oracle Solaris technologies for a Hadoop cluster 11 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    12. 12. Solaris Zones Hadoop Architecture 12 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    13. 13. Built-in Virtualization Oracle Solaris 11 Zones • Secure, light-weight virtualization • Scales to 100s of zones/ node • Built-in, no cost virtualization • Combines Isolation with Resource Management • Widely used for: • Consolidation • Legacy OS support • Rapid Application Deployment • Securely Protecting Applications Co-engineered with installation, security, ZFS, networking, IPS, SPARC and x86 hypervisors 13 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information 1 out of 3 Oracle Solaris Systems running Oracle Solaris Zones
    14. 14. The benefits of using Oracle Solaris Zones for a Hadoop cluster Oracle Solaris Zones Benefits  Fast provision of new cluster members using the Solaris zones cloning feature  Very high network throughput between the zones for data node replication 14 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    15. 15. Oracle Solaris Zones  Source http://dtrace.org/blogs/brendan/2013/01/11/virtualization- performance-zones-kvm-xen 15 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    16. 16. Oracle Solaris Zones  Source http://dtrace.org/blogs/brendan/2013/01/11/virtualization- performance-zones-kvm-xen 16 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    17. 17. Oracle Solaris 11: Storage Virtualization Secure Datasets for Each Tenant Finance HR Sales Zone Zone Zone Finance Dataset HR Dataset Sales Dataset 10x storage savings for virtualization 2x storage compression 17 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. • Virtual flash-enabled storage pools for speed • Built-in data services save storage software costs • File and block sharing • Wire-speed encryption on disk, over the wire • Extreme data integrity • Unlimited scale Oracle Confidential, Proprietary Information
    18. 18. The benefits of using Oracle Solaris ZFS for a Hadoop cluster Oracle Solaris ZFS Benefits  Immense data capacity,128 bit file system, perfect for big data-set  Optimized disk I/O utilization for better I/O performance with ZFS built-in compression  Secure data at rest using ZFS encryption 18 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    19. 19. Performance analysis Each Oracle Solaris Zone can have different workload; it can be disk I/O, network I/O, CPU, memory, or combination of these. In addition, a single Oracle Solaris Zone can overload the entire system resources.  •Each Oracle Solaris Zone can have different workload; it can be disk I/O, network I/O, CPU, memory, or combination of these. In addition, a single Oracle Solaris Zone can overload the entire system resources. DTrace - comprehensive, advanced tracing tool for troubleshooting systematic problems in real time.  19 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    20. 20. zonestat The zonestat command allow us to monitor all the Solaris zones running on our environment and provide us in real time statistics for the CPU, memory and Network utilization. root@global_zone:~# zonestat 10 10 Interval: 1, Duration: 0:00:10 SUMMARY Cpus/Online: 128/12 PhysMem: 256G VirtMem: 259G ---CPU---- --PhysMem-- --VirtMem-- --PhysNet-ZONE USED %PART USED %USED USED %USED PBYTE %PUSE [total] 118.10 92.2% 24.6G 9.62% 60.0G 23.0% 18.4E 100% [system] 0.00 0.00% 9684M 3.69% 40.5G 15.5% data-node3 42.13 32.9% 4897M 1.86% 6146M 2.30% 18.4E 100% data-node1 41.49 32.4% 4891M 1.86% 6173M 2.31% 18.4E 100% data-node2 33.97 26.5% 4851M 1.85% 6145M 2.30% 18.4E 100% global 0.34 0.27% 283M 0.10% 420M 0.15% 2192 0.00% name-node 0.15 0.11% 419M 0.15% 718M 0.26% 126 0.00% sec-name-node 0.00 0.00% 205M 0.07% 363M 0.13% 0 0.00% 20 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    21. 21. DISK I/O Performance Monitoring 21 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    22. 22. fsstat The fsstat command allows us to monitor Disk I/O activity per Disk or per Solaris Zone. For example: monitoring writes to all ZFS file systems at 10 second intervals. root@global_zone:~# fsstat -Z zfs 10 10 new name name attr attr lookup rddir read read write write file remov chng get set ops ops ops bytes ops bytes 0 0 0 0 0 0 0 22 0 0 0 0 0 0 744 0 0 151 0 359 0 413 0 14 0 14 11.4K 0 6.01K 5.87M 0 0 3.27K 0 1.41K 1.94M 0 8.72K 0 2.75K 3.95M 0 9.03K 0 2.98K 4.22M 0 51 0 0 0 0 51 0 0 0 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information 0 zfs:global 7 1.42K zfs:data-node1 22 4.06K zfs:data-node2 21 4.34K zfs:data-node3 0 0 zfs:name-node 0 0 zfs:sec-name-node
    23. 23. DISK I/O - Cont'd Run the DTrace iopattern script, as shown, to analyze the type of disk I/O workload (is it random or sequential) root@global_zone:~# /usr/dtrace/DTT/iopattern %RAN %SEQ COUNT MIN MAX AVG KR 69 31 236 1024 1048576 448830 103441 75 25 577 512 1048576 327938 184306 92 8 598 512 1048576 198293 114275 74 26 379 512 1048576 330296 121954 66 34 281 1024 1048576 500550 137358 80 20 346 1024 1048576 332114 112218 81 19 444 512 1048576 290734 124694 65 35 337 512 1048576 490375 161139 75 25 704 512 1048576 353086 241105 75 25 444 1024 1048576 386634 167642 77 23 666 1024 1048576 397105 258274 23 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information KW 0 479 1525 294 0 0 1366 244 1642 0 0
    24. 24. Visualization For more information about dim_STAT http://dimitrik.free.fr 24 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    25. 25. Flame Graphs For more information http://dtrace.org/blogs/brendan/2011/12/16/flame-graphs 25 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    26. 26. Hadoop on an Oracle SPARC T4-2 Server  Source https://blogs.oracle.com/taylor22 26 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    27. 27. For more information  How to Set Up a Hadoop Cluster Using Oracle Solaris Zones  How to Build Native Hadoop Libraries for Oracle Solaris 11  How to Set Up a Hadoop Cluster Using Oracle Solaris (Hands-On Lab)  Performance Analysis in a Multitenant Cloud Environment Using Hadoop Cluster and Oracle Solaris 11  My Blog 27 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    28. 28. Follow us on Questions 28 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×