• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case
 

Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

on

  • 1,717 views

The following are benefits of using Oracle Solaris Zones for a Hadoop cluster: ...

The following are benefits of using Oracle Solaris Zones for a Hadoop cluster:

Fast provision of new cluster members using the zone cloning feature
Very high network throughput between the zones for data node replication
Optimized disk I/O utilization for better I/O performance with ZFS built-in compression
Secure data at rest using ZFS encryption
For more information see: http://www.oracle.com/technetwork/articles/servers-storage-admin/howto-setup-hadoop-zones-1899993.html

Statistics

Views

Total Views
1,717
Views on SlideShare
787
Embed Views
930

Actions

Likes
0
Downloads
25
Comments
0

12 Embeds 930

https://blogs.oracle.com 881
http://feedly.com 24
http://blogs.oracle.com 9
http://googletrender.com 4
https://feedly.com 3
https://inoreader.com 2
http://www.hanrss.com 2
http://itnewscast.com 1
http://www.feedspot.com 1
https://digg.com 1
http://digg.com 1
http://news.google.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Storage Virtualization is possible through ZFS, the default storage subsystem in Solaris 11. ZFS simplifies storage management through the use of virtual storage pools that can include flash for high performance data operations. ZFS datasets can be assigned to a specific zone and then encrypted at wire-speed to keep data separate in a virtualized environment. ZFS provides both file and block sharing for UNIX and Windows environments. ZFS data services such as deduplication, compression, replication and migration, snapshots and more are built in to ZFS so customers don’t have to purchase extra software or hardware options.ZFS is designed for extreme data integrity – there has never been a reported service case of corrupted data since 2006 when it first shipped with Solaris 10. ZFS is a128-bit file system designed to scale for the next 50 years of data management. All other file systems today are 64 bit or less

Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case Presentation Transcript

  • <Insert Picture Here> Oracle Solaris 11 as a Big Data Platform Apache Hadoop Use Case Orgad Kimchi, Principal Software Engineer Oracle ISV Engineering
  • Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle Corporation. 2 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
  • Agenda  Hadoop Overview  The Benefits of Using Oracle Solaris Technologies for a Hadoop Cluster 3 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
  • What is Big Data  Big Data is both: Large and Variable Datasets + New Set of Technologies  Extremely large files of unstructured or semi-structured data  Large and highly distributed datasets that are otherwise difficult to manage as a single unit of information  That can economically acquire, organize, store, analyze and extract value from Big Data datasets – thus facilitating better, more informed business decisions 4 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
  • Introduction To Hadoop 5 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
  • What is Hadoop ?  Originated at Google 2003  Generation of search indexes and web scores  Top level Apache project, Consists of two key services 1. Hadoop Distributed File System (HDFS), highly scalable, fault-tolerant , distributed 2. MapReduce API (Java), Can be scripted in other languages  Hadoop brings the ability to cheaply process large amounts of data, regardless of its structure. 6 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
  • Components of Hadoop 7 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
  • HDFS  HDFS is the file system responsible for storing data on the cluster  Written in Java (based on Google’s GFS)  Sits on top of a native file system (ext3, ext4, xfs, ZFS)  POSIX like file permissions model  Provides redundant storage for massive amounts of data  HDFS is optimized for large, streaming reads of files 8 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
  • The Five Hadoop Daemons Hadoop is comprised of five separate daemons  NameNode : Holds the metadata for HDFS  Secondary NameNode : Performs housekeeping functions for the NameNode  DataNode : Stores actual HDFS data blocks  JobTracker : Manages MapReduce jobs, distributes individual tasks to machines running the TaskTracker. Coordinates MapReduce stages.  TaskTracker : Responsible for instantiating and monitoring individual Map and Reduce tasks 9 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
  • Hadoop Architecture 10 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
  • The benefits of using Oracle Solaris technologies for a Hadoop cluster 11 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
  • Solaris Zones Hadoop Architecture 12 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
  • Built-in Virtualization Oracle Solaris 11 Zones • Secure, light-weight virtualization • Scales to 100s of zones/ node • Built-in, no cost virtualization • Combines Isolation with Resource Management • Widely used for: • Consolidation • Legacy OS support • Rapid Application Deployment • Securely Protecting Applications Co-engineered with installation, security, ZFS, networking, IPS, SPARC and x86 hypervisors 13 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information 1 out of 3 Oracle Solaris Systems running Oracle Solaris Zones
  • The benefits of using Oracle Solaris Zones for a Hadoop cluster Oracle Solaris Zones Benefits  Fast provision of new cluster members using the Solaris zones cloning feature  Very high network throughput between the zones for data node replication 14 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
  • Oracle Solaris Zones  Source http://dtrace.org/blogs/brendan/2013/01/11/virtualization- performance-zones-kvm-xen 15 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
  • Oracle Solaris Zones  Source http://dtrace.org/blogs/brendan/2013/01/11/virtualization- performance-zones-kvm-xen 16 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
  • Oracle Solaris 11: Storage Virtualization Secure Datasets for Each Tenant Finance HR Sales Zone Zone Zone Finance Dataset HR Dataset Sales Dataset 10x storage savings for virtualization 2x storage compression 17 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. • Virtual flash-enabled storage pools for speed • Built-in data services save storage software costs • File and block sharing • Wire-speed encryption on disk, over the wire • Extreme data integrity • Unlimited scale Oracle Confidential, Proprietary Information
  • The benefits of using Oracle Solaris ZFS for a Hadoop cluster Oracle Solaris ZFS Benefits  Immense data capacity,128 bit file system, perfect for big data-set  Optimized disk I/O utilization for better I/O performance with ZFS built-in compression  Secure data at rest using ZFS encryption 18 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
  • Performance analysis Each Oracle Solaris Zone can have different workload; it can be disk I/O, network I/O, CPU, memory, or combination of these. In addition, a single Oracle Solaris Zone can overload the entire system resources.  •Each Oracle Solaris Zone can have different workload; it can be disk I/O, network I/O, CPU, memory, or combination of these. In addition, a single Oracle Solaris Zone can overload the entire system resources. DTrace - comprehensive, advanced tracing tool for troubleshooting systematic problems in real time.  19 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
  • zonestat The zonestat command allow us to monitor all the Solaris zones running on our environment and provide us in real time statistics for the CPU, memory and Network utilization. root@global_zone:~# zonestat 10 10 Interval: 1, Duration: 0:00:10 SUMMARY Cpus/Online: 128/12 PhysMem: 256G VirtMem: 259G ---CPU---- --PhysMem-- --VirtMem-- --PhysNet-ZONE USED %PART USED %USED USED %USED PBYTE %PUSE [total] 118.10 92.2% 24.6G 9.62% 60.0G 23.0% 18.4E 100% [system] 0.00 0.00% 9684M 3.69% 40.5G 15.5% data-node3 42.13 32.9% 4897M 1.86% 6146M 2.30% 18.4E 100% data-node1 41.49 32.4% 4891M 1.86% 6173M 2.31% 18.4E 100% data-node2 33.97 26.5% 4851M 1.85% 6145M 2.30% 18.4E 100% global 0.34 0.27% 283M 0.10% 420M 0.15% 2192 0.00% name-node 0.15 0.11% 419M 0.15% 718M 0.26% 126 0.00% sec-name-node 0.00 0.00% 205M 0.07% 363M 0.13% 0 0.00% 20 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
  • DISK I/O Performance Monitoring 21 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
  • fsstat The fsstat command allows us to monitor Disk I/O activity per Disk or per Solaris Zone. For example: monitoring writes to all ZFS file systems at 10 second intervals. root@global_zone:~# fsstat -Z zfs 10 10 new name name attr attr lookup rddir read read write write file remov chng get set ops ops ops bytes ops bytes 0 0 0 0 0 0 0 22 0 0 0 0 0 0 744 0 0 151 0 359 0 413 0 14 0 14 11.4K 0 6.01K 5.87M 0 0 3.27K 0 1.41K 1.94M 0 8.72K 0 2.75K 3.95M 0 9.03K 0 2.98K 4.22M 0 51 0 0 0 0 51 0 0 0 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information 0 zfs:global 7 1.42K zfs:data-node1 22 4.06K zfs:data-node2 21 4.34K zfs:data-node3 0 0 zfs:name-node 0 0 zfs:sec-name-node
  • DISK I/O - Cont'd Run the DTrace iopattern script, as shown, to analyze the type of disk I/O workload (is it random or sequential) root@global_zone:~# /usr/dtrace/DTT/iopattern %RAN %SEQ COUNT MIN MAX AVG KR 69 31 236 1024 1048576 448830 103441 75 25 577 512 1048576 327938 184306 92 8 598 512 1048576 198293 114275 74 26 379 512 1048576 330296 121954 66 34 281 1024 1048576 500550 137358 80 20 346 1024 1048576 332114 112218 81 19 444 512 1048576 290734 124694 65 35 337 512 1048576 490375 161139 75 25 704 512 1048576 353086 241105 75 25 444 1024 1048576 386634 167642 77 23 666 1024 1048576 397105 258274 23 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information KW 0 479 1525 294 0 0 1366 244 1642 0 0
  • Visualization For more information about dim_STAT http://dimitrik.free.fr 24 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
  • Flame Graphs For more information http://dtrace.org/blogs/brendan/2011/12/16/flame-graphs 25 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
  • Hadoop on an Oracle SPARC T4-2 Server  Source https://blogs.oracle.com/taylor22 26 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
  • For more information  How to Set Up a Hadoop Cluster Using Oracle Solaris Zones  How to Build Native Hadoop Libraries for Oracle Solaris 11  How to Set Up a Hadoop Cluster Using Oracle Solaris (Hands-On Lab)  Performance Analysis in a Multitenant Cloud Environment Using Hadoop Cluster and Oracle Solaris 11  My Blog 27 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information
  • Follow us on Questions 28 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Oracle Confidential, Proprietary Information