How to Set Up a Hadoop Cluster Using Oracle Solaris (Hands-On Lab)

  • 2,149 views
Uploaded on

This is the slide deck from the Oracle Open World 2013 Hands-On Lab "How to Set Up a Hadoop Cluster Using Oracle Solaris" …

This is the slide deck from the Oracle Open World 2013 Hands-On Lab "How to Set Up a Hadoop Cluster Using Oracle Solaris"
http://www.oracle.com/technetwork/systems/hands-on-labs/hol-setup-hadoop-solaris-2041770.html

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
2,149
On Slideshare
0
From Embeds
0
Number of Embeds
11

Actions

Shares
Downloads
76
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • <number>
  • <number>
  • <number>
  • <number>
  • <number>
  • <number>
  • <number>
  • <number>
  • <number>
  • <number>
  • <number>

Transcript

  • 1. How to Set Up a Hadoop Cluster with Oracle Solaris [HOL10182] Orgad Kimchi Principal Software Engineer
  • 2. Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle Corporation. 2Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 3. Agenda  Lab Overview  Hadoop Overview  The Benefits of Using Oracle Solaris Technologies for a Hadoop Cluster 3Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 4. Lab Overview  In this Hands-on-Lab we will preset and demonstrate using exercises how to set up a Hadoop cluster Using Oracle Solaris 11 technologies like: Zones, ZFS, DTrace and Network Virtualization  Key topics include the Hadoop Distributed File System and MapReduce.  We will also cover the Hadoop installation process and the cluster building blocks: NameNode, a secondary NameNode, and DataNodes. 4Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 5. Lab Overview – Cont’d  During the lab users will learn how to load data into the Hadoop cluster and run Map-Reduce job.  This hands-on training lab is for system administrators and others responsible for managing Apache Hadoop clusters in production or development environments 5Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 6. Lab Main Topics This hands-on lab consists of 13 exercises covering various Oracle Solaris and Apache Hadoop technologies: 1. Install Hadoop. 2. Edit the Hadoop configuration files. 3. Configure the Network Time Protocol. 4. Create the virtual network interfaces (VNICs). 5. Create the NameNode and the secondary NameNode zones. 6. Set up the DataNode zones. 7. Configure the NameNode. 8. Set up SSH. 9. Format HDFS from the NameNode. 10. Start the Hadoop cluster. 11. Run a MapReduce job. 12. Secure data at rest using ZFS encryption. 13. Use Oracle Solaris DTrace for performance monitoring. 6Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 7. What is Big Data  Big Data is both: Large and Variable Datasets + New Set of Technologies  Extremely large files of unstructured or semi-structured data  Large and highly distributed datasets that are otherwise difficult to manage as a single unit of information  That can economically acquire, organize, store, analyze and extract value from Big Data datasets – thus facilitating better, more informed business decisions 7Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 8. Data is Everywhere! Facts & Figures 8Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 234M Web sites  Facebook  500M Users  40M photos per day  30 billion new pieces of content per month 7M New sites in 2010 New York Stock Exchange  1 TB of data per day  Web 2.0  147M Blogs and growing  Twitter – 12TB of data per day Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template 8
  • 9. Introduction To Hadoop 9Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 10. What is Hadoop ?  Originated at Google 2003  – Generation of search indexes and web scores  Top level Apache project, Consists of two key services 1. Hadoop Distributed File System (HDFS), highly scalable, fault-tolerant , distributed 2. MapReduce API (Java), Can be scripted in other languages  Hadoop brings the ability to cheaply process large amounts of data, regardless of its structure. 10Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 11. Components of Hadoop 11Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 12. HDFS  HDFS is the file system responsible for storing data on the cluster  Written in Java (based on Google’s GFS)  Sits on top of a native file system (ext3, ext4, xfs, etc)  POSIX like file permissions model  Provides redundant storage for massive amounts of data  HDFS is optimized for large, streaming reads of files 12Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 13. The Five Hadoop Daemons - Hadoop is comprised of five separate daemons  NameNode : Holds the metadata for HDFS  Secondary NameNode : Performs housekeeping functions for the NameNode  DataNode : Stores actual HDFS data blocks  JobTracker : Manages MapReduce jobs, distributes individual tasks to machines running the TaskTracker. Coordinates MapReduce stages.  TaskTracker : Responsible for instantiating and monitoring individual Map and Reduce tasks 13Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 14. Hadoop Architecture 14Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 15. MapReduce Very big data M A P  Map: – Accepts input key/value pair – Emits intermediate key/value Partitioning Function R E D U C E Result Reduce: – Accepts intermediate key/value* pair – Emits output key/value pair pair 15Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template 15
  • 16. MapReduce Example Counting word occurrences in a document: how many chucks could a woodchuck chuck if a woodchuck could chuck wood 4 Node Map how,1 many,1 chucks,1 could,1 a,1 woodchuck,1 chuck,1 if,1 a,1 woodchuck,1 could,1 chuck,1 wood,1 Group by Key 2 Node Reduce a,1:1 chuck,1:1 chucks,1 could,1:1 how,1 if,1 many,1 wood,1 woodchuck,1:1 Output a,2 chuck,2 chucks,1 could,2 how,1 if,1 many,1 wood,1 woodchuck,2 16Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template 16
  • 17. MapReduce Functions  MapReduce partitions data into 64MB chunks ( default )  Distributes data and jobs across thousands of nodes  Tasks scheduled based on location of data  Master writes periodic checkpoints  If map worker fails Master restarts job on new node  Barrier - no reduce can begin until all maps are complete  HDFS manages data replication for redundancy  MapReduce library does the hard work for us! 17Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 18. RDBMS compared to MapReduce Traditional RDBMS MapReduce Data size Gigabytes Petabytes Access Interactive and batch Batch Updates Read and write many Write once, read times many times Structure Static schema Dynamic schema Integrity High Low Scaling Nonlinear Linear 18Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 19. The benefits of using Oracle Solaris technologies for a Hadoop cluster 19Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template Insert Picture Here
  • 20. Architecture Layout 20Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 21. The benefits of using Oracle Solaris Zones for a Hadoop cluster Oracle Solaris Zones Benefits  Fast provision of new cluster members using the Solaris zones cloning feature Insert Picture Here  Very high network throughput between the zones for data node replication 21Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 22. The benefits of using Oracle Solaris ZFS for a Hadoop cluster Oracle Solaris ZFS Benefits  Immense data capacity,128 bit file system, perfect for big dataset  Optimized disk I/O utilization for Insert Picture Here better I/O performance with ZFS built-in compression  Secure data at rest using ZFS encryption 22Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 23. The benefits of using Oracle Solaris technologies for a Hadoop cluster • Multithread awareness - Oracle Solaris understands the correlation between cores and the threads, and it provides a fast and efficient thread implementation. • DTrace - comprehensive, advanced tracing tool for troubleshooting systematic problems in real time. • SMF – allow to build dependencies between Hadoop services (e.g. starting the MapReduce daemons after the HDFS daemons). 23Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 24. For more information  How to Set Up a Hadoop Cluster Using Oracle Solaris Zones  How to Build Native Hadoop Libraries for Oracle Solaris 11  How to Set Up a Hadoop Cluster Using Oracle Solaris (Hands-on Lab)  My Blog 24Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 25. Graphic Section Divider 25Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 26. 26Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template