• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Introduction to Apache HBase Training
 

Introduction to Apache HBase Training

on

  • 2,737 views

Learn who is best suited to attend the full training, what prior knowledge you should have, and what topics the course covers. Cloudera Curriculum Developer, Jesse Anderson, will discuss the skills ...

Learn who is best suited to attend the full training, what prior knowledge you should have, and what topics the course covers. Cloudera Curriculum Developer, Jesse Anderson, will discuss the skills you will attain during the course and how they will help you move make the most of your HBase deployment in development or production and prepare for the Cloudera Certified Specialist in Apache HBase (CCSHB) exam.

Statistics

Views

Total Views
2,737
Views on SlideShare
2,413
Embed Views
324

Actions

Likes
11
Downloads
0
Comments
0

4 Embeds 324

http://www.cloudera.com 285
http://cloudera.com 28
http://author01.mtv.cloudera.com 9
http://klue.dev 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • scan 'table1'Scans the entire tablescan 'table1', {LIMIT => 10}Scans the first 10 rows in the tablescan 'table1', {STARTROW => 'start', STOPROW => 'stop'} Scan between the start and stop rowsscan 'table1', {COLUMNS => ['fam1:col1', 'fam2:col2']} Scans the entire table for just those 2 column familities
  • The full code listing. A virtual line by line discussion follows.
  • Note that for the Python code, the row comes back as an array.

Introduction to Apache HBase Training Introduction to Apache HBase Training Presentation Transcript

  • Introduction to Apache HBase Training Jesse Anderson Curriculum Developer and Instructor
  • Agenda • Why Cloudera Training? • Target Audience and Prerequisites • Course Outline • Short Presentation Based on Actual Course Material - Using Scans to Access Data • Q&A
  • 32,000trained professionals by 2015 Rising demand for Big Data and analytics experts but a DEFICIENCY OF TALENT will result in a shortfall of Source: Accenture “Analytics in Action,“ March 2013.
  • 55% of the Fortune 100 have attended live Cloudera training Source: Fortune, “Fortune 500 “ and “Global 500,” May 2012. 100% of the top 20 global technology firms to use Hadoop Cloudera has trained employees from Big Data professionals from Cloudera Trains the Top Companies
  • Intro to Data Science Design schemas to minimize latency on massive data sets Scale hundreds of thousands of operations per second HBase Training Learn to code and write MapReduce programs for production Master advanced API topics required for real-world data analysis Implement recommenders and data experiments Draw actionable insights from analysis of disparate data Data Analyst Training Run full analyses natively on Big Data without BI software Eliminate complexity to perform ad hoc queries in real time Developer Training Learning Path: Developers
  • Data Analyst Training Implement massively distributed, columnar storage at scale Enable random, real-time read/write access to all data HBase Training Configure, install, and monitor clusters for optimal performance Implement security measures and multi-user functionality Vertically integrate basic analytics into data management Transform and manipulate data to drive high-value utilization Enterprise Training Use Cloudera Manager to speed deployment and scale the cluster Learn which tools and techniques improve cluster performance Administrator Training Learning Path: Administrators
  • 1 Broadest Range of Courses Developer, Admin, Analyst, HBase, Data Science 2 3 Most Experienced Instructors Over 15,000 students trained since 2009 5 Widest Geographic Coverage Most classes offered: 50 cities worldwide plus online 6 Most Relevant Platform & Community CDH deployed more than all other distributions combined 7 Depth of Training Material Hands-on labs and VMs support live instruction Leader in Certification Over 5,000 accredited Cloudera professionals 4 State of the Art Curriculum Classes updated regularly as Hadoop evolves 8 Ongoing Learning Video tutorials and e-learning complement training Why Cloudera Training?
  • Cloudera is the best vendor evangelizing the Big Data movement and is doing a great service promoting Hadoop in the industry. Developer training was a great way to get started on my journey.
  • Cloudera Training for Apache HBase About the Course
  •  This course was created for people in developer and operations roles, including –Developers –DevOps –Database Administrator –Data Warehouse Engineer –Administrators  Also useful for others who want to access HBase –Business Intelligence Developer –ETL Developers –Quality Assurance Engineers Intended Audience
  •  Developers who want to learn details of MapReduce programming –Recommend Cloudera Developer Training for Apache Hadoop  System administrators who want to learn how to install/configure tools –Recommend Cloudera Administrator Training for Apache Hadoop Who Should Not Take this Course
  •  No prior knowledge of Hadoop is required  What is required is an understanding of –Basic end-user UNIX commands  An optional understanding of –Basic relational database concepts –Basic knowledge of SQL Course Prerequisites SELECT id, first_name, last_name FROM customers; ORDER BY last_name; $ mkdir /data $ cd /data $ rm /home/tomwheeler/salesreport.txt
  • During this course, you will learn:  The core technologies of Apache HBase  How HBase and HDFS work together  How to work with the HBase shell, Java API, and Thrift API  The HBase storage and cluster architecture  The fundamentals of HBase administration  Best practices for installing and configuring HBase  Advanced features of the HBase API  The importance of schema design in HBase  How to work with HBase ecosystem projects Course Objectives
  •  Hadoop Introduction –Hands-On Exercise - Using HDFS  Introduction to HBase  HBase Concepts –Hands-On Exercise - HBase Data Import  The HBase Administration API –Hands-On Exercise - Using the HBase Shell  Accessing Data with the HBase API Part 1 –Hands-On Exercise - Data Access in the HBase Shell  Accessing Data with the HBase API Part 2 –Hands-On Exercise - Using the Developer API Course Outline
  •  Accessing Data with the HBase API Part 3 –Hands-On Exercise - Filters  HBase Architecture Part 1 –Hands-On Exercise - Exploring HBase  HBase Architecture Part 2 –Hands-On Exercise - Flushes and Compactions  Installation and Configuration Part 1  Installation and Configuration Part 2 –Hands-On Exercise - Administration  Row Key Design in HBase Course Outline (cont’d)
  •  Schema Design in HBase –Hands-On Exercise - Detecting Hot Spots  The HBase Ecosystem –Hands-On Exercise - Hive and HBase Course Outline (cont’d)
  •  A Scan can be used when: –The exact row key is not known –A group of rows needs to be accessed  Scans can be bounded by a start and stop row key –The start row key is included in the results –The stop row is not included in the results and the Scan will exhaust its data upon hitting the stop row key  Scans can be limited to certain column families or column descriptors Scans
  •  A scan without a start and stop row will scan the entire table  With a start row of "jordena" and an end row of "turnerb" –The scan will return all rows starting at "jordena" and not include "turnerb" Scanning Row key Users Table aaronsona fname: Aaron lname: Aaronson harrise fname: Ernest lname: Harris jordena fname: Adam lname: Jorden laytonb fname: Bennie lname: Layton millerb fname: Billie lname: Miller nununezw fname: Willam lname: Nunez rossw fname: William lname: Ross sperberp fname: Phyllis lname: Sperber turnerb fname: Brian lname: Turner walkerm fname: Martin lname: Walker zykowskiz fname: Zeph lname: Zykowski
  •  Retrieve a group of rows with scan  General form:  Examples: Scanning Rows With scan in HBase Shell hbase> scan 'tablename' [,options] hbase> scan 'table1' hbase> scan 'table1', {LIMIT => 10} hbase> scan 'table1', {STARTROW => 'start', STOPROW => 'stop'} hbase> scan 'table1', {COLUMNS => ['fam1:col1', 'fam2:col2']}
  • Scan Java API: Complete Code Scan s = new Scan(); ResultScanner rs = table.getScanner(s); for (Result r : rs) { String rowKey = Bytes.toString(r.getRow()); byte[] b = r.getValue(FAMILY_BYTES, COLUMN_BYTES); String user = Bytes.toString(b); } s.close();
  • Scan Java API: Scan and ResultScanner Scan s = new Scan(); ResultScanner rs = table.getScanner(s); for (Result r : rs) { String rowKey = Bytes.toString(r.getRow()); byte[] b = r.getValue(FAMILY_BYTES, COLUMN_BYTES); String user = Bytes.toString(b); } s.close(); The Scan object is created and will scan all rows. The scan is executed on the table and a ResultScanner object is returned.
  • Scan Java API: Iterating Scan s = new Scan(); ResultScanner rs = table.getScanner(s); for (Result r : rs) { String rowKey = Bytes.toString(r.getRow()); byte[] b = r.getValue(FAMILY_BYTES, COLUMN_BYTES); String user = Bytes.toString(b); } s.close();Using a for loop, you iterate through all Result objects in the ResultScanner. Each Result can be used to get the values.
  • Python Scan Code: Complete Code scannerId = client.scannerOpen("tablename") row = client.scannerGet(scannerId) while row: columnvalue = row.columns.get(columnwithcf).value row = client.scannerGet(scannerId) client.scannerClose(scannerId)
  • Python Scan Code: Open Scanner scannerId = client.scannerOpen("tablename") row = client.scannerGet(scannerId) while row: columnvalue = row.columns.get(columnwithcf).value row = client.scannerGet(scannerId) client.scannerClose(scannerId) Call scannerOpen to create a scan object on the Thrift server. This returns a scanner id that uniquely identifies the scanner on the server.
  • Python Scan Code: Get the List scannerId = client.scannerOpen("tablename") row = client.scannerGet(scannerId) while row: columnvalue = row.columns.get(columnwithcf).value row = client.scannerGet(scannerId) client.scannerClose(scannerId) The scannerGet method needs to be called with the unique id. This returns a row of results.
  • Python Scan Code: Iterating Through scannerId = client.scannerOpen("tablename") row = client.scannerGet(scannerId) while row: columnvalue = row.columns.get(columnwithcf).value row = client.scannerGet(scannerId) client.scannerClose(scannerId)The while loop continues as long as the scanner returns a new row. Columns must be addressed with column family, ":", and the column descriptor. row gets populated by another call to scannerGet and the loop is repeated.
  • Python Scan Code: Closing the Scanner scannerId = client.scannerOpen("tablename") row = client.scannerGet(scannerId) while row: columnvalue = row.columns.get(columnwithcf).value row = client.scannerGet(scannerId) client.scannerClose(scannerId) The scannerClose method call is very important. This closes the Scan object on the Thrift server. Not calling this method can leak Scan objects on the server.
  •  Scan results can be retrieved in batches to improve performance –Performance will improve but memory usage will increase  Java API:  Python with Thrift: Scanner Caching Scan s = new Scan(); s.setCaching(20); rowsArray = client.scannerGetList(scannerId, 10)