Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Page 1 © Hortonworks Inc. 2014
SQL on HBase with Phoenix
Page 2 © Hortonworks Inc. 2014
Agenda
What Is Apache HBase
•  High Level Overview.
•  Technical Detail.
What Is Apache Pho...
Page 3 © Hortonworks Inc. 2014
New Data Requires a New Data Architecture
Source: IDC
2.8	
  ZB	
  in	
  2012	
  
85%	
  fr...
Page 4 © Hortonworks Inc. 2014
What Is Apache HBase?
100%	
  Open	
  Source	
  
Store	
  and	
  Process	
  Petabytes	
  of...
Page 5 © Hortonworks Inc. 2014
Kinds of Apps Built with HBase
Interested? See HBase Case Studies later in this document.
W...
Page 6 © Hortonworks Inc. 2014
HBase is Deeply Integrated with Hadoop
•  Data	
  is	
  stored	
  in	
  HDFS.	
  You	
  can...
Page 7 © Hortonworks Inc. 2014
Who’s Using HBase?
Page 8 © Hortonworks Inc. 2014
HBase Technical Details
Spring 2014
Version 1.0
Page 9 © Hortonworks Inc. 2014
HBase Technical Details
Based on Google BigTable
•  Dynamic schema.
•  Good for very sparse...
Page 10 © Hortonworks Inc. 2014
Page 11 © Hortonworks Inc. 2014
Logical Architecture
Distributed, persistent partitions of a BigTable
a
b
d
c
e
f
h
g
i
j
...
Page 12 © Hortonworks Inc. 2014
Logical Data Model
A sparse, multi-dimensional, sorted map
Legend:
- Rows are sorted by ro...
Page 13 © Hortonworks Inc. 2014
HBase HA Overview (Introduced in HDP 2.1)
HMaster	
  
Zookeeper	
  
Client	
   Client	
   ...
Page 14 © Hortonworks Inc. 2014
Apache Phoenix
Spring 2014
Version 1.0
The SQL Skin for HBase
Page 15 © Hortonworks Inc. 2014
Apache Phoenix
A SQL Skin for HBase
•  Provides a SQL interface for managing data in HBase...
Page 16 © Hortonworks Inc. 2014
Apache Phoenix: Current Capabilities
Feature Supported?
Common SQL Datatypes Yes
Inserts a...
Page 17 © Hortonworks Inc. 2014
Apache Phoenix: Future Capabilities
Feature Supported?
Multi-Table Transactions Future
Sca...
Page 18 © Hortonworks Inc. 2014
Phoenix Provides Familiar SQL Constructs
Compare: Phoenix versus Native API
Code Notes
//	...
Page 19 © Hortonworks Inc. 2014
Phoenix: Architecture
HBase Cluster
Phoenix	
  
Coprocessor	
  
Phoenix	
  
Coprocessor	
 ...
Page 20 © Hortonworks Inc. 2014
Phoenix Performance
Phoenix Performance Characterization:
•  Suitable for 10s of thousands...
Page 21 © Hortonworks Inc. 2014
Phoenix Use Cases
Phoenix is for:
•  Rapidly and easily building an application backed by ...
Page 22 © Hortonworks Inc. 2014
Phoenix: Futures
Short-term focus:
•  Transactions.
•  Scalable joins.
•  Analytical capab...
Page 23 © Hortonworks Inc. 2014
What’s New in Apache Phoenix
Page 24 © Hortonworks Inc. 2014
What’s New in Apache Phoenix
Phoenix in HDP 2.2
•  Based on Apache Phoenix 4.2.
•  8 new f...
Page 25 © Hortonworks Inc. 2014
Robust Secondary Index
Background / Refresher
•  Phoenix supports local and global seconda...
Page 26 © Hortonworks Inc. 2014
Improved SQL: Sub Joins
Example:
select	
  *	
  from	
  A	
  
	
  left	
  join	
  (B	
  jo...
Page 27 © Hortonworks Inc. 2014
Phoenix: Basic Window Functions
FIRST_VALUE, LAST_VALUE, NTH_VALUE
•  No OVER or PARTITION...
Page 28 © Hortonworks Inc. 2014
ENCODE, DECODE
DECODE
•  Supports hexadecimal format.
DECODE('000000008512af277ffffff8',	
...
Page 29 © Hortonworks Inc. 2014
Demo
Phoenix Secondary Indexes
Page 30 © Hortonworks Inc. 2014
Secondary Index Recap
Index Management via JDBC:
•  CREATE INDEX my_index ON my_table (v1)...
Upcoming SlideShare
Loading in …5
×

Hortonworks Technical Workshop: HBase and Apache Phoenix

13,538 views

Published on

HBASE is the leading NoSQL database. Tightly integrated with Hadoop ecosystem, it offers random, real-time read/write capabilities on billions of rows and millions of columns. Apache Phoenix offers a SQL interface to HBASE, opening HBase to large community of SQL developers and enabling inter-operability with SQL compliant applications. The session will cover the essentials of HBASE and provide an in-depth insight into Apache Phoenix. Audience: Developers, Architects and System Engineers from the Hortonworks Technology Partner community. Recording:
https://hortonworks.webex.com/hortonworks/lsr.php?RCID=de6d0c435c0761adedf3114a100e7483%20

Published in: Technology
  • For Business Analytics Tools Online Training register at http://www.todaycourses.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Hortonworks Technical Workshop: HBase and Apache Phoenix

  1. 1. Page 1 © Hortonworks Inc. 2014 SQL on HBase with Phoenix
  2. 2. Page 2 © Hortonworks Inc. 2014 Agenda What Is Apache HBase •  High Level Overview. •  Technical Detail. What Is Apache Phoenix •  Overview. •  What’s New. •  Secondary Index Demo.
  3. 3. Page 3 © Hortonworks Inc. 2014 New Data Requires a New Data Architecture Source: IDC 2.8  ZB  in  2012   85%  from  New  Data  Types   15x  Machine  Data  by  2020   40  ZB  by  2020   OLTP,  ERP,  CRM  Systems   Unstructured  documents,  emails   Clickstream   Server  logs   Sen>ment,  Web  Data   Sensor,  Machine  Data   Geoloca>on   Modern  Database  Needs   More  Scalable   Handle  New  Data  Types   Intelligent  and  Predic>ve  
  4. 4. Page 4 © Hortonworks Inc. 2014 What Is Apache HBase? 100%  Open  Source   Store  and  Process  Petabytes  of  Data   Flexible  Schema   Scale  out  on  Commodity  Servers   High  Performance,  High  Availability   Integrated  with  YARN   SQL  and  NoSQL  Interfaces   YARN  :  Data  OperaGng  System   HBase     RegionServer   1   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   N   HDFS   (Permanent  Data  Storage)   HBase     RegionServer   HBase     RegionServer   Dynamic Schema Scales Horizontally to PB of Data Directly Integrated with Hadoop
  5. 5. Page 5 © Hortonworks Inc. 2014 Kinds of Apps Built with HBase Interested? See HBase Case Studies later in this document. Write Heavy Low-Latency Search / Indexing Messaging Audit / Log Archive AdvertisingData Cubes Time Series Sensor / Device
  6. 6. Page 6 © Hortonworks Inc. 2014 HBase is Deeply Integrated with Hadoop •  Data  is  stored  in  HDFS.  You  can   store  more  data  and  re-­‐use  exis>ng   HDFS  exper>se.   •  HBase  is  integrated  with  YARN.   •  Analy>cs  in-­‐place  using  Hive,  Pig,   Spark  and  more.  
  7. 7. Page 7 © Hortonworks Inc. 2014 Who’s Using HBase?
  8. 8. Page 8 © Hortonworks Inc. 2014 HBase Technical Details Spring 2014 Version 1.0
  9. 9. Page 9 © Hortonworks Inc. 2014 HBase Technical Details Based on Google BigTable •  Dynamic schema. •  Good for very sparse datasets. •  All data is range-partitioned for trivial horizontal scaling across commodity hardware. Directly integrated with HDFS and Hadoop •  Analyze data in HBase with any Hadoop ecosystem tools (Hive, Pig, MapReduce, Tez, etc.) •  Re-use existing Hadoop skills to run HBase.
  10. 10. Page 10 © Hortonworks Inc. 2014
  11. 11. Page 11 © Hortonworks Inc. 2014 Logical Architecture Distributed, persistent partitions of a BigTable a b d c e f h g i j l k m n p o Table A Region 1 Region 2 Region 3 Region 4 Region Server 7 Table A, Region 1 Table A, Region 2 Table G, Region 1070 Table L, Region 25 Region Server 86 Table A, Region 3 Table C, Region 30 Table F, Region 160 Table F, Region 776 Region Server 367 Table A, Region 4 Table C, Region 17 Table E, Region 52 Table P, Region 1116 Legend: - A single table is partitioned into Regions of roughly equal size. - Regions are assigned to Region Servers across the cluster. - Region Servers host roughly the same number of regions.
  12. 12. Page 12 © Hortonworks Inc. 2014 Logical Data Model A sparse, multi-dimensional, sorted map Legend: - Rows are sorted by rowkey. - Within a row, values are located by column family and qualifier. - Values also carry a timestamp; there can me multiple versions of a value. - Within a column family, data is schemaless. Qualifiers and values are treated as arbitrary bytes. 1368387247 [3.6 kb png data]"thumb"cf2b a cf1 1368394583 7 1368394261 "hello" "bar" 1368394583 22 1368394925 13.6 1368393847 "world" "foo" cf2 1368387684 "almost the loneliest number"1.0001 1368396302 "fourth of July""2011-07-04" Table A rowkey column family column qualifier timestamp value
  13. 13. Page 13 © Hortonworks Inc. 2014 HBase HA Overview (Introduced in HDP 2.1) HMaster   Zookeeper   Client   Client   Client   Client   HBase  RegionServer   Region:   100-­‐199   (Standby)   Region:   200-­‐299   (Standby)   Region:   0-­‐99   (Primary)   HBase  RegionServer   Region:   100-­‐199   (Primary)   Region:   0-­‐99   (Standby)   Region:   200-­‐299   (Primary)   HFile   HFile   HFile   HFile   HFile   HFile   HDFS   HBase  HA:   Real-­‐Time   Replica>on   Low-­‐Latency   Reads  and  Writes   In-­‐Memory  Cache   In-­‐Memory  Cache   Hive,  Pig,  MapReduce   Hive,  Pig,  MapReduce   Data  Stored   to  HDFS   Read  or  Write  Directly   from  Hadoop  Tools   Cluster  Topology,   Data  Placement  
  14. 14. Page 14 © Hortonworks Inc. 2014 Apache Phoenix Spring 2014 Version 1.0 The SQL Skin for HBase
  15. 15. Page 15 © Hortonworks Inc. 2014 Apache Phoenix A SQL Skin for HBase •  Provides a SQL interface for managing data in HBase. •  Large subset of SQL:1999 mandatory featureset. •  Create tables, insert and update data and perform low-latency point lookups through JDBC. •  Phoenix JDBC driver easily embeddable in any app that supports JDBC. Phoenix Makes HBase Better •  Oriented toward online / semi-transactional apps. •  If HBase is a good fit for your app, Phoenix makes it even better. •  Phoenix gets you out of the “one table per query” model many other NoSQL stores force you into.
  16. 16. Page 16 © Hortonworks Inc. 2014 Apache Phoenix: Current Capabilities Feature Supported? Common SQL Datatypes Yes Inserts and Updates Yes SELECT, DISTINCT, GROUP BY, HAVING Yes NOT NULL and Primary Key constrants Yes Inner and Outer JOINs Yes Views Yes Subqueries HDP 2.2 Robust Secondary Indexes HDP 2.2
  17. 17. Page 17 © Hortonworks Inc. 2014 Apache Phoenix: Future Capabilities Feature Supported? Multi-Table Transactions Future Scalable Joins (Fact-to-Fact) Future Analytics, Windowing Functions Future
  18. 18. Page 18 © Hortonworks Inc. 2014 Phoenix Provides Familiar SQL Constructs Compare: Phoenix versus Native API Code Notes //  HBase  Native  API.   HBaseAdmin  hbase  =  new  HBaseAdmin(conf);   HTableDescriptor  desc  =  new  HTableDescriptor("us_population");   HColumnDescriptor  state  =  new  HColumnDescriptor("state".getBytes());   HColumnDescriptor  city  =  new  HColumnDescriptor("city".getBytes());   HColumnDescriptor  population  =  new  HColumnDescriptor("population".getBytes());   desc.addFamily(state);   desc.addFamily(city);   desc.addFamily(population);   hbase.createTable(desc);     //  Phoenix  DDL.   CREATE  TABLE  us_population  (                  state  CHAR(2)  NOT  NULL,                  city  VARCHAR  NOT  NULL,                  population  BIGINT   CONSTRAINT  my_pk  PRIMARY  KEY  (state,  city));   •  Familiar SQL syntax. •  Provides additional constraint checking.
  19. 19. Page 19 © Hortonworks Inc. 2014 Phoenix: Architecture HBase Cluster Phoenix   Coprocessor   Phoenix   Coprocessor   Phoenix   Coprocessor   Java   Applica>on   Phoenix  JDBC   Driver   User Application
  20. 20. Page 20 © Hortonworks Inc. 2014 Phoenix Performance Phoenix Performance Characterization: •  Suitable for 10s of thousands of point-lookups per second. •  Suitable for thousands of aggregations / filtered searches per second. •  Supports extremely high concurrency. Phoenix Performance Optimizations •  Column skipping. •  Table salting. •  Skip scans. Performance characteristics: •  Index point lookups in milliseconds. •  Aggregation and Top-N queries in a few seconds over large datasets.
  21. 21. Page 21 © Hortonworks Inc. 2014 Phoenix Use Cases Phoenix is for: •  Rapidly and easily building an application backed by HBase. •  Making use of your existing SQL skills and investment. •  High performing aggregations of moderately-sized datasets inside HBase. Phoenix is not for: •  Sophisticated SQL queries involving large joins or advanced SQL features. •  Queries requiring large scans that do not use indexes. •  ETL.
  22. 22. Page 22 © Hortonworks Inc. 2014 Phoenix: Futures Short-term focus: •  Transactions. •  Scalable joins. •  Analytical capabilities. Long-term focus: Primary interface for HBase. •  Build HBase applications using Phoenix. •  Configure cluster security and replication using Phoenix. •  Integration with BI tools like Microstrategy.
  23. 23. Page 23 © Hortonworks Inc. 2014 What’s New in Apache Phoenix
  24. 24. Page 24 © Hortonworks Inc. 2014 What’s New in Apache Phoenix Phoenix in HDP 2.2 •  Based on Apache Phoenix 4.2. •  8 new features, 143 total improvements and fixes. Notable new features. •  Robust secondary indexes. •  Sub-joins. •  Basic window functions. •  Bulk loader improvements.
  25. 25. Page 25 © Hortonworks Inc. 2014 Robust Secondary Index Background / Refresher •  Phoenix supports local and global secondary indexes. •  Updating a global index may require coordination with another RegionServer. •  See Phoenix docs if you need info on which to use when. Before Phoenix 4.1 (HDP 2.1): •  Using global indexes, if the RegionServer serving the index key was down, regionservers would abort. •  Note: Does not affect local indexes. Phoenix 4.1+: •  If the global index cannot be updated: •  The index is temporarily disabled. •  Background job is launched to rebuild the index. •  Reads will go directly to base tables rather than accessing the index. •  Writes will continue to update the index. •  Controlled by: phoenix.index.failure.handling.rebuild
  26. 26. Page 26 © Hortonworks Inc. 2014 Improved SQL: Sub Joins Example: select  *  from  A    left  join  (B  join  C  on  B.bc_id  =  C.bc_id)    on  A.ab_id  =  B.ab_id  and  A.ac_id  =  C.ac_id; Caveats related to joins still apply: •  Still broadcast joins only.
  27. 27. Page 27 © Hortonworks Inc. 2014 Phoenix: Basic Window Functions FIRST_VALUE, LAST_VALUE, NTH_VALUE •  No OVER or PARTITION BY. •  Function applied to each group based on GROUP BY. Example: SELECT    FIRST_VALUE(“column1”)    WITHIN  GROUP      (ORDER  BY  column2  ASC)    FROM      table    GROUP  BY      column3;  
  28. 28. Page 28 © Hortonworks Inc. 2014 ENCODE, DECODE DECODE •  Supports hexadecimal format. DECODE('000000008512af277ffffff8',  'hex')     ENCODE •  Supports hexadecimal and Base62 ENCODE(1,  'base62')     What is base 62??? •  Used to encode data using only letters and numbers.   •  Commonly used for things like URL shorteners.
  29. 29. Page 29 © Hortonworks Inc. 2014 Demo Phoenix Secondary Indexes
  30. 30. Page 30 © Hortonworks Inc. 2014 Secondary Index Recap Index Management via JDBC: •  CREATE INDEX my_index ON my_table (v1); •  DROP INDEX my_index ON my_table; •  ALTER INDEX my_index ON my_table DISABLE / REBUILD; Index population during bulk import: •  Uses the CsvBulkLoadTool utility (not psql.py). •  Adds the --index-table argument to specify your target index. HADOOP_CLASSPATH=/path/to/hbase-­‐protocol.jar:/path/to/hbase/conf   hadoop  jar  phoenix-­‐4.0.0.jar            org.apache.phoenix.mapreduce.CsvBulkLoadTool            -­‐-­‐table  EXAMPLE  -­‐-­‐input  /data/example.csv  

×