April 2014 HUG : Apache Phoenix
Upcoming SlideShare
Loading in...5
×
 

April 2014 HUG : Apache Phoenix

on

  • 1,663 views

April 2014 HUG : Apache Phoenix

April 2014 HUG : Apache Phoenix

Statistics

Views

Total Views
1,663
Views on SlideShare
1,657
Embed Views
6

Actions

Likes
3
Downloads
36
Comments
0

2 Embeds 6

http://www.slideee.com 3
https://twitter.com 3

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • About Me:HBase & Phoenix committerMember of HBase Team at Hortonworks
  • Example of CF delete processing
  • From the query plan, we can see we push the predicates down to the data. This + parallel scan achieves huge performance gain.
  • In the above example, we create a phoenix table which only maps one column “f1”,”col1”
  • In the above example, we create a phoenix table which only maps one column “f1”,”col1”

April 2014 HUG : Apache Phoenix April 2014 HUG : Apache Phoenix Presentation Transcript

  • © Hortonworks Inc. 2011 Apache Phoenix – SQL skin over HBase Jeffrey Zhong jzhong@hortonworks.com jeffreyz@apache.org
  • © Hortonworks Inc. 2011 Overview •What is Phoenix? •Major Phoenix Features •Futures •Phoenix In Action •Summary Architecting the Future of Big Data
  • © Hortonworks Inc. 2011 What is Phoenix?  SQL skin for HBase originally developed by folks in Salesforce.com and now is an Apache Incubator Project  Targets low latency queries over HBase data  Query engine transforms SQL into native HBase APIs: put, delete, parallel scans instead of Map/Reduce  Delivered as an fat JDBC driver(client) •Support features not provided by HBase: Secondary Indexing, Multi-tenancy, simple Hash Join and more Architecting the Future of Big Data
  • © Hortonworks Inc. 2011 Phoenix Semantics Support Architecting the Future of Big Data Feature Supported? UPSERT / DELETE Yes SELECT Yes WHERE / HAVING Yes GROUP BY Yes ORDER BY Yes LIMIT Yes Views Yes JOIN Yes (Introduced in 4.0), limited to hash joins Transactions No
  • © Hortonworks Inc. 2011 Why Phoenix?  Leverage existing tooling  SQL client •Free the burden to write huge amount code to do simple things  SELECT COUNT(*) FROM WEB_STAT WHERE HOST='EU' and CORE > 35 GROUP BY DOMAIN; •Performance optimizations transparent to the user  Phoenix breaks up queries into multiple scans and runs them in parallel. For aggregate queries, coprocessors complete partial aggregation on local region server and only returns relevant data to the client Architecting the Future of Big Data
  • © Hortonworks Inc. 2011 Phoenix Query Optimization 0: jdbc:phoenix:localhost> explain SELECT count(*) FROM WEB_STAT WHERE HOST='EU' and CORE > 35 GROUP BY DOMAIN; +------------+ | PLAN | +------------+ | CLIENT PARALLEL 32-WAY RANGE SCAN OVER WEB_STAT ['EU'] | | SERVER FILTER BY USAGE.CORE > 35 | | SERVER AGGREGATE INTO DISTINCT ROWS BY [DOMAIN] | | CLIENT MERGE SORT | +------------+ Architecting the Future of Big Data CREATE TABLE IF NOT EXISTS WEB_STAT ( HOST CHAR(2) NOT NULL, DOMAIN VARCHAR NOT NULL, FEATURE VARCHAR NOT NULL, DATE DATE NOT NULL, USAGE.CORE BIGINT, USAGE.DB BIGINT, STATS.ACTIVE_VISITOR INTEGER CONSTRAINT PK PRIMARY KEY (HOST, DOMAIN, FEATURE, DATE) ); SELECT count(*) FROM WEB_STAT WHERE HOST='EU' and CORE > 35 GROUP BY DOMAIN; WEB_STAT Table Schema
  • © Hortonworks Inc. 2011 Major Features In Phoenix  DDL support: CREATE/DROP/ALTER TABLE for adding/removing columns  Extend Schema at query time: Dynamic Column  Salting  Mapping to an existing HBase table  DML support: UPSERT VALUES for row-by-row insertion, UPSERT SELECT for mass data transfer between the same or different tables and DELETE for deleting rows  Secondary Indexes to improve performance for queries on non- row key columns(still maturing)  Multi-Tenancy (Available in Phoenix 3.0/4.0)  Limited Hash Join(Available in Phoenix 3.0/4.0) Architecting the Future of Big Data
  • © Hortonworks Inc. 2011 Phoenix Futures •Improved Secondary Indexing. –Tolerant of region split/merge, RegionServer failures. •Improved JOIN support. •Transaction support. •Improved Phoenix / Hive interoperability. •More at http://phoenix.incubator.apache.org/roadmap.html Architecting the Future of Big Data
  • © Hortonworks Inc. 2011 Mapping an existing HBase Table Architecting the Future of Big Data • create 't1', {NAME=>'f1', VERSIONS => 3} – put 't1', 'r1', 'f1.col1', 'val1’ – put 't1', ’r2', 'f1.col2', 'val2’ • Mapping t1 into Phoenix Table – Phoenix stores its own metadata in Table SYSTEM.CATALOG so you need recreate Phoenix Table or Views to mapping the existing HBase Table – By default, Phoenix uses capital characters, so it’s a better practice to use always “”. • create table "t1" (myPK VARCHAR PRIMARY KEY, "f1"."col1" VARCHAR); 0: jdbc:phoenix:localhost> select * from "t1"; +------------+------------+ | MYPK | col1 | +------------+------------+ | r1 | val1 | | r2 | null | +------------+------------+ 2 rows selected (0.049 seconds) 0: jdbc:phoenix:localhost> select * from t1; Error: ERROR 1012 (42M03): Table undefined. tableName=T1 (state=42M03,code=1012)
  • © Hortonworks Inc. 2011 Changes Behind Scenes of Mapping Architecting the Future of Big Data • Metadata are inserted into SYSTEM.CATALOG table 0: jdbc:phoenix:localhost> select table_name, column_name, table_type from system.catalog where table_name='t1'; +------------+-------------+------------+ | TABLE_NAME | COLUMN_NAME | TABLE_TYPE | +------------+-------------+------------+ | t1 | null | u | | t1 | MYPK | null | | t1 | col1 | null | +------------+-------------+------------+ • Empty cell is created for each row. It’s used to enforce PRIMAY KEY constraints because HBase doesn’t store cells with NULL values. hbase(main):023:0> scan 't1' ROW COLUMN+CELL r1 column=f1:_0, timestamp=1397527184229, value= r1 column=f1:col1, timestamp=1397527184229, value=val1 r2 column=f1:_0, timestamp=1397527197205, value= r2 column=f1:col2, timestamp=1397527197205, value=val2
  • © Hortonworks Inc. 2011 Mapping an existing HBase Table – Cont. •The bytes were serialized must match the way the bytes are serialized by Phoenix. You can refer to Phoenix data types. (http://phoenix.incubator.apache.org/language/datat ypes.html) Architecting the Future of Big Data
  • © Hortonworks Inc. 2011 Dynamic Columns - Extend Schema During Query •HBase can create new columns(qualifier) after table created. In Phoenix, a subset of columns may be specified at table create time while the rest is possibly surfaced at query time through dynamic columns. – In the previous table mapping, we only mapped one column “f1”.”col1” create table "t1" (myPK VARCHAR PRIMARY KEY, "f1"."col1" VARCHAR); – In order to get data from col2, we can do 0: jdbc:phoenix:localhost> select * from "t1"("f1"."col2" VARCHAR); +------------+------------+------------+ | MYPK | col1 | col2 | +------------+------------+------------+ | r1 | val1 | null | | r2 | null | val2 | +------------+------------+------------+ 2 rows selected (0.065 seconds) Architecting the Future of Big Data
  • © Hortonworks Inc. 2011 Secondary Index •Index data are stored in separate HBase table and located in different region servers other than data table. •Two types of Secondary Index Immutable Indexes – Targets tables where rows are immutable after written – When new rows are inserted, updates are sent to data table and then index table – Client handles failures Mutable Indexes Architecting the Future of Big Data
  • © Hortonworks Inc. 2011 Phoenix Secondary Index – Cont. Mutable Indexes –Implemented through coprocessors –Aborts region server when index updates fails(could change with custom IndexFailurePolicy) Courtesy of Jesse Yates from SF Hbase User Group Slides
  • © Hortonworks Inc. 2011 Phoenix Secondary Index – Cont. •Index Creation –Same statement to create both types of indexes. Immutable Indexes are created for tables created with “IMMUTABLE_ROWS=true” otherwise mutable indexes are created –DDL Statement: CREATE INDEX <index_name> ON <table_name>(<columns_to_index>…) INCLUDE (<columns_to_cover>…); –Examples – create index "t1_index" on "t1" ("f1"."col1") – Verify index will be used 0: jdbc:phoenix:localhost> explain select * from "t1" where "f1"."col1"='val1'; +------------+ | PLAN | +------------+ | CLIENT PARALLEL 1-WAY RANGE SCAN OVER t1_index ['val1'] | +------------+ 1 row selected (0.037 seconds) Architecting the Future of Big Data
  • © Hortonworks Inc. 2011 Phoenix Secondary Index – Cont. •How Index Data are Stored hbase(main):008:0> scan 't1_index' ROW COLUMN+CELL x00r2 column=0:_0, timestamp=1397611429248, value= val1x00r1 column=0:_0, timestamp=1397611429248, value= Row key are concatenated with index column values delimited by a zero byte character end with data table primary key. If you define covered columns, you’ll see cells with their values as well in the index table. Architecting the Future of Big Data
  • © Hortonworks Inc. 2011 Salted Table •HBase uses salting to prevent region server hot spotting if row key is monotonically increasing. Phoenix provides a way to salt the row key with salting bytes during table creation time. For optimal performance, number of salt buckets should match number of region servers Architecting the Future of Big Data CREATE TABLE table (a_key VARCHAR PRIMARY KEY, a_col VARCHAR) SALT_BUCKETS = 20;
  • © Hortonworks Inc. 2011 Resources •Apache Phoenix Home Page –http://phoenix.incubator.apache.org/index.html •Mailing Lists –http://phoenix.incubator.apache.org/mailing_list.html •Latest Release –Phoenix 3.0 for HBase0.94.*, Phoenix 4.0 for HBase0.98.1+(http://phoenix.incubator.apache.org/do wnload.html) – HDP(Hortonworks Data Platform)2.1 will ship Phoenix 4.0 Architecting the Future of Big Data
  • © Hortonworks Inc. 2011 Try by yourself •Load Sample Data ./psql.py localhost ../examples/WEB_STAT.sql ../examples/WEB_STAT.csv •Start Sql Client ./sqlline.py localhost •Run Performance Test ./performance.py localhost 10000 Architecting the Future of Big Data Assuming HBase Zookeeper Quorum String = “localhost” and you are under bin folder of the installation.
  • © Hortonworks Inc. 2011 Summary •Phoenix vs HBase Native APIs As a rule of thumb, you should leverage Phoenix as your Hbase client whenever is possible because Phoenix provides easy to use APIs and performance optimizations. Architecting the Future of Big Data
  • © Hortonworks Inc. 2011 Questions? Comments?