Apache Big Data EU 2015 - Phoenix

Nick Dimiduk
Nick DimidukEngineer, Hacker, Author
V5	
  
The	
  Evolu*on	
  of	
  a	
  Rela*onal	
  Database	
  Layer	
  over	
  HBase	
  
@ApachePhoenix	
  
h0p://phoenix.apache.org	
  
2015-­‐09-­‐29	
  
Nick	
  Dimiduk	
  (@xefyr)	
  
h0p://n10k.com	
  
#apachebigdata	
  
Agenda	
  
o  What	
  is	
  Apache	
  Phoenix?	
  
o  State	
  of	
  the	
  Union	
  
o  Latest	
  Releases	
  
o  A	
  brief	
  aside	
  on	
  TransacQons	
  
o  What’s	
  Next?	
  
o  Q	
  &	
  A	
  
WHAT	
  IS	
  APACHE	
  PHOENIX	
  
PuUng	
  the	
  SQL	
  back	
  into	
  NoSQL	
  
What	
  is	
  Apache	
  Phoenix?	
  
o  A	
  relaQonal	
  database	
  layer	
  for	
  Apache	
  HBase	
  
o  Query	
  engine	
  
o  Transforms	
  SQL	
  queries	
  into	
  naQve	
  HBase	
  API	
  calls	
  
o  Pushes	
   as	
   much	
   work	
   as	
   possible	
   onto	
   the	
   cluster	
   for	
   parallel	
  
execuQon	
  
o  Metadata	
  repository	
  
o  Typed	
  access	
  to	
  data	
  stored	
  in	
  HBase	
  tables	
  
o  A	
  JDBC	
  driver	
  
o  A	
  top	
  level	
  Apache	
  So_ware	
  FoundaQon	
  project	
  
o  Originally	
  developed	
  at	
  Salesforce	
  
o  A	
  growing	
  community	
  with	
  momentum	
  
What	
  is	
  Apache	
  Phoenix?	
  
==	
  
==	
  
Where	
  Does	
  Phoenix	
  Fit	
  In?	
  Sqoop	
  	
  
RDB	
  Data	
  Collector	
  
Flume	
  	
  
Log	
  Data	
  Collector	
  
Zookeeper	
  
CoordinaQon	
  
YARN	
  /	
  MRv2	
  
Cluster	
  Resource	
  
Manager	
  /	
  
MapReduce	
  
HDFS	
  
Hadoop	
  Distributed	
  File	
  System	
  
GraphX	
  
Graph	
  analysis	
  
framework	
  
Phoenix	
  
Query	
  execuQon	
  engine	
  
HBase	
  
Distributed	
  Database	
  
The	
  Java	
  Virtual	
  Machine	
  
Hadoop	
  
Common	
  JNI	
  
Spark	
  
IteraQve	
  In-­‐Memory	
  
ComputaQon	
  
MLLib	
  
Data	
  mining	
  
Pig	
  
Data	
  ManipulaQon	
  
Hive	
  
Structured	
  Query	
  
Phoenix	
  
JDBC	
  client	
  
Phoenix	
  Puts	
  the	
  SQL	
  Back	
  in	
  NoSQL	
  
o  Familiar	
  SQL	
  interface	
  for	
  (DDL,	
  DML,	
  etc.)	
  
o Read	
  only	
  views	
  on	
  exisQng	
  HBase	
  data	
  
o Dynamic	
  columns	
  extend	
  schema	
  at	
  runQme	
  
o Flashback	
  queries	
  using	
  versioned	
  schema	
  
o  Data	
  types	
  
o  Secondary	
  indexes	
  
o  Joins	
  
o  Query	
  planning	
  and	
  opQmizaQon	
  
o  MulQ-­‐tenancy	
  
Phoenix	
  Puts	
  the	
  SQL	
  Back	
  in	
  NoSQL	
  
Accessing	
  HBase	
  data	
  with	
  Phoenix	
  can	
  be	
  substanQally	
  easier	
  than	
  direct	
  
HBase	
  API	
  use	
  
	
  
	
  
Versus	
  
	
  
HTable t = new HTable(“foo”);
RegionScanner s = t.getScanner(new Scan(…,
new ValueFilter(CompareOp.GT,
new CustomTypedComparator(30)), …));
while ((Result r = s.next()) != null) {
// blah blah blah Java Java Java
}
s.close();
t.close();
SELECT * FROM foo WHERE bar > 30
HTable t = new HTable(“foo”);
RegionScanner s = t.getScanner(new Scan(…,
new ValueFilter(CompareOp.GT,
new CustomTypedComparator(30)), …));
while ((Result r = s.next()) != null) {
// blah blah blah Java Java Java
}
s.close();
t.close();
Your	
  BI	
  tool	
  
probably	
  can’t	
  do	
  
this	
  
(And	
  we	
  didn’t	
  include	
  error	
  handling…)	
  
STATE	
  OF	
  THE	
  UNION	
  
So	
  how’s	
  things?	
  
And	
  more	
  ….	
  
State	
  of	
  the	
  Union	
  
o  SQL	
  Support:	
  Broad	
  enough	
  to	
  run	
  TPC	
  queries	
  
o  Joins,	
  Sub-­‐queries,	
  Derived	
  tables,	
  etc.	
  
o  Secondary	
  Indices:	
  Three	
  different	
  strategies	
  
o  Immutable	
  for	
  write-­‐once/append	
  only	
  data	
  	
  
o  Global	
  for	
  read-­‐heavy	
  mutable	
  data	
  
o  Local	
  for	
  write-­‐heavy	
  mutable	
  or	
  immutable	
  data	
  
o  StaQsQcs	
  driven	
  parallel	
  execuQon	
  
o  Monitoring	
  &	
  Management	
  
o  Tracing,	
  metrics	
  
Join	
  and	
  Subquery	
  Support
o Grammar	
  
o inner	
  join;	
  le_/right/full	
  outer	
  join;	
  cross	
  join	
  
o AddiQonal	
  
o semi	
  join;	
  anQ	
  join	
  
o Algorithms	
  
o hash-­‐join;	
  sort-­‐merge-­‐join	
  
o OpQmizaQons	
  
o Predicate	
  push-­‐down;	
  FK-­‐to-­‐PK	
  join	
  opQmizaQon;	
  
Global	
  index	
  with	
  missing	
  data	
  columns;	
  Correlated	
  
subquery	
  rewrite	
  
TPC	
  Example	
  1	
  
Small-­‐QuanQty-­‐Order	
  Revenue	
  Query	
  (Q17)
select sum(l_extendedprice) / 7.0 as avg_yearly
from lineitem, part
where p_partkey = l_partkey
and p_brand = '[B]'
and p_container = '[C]'
and l_quantity < (
select 0.2 * avg(l_quantity)
from lineitem
where l_partkey = p_partkey
);
CLIENT 4-WAY FULL SCAN OVER lineitem
PARALLEL INNER JOIN TABLE 0
CLIENT 1-WAY FULL SCAN OVER part
SERVER FILTER BY p_partkey = ‘[B]’ AND p_container = ‘[C]’
PARALLEL INNER JOIN TABLE 1
CLIENT 4-WAY FULL SCAN OVER lineitem
SERVER AGGREGATE INTO DISTINCT ROWS BY l_partkey
AFTER-JOIN SERVER FILTER BY l_quantity < $0
TPC	
  Example	
  2	
  
Order	
  Priority	
  Checking	
  Query	
  (Q4)
select o_orderpriority, count(*) as order_count
from orders
where o_orderdate >= date '[D]'
and o_orderdate < date '[D]' + interval '3' month
and exists (
select * from lineitem
where l_orderkey = o_orderkey and l_commitdate < l_receiptdate
)
group by o_orderpriority
order by o_orderpriority;
CLIENT 4-WAY FULL SCAN OVER orders
SERVER FILTER o_orderdate >= ‘[D]’ AND o_orderdate < ‘[D]’ + 3(d)
SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY o_orderpriority
CLIENT MERGE SORT
SKIP-SCAN JOIN TABLE 0
CLIENT 4-WAY FULL SCAN OVER lineitem
SERVER FILTER BY l_commitdate < l_receiptdate
DYNAMIC SERVER FILTER BY o_orderkey IN l_orderkey
Join	
  support	
  -­‐	
  what	
  can't	
  we	
  do?
o Nested	
  Loop	
  Join	
  
o StaQsQcs	
  Guided	
  Join	
  Algorithm	
  
o Smartly	
  choose	
  the	
  smaller	
  table	
  for	
  the	
  build	
  side	
  
o Smartly	
  switch	
  between	
  hash-­‐join	
  and	
  sort-­‐merge-­‐join	
  
o Smartly	
  turn	
  on/off	
  FK-­‐to-­‐PK	
  join	
  opQmizaQon
LATEST	
  RELEASES	
  
The	
  new	
  goodness,	
  from	
  our	
  community	
  to	
  your	
  datacenter!	
  
Phoenix	
  4.4	
  
o  HBase	
  1.0	
  Support	
  
o  Func*onal	
  Indexes	
  
o  User	
  Defined	
  Func*ons	
  
o  Query	
  Server	
  with	
  Thin	
  Driver	
  
o  Union	
  All	
  support	
  
o  TesQng	
  at	
  scale	
  with	
  Pherf	
  
o  MR	
  index	
  build	
  
o  Spark	
  integraQon	
  
o  Date	
  built-­‐in	
  funcQons	
  –	
  WEEK,	
  DAYOFMONTH,	
  etc.	
  
Phoenix	
  4.4:	
  FuncQonal	
  Indexes	
  
o CreaQng	
  an	
  index	
  on	
  an	
  expression	
  as	
  
opposed	
  to	
  just	
  a	
  column	
  value.	
  
o Example	
  without	
  index,	
  full	
  table	
  scan:	
  
SELECT	
  AVG(response_*me)	
  FROM	
  SERVER_METRICS	
  
WHERE	
  DAYOFMONTH(create_*me)	
  =	
  1	
  
o Add	
  funcQonal	
  index,	
  range	
  scan:	
  
CREATE	
  INDEX	
  day_of_month_idx	
  
ON	
  SERVER_METRICS	
  (DAYOFMONTH(create_*me))	
  
INCLUDE	
  (response_*me)	
  
Phoenix	
  4.4:	
  User	
  Defined	
  FuncQons	
  
o  Extension	
  points	
  to	
  Phoenix	
  for	
  domain-­‐specific	
  
funcQons.	
  
o  For	
  example,	
  a	
  geo-­‐locaQon	
  applicaQon	
  might	
  load	
  a	
  
set	
  of	
  UDFs	
  like	
  this:	
  
CREATE	
  FUNCTION	
  WOEID_DISTANCE(INTEGER,INTEGER)	
  	
  
RETURNS	
  INTEGER	
  AS	
  ‘org.apache.geo.woeidDistance’	
  
USING	
  JAR	
  ‘/lib/geo/geoloc.jar’	
  
o  Querying,	
  funcQonal	
  indexing,	
  etc.	
  then	
  possible:	
  
SELECT	
  *	
  FROM	
  woeid	
  a	
  JOIN	
  woeid	
  b	
  ON	
  a.country	
  =	
  b.country	
  
WHERE	
  woeid_distance(a.ID,b.ID)	
  <	
  5	
  
Phoenix	
  4.4:	
  Query	
  Server	
  +	
  Thin	
  Driver	
  
o  Offloads	
  query	
  planning	
  and	
  execuQon	
  to	
  different	
  
server(s)	
  
o  Minimizes	
  client	
  dependencies	
  
o  Enabler	
  for	
  ODBC	
  driver	
  (work	
  in	
  progress)	
  
o  Connect	
  like	
  this	
  instead:	
  
Connection conn = DriverManager.getConnection(
"jdbc:phoenix:thin:url=http://localhost:8765");
o  SQll	
  evolving:	
  no	
  backward	
  compa*bility	
  guarantees	
  yet	
  
	
  
h0p://phoenix.apache.org/server.html	
  
Phoenix	
  4.4	
  
o  HBase	
  1.0	
  Support	
  
o  FuncQonal	
  Indexes	
  
o  User	
  Defined	
  FuncQons	
  
o  Query	
  Server	
  with	
  Thin	
  Driver	
  
o  Union	
  All	
  support	
  
o  Tes*ng	
  at	
  scale	
  with	
  Pherf	
  
o  MR	
  index	
  build	
  
o  Spark	
  integra*on	
  
o  Date	
  built-­‐in	
  func*ons	
  –	
  WEEK,	
  DAYOFMONTH,	
  &c.	
  
	
  
Phoenix	
  4.5	
  
o  HBase	
  1.1	
  Support	
  
o  Client-­‐side	
  per-­‐statement	
  metrics	
  
o  SELECT	
  without	
  FROM	
  
o  Now	
  possible	
  to	
  alter	
  tables	
  with	
  views	
  
o  Math	
  built-­‐in	
  funcQons	
  –	
  SIGN,	
  SQRT,	
  CBRT,	
  EXP,	
  &c.	
  
o  Array	
  built-­‐in	
  funcQons	
  –	
  ARRAY_CAT,	
  ARRAY_FILL,	
  
&c.	
  
o  UDF,	
  View,	
  FuncQonal	
  Index	
  enhancements	
  
o  Pherf	
  and	
  Performance	
  enhancements	
  
TRANSACTIONS	
  
I	
  was	
  told	
  there	
  would	
  be…	
  
TransacQons	
  (Work	
  in	
  Progress)	
  
o  Snapshot	
  isolaQon	
  model	
  
o  Using	
  Tephra	
  (h0p://tephra.io/)	
  
o  Supports	
  REPEATABLE_READ	
  isolaQon	
  level	
  
o  Allows	
  reading	
  your	
  own	
  uncommi0ed	
  data	
  
o  OpQonal	
  
o  Enabled	
  on	
  a	
  table	
  by	
  table	
  basis	
  
o  No	
  performance	
  penalty	
  when	
  not	
  used	
  
o  Work	
  in	
  progress,	
  but	
  close	
  to	
  release	
  
o  Try	
  our	
  txn	
  branch	
  
o  Will	
  be	
  available	
  in	
  next	
  release	
  
OpQmisQc	
  Concurrency	
  Control	
  
•  Avoids cost of locking rows and tables
•  No deadlocks or lock escalations
•  Cost of conflict detection and possible rollback is higher
•  Good if conflicts are rare: short transaction, disjoint
partitioning of work
•  Conflict detection not always necessary: write-once/
append-only data
Tephra	
  Architecture	
  
ZooKeeper	

Tx Manager	

(standby)	

HBase	

Master 1	

Master 2 	

	

RS 1	

RS 2	

 RS 4	

RS 3	

Client 1	

Client 2	

Client N	

Tx Manager	

(active)
time out
try abort
failed
roll back
in HBase
write
to
HBase
do work
Client Tx Manager
none
complete V
abortsucceeded
in progress
start tx
start
start tx
commit
try commit check conflicts
invalid X
invalidate
failed
TransacQon	
  Lifecycle	
  
Tephra	
  Architecture	
  
•  TransactionAware client!
•  Coordinates transaction lifecycle with manager
•  Communicates directly with HBase for reads and writes
•  Transaction Manager
•  Assigns transaction IDs
•  Maintains state on in-progress, committed and invalid transactions!
•  Transaction Processor coprocessor
•  Applies server-side filtering for reads
•  Cleans up data from failed transactions, and no longer visible versions
WHAT’S	
  NEXT?	
  
A	
  brave	
  new	
  world	
  
What’s	
  Next?	
  
o Is	
  Phoenix	
  done?	
  
o What	
  about	
  the	
  Big	
  Picture?	
  
o How	
  can	
  Phoenix	
  be	
  leveraged	
  in	
  the	
  larger	
  
ecosystem?	
  
o Hive,	
  Pig,	
  MR	
  integraQon	
  with	
  Phoenix	
  today,	
  but	
  
it’s	
  not	
  a	
  great	
  story	
  
What’s	
  Next?	
  
You	
  are	
  here	
  
Moving	
  to	
  Apache	
  Calcite	
  
o  Query	
  parser,	
  compiler,	
  and	
  planner	
  framework	
  
o SQL-­‐92	
  compliant	
  (ever	
  argue	
  SQL	
  with	
  Julian?	
  :-­‐)	
  )	
  
o Enables	
  Phoenix	
  to	
  get	
  missing	
  SQL	
  support	
  
o  Pluggable	
  cost-­‐based	
  opQmizer	
  framework	
  
o Sane	
  way	
  to	
  model	
  push	
  down	
  through	
  rules	
  
o  Interop	
  with	
  other	
  Calcite	
  adaptors	
  
o Not	
  for	
  free,	
  but	
  it	
  becomes	
  feasible	
  
o Already	
  used	
  by	
  Drill,	
  Hive,	
  Kylin,	
  Samza	
  
o Supports	
  any	
  JDBC	
  source	
  (i.e.	
  RDBMS	
  -­‐	
  remember	
  
them	
  :-­‐)	
  )	
  
o One	
  cost-­‐model	
  to	
  rule	
  them	
  all	
  
How	
  does	
  Phoenix	
  plug	
  in?	
  
Calcite	
  Parser	
  &	
  Validator
Calcite	
  Query	
  OpQmizer
Phoenix	
  Query	
  Plan	
  Generator
Phoenix	
  RunQme
Phoenix	
  Tables	
  over	
  HBase	
  
JDBC	
  Client
SQL	
  +	
  Phoenix	
  
specific	
  
grammar
Built-­‐in	
  rules	
  
+	
  Phoenix	
  
specific	
  rules
OpQmizaQon	
  Rules	
  
o AggregateRemoveRule	
  
o FilterAggregateTransposeRule	
  
o FilterJoinRule	
  
o FilterMergeRule	
  
o JoinCommuteRule	
  
o PhoenixFilterScanMergeRule	
  
o PhoenixJoinSingleAggregateMergeRule	
  
o …	
  
Query	
  Example	
  	
  
(filter	
  push-­‐down	
  and	
  smart	
  join	
  algorithm)	
  
LogicalFilter	
  
filter:	
  $0	
  =	
  ‘x’
LogicalJoin	
  
type:	
  inner	
  
cond:	
  $3	
  =	
  $7
LogicalProject	
  
projects:	
  $0,	
  $5
LogicalTableScan	
  
table:	
  A
LogicalTableScan	
  
table:	
  B
PhoenixTableScan	
  
table:	
  ‘a’	
  
filter:	
  $0	
  =	
  ‘x’	
  
PhoenixServerJoin	
  
type:	
  inner	
  
cond:	
  $3	
  =	
  $1
PhoenixServerProject	
  
projects:	
  $2,	
  $0
OpQmizer	
  
(with	
  
RelOptRules	
  &	
  
ConvertRules)	
  
PhoenixTableScan	
  
table:	
  ‘b’	
  
PhoenixServerProject	
  
projects:	
  $0,	
  $2
PhoenixServerProject	
  
projects:	
  $0,	
  $3
Query	
  Example	
  
(filter	
  push-­‐down	
  and	
  smart	
  join	
  algorithm)
ScanPlan	
  
table:	
  ‘a’	
  
skip-­‐scan:	
  pk0	
  =	
  ‘x’	
  
projects:	
  pk0,	
  c3
HashJoinPlan	
  
types	
  {inner}
join-­‐keys:	
  {$1}
projects:	
  $2,	
  $0
Build	
  
hash-­‐key:	
  $1
Phoenix	
  
Implementor	
  
PhoenixTableScan	
  
table:	
  ‘a’	
  
filter:	
  $0	
  =	
  ‘x’	
  
PhoenixServerJoin	
  
type:	
  inner	
  
cond:	
  $3	
  =	
  $1
PhoenixServerProject	
  
projects:	
  $2,	
  $0
PhoenixTableScan	
  
table:	
  ‘b’	
  
PhoenixServerProject	
  
projects:	
  $0,	
  $2
PhoenixServerProject	
  
projects:	
  $0,	
  $3
ScanPlan	
  
table:	
  ‘b’	
  
projects:	
  col0,	
  col2	
  
Probe
Interoperability	
  Example	
  
o Joining	
  data	
  from	
  Phoenix	
  and	
  mySQL	
  
EnumerableJoin
PhoenixTableScan JdbcTableScan
Phoenix	
  Tables	
  over	
  HBase	
   mySQL	
  Database	
  
PhoenixToEnumerable	
  
Converter
JdbcToEnumerable	
  
Converter
Query	
  Example	
  1
WITH	
  m	
  AS	
  	
  
	
  	
  	
  	
  (SELECT	
  *	
  	
  
	
  	
  	
  	
  	
  FROM	
  dept_manager	
  dm	
  	
  
	
  	
  	
  	
  	
  WHERE	
  from_date	
  =	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  (SELECT	
  max(from_date)	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  FROM	
  dept_manager	
  dm2	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  WHERE	
  dm.dept_no	
  =	
  dm2.dept_no))	
  	
  
SELECT	
  m.dept_no,	
  d.dept_name,	
  e.first_name,	
  e.last_name	
  	
  
FROM	
  employees	
  e	
  	
  
JOIN	
  m	
  ON	
  e.emp_no	
  =	
  m.emp_no	
  
JOIN	
  departments	
  d	
  ON	
  d.dept_no	
  =	
  m.dept_no	
  
ORDER	
  BY	
  d.dept_no;
Query	
  Example	
  2	
  
SELECT	
  dept_no,	
  Qtle,	
  count(*)	
  	
  
FROM	
  Qtles	
  t	
  	
  
JOIN	
  dept_emp	
  de	
  ON	
  t.emp_no	
  =	
  de.emp_no	
  
WHERE	
  dept_no	
  <=	
  'd006'	
  
GROUP	
  BY	
  rollup(dept_no,	
  *tle)	
  	
  
ORDER	
  BY	
  dept_no,	
  Qtle;	
  
V5	
  
Thanks!	
  
@ApachePhoenix	
  
h0p://phoenix.apache.org	
  
2015-­‐09-­‐29	
  
Nick	
  Dimiduk	
  (@xefyr)	
  
h0p://n10k.com	
  
#apachebigdata	
  
1 of 40

Recommended

Apache Phoenix: Transforming HBase into a SQL Database by
Apache Phoenix: Transforming HBase into a SQL DatabaseApache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseDataWorks Summit
16.7K views77 slides
The Evolution of a Relational Database Layer over HBase by
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseDataWorks Summit
2.3K views34 slides
Apache Big Data EU 2015 - HBase by
Apache Big Data EU 2015 - HBaseApache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseNick Dimiduk
3.3K views40 slides
Apache phoenix: Past, Present and Future of SQL over HBAse by
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseenissoz
6.3K views41 slides
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase by
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseDataWorks Summit/Hadoop Summit
3.3K views41 slides
Apache phoenix by
Apache phoenixApache phoenix
Apache phoenixUniversity of Moratuwa
1.1K views19 slides

More Related Content

What's hot

April 2014 HUG : Apache Phoenix by
April 2014 HUG : Apache PhoenixApril 2014 HUG : Apache Phoenix
April 2014 HUG : Apache PhoenixYahoo Developer Network
6.2K views21 slides
Apache phoenix by
Apache phoenixApache phoenix
Apache phoenixOsama Hussein
162 views44 slides
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse by
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseJosh Elser
36.9K views43 slides
Large-Scale Stream Processing in the Hadoop Ecosystem by
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem DataWorks Summit/Hadoop Summit
1.7K views50 slides
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster by
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clustermas4share
1.3K views32 slides
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ... by
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon
4.3K views35 slides

What's hot(20)

Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse by Josh Elser
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Josh Elser36.9K views
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster by mas4share
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
mas4share1.3K views
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ... by HBaseCon
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon4.3K views
Query Compilation in Impala by Cloudera, Inc.
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in Impala
Cloudera, Inc.7.8K views
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016 by Gyula Fóra
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Gyula Fóra2.5K views
Phoenix: How (and why) we put the SQL back into the NoSQL by DataWorks Summit
Phoenix: How (and why) we put the SQL back into the NoSQLPhoenix: How (and why) we put the SQL back into the NoSQL
Phoenix: How (and why) we put the SQL back into the NoSQL
DataWorks Summit2K views
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory... by Cloudera, Inc.
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
Cloudera, Inc.4.1K views
Real-time Stream Processing with Apache Flink @ Hadoop Summit by Gyula Fóra
Real-time Stream Processing with Apache Flink @ Hadoop SummitReal-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop Summit
Gyula Fóra2K views
Using Morphlines for On-the-Fly ETL by Cloudera, Inc.
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL
Cloudera, Inc.18.8K views
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud by gluent.
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the CloudSpeed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.3.2K views

Viewers also liked

IBM Message Hub service in Bluemix - Apache Kafka in a public cloud by
IBM Message Hub service in Bluemix - Apache Kafka in a public cloudIBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloudAndrew Schofield
6.6K views28 slides
Being Ready for Apache Kafka - Apache: Big Data Europe 2015 by
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Michael Noll
8.9K views57 slides
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka by
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaIntroducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaAndrew Schofield
2.6K views35 slides
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015 by
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Robert Metzger
4.4K views42 slides
IBM Message Hub: Cloud-Native Messaging by
IBM Message Hub: Cloud-Native MessagingIBM Message Hub: Cloud-Native Messaging
IBM Message Hub: Cloud-Native MessagingAndrew Schofield
1.1K views31 slides
Apache Kylin @ Big Data Europe 2015 by
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Seshu Adunuthula
2.6K views50 slides

Viewers also liked(20)

IBM Message Hub service in Bluemix - Apache Kafka in a public cloud by Andrew Schofield
IBM Message Hub service in Bluemix - Apache Kafka in a public cloudIBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
Andrew Schofield6.6K views
Being Ready for Apache Kafka - Apache: Big Data Europe 2015 by Michael Noll
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Michael Noll8.9K views
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka by Andrew Schofield
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaIntroducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Andrew Schofield2.6K views
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015 by Robert Metzger
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Robert Metzger4.4K views
IBM Message Hub: Cloud-Native Messaging by Andrew Schofield
IBM Message Hub: Cloud-Native MessagingIBM Message Hub: Cloud-Native Messaging
IBM Message Hub: Cloud-Native Messaging
Andrew Schofield1.1K views
Apache Kylin @ Big Data Europe 2015 by Seshu Adunuthula
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015
Seshu Adunuthula2.6K views
Fraud Detection in Real-time @ Apache Big Data Con by Seshika Fernando
Fraud Detection in Real-time @ Apache Big Data ConFraud Detection in Real-time @ Apache Big Data Con
Fraud Detection in Real-time @ Apache Big Data Con
Seshika Fernando2.6K views
Effectively Managing a Hybrid Messaging Environment by Andrew Schofield
Effectively Managing a Hybrid Messaging EnvironmentEffectively Managing a Hybrid Messaging Environment
Effectively Managing a Hybrid Messaging Environment
Andrew Schofield711 views
Hortonworks Technical Workshop: HBase and Apache Phoenix by Hortonworks
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks15K views
Hybrid Messaging with IBM Bluemix by matthew1001
Hybrid Messaging with IBM BluemixHybrid Messaging with IBM Bluemix
Hybrid Messaging with IBM Bluemix
matthew1001263 views
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015 by Sergio Fernández
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Sergio Fernández3.2K views
HBase + Hue - LA HBase User Group by gethue
HBase + Hue - LA HBase User GroupHBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User Group
gethue9.3K views
Hue: The Hadoop UI - Hadoop Singapore by gethue
Hue: The Hadoop UI - Hadoop SingaporeHue: The Hadoop UI - Hadoop Singapore
Hue: The Hadoop UI - Hadoop Singapore
gethue2.2K views
HBase Low Latency, StrataNYC 2014 by Nick Dimiduk
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014
Nick Dimiduk1.8K views
Bring Cartography to the Cloud by Nick Dimiduk
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the Cloud
Nick Dimiduk2.7K views
The Many Faces of Apache Kafka: Leveraging real-time data at scale by Neha Narkhede
The Many Faces of Apache Kafka: Leveraging real-time data at scaleThe Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scale
Neha Narkhede1.8K views
Connect 2017 DEV-1420 - Blue Mix and Domino – Complementing Smartcloud by Matteo Bisi
Connect 2017 DEV-1420 - Blue Mix and Domino – Complementing SmartcloudConnect 2017 DEV-1420 - Blue Mix and Domino – Complementing Smartcloud
Connect 2017 DEV-1420 - Blue Mix and Domino – Complementing Smartcloud
Matteo Bisi369 views

Similar to Apache Big Data EU 2015 - Phoenix

HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark by
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkMichael Stack
742 views40 slides
Building Scalable Data Pipelines - 2016 DataPalooza Seattle by
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleEvan Chan
5.7K views80 slides
Omid: scalable and highly available transaction processing for Apache Phoenix by
Omid: scalable and highly available transaction processing for Apache PhoenixOmid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache PhoenixDataWorks Summit
870 views36 slides
HBaseCon2015-final by
HBaseCon2015-finalHBaseCon2015-final
HBaseCon2015-finalMaryann Xue
127 views35 slides
Cascading on starfish by
Cascading on starfishCascading on starfish
Cascading on starfishFei Dong
798 views14 slides
Programming the Network Data Plane by
Programming the Network Data PlaneProgramming the Network Data Plane
Programming the Network Data PlaneC4Media
922 views47 slides

Similar to Apache Big Data EU 2015 - Phoenix(20)

HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark by Michael Stack
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
Michael Stack742 views
Building Scalable Data Pipelines - 2016 DataPalooza Seattle by Evan Chan
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Evan Chan5.7K views
Omid: scalable and highly available transaction processing for Apache Phoenix by DataWorks Summit
Omid: scalable and highly available transaction processing for Apache PhoenixOmid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache Phoenix
DataWorks Summit870 views
HBaseCon2015-final by Maryann Xue
HBaseCon2015-finalHBaseCon2015-final
HBaseCon2015-final
Maryann Xue127 views
Cascading on starfish by Fei Dong
Cascading on starfishCascading on starfish
Cascading on starfish
Fei Dong798 views
Programming the Network Data Plane by C4Media
Programming the Network Data PlaneProgramming the Network Data Plane
Programming the Network Data Plane
C4Media922 views
The life of a query (oracle edition) by maclean liu
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)
maclean liu2.2K views
Hyperspace: An Indexing Subsystem for Apache Spark by Databricks
Hyperspace: An Indexing Subsystem for Apache SparkHyperspace: An Indexing Subsystem for Apache Spark
Hyperspace: An Indexing Subsystem for Apache Spark
Databricks721 views
Spark (Structured) Streaming vs. Kafka Streams by Guido Schmutz
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
Guido Schmutz5K views
Omid: scalable and highly available transaction processing for Apache Phoenix by DataWorks Summit
Omid: scalable and highly available transaction processing for Apache PhoenixOmid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache Phoenix
DataWorks Summit168 views
GraphConnect 2014 SF: From Zero to Graph in 120: Scale by Neo4j
GraphConnect 2014 SF: From Zero to Graph in 120: ScaleGraphConnect 2014 SF: From Zero to Graph in 120: Scale
GraphConnect 2014 SF: From Zero to Graph in 120: Scale
Neo4j796 views
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat... by Flink Forward
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward1.3K views
Web Scale Reasoning and the LarKC Project by Saltlux Inc.
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC Project
Saltlux Inc.498 views
Distributed & Highly Available server applications in Java and Scala by Max Alexejev
Distributed & Highly Available server applications in Java and ScalaDistributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and Scala
Max Alexejev5.6K views
Hadoop and HBase experiences in perf log project by Mao Geng
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
Mao Geng832 views
Streaming SQL by Julian Hyde
Streaming SQLStreaming SQL
Streaming SQL
Julian Hyde3.6K views
Fast federated SQL with Apache Calcite by Chris Baynes
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
Chris Baynes1.4K views

More from Nick Dimiduk

Apache HBase 1.0 Release by
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 ReleaseNick Dimiduk
25.6K views26 slides
HBase Blockcache 101 by
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101Nick Dimiduk
5.7K views13 slides
Apache HBase Low Latency by
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low LatencyNick Dimiduk
12K views38 slides
Apache HBase for Architects by
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for ArchitectsNick Dimiduk
11.2K views34 slides
HBase for Architects by
HBase for ArchitectsHBase for Architects
HBase for ArchitectsNick Dimiduk
33.7K views21 slides
HBase Client APIs (for webapps?) by
HBase Client APIs (for webapps?)HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)Nick Dimiduk
9K views35 slides

More from Nick Dimiduk(8)

Apache HBase 1.0 Release by Nick Dimiduk
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 Release
Nick Dimiduk25.6K views
HBase Blockcache 101 by Nick Dimiduk
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101
Nick Dimiduk5.7K views
Apache HBase Low Latency by Nick Dimiduk
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low Latency
Nick Dimiduk12K views
Apache HBase for Architects by Nick Dimiduk
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for Architects
Nick Dimiduk11.2K views
HBase for Architects by Nick Dimiduk
HBase for ArchitectsHBase for Architects
HBase for Architects
Nick Dimiduk33.7K views
HBase Client APIs (for webapps?) by Nick Dimiduk
HBase Client APIs (for webapps?)HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)
Nick Dimiduk9K views
Pig, Making Hadoop Easy by Nick Dimiduk
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
Nick Dimiduk84.7K views
Introduction to Hadoop, HBase, and NoSQL by Nick Dimiduk
Introduction to Hadoop, HBase, and NoSQLIntroduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Nick Dimiduk7.5K views

Recently uploaded

PRODUCT PRESENTATION.pptx by
PRODUCT PRESENTATION.pptxPRODUCT PRESENTATION.pptx
PRODUCT PRESENTATION.pptxangelicacueva6
13 views1 slide
Ransomware is Knocking your Door_Final.pdf by
Ransomware is Knocking your Door_Final.pdfRansomware is Knocking your Door_Final.pdf
Ransomware is Knocking your Door_Final.pdfSecurity Bootcamp
53 views46 slides
Empathic Computing: Delivering the Potential of the Metaverse by
Empathic Computing: Delivering  the Potential of the MetaverseEmpathic Computing: Delivering  the Potential of the Metaverse
Empathic Computing: Delivering the Potential of the MetaverseMark Billinghurst
476 views80 slides
Democratising digital commerce in India-Report by
Democratising digital commerce in India-ReportDemocratising digital commerce in India-Report
Democratising digital commerce in India-ReportKapil Khandelwal (KK)
15 views161 slides
Voice Logger - Telephony Integration Solution at Aegis by
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at AegisNirmal Sharma
31 views1 slide
HTTP headers that make your website go faster - devs.gent November 2023 by
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023Thijs Feryn
21 views151 slides

Recently uploaded(20)

Empathic Computing: Delivering the Potential of the Metaverse by Mark Billinghurst
Empathic Computing: Delivering  the Potential of the MetaverseEmpathic Computing: Delivering  the Potential of the Metaverse
Empathic Computing: Delivering the Potential of the Metaverse
Mark Billinghurst476 views
Voice Logger - Telephony Integration Solution at Aegis by Nirmal Sharma
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at Aegis
Nirmal Sharma31 views
HTTP headers that make your website go faster - devs.gent November 2023 by Thijs Feryn
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023
Thijs Feryn21 views
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
Unit 1_Lecture 2_Physical Design of IoT.pdf by StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec12 views
Case Study Copenhagen Energy and Business Central.pdf by Aitana
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdf
Aitana16 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson66 views

Apache Big Data EU 2015 - Phoenix

  • 1. V5   The  Evolu*on  of  a  Rela*onal  Database  Layer  over  HBase   @ApachePhoenix   h0p://phoenix.apache.org   2015-­‐09-­‐29   Nick  Dimiduk  (@xefyr)   h0p://n10k.com   #apachebigdata  
  • 2. Agenda   o  What  is  Apache  Phoenix?   o  State  of  the  Union   o  Latest  Releases   o  A  brief  aside  on  TransacQons   o  What’s  Next?   o  Q  &  A  
  • 3. WHAT  IS  APACHE  PHOENIX   PuUng  the  SQL  back  into  NoSQL  
  • 4. What  is  Apache  Phoenix?   o  A  relaQonal  database  layer  for  Apache  HBase   o  Query  engine   o  Transforms  SQL  queries  into  naQve  HBase  API  calls   o  Pushes   as   much   work   as   possible   onto   the   cluster   for   parallel   execuQon   o  Metadata  repository   o  Typed  access  to  data  stored  in  HBase  tables   o  A  JDBC  driver   o  A  top  level  Apache  So_ware  FoundaQon  project   o  Originally  developed  at  Salesforce   o  A  growing  community  with  momentum  
  • 5. What  is  Apache  Phoenix?   ==   ==  
  • 6. Where  Does  Phoenix  Fit  In?  Sqoop     RDB  Data  Collector   Flume     Log  Data  Collector   Zookeeper   CoordinaQon   YARN  /  MRv2   Cluster  Resource   Manager  /   MapReduce   HDFS   Hadoop  Distributed  File  System   GraphX   Graph  analysis   framework   Phoenix   Query  execuQon  engine   HBase   Distributed  Database   The  Java  Virtual  Machine   Hadoop   Common  JNI   Spark   IteraQve  In-­‐Memory   ComputaQon   MLLib   Data  mining   Pig   Data  ManipulaQon   Hive   Structured  Query   Phoenix   JDBC  client  
  • 7. Phoenix  Puts  the  SQL  Back  in  NoSQL   o  Familiar  SQL  interface  for  (DDL,  DML,  etc.)   o Read  only  views  on  exisQng  HBase  data   o Dynamic  columns  extend  schema  at  runQme   o Flashback  queries  using  versioned  schema   o  Data  types   o  Secondary  indexes   o  Joins   o  Query  planning  and  opQmizaQon   o  MulQ-­‐tenancy  
  • 8. Phoenix  Puts  the  SQL  Back  in  NoSQL   Accessing  HBase  data  with  Phoenix  can  be  substanQally  easier  than  direct   HBase  API  use       Versus     HTable t = new HTable(“foo”); RegionScanner s = t.getScanner(new Scan(…, new ValueFilter(CompareOp.GT, new CustomTypedComparator(30)), …)); while ((Result r = s.next()) != null) { // blah blah blah Java Java Java } s.close(); t.close(); SELECT * FROM foo WHERE bar > 30 HTable t = new HTable(“foo”); RegionScanner s = t.getScanner(new Scan(…, new ValueFilter(CompareOp.GT, new CustomTypedComparator(30)), …)); while ((Result r = s.next()) != null) { // blah blah blah Java Java Java } s.close(); t.close(); Your  BI  tool   probably  can’t  do   this   (And  we  didn’t  include  error  handling…)  
  • 9. STATE  OF  THE  UNION   So  how’s  things?  
  • 11. State  of  the  Union   o  SQL  Support:  Broad  enough  to  run  TPC  queries   o  Joins,  Sub-­‐queries,  Derived  tables,  etc.   o  Secondary  Indices:  Three  different  strategies   o  Immutable  for  write-­‐once/append  only  data     o  Global  for  read-­‐heavy  mutable  data   o  Local  for  write-­‐heavy  mutable  or  immutable  data   o  StaQsQcs  driven  parallel  execuQon   o  Monitoring  &  Management   o  Tracing,  metrics  
  • 12. Join  and  Subquery  Support o Grammar   o inner  join;  le_/right/full  outer  join;  cross  join   o AddiQonal   o semi  join;  anQ  join   o Algorithms   o hash-­‐join;  sort-­‐merge-­‐join   o OpQmizaQons   o Predicate  push-­‐down;  FK-­‐to-­‐PK  join  opQmizaQon;   Global  index  with  missing  data  columns;  Correlated   subquery  rewrite  
  • 13. TPC  Example  1   Small-­‐QuanQty-­‐Order  Revenue  Query  (Q17) select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand = '[B]' and p_container = '[C]' and l_quantity < ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey ); CLIENT 4-WAY FULL SCAN OVER lineitem PARALLEL INNER JOIN TABLE 0 CLIENT 1-WAY FULL SCAN OVER part SERVER FILTER BY p_partkey = ‘[B]’ AND p_container = ‘[C]’ PARALLEL INNER JOIN TABLE 1 CLIENT 4-WAY FULL SCAN OVER lineitem SERVER AGGREGATE INTO DISTINCT ROWS BY l_partkey AFTER-JOIN SERVER FILTER BY l_quantity < $0
  • 14. TPC  Example  2   Order  Priority  Checking  Query  (Q4) select o_orderpriority, count(*) as order_count from orders where o_orderdate >= date '[D]' and o_orderdate < date '[D]' + interval '3' month and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority; CLIENT 4-WAY FULL SCAN OVER orders SERVER FILTER o_orderdate >= ‘[D]’ AND o_orderdate < ‘[D]’ + 3(d) SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY o_orderpriority CLIENT MERGE SORT SKIP-SCAN JOIN TABLE 0 CLIENT 4-WAY FULL SCAN OVER lineitem SERVER FILTER BY l_commitdate < l_receiptdate DYNAMIC SERVER FILTER BY o_orderkey IN l_orderkey
  • 15. Join  support  -­‐  what  can't  we  do? o Nested  Loop  Join   o StaQsQcs  Guided  Join  Algorithm   o Smartly  choose  the  smaller  table  for  the  build  side   o Smartly  switch  between  hash-­‐join  and  sort-­‐merge-­‐join   o Smartly  turn  on/off  FK-­‐to-­‐PK  join  opQmizaQon
  • 16. LATEST  RELEASES   The  new  goodness,  from  our  community  to  your  datacenter!  
  • 17. Phoenix  4.4   o  HBase  1.0  Support   o  Func*onal  Indexes   o  User  Defined  Func*ons   o  Query  Server  with  Thin  Driver   o  Union  All  support   o  TesQng  at  scale  with  Pherf   o  MR  index  build   o  Spark  integraQon   o  Date  built-­‐in  funcQons  –  WEEK,  DAYOFMONTH,  etc.  
  • 18. Phoenix  4.4:  FuncQonal  Indexes   o CreaQng  an  index  on  an  expression  as   opposed  to  just  a  column  value.   o Example  without  index,  full  table  scan:   SELECT  AVG(response_*me)  FROM  SERVER_METRICS   WHERE  DAYOFMONTH(create_*me)  =  1   o Add  funcQonal  index,  range  scan:   CREATE  INDEX  day_of_month_idx   ON  SERVER_METRICS  (DAYOFMONTH(create_*me))   INCLUDE  (response_*me)  
  • 19. Phoenix  4.4:  User  Defined  FuncQons   o  Extension  points  to  Phoenix  for  domain-­‐specific   funcQons.   o  For  example,  a  geo-­‐locaQon  applicaQon  might  load  a   set  of  UDFs  like  this:   CREATE  FUNCTION  WOEID_DISTANCE(INTEGER,INTEGER)     RETURNS  INTEGER  AS  ‘org.apache.geo.woeidDistance’   USING  JAR  ‘/lib/geo/geoloc.jar’   o  Querying,  funcQonal  indexing,  etc.  then  possible:   SELECT  *  FROM  woeid  a  JOIN  woeid  b  ON  a.country  =  b.country   WHERE  woeid_distance(a.ID,b.ID)  <  5  
  • 20. Phoenix  4.4:  Query  Server  +  Thin  Driver   o  Offloads  query  planning  and  execuQon  to  different   server(s)   o  Minimizes  client  dependencies   o  Enabler  for  ODBC  driver  (work  in  progress)   o  Connect  like  this  instead:   Connection conn = DriverManager.getConnection( "jdbc:phoenix:thin:url=http://localhost:8765"); o  SQll  evolving:  no  backward  compa*bility  guarantees  yet     h0p://phoenix.apache.org/server.html  
  • 21. Phoenix  4.4   o  HBase  1.0  Support   o  FuncQonal  Indexes   o  User  Defined  FuncQons   o  Query  Server  with  Thin  Driver   o  Union  All  support   o  Tes*ng  at  scale  with  Pherf   o  MR  index  build   o  Spark  integra*on   o  Date  built-­‐in  func*ons  –  WEEK,  DAYOFMONTH,  &c.    
  • 22. Phoenix  4.5   o  HBase  1.1  Support   o  Client-­‐side  per-­‐statement  metrics   o  SELECT  without  FROM   o  Now  possible  to  alter  tables  with  views   o  Math  built-­‐in  funcQons  –  SIGN,  SQRT,  CBRT,  EXP,  &c.   o  Array  built-­‐in  funcQons  –  ARRAY_CAT,  ARRAY_FILL,   &c.   o  UDF,  View,  FuncQonal  Index  enhancements   o  Pherf  and  Performance  enhancements  
  • 23. TRANSACTIONS   I  was  told  there  would  be…  
  • 24. TransacQons  (Work  in  Progress)   o  Snapshot  isolaQon  model   o  Using  Tephra  (h0p://tephra.io/)   o  Supports  REPEATABLE_READ  isolaQon  level   o  Allows  reading  your  own  uncommi0ed  data   o  OpQonal   o  Enabled  on  a  table  by  table  basis   o  No  performance  penalty  when  not  used   o  Work  in  progress,  but  close  to  release   o  Try  our  txn  branch   o  Will  be  available  in  next  release  
  • 25. OpQmisQc  Concurrency  Control   •  Avoids cost of locking rows and tables •  No deadlocks or lock escalations •  Cost of conflict detection and possible rollback is higher •  Good if conflicts are rare: short transaction, disjoint partitioning of work •  Conflict detection not always necessary: write-once/ append-only data
  • 26. Tephra  Architecture   ZooKeeper Tx Manager (standby) HBase Master 1 Master 2 RS 1 RS 2 RS 4 RS 3 Client 1 Client 2 Client N Tx Manager (active)
  • 27. time out try abort failed roll back in HBase write to HBase do work Client Tx Manager none complete V abortsucceeded in progress start tx start start tx commit try commit check conflicts invalid X invalidate failed TransacQon  Lifecycle  
  • 28. Tephra  Architecture   •  TransactionAware client! •  Coordinates transaction lifecycle with manager •  Communicates directly with HBase for reads and writes •  Transaction Manager •  Assigns transaction IDs •  Maintains state on in-progress, committed and invalid transactions! •  Transaction Processor coprocessor •  Applies server-side filtering for reads •  Cleans up data from failed transactions, and no longer visible versions
  • 29. WHAT’S  NEXT?   A  brave  new  world  
  • 30. What’s  Next?   o Is  Phoenix  done?   o What  about  the  Big  Picture?   o How  can  Phoenix  be  leveraged  in  the  larger   ecosystem?   o Hive,  Pig,  MR  integraQon  with  Phoenix  today,  but   it’s  not  a  great  story  
  • 31. What’s  Next?   You  are  here  
  • 32. Moving  to  Apache  Calcite   o  Query  parser,  compiler,  and  planner  framework   o SQL-­‐92  compliant  (ever  argue  SQL  with  Julian?  :-­‐)  )   o Enables  Phoenix  to  get  missing  SQL  support   o  Pluggable  cost-­‐based  opQmizer  framework   o Sane  way  to  model  push  down  through  rules   o  Interop  with  other  Calcite  adaptors   o Not  for  free,  but  it  becomes  feasible   o Already  used  by  Drill,  Hive,  Kylin,  Samza   o Supports  any  JDBC  source  (i.e.  RDBMS  -­‐  remember   them  :-­‐)  )   o One  cost-­‐model  to  rule  them  all  
  • 33. How  does  Phoenix  plug  in?   Calcite  Parser  &  Validator Calcite  Query  OpQmizer Phoenix  Query  Plan  Generator Phoenix  RunQme Phoenix  Tables  over  HBase   JDBC  Client SQL  +  Phoenix   specific   grammar Built-­‐in  rules   +  Phoenix   specific  rules
  • 34. OpQmizaQon  Rules   o AggregateRemoveRule   o FilterAggregateTransposeRule   o FilterJoinRule   o FilterMergeRule   o JoinCommuteRule   o PhoenixFilterScanMergeRule   o PhoenixJoinSingleAggregateMergeRule   o …  
  • 35. Query  Example     (filter  push-­‐down  and  smart  join  algorithm)   LogicalFilter   filter:  $0  =  ‘x’ LogicalJoin   type:  inner   cond:  $3  =  $7 LogicalProject   projects:  $0,  $5 LogicalTableScan   table:  A LogicalTableScan   table:  B PhoenixTableScan   table:  ‘a’   filter:  $0  =  ‘x’   PhoenixServerJoin   type:  inner   cond:  $3  =  $1 PhoenixServerProject   projects:  $2,  $0 OpQmizer   (with   RelOptRules  &   ConvertRules)   PhoenixTableScan   table:  ‘b’   PhoenixServerProject   projects:  $0,  $2 PhoenixServerProject   projects:  $0,  $3
  • 36. Query  Example   (filter  push-­‐down  and  smart  join  algorithm) ScanPlan   table:  ‘a’   skip-­‐scan:  pk0  =  ‘x’   projects:  pk0,  c3 HashJoinPlan   types  {inner} join-­‐keys:  {$1} projects:  $2,  $0 Build   hash-­‐key:  $1 Phoenix   Implementor   PhoenixTableScan   table:  ‘a’   filter:  $0  =  ‘x’   PhoenixServerJoin   type:  inner   cond:  $3  =  $1 PhoenixServerProject   projects:  $2,  $0 PhoenixTableScan   table:  ‘b’   PhoenixServerProject   projects:  $0,  $2 PhoenixServerProject   projects:  $0,  $3 ScanPlan   table:  ‘b’   projects:  col0,  col2   Probe
  • 37. Interoperability  Example   o Joining  data  from  Phoenix  and  mySQL   EnumerableJoin PhoenixTableScan JdbcTableScan Phoenix  Tables  over  HBase   mySQL  Database   PhoenixToEnumerable   Converter JdbcToEnumerable   Converter
  • 38. Query  Example  1 WITH  m  AS            (SELECT  *              FROM  dept_manager  dm              WHERE  from_date  =                      (SELECT  max(from_date)                        FROM  dept_manager  dm2                        WHERE  dm.dept_no  =  dm2.dept_no))     SELECT  m.dept_no,  d.dept_name,  e.first_name,  e.last_name     FROM  employees  e     JOIN  m  ON  e.emp_no  =  m.emp_no   JOIN  departments  d  ON  d.dept_no  =  m.dept_no   ORDER  BY  d.dept_no;
  • 39. Query  Example  2   SELECT  dept_no,  Qtle,  count(*)     FROM  Qtles  t     JOIN  dept_emp  de  ON  t.emp_no  =  de.emp_no   WHERE  dept_no  <=  'd006'   GROUP  BY  rollup(dept_no,  *tle)     ORDER  BY  dept_no,  Qtle;  
  • 40. V5   Thanks!   @ApachePhoenix   h0p://phoenix.apache.org   2015-­‐09-­‐29   Nick  Dimiduk  (@xefyr)   h0p://n10k.com   #apachebigdata