Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

De-Mystifying the Apache Phoenix QueryServer

499 views

Published on

A general introduction to the Apache Phoenix QueryServer. It aims to cover the "what", "why", and "how" behind the technology.

Published in: Software
  • Be the first to comment

De-Mystifying the Apache Phoenix QueryServer

  1. 1. De-Mystifying the Apache Phoenix QueryServer Josh Elser MTS 2016-04-13
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved About me • (Recent) Apache Phoenix Committer • Apache Calcite Committer and PMC • Long-time NoSQL developer, re-learning SQL Apache Calcite and Apache Phoenix are projects at the Apache Software Foundation. These names are trademarks of the Foundation.
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda What? Why? How? Apache Phoenix QueryServer
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved “What” is Apache Phoenix?  Been called many things [1] – “We put the SQL back in NoSQL!” – “A SQL skin on HBase” – “A relational layer on HBase” – “Online transaction processing and operational analytics for Hadoop”  Built on HDFS and HBase – Clients use a JDBC driver – Lots of server-side “magic” through HBase Coprocessors  A query system capable of both OLAP and OLTP workloads – More or less [1] https://medium.com/salesforce-open-source/apache-phoenix-a-conversation-with-pmc-chair-james-taylor-cc0dd8c7c3e5
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved “What” is the Apache Phoenix QueryServer?  An HTTP abstraction of a JDBC Driver – Built on Apache Calcite’s Avatica sub-project  A standalone-service to be run on each node in a cluster – An HTTP server – Configurable serialization mechanism  A new JDBC Driver to use with the QueryServer – A glorified HTTP client – A new sqlline script
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved “What” is Apache Calcite?  SQL Parser – One SQL implementation usable by everyone  Cost-Based Optimizer – “Optimizations are easy”  Pluggable Data Sources – Implement your own SQL engine  Avatica – Calcite sub-project – Implements the JDBC-over-HTTP abstraction – Written to the JDBC spec, not database-specific The coolest project approximately one person can explain
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda What? Why? How? Apache Phoenix QueryServer
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved “Why” should I care?  A true “thin” client – No required connection to HBase/ZooKeeper/HDFS – Greatly simplifies definition of “Phoenix client”  Offload computational resources to cluster – QueryServers run on the cluster – Not your laptop or some “edge” node  Enables non-Java clients – The big one Because it’s friggin’ cool!
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved “Why” are non-Java clients important?  ”Native” bindings in any language – HTTP clients are easily implemented – Serialization approaches (often) have cross-language support  Access to data in HBase is suddenly easily accessible – Standardized table format through Phoenix – Well-defined APIs: Python Database API, Ruby ActiveRecord, etc  ODBC and BI Tools – The moonshot. – The hopes and dreams of services people everywhere. Not everyone wants to use Java.
  10. 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved “Why” not <insert rpc framework here> instead of HTTP?  HTTP is simple – “You have multiple versions of Thrift on the classpath” – “You have to use Protobuf 2.4”  Designed to be stateless – JDBC doesn’t make this easy – Can work around it via Avatica’s wire API  Statelessness makes scaling easier – Pull down any HTTP load balancer – Deploy more Avatica servers to scale up Because portability sucks
  11. 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda What? Why? How? Apache Phoenix QueryServer
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved “How” does it work?  HTTP Server – Jetty – Phoenix “thick” Driver  Serialization mechanism – Protocol Buffers – JSON  Metrics system – Dropwizard Metrics – Apache Hadoop Metrics2  Authentication – Kerberos via SPNEGO – HTTP Basic or Digest The QueryServer itself
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved “How” does the serialization work?  Google Protocol Buffers (v3) – “think XML, but smaller, faster, and simpler” [1] – 110% supported WRT compatibility – Native bindings in most every popular language – Clients can use any version of protobuf3  JSON – Nice for testing – 110% unsupported WRT compatibility – You will run into issue with mismatched client/server versions Please, please, please use Protocol Buffers [1] https://developers.google.com/protocol-buffers/
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved “How” do I make a client?  Choose a language – Find an HTTP client supported with that language – Install Protobuf bindings for that language  Read the Avatica docs [1] – Tell us when docs are incorrect/lacking/wrong/boring/lame  Write tests  Publish the client – And tell us! Sit down and write code [1] http://calcite.apache.org/avatica/docs/protobuf_reference.html
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved “How” do I get involved?  Provide servers for databases – A simple project for a specific database  Write some tests  Proofread the docs  Contribute a client  Answer questions on Stackoverflow/mailing lists Carpe diem
  16. 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thanks! Email: elserj@apache.org Twitter: @josh_elser Mailing lists: Phoenix: dev@phoenix.apache.org, user@phoenix.apache.org, Calcite: dev@calcite.apache.org Project info: https://phoenix.apache.org/server.html https://calcite.apache.org/avatica/

×