SQL on Accumulo


Published on

7:30 SQL-on-Accumulo - Don Miner, ClearEdge IT
Running SQL queries over data in Accumulo is easier said than done and has several nuanced design challenges that don't have clear answers. This talk will give an outline of the current state of the art in SQL-on-Accumulo technologies, while giving a realistic view on what is doable and what is not doable today.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

SQL on Accumulo

  1. 1. SQL-on-Accumulo A talk about how we may be able to do it one day DONALD MINER 1 May 5th, 2014
  2. 2. A brief history of Hadoop and SQL 2 time 1980 2000
  3. 3. A brief history of Hadoop and SQL 3 time 1980 2000
  4. 4. 4 BIG DATASQL
  5. 5. A brief history of Hadoop and SQL 5 time 1980 2000 SQL-on-Hadoop
  6. 6. SQL-on-Accumulo would be nice Problem: Accumulo is just a data store  We’ll have to do query somewhere else 6
  7. 7. 7 WWHBD?(What Would HBase Do?)
  8. 8. WWHBD? - Hive • Hive  Runs in MapReduce  Map col family and col qualifiers to columns  Maintained by Hive community • Impala and Shark inherit functionality from Hive 8
  9. 9. WWHBD? - next level Problem: Hive, Impala, and Shark don’t know how HBase works … and don’t care • Apache Phoenix  Specifically SQL-on-HBase  Currently Apache incubator project  Client-embedded JDBC driver  Uses series of scans and coprocessors • Pivotal’s HAWQ and PXF  PXF is external table functionality in HAWQ  Native support for HAWQ: uses push down filters, range scans, etc. to efficiently slurp data into HAWQ 9
  10. 10. ACCUMULO-143 people. technology. integrity. 10
  11. 11. SQL-on-Accumulo Status Hive (and somewhat Impala and Shark) • Github project by Brian Femiano [1]  Doesn’t work on new versions  Hasn’t been touched in 9 months  Wasn’t committed into trunk • Some rumors that some orgs have done it themselves (but no public information) people. technology. integrity. 11 [1] https://github.com/bfemiano/accumulo-hive-storage-manager (google for “accumulo hive”)
  12. 12. SQL-on-Accumulo Status Phoenix • Discussion on mailing list last week • Some differences between iterators and coprocessors makes this interesting Pivotal’s HAWQ and PXF • In development • Will support visibility labels • Pushdown and optimizations with iterators people. technology. integrity. 12
  13. 13. Visibility Design Problems 13 These problems are unique to Accumulo • SELECT and visibility labels  Assume two cells, only uniqueness is visibility… Which do I pick in a SELECT?  Timestamps have this problem, but have a logical assumption (most recent) • Authorizations in SQL  How do you tell the execution engine which authorizations to use?  Table definition? (hard to change)  SQL statement? (extend SQL language?)  Based on login? (how do you downgrade?)
  14. 14. What are the next steps? I guess that’s up to the community 14
  15. 15. QUIZ: What is this definition trying to say? Big Data: • Volume • Variety • Velocity • Veracity 15 A warning about SQL-on-Accumulo
  16. 16. QUIZ: What is this definition trying to say? Big Data: • Volume • Variety • Velocity • Veracity Answer: RDBMS/SQL suck at all these things 16 A warning about SQL-on-Accumulo
  17. 17. QUIZ: What is this definition trying to say? Big Data: • Volume • Variety • Velocity • Veracity Answer: RDBMS/SQL suck at all these things 17 A warning about SQL-on-Accumulo What does SQL-on-Accumulo still suck at? *Added context for my internet viewers since this could be controversial if taken literally and I’m not talking to my slides: I’m trying to say that SQL-on-X can’t solve all of the worlds problems, but it can solve a good number of them very well. It also tees up the conversation that SQL is not the end-all-be-all… there are ways that it could be made better to adapt to “the big data use case”. Don’t take this the wrong way, SQL-on-Hadoop and SQL-on- Accumulo would be incredible useful, but it doesn’t solve 100% of the problems.
  18. 18. SQL-on-Accumulo DONALD MINER 18 dminer@clearedgeit.com @donaldpminer Questions?