Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

1,212 views

Published on

This session surveys and compares several major big data query processing systems. You will get to see a landscape of the area, from analytical data processing to operational data processing. You’ll learn how Couchbase query language N1QL (SQL for JSON) aligns with SQL++, a SQL extension that unifies online and offline query processing for big data without losing SQL backward compatibility.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

  1. 1. BIG DATA QUERY LANDSCAPE – N1QL AND MORE Yingyi Bu | Couchbase
  2. 2. ©2015 Couchbase Inc. 2 About Myself  Sr. Software Engineer @ Couchbase  Committer @ AsterixDB (Research Project under Apache Incubation)  PhD Student @ UC Irvine  N1QL  SQL++ yingyi@couchbase.com @buyingyi
  3. 3. ©2015 Couchbase Inc. 3 Agenda  Introduction  Operational Query Processing  Analytical Query Processing  Comparison and Unification  Summary
  4. 4. Introduction
  5. 5. ©2015 Couchbase Inc. 5 Research Projects Introduction NoSQL SQL-on-Hadoop SQL++ Unification
  6. 6. ©2015 Couchbase Inc. 6  Language Unification Research  SQL Backward Compatible  Rich Data Model  Configurable Semantics  System Unification Research  A Single Language Interface  Scale-out for BothWorkloads  Resource Scheduling Underneath Introduction SQL++
  7. 7. Operational Query Processing
  8. 8. ©2015 Couchbase Inc. 8 ArrayList<URI> nodes = new ArrayList<URI>(); // Add one or more nodes of your cluster nodes.add(URI.create("http://127.0.0.1:8091/pools")); //Try to connect to the client CouchbaseClient client = null; try { client = new CouchbaseClient(nodes, "default", ""); } catch (Exception e) { System.err.println("Error connecting toCouchbase: " + e.getMessage()); System.exit(1); } // Put the key-value pair into Couchbase. client.set("hello", "couchbase!").get(); // Return the result and cast it to string String result = (String) client.get("hello"); System.out.println(result); Operational Query Processing Put Get  JSON  Filtering  Flatten  Group-by  Aggregation  Join  Ordering
  9. 9. ©2015 Couchbase Inc. 9 N1QL – SQL for NoSQL  Nested Data  Heterogeneous Data  Dynamic typing [ { "beer-sample": { "brewery_id": "bro" "abv": {"m1":1, "m2“:2}, "category": "North American Lager”, "type": "beer" } }, { "beer-sample": { "abv": 9.5, "brewery_id": "brouwerij" } } ] SELECT category, type, abv.m1 FROM `beer-sample` WHERE type = “beer” [ { "category": "North American Lager", "type": "beer”, "m1": 1 } ]  Standard SELECT pipeline  Joins, subqueries, set operators  UNNEST and NEST
  10. 10. ©2015 Couchbase Inc. 10 Cassandra  SQL-like query language Feature N1QL Cassandra Lookup ✔ ✔ Filtering ✔ ✔ Ordering ✔ ✔ Aggregation ✔ ✖ Join ✔ ✖ Subqueries ✔ ✖ Unnest ✔ ✖ Schema-free ✔ ✖ SELECT firstname, lastname FROM users WHERE birth_year = 1981 AND country = 'FR' ALLOW FILTERING; SELECT * FROM postsWHERE userid='john doe' AND (blog_title, posted_at) > ('John''s Blog', '2012-01-01')
  11. 11. ©2015 Couchbase Inc. 11 MongoDB  JavaScript-like language Feature N1QL MongoDB Lookup ✔ ✔ Filtering ✔ ✔ Ordering ✔ ✔ Aggregation ✔ ✔ Join ✔ ✖ Subqueries ✔ ✖ Unnest ✔ ✔ Schema-free ✔ ✔ db.sales.aggregate( [ { $group : { _id : { month: { $month: "$date" }, day: { $dayOfMonth: "$date" }, year: { $year: "$date" } }, totalPrice: { $sum: { $multiply: [ "$price", "$quantity" ] } }, averageQuantity: { $avg: "$quantity" }, count: { $sum: 1 } } } ] ) db.users.find( { age: { $gt: 18 } }, { name: 1, address: 1 } ).limit(5)
  12. 12. Analytical Query Processing
  13. 13. ©2015 Couchbase Inc. 13 Hive INSERT OVERWRITE TABLE school_summary SELECT subq1.school, COUNT(1) FROM (SELECT a.status, b.school, b.gender FROM status_updates a JOIN profiles b ON (a.userid = b.userid AND a.ds='2009-03-20' )) subq1 GROUP BY subq1.school ProjectProject Scan (a) Filter Scan (b) ReduceSink ReduceSink Join Group-by FileSink Scan ReduceSink Group-by FileSink M1 R1 M2 R2 More data types than SQL  Hadoop orTez as runtime
  14. 14. ©2015 Couchbase Inc. 14 Impala INSERT OVERWRITE TABLE school_summary SELECT subq1.school, COUNT(1) FROM (SELECT a.status, b.school, b.gender FROM status_updates a JOIN profiles b ON (a.userid = b.userid AND a.ds='2009-03-20' )) subq1 GROUP BY subq1.school ProjectProject Filter HDFS Scan (b) Hash Join HDFS Scan (a) Pre-Agg Merge-Agg HDFS Write  ANSI SQL-92  HDFS/HBase as the storage  Native MPP execution engine
  15. 15. ©2015 Couchbase Inc. 15 Spark SQL ctx = new HiveContext() users = ctx.table("users") young = users.where(users("age") < 21) println(young.count()) SELECT count(*) FROM users where age < 21 SQL DataFrames SQL DataFrames Unresolved Logical Plan Logical Plan Physical Plans Selected Physical Plan RDDs CostModel Catalog
  16. 16. ©2015 Couchbase Inc. 16 Drill  ANSI SQL-92  Nested Data  Schema Inference Centralized schema  Static  Managed by DBAs Self-describing or schema-less  Dynamic evolving  Managed by applications  Embedded in data  CSV, JSON, Parquet, ORC
  17. 17. Comparison and Unification
  18. 18. ©2015 Couchbase Inc. 18 Comparison and Unification  AsterixDB – System Unification Research  Query language?  Language Comparisons  SQL++ – Language Unification Research  N1QL and SQL++ SQL++ Unification Research Projects
  19. 19. ©2015 Couchbase Inc. 19  NoSQL data model with schema flexibility  Declarative full-fledged query language (AQL)  Partitioned native LSM-based storage  Secondary index (B-Tree, R-Tree, and keyword index)  Single-row transaction  Spatial/temporal data types  External data (HDFS) access and indexing  Native MPP query execution engine AsterixDB (Apache incubator) Operationa l Analytical
  20. 20. ©2015 Couchbase Inc. 20 Query Language? SELECT subq1.school, COUNT(1) FROM (SELECT a.status, a.date, b.school, b.region FROM status_updates a JOIN profiles b ON (a.userid = b.userid AND a.date='2009-03-20' )) subq1 GROUP BY subq1.school  Relational  JSON  Nested tuples/collections  Partial/missing schema  Heterogeneity  Complex values  Replace COUNT(1) with “(select * from subq1 order by date limit 3)”;  “school” is not in the schema of the “profiles” table  “school” is missing in some profiles;  “school” is a nested tuple.
  21. 21. ©2015 Couchbase Inc. 21 Language Comparison: Data Model System Top-level Values Heterogeneity Arrays Bags Maps Nested Tuples Primitiv eValues Hive Bags/Tuples ✖ ✔ ✖ P ✔ ✔ Impala Bags/Tuples ✖ ✖ ✖ ✖ ✖ ✔ Spark SQL Bags/Tuples ✖ ✔ ✖ ✔ ✔ ✔ Drill Bags/Tuples ✖ ✔ ✖ ✔ ✔ ✔ N1QL Bags/Tuples ✔ ✔ ✖ ✖ ✔ ✔ Cassandra Bags/Tuples ✖ P ✖ P ✖ ✔ MongoDB Bags/Tuples ✔ ✔ ✖ ✖ ✔ ✔ AsterixDB AnyValues ✔ ✖ ✔ ✖ ✔ ✔
  22. 22. ©2015 Couchbase Inc. 22 Language Comparison:Types System Dynamic Type Check Static Type Check AnyType OpenType UnionType Optional Hive ✖ ✔ ✖ ✖ ✖ ✖ Impala ✖ ✔ ✖ ✖ ✖ ✖ Spark SQL ✖ ✔ ✖ ✖ ✖ ✖ Drill ✖ ✔ ✖ ✖ ✖ ✖ N1QL ✔ ✖ – – Cassandra ✖ ✔ ✖ ✖ ✖ ✖ MongoDB ✔ ✖ – – AsterixDB ✔ ✔ ✔ ✔ ✖ ✔
  23. 23. ©2015 Couchbase Inc. 23 Language Comparison: Path Navigation System Tuple Nav. absent Tuple Nav. mismatch Array Nav. absent Array Nav. mismatch Map Nav. absent Map Nav. mismatch Hive error error null error null error Impala error error -- -- -- -- Spark SQL error error error error null error Drill error error error error null error N1QL missing missing missing missing -- -- Cassandra error error -- -- -- -- MongoDB missing missing -- -- -- -- AsterixDB null error error error -- -- No Errors!
  24. 24. ©2015 Couchbase Inc. 24 Language Comparison: SELECT Clause System ProjectTuples with Non-scalar Subqueries ProjectTuples with Nested Collections Project Non- Tuples Hive ✖ ✔ ✖ Impala ✖ ✖ ✖ Spark SQL ✖ ✔ ✖ Drill ✖ ✔ ✖ N1QL ✔ ✔ ✔ Cassandra ✖ ✖ ✖ MongoDB ✖ ✔ ✔ AsterixDB ✔ ✔ ✔
  25. 25. ©2015 Couchbase Inc. 25 Language Comparison: FROM Clause System Subquery Joins Inner Unnest Outer Unnest Ordinal Positions Hive ✔ ✔ ✔ ✔ ✔ Impala ✔ ✔ ✖ ✖ ✖ Spark SQL ✔ ✔ ✖ ✖ ✖ Drill ✔ ✔ ✔ ✖ ✖ N1QL ✔ ✔ ✔ ✔ ✖ Cassandra ✖ ✖ ✖ ✖ ✖ MongoDB ✖ ✖ ✔ ✖ ✖ AsterixDB ✔ ✔ ✔ ✖ ✔
  26. 26. ©2015 Couchbase Inc. 26  JSON data model  INNER/OUTER FLATTEN CLAUSE  Arbitrary subqueries in SELECT  Configurable parameters for semantics  Path navigations  Equality evaluations  Collection coercions SQL++ (The “++” Part) Supported by N1QL! Made consistent in N1QL!
  27. 27. ©2015 Couchbase Inc. 27 SQL++ Configuration for N1QL Configuration Parameter Value Parameter Value @path tuple_nav.absent missing tuple_nav.type_mismatch missing array_nav.absent missing array_nav.type_mismatch missing map_nav.absent missing map_nav.type_mismatch missing @eq complex yes type_mismatch false null_eq_null null null_eq_value null null_eq_missing missing missing_eq_missing missing missing_eq_value missing null_and_missing missing null_and_true null null_and_null null missing_and_true missing missing_and_missing missing
  28. 28. Summary N1QL in a Bigger Context
  29. 29. ©2015 Couchbase Inc. 29  Operational Query Processing  Rich Data Model  SQL is BACK, but with EXTENSIONS!  Analytical Query Processing  Rich Data Model is a MUST!  Unification  The trend! Summary
  30. 30. Thank you. Q & A

×