Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Quark Virtualization Engine for Analytics

465 views

Published on

Quark Virtualization Engine for Analytics

Published in: Technology
  • Be the first to comment

Quark Virtualization Engine for Analytics

  1. 1. Quark Virtualization Engine for Analytics Rajat Venkatesh Qubole @vrajat
  2. 2. Quark • Motivation • Use Cases • Architecture • Roadmap Agenda
  3. 3. Quark Data @Qubole api.qubole.com Monitoring & Alerts Business Analysts Workload Analysis Customer Clusters Amazon RDS Amazon S3
  4. 4. Quark Multi-Store Architecture Embedded Thin JDBC JAR Quark Server Quark Catalog Laptop or Server Amazon Redshift
  5. 5. Quark Narrow Tables TPCDS Dataset ~3 Billion Rows ORC Presto 0.119 Q3 referenced 3 attributes from store sales 0 50 100 150 200 250 0 10 20 30 40 50 Q3 String (512 Bytes)
  6. 6. Quark Narrow Tables Table No. of Queries Total Columns Columns Used Tickets 25000+ 265 74 Customers 10000+ 53 43 Support 6000+ 33 10
  7. 7. Quark select dt.d_year, item.i_brand_id brand_id, item.i_brand brand, sum(ss_ext_sales_price) sum_agg from store_sales, item, date_dim dt where dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk= item.i_item_sk and item.i_manufact_id = 436 and dt.d_moy = 12 -- partition key filters and (ss_sold_date_skbetween 2451149 and 2451179 or ss_sold_date_sk between 2451514 and 2451544 or ss_sold_date_sk between 2451880 and 2451910 or ss_sold_date_sk between 2452245 and 2452275 or ss_sold_date_sk between 2452610 and 2452640) group by dt.d_year, item.i_brand, item.i_brand_id order by dt.d_year, sum_agg desc, brand_id limit 100; TPCDS q3.sql create table narrow_store_sales_3m as select ss_sold_date_sk, ss_item_sk, ss_sold_date_sk from store_sales where ss_sold_date_sk >= (julian_day(now() - 3 months));
  8. 8. Quark Materialized View in Quark create view store_sales_view as select ss_sold_date_sk, ss_item_sk, ss_sold_date_sk from store_sales where ss_sold_date_sk >= (julian_day(now() - 3 months)); stored in narrow_store_sales_3m
  9. 9. Quark • Sort on non-partitioned columns. • For e.g. in TPCDS, store_sales is partitioned by ss_sold_date_sk, sorted by ss_item_sk Sorted Tables 0 10 20 30 40 50 60 70 80 90 100 0 100 200 300 400 500 q27 q3 q42 q52 q55 q7 q89 q98 Base Tables Denormalized % Speedup
  10. 10. Quark Materialized View in Quark create view store_sales_sorted as select * from store_sales where ss_sold_date_sk >= (julian_day(now() - 3 months)); order by ss_sold_date_sk, ss_item_sk; stored in sorted_store_sales_3m
  11. 11. Quark • Join & store store_sales and items table in TPCDS • Only star schema joins supported. • FK-PK joins only. Denormalized Tables 0 10 20 30 40 50 60 70 80 0 200 400 600 800 1000 1200 1400 1600 q19 q3 q42 q43 q46 q52 q53 q55 q59 q63 q68 q7 q73 q79 q89 q98 Unsorted Sorted % Speedup
  12. 12. Quark Materialized View in Quark create view store_sales_items_view as select * from store_sales join items on ss_item_sk = i_item_sk where ss_sold_date_sk >= (julian_day(now() - 3 months)); order by ss_sold_date_sk, ss_item_sk; stored in sorted_store_sales_items_3m
  13. 13. Quark • Cube are stored in a table • Cube on partial data - for e.g. 3 months • Incremental Cubes OLAP Cubes create cube store_sales_cube as select sum( … ), … from store_sales join items on ss_item_sk = i_item_sk join … where ss_sold_date_sk >= (julian_day(now() - 3 months)); group by by i_item_sk, dd_year, … stored in sorted_store_sales_cube_3m
  14. 14. Quark • Quark supports multiple technologies. • Views or Cubes can span data bases – Store your cube in Redshift or HBase or Elastic Search • Redirect your lookup queries to Apache HBase Bring your own Storage & SQL Engine
  15. 15. Quark Table store_sales partitioned by year, month select .... from date_dim dt, store_sales, item where .... -- partition key filters and (ss_sold_date_sk between 2451149 and 2451179 or ss_sold_date_sk between 2451514 and 2451544 or ss_sold_date_sk between 2451880 and 2451910 or ss_sold_date_sk between 2452245 and 2452275 or ss_sold_date_sk between 2452610 and 2452640) .... Predicate Injection -- Inject predicate year between 1998 and 2002 and month in (11, 12)
  16. 16. Quark Apache Kylin and Apache Lens Comparison ● Quark supports many optimized storage structures ○ Materialized Views ○ Predicate Injections ● Quark encourages a mix of storage and SQL Engines (Apache Kylin) ● ANSI SQL (Apache Lens) ● DDL Statements ● No UI/API or Web Services. JDBC Server/Client only.
  17. 17. Quark Architecture JDBC Client Quark Server Catalog Hive DWH K-V Store Catalog Optimizer Execution Engine MV and Cube Definitions Avatica + Protobuf API Get Catalog. Execute Queries.
  18. 18. Quark Materialized Views [CALCITE-749] Add MaterializationService.TableFactory [CALCITE-786] Detect if materialized view can be used to rewrite a query in non-trivial cases [CALCITE-787] Star table wrongly assigned to materialized view [CALCITE-925] Match materialized views when predicates contain strings and ranges OLAP Cubes [CALCITE-758] Use more than one lattice in the same query Cost Based Optimizer [CALCITE-1003] Utility to convert RelNode to SQL [CALCITE-1010] FETCH?LIMIT and PFFSET in RelToSqlConverter [CALCITE-1109] Fix up condition when pushing Filter through Aggregate [CALCITE-1130] Add support for operators IS_NULL and IS_NOT_NULL in RexImplicationChecker [CALCITE-1216] Rule to convert Filter-on-Scan to materialized view Contributions to Apache Calcite
  19. 19. Quark Quark as a Service 1. Register DBs as DbTaps 2. Submit QuarkCommand Account Info including DbTaps
  20. 20. Quark • Optimizer – Materialized Views and Joins. – Statistics - Choose among MVs or SQL engines. • Multi-Store – SQL Dialects – JIT Function definitions – Query Life Cycle & Management • ETL – Integrate with Workflow engines like Apache Oozie or Airflow. RoadMap
  21. 21. Quark Github: https://github.com/qubole/quark/ Mailing List: quark-dev@googlegroups.com Subscribe: quark-dev+subscribe@googlegroups.com Unsubscribe: quark-dev+unsubscribe@googlegroups.com Gitter: https://gitter.im/qubole/quark Co-ordinates

×