Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
In search of database nirvana
The challenges of delivering Hybrid Transaction/Analytical Processing
Rohit Jain, CTO – 2016...
Agenda
The swinging database pendulum
Hybrid Transaction/Analytical Processing (HTAP) Workloads
Query versus storage engin...
RDBMS
The swinging database pendulum
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
RDBMS challenges with Big Dat...
The swinging database pendulum
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
But enterprises wanted SQL
• Skills...
Hybrid Transaction/Analytical
Processing (HTAP) Workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
OLTP
• M...
Query versus storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Hadoop Cluster
Switch Switch
Operatio...
The challenges of HTAP:
Single query engine for all workloads
Data structure – key support, clustering, partitioning
Stati...
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and...
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and...
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and...
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and...
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Inde...
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and...
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and...
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and...
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and...
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and...
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and...
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and...
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and...
The challenges of HTAP:
Supporting multiple storage engines
Statistics
Key structure
Partitioning
Data type support
Projec...
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transa...
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
BA CMu...
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transa...
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transa...
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transa...
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transa...
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transa...
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Mapp...
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transa...
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Single...
The challenges of HTAP:
Supporting multiple storage engines
Statistics
Key structure
Partitioning
Data type support
Projec...
The challenges of HTAP:
Supporting multiple storage engines
Statistics
Key structure
Partitioning
Data type support
Projec...
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transa...
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Statis...
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Stor...
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transa...
The challenges of HTAP:
Same data model for all workloads …
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Normal...
The challenges of HTAP:
Same data model for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Normal f...
The challenges of HTAP:
Same data model for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
NoSQL Da...
The challenges of HTAP:
Enterprise-caliber capabilities
High Availability
Security
Manageability
(C) Copyright 2015 Esgyn ...
The challenges of HTAP:
Enterprise-caliber capabilities
High Availability
Security
Manageability
(C) Copyright 2015 Esgyn ...
The challenges of HTAP:
Enterprise-caliber capabilities
High Availability
Security
Manageability
(C) Copyright 2015 Esgyn ...
The challenges of HTAP:
Enterprise-caliber capabilities
High Availability
Security
Manageability
(C) Copyright 2015 Esgyn ...
The challenges of HTAP:
Enterprise-caliber capabilities
High Availability
Security
Manageability
(C) Copyright 2015 Esgyn ...
Conclusion
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Detailed O’Reilly report:
http://www.oreilly.com/data/f...
Upcoming SlideShare
Loading in …5
×

In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

309 views

Published on

Companies are looking for a single database engine that can address all their varied needs—from transactional to analytical workloads, against structured, semi-structured, and unstructured data, leveraging graph, document, text search, column, key value, wide column, and relational data stores; on a single platform without the latency of data transformation and replication. They are looking for the ultimate database nirvana.

The term hybrid transactional/analytical processing (HTAP), coined by Gartner, perhaps comes closest to describing this concept. 451 Research uses the terms convergence or converged data platform. The terms multi-model or unified are also used. But can such a nirvana be achieved? Some database vendors claim to have already achieved this nirvana. In this talk we will discuss the following challenges on the path to this nirvana, for you to assess how accurate these claims are:
· What is needed for a single query engine to support all workloads?
· What does it take for that single query engine to support multiple storage engines, each serving a different need?
· Can a single query engine support all data models?
· Can it provide enterprise-caliber capabilities?

Attendees looking to assess query and storage engines would benefit from understanding what the key considerations are when picking an engine to run their targeted workloads. Also, developers working on such engines can better understand capabilities they need to provide in order to run workloads that span the HTAP spectrum.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

  1. 1. In search of database nirvana The challenges of delivering Hybrid Transaction/Analytical Processing Rohit Jain, CTO – 2016 rohit.jain@esgyn.com (C) Copyright 2015 Esgyn Corporation Esgyn Confidential
  2. 2. Agenda The swinging database pendulum Hybrid Transaction/Analytical Processing (HTAP) Workloads Query versus storage engines The challenges of HTAP ◦ Single query engine for all workloads ◦ Supporting multiple storage engines ◦ Same data model for all workloads ◦ Enterprise-caliber capabilities Conclusion (C) Copyright 2015 Esgyn Corporation Esgyn Confidential
  3. 3. RDBMS The swinging database pendulum (C) Copyright 2015 Esgyn Corporation Esgyn Confidential RDBMS challenges with Big Data • High TCO • Lack of elastic scalability • Did not meet performance requirements • No support for semi-structured & unstructured data • Inability to parallelize user code • No schema flexibility • Too complex for simple needs NoSQL Enter NoSQL – polyglot programming & persistence • Key value stores • Wide column stores (Big Table) • Document stores • Text search • Graph database • Column stores
  4. 4. The swinging database pendulum (C) Copyright 2015 Esgyn Corporation Esgyn Confidential But enterprises wanted SQL • Skills prevalent • Existing tools & applications • Transaction support often useful • More efficient when joins needed • Easier than coding MapReduce • Merit in rigor of pre-defining columns • Uniform metadata across applications NoSQL But still … • Too many languages, interfaces, & data structures • Too much of gluing technologies together • Compatibility between different versions • No end-to-end view of workload performance • Support contracts with multiple vendors • Too many skills required to develop and manage • Too much data movement • No single solution for varied interfaces & use cases SQL
  5. 5. Hybrid Transaction/Analytical Processing (HTAP) Workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential OLTP • Mostly transactional • Sub-second response • Customer experience • Large update volume • Online updates • No historical data • High concurrency • Scales linearly • Normalized data model • Custom applications or third-party solutions • Keyed updates/queries • Mostly SMP; MPP for web-scale ODS • Can be transactional • Sub-second to seconds • Customer experience or Business internal • Low update volume • Batch to streaming feeds from OLTP • Some historical data • Low concurrency if internal, high otherwise • Near linear scale • Normalized data model • Custom apps/3rd party • Keyed queries • Mostly MPP BI • Non-transactional • Seconds to minutes • Business internal • No direct updates • Batch to streaming feeds from OLTP/ODS • Historical data • Low to high concurrency • Less linear in scale • Dimension data model • BI, OLAP, ROLAP tools – reporting and dashboards • Ad hoc and scheduled queries and large extracts • Mostly MPP Analytics • Non-transactional • Minutes to hours • Business internal • No direct updates • Batch/aggregates from BI • Historical and big data • Low concurrency • Complex queries, nonlinear scale • Columnar store • Analytical tools • Ad hoc queries; Analytics in database • Mostly MPP Essential to operate the business To improve performance of the company
  6. 6. Query versus storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Hadoop Cluster Switch Switch Operational Business Intelligence Analytics Query Engine • Allow clients to connect & submit queries • Distribute connections across cluster • Compile query • Execute query • Return results of query to client Storage Engine • Storage structure • Partitioning • Automatic data repartitioning • Select columns • Select rows based on predicates • Caching writes and reads • Clustering by key • Fast access paths or filtering • Transactional support • Replication • Compression & encryption • Mixed workload support • Bulk data ingest/extract • Indexing • Colocation or node locality • Data governance • Security • Disaster recovery • Backup, archive, restore • Multi-temperature data support In-memory Single Query Engine
  7. 7. The challenges of HTAP: Single query engine for all workloads Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support (C) Copyright 2015 Esgyn Corporation Esgyn Confidential
  8. 8. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support Table A Table B Partitioned The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Salting / Partitioning (hash, range, …) Salt key G D C EF Non-partitioned G D C F E Clustered by Primary Key BA C Multi-column clustering key
  9. 9. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Equal-height histograms • Unique Entry Count • Lowest and highest values • Multiple key / join column cardinalities • Sampling for fast stats updates • Incremental update stats • Skew – equal height histograms
  10. 10. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential 80 minutes 2 minutes Skew Buster
  11. 11. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Week Item Store … 01/07/2016 1 1 … 01/07/2016 1 3 … 01/07/2016 1 5 … 01/07/2016 2 34 … 01/07/2016 3 13 … 01/07/2016 3 3 … 01/07/2016 4 2 … 01/07/2016 4 4 … 01/14/2016 1 2 … 01/14/2016 1 4 … 01/14/2016 1 5 … 01/14/2016 1 35 … 01/14/2016 3 1 … 01/14/2016 3 20 … Where is item = 1, Stores 2 through 5? • Use of various statistics to generate an efficient plan • Sequence of column access for column stores
  12. 12. The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Indexes • Kinds of indexes and how they are leveraged • Unique index • Transactional consistency with base table • Impact on updates • Updates during bulk loads Materialized Views • Synchronous and asynchronous maintenance • Overhead of maintenance • Automatic query rewrite • User defined materialized views Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support
  13. 13. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Serial vs parallel plans Node 1 Node 2 Node n Client Application HDFS HBase Region 1 Filters HDFS HDFS HDFS HDFS Ethernet Coprocessors HBase Region 2 HBase Region 3 HBase Region 4 HBase Region 5 Master Master Multi- fragment Master ESP ESP ESP ESP ESP ESP ESP ESP ESP ESP
  14. 14. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Qry1 Qry2Qry4 Qry3Qry5 Qry6 Qry7
  15. 15. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Optimizer technology, e.g., Cascades used by Apache Trafodion and Microsoft SQL Server • Query plan caching for operational • Query plan cache management • Extensibility of optimizer to evolve with varied workloads • Recognizing query patterns, such as star joins
  16. 16. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Adaptive and parallel joins • Nested join • Probe cache for nested join • Merge join • Matching partition join • Repartitioned hash join • Replication by broadcast hash join • Inner / outer child broadcast • Dimensional schema star join • Inner join • Left Join • Right Join • Full Outer Join • Self join Cost Premiums for nested joins or serial plans
  17. 17. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Compute Cost Execution Environment Physical Properties Estimates Confidence Cardinality, Distribution, Correlation Sensitivity To Estimates Evaluate Risk Risk Adjustment Benefit Risk Risk Premiums • Nested join 20% • Merge join 10% • Serial plan 5%   ?
  18. 18. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Scan Scan Join Group by • Data flow architecture • No materialization of intermediate results • Graceful overflow to disk for large memory operations • Efficiencies such as pre-fetch • Fast path for operational workloads
  19. 19. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support • Priority / SLA-based execution • Allocation of resources by service level • Decrease priority with usage increase • Anti-starvation / switch between queries based on priority The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Query Low Query Medium Queue Memstore HBase …. Memstore HBase Memstore HBase Queue Queue HBase Region 1 HBase Region 3 HBase Region 5 Query High Low Low Low Medium MediumMedium High HighHighLow Low Low Medium MediumMedium High HighHigh
  20. 20. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Operational workloads • Referential integrity • Stored procedures • Triggers • Various levels of transactional isolation and consistency • … BI and Analytics workloads • Materialized views • Fast / bulk extract, transform, load (ETL) • OLAP, time series, statistical, data mining, and other functions • … Needed by both • Scalar and table mapping UDFs • Inner, outer, and full outer joins • Un-nesting of subqueries • Converting correlated subqueries to joins • Predicate push down • Sort avoidance strategies • Constant folding • Recursive union • …
  21. 21. The challenges of HTAP: Supporting multiple storage engines Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects (C) Copyright 2015 Esgyn Corporation Esgyn Confidential
  22. 22. Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Storage engine statistics, used by query engine • Sampling • Access to changed data for incremental updates • Update counters to determine refresh schedule Refresh
  23. 23. The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential BA CMulti-column key Query Engine Storage Engine A+B+C Single clustering key Random single row and range access for operational workloads 31 5 51 7 22 4 22 9 32 4 42 1 23 1 23 2 A=2 range access Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects
  24. 24. Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Data partitioning across disks and nodes • Hash, range, or combination • Salting support • Query engine imposed salting • Repartitioning as the cluster expands/contracts • Read/write access while being rebalanced • Localize data access to avoid shuffling CREATE TABLE t(a integer not null primary key, b integer) SALT USING 4 PARTITIONS; HBase Region HDFS HBase Region HDFS HBase Region HDFS HBase Region HDFS INSERT(s) SELECT(s) PART 1 PART 2 PART 3 PART 4
  25. 25. Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects The challenges of HTAP Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Data types supported • Query to storage engine data type mapping • Value constraint enforcement CHARACTER(n) Character string. Fixed-length n VARCHAR(n) or CHARACTER VARYING(n) Character string. Variable length. Maximum length n BINARY(n) Binary string. Fixed-length n BOOLEAN Stores TRUE or FALSE values VARBINARY(n) or BINARY VARYING(n) Binary string. Variable length. Maximum length n INTEGER(p) Integer numerical (no decimal). Precision p SMALLINT Integer numerical (no decimal). Precision 5 INTEGER Integer numerical (no decimal). Precision 10 BIGINT Integer numerical (no decimal). Precision 19 DECIMAL(p,s) Exact numerical, precision p, scale s. Example: decimal(5,2) is a number that has 3 digits before the decimal and 2 digits after the decimal NUMERIC(p,s) Exact numerical, precision p, scale s. (Same as DECIMAL) FLOAT(p) Approximate numerical, mantissa precision p. A floating number in base 10 exponential notation. The size argument for this type consists of a single number specifying the minimum precision REAL Approximate numerical, mantissa precision 7 FLOAT Approximate numerical, mantissa precision 16 DOUBLE PRECISION Approximate numerical, mantissa precision 16 DATE Stores year, month, and day values TIME Stores hour, minute, and second values TIMESTAMP Stores year, month, day, hour, minute, and second values INTERVAL Composed of a number of integer fields, representing a period of time, depending on the type of interval ARRAY A set-length and ordered collection of elements
  26. 26. Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects The challenges of HTAP Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Data types supported • Query to storage engine data type mapping • Value constraint enforcement • Referential constraints • Character sets • Collations • Compression • Encryption
  27. 27. Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Projection at storage or query engine level • Predicates evaluated by query and storage engines • Predicates applied to compressed data • Multi-column predicates • IN lists; size of IN lists • Multiple predicates with ORs and ANDs (pushdown) • Evaluate predicates in sequence of filtering effectiveness • Predicates comparing different columns of same table • Complex expression evaluation • Evaluation of functions • Default or missing values on retrieval C2C1 C3 G1 7 R2 4 F2 9 T2 4 B2 1 .... .. C5C4 C6 23 T 15 F 57 R 89 M 82 N .... .. project
  28. 28. Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Server side extensibility e.g. HBase coprocessors or Cassandra triggers to push down: • Complex predicate evaluation with expressions and functions • Pre-aggregation • Collocated joins or index maintenance • Transactional support • Security enforcement • Some ANSI Trigger actions
  29. 29. The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Mapping of security frameworks for the query and storage engines to enforce ANSI SQL security • Integration with underlying Hadoop Kerberos security • Integration with security solutions, like Sentry or Ranger • Integration with security logging and SIEM solutions Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects
  30. 30. Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Replication for high availability, backup and restore, and multi-data center support from query & storage engines • ACID or BASE transactional support • Integration between the query and storage engines, such as write ahead logs, and use of coprocessors • Completely scalable and distributed transaction management architecture • Multi datacenter support – active-active single or multiple master replication • Overhead of transactions on throughput and system resources • Online backup and point in time recovery
  31. 31. The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Single-Master Multiple-Masters Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects
  32. 32. The challenges of HTAP: Supporting multiple storage engines Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects Time Full transactionally consistent snapshot Snapshots after non-transactional changes such as bulk loads Transactional changes captured continuously Point-in-time recovery
  33. 33. The challenges of HTAP: Supporting multiple storage engines Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects Point-in-time recovery Time Drop table or erroneous large transactional update Restore previous full snapshot Initiate recovery to point-in-time
  34. 34. Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Mapping storage to query engine metadata • Handling storage engine specific options • Support provided for external tables • Changes to external tables outside of the query engine • Operational vs. analytics objects
  35. 35. The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects As nodes are added query engine immediately uses them for queries and transactions Storage engine rebalances data automatically • Transactional consistency across bulk loads • Rowset inserts and selects • Fast scanning options – snapshot scans, prefetching • Integration for parallel operations • Concurrency and mixed workload capability • Elastic scale for Cloud deployments
  36. 36. The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Storage and query engine error logging • Mapping of storage engine errors to meaningful error messages and resolution options by the query engine Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects
  37. 37. Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Minimize operational and performance impact of storage engine operational aspects, e.g., compaction or splitting
  38. 38. The challenges of HTAP: Same data model for all workloads … (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Normal Form Normal form • 1NF • 2NF • 3NF • BCNF • 4NF • 5NF • 6NF Star Schema Snowflake Schema Query engine integration with storage engine(s) to support all these data models
  39. 39. The challenges of HTAP: Same data model for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Normal form • 1NF • 2NF • 3NF • BCNF • 4NF • 5NF • 6NF Star Schema Snowflake Schema Query engine integration with storage engine(s) to support all these data models
  40. 40. The challenges of HTAP: Same data model for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential NoSQL Data Models “NoSQL Data Modeling Techniques” by Ilya Katsov Highly Scalable Blog … and these!
  41. 41. The challenges of HTAP: Enterprise-caliber capabilities High Availability Security Manageability (C) Copyright 2015 Esgyn Corporation Esgyn Confidential
  42. 42. The challenges of HTAP: Enterprise-caliber capabilities High Availability Security Manageability (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Percentage of uptime 99.99% = 52.56 minutes downtime to 99.999% = 5.26 • Online operations (data available for reads and writes) o Upgrading the OS o Upgrading the file system o Upgrading the storage engine o Upgrading the query engine o Redistribute data to accommodate node and/or disk expansions and contractions o Changing table definition, e.g., data type changes, and adding, dropping, renaming columns o Create/drop secondary indexes o Full and incremental backups
  43. 43. The challenges of HTAP: Enterprise-caliber capabilities High Availability Security Manageability (C) Copyright 2015 Esgyn Corporation Esgyn Confidential
  44. 44. The challenges of HTAP: Enterprise-caliber capabilities High Availability Security Manageability (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Schema Management Performance Management Monitoring Security Management BAR Management Object Management Performance Monitoring Database Monitor User Management Backup Analysis Graphical Object Editor Live Performance Monitoring Event Monitoring Role Management Recovery Cross-Platform Schema Knowledge Data Repository Live Event Monitoring Account Migration Log Backup Bottleneck Analysis Threshold Alerts Audit Report Backup Reports SQL Management Job/Workload Analysis Health Index Alarm Archival Query Builder Job/Workload Wizard Live Health Monitoring Visual Difference Tool Job/Workload Management Response Times Maintenance Configuration Management Data Management Live Job/Workload Monitoring Alert Center Repository Aging OS Provisioning Data Migration OS Analysis Remote Monitoring Automated Maintenance Cluster Provisioning SQL Profiler Capacity Capture Central Monitoring Instance Provisioning Automated Import Capacity Trending Hardware Inventory Change Management Cloud Provisioning Visual Explain Plans Capacity Forecast Hardware Monitoring Schema Capture Configuration Editor Session Management Space Management Schema Compare and Synch Lock Management Reorganization Management Troubleshooting Notifications Process Management Query Cost Simulation Health Analysis Schema Rotation Consistency Checks Historical Reports Problem Correlation Collaboration Online Schema Evolution Bottleneck Tuning Automated Actions Virtual Changes Built-In Automation Access Path Analysis
  45. 45. The challenges of HTAP: Enterprise-caliber capabilities High Availability Security Manageability (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Operational performance by transactions per second • Analytical performance by query • Overhead of gathering metrics on operational and analytical workloads • Configurable statistics collection • Workload management by Service Level Objectives o Based on priority and/or resource allocation o High priority operational workloads vs analytical workloads • End-to-end visibility of transaction and query metrics • Metric breakdown down to the query operation • Metrics for table access across workloads down to the partition level • Skew or bottlenecks • Integration with YARN
  46. 46. Conclusion (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Detailed O’Reilly report: http://www.oreilly.com/data/free/in -search-of-database-nirvana.csp It ain’t easy!! Very few products can even come close

×