HEKATON
SQL Server’s Memory-Optimized OLTP Engine
Presented by: Prutha Date and Siraj Memon
Outline
● Introduction
● Design Consideration
● High-Level Architecture
● Storage and Indexing
● Programmability and Query Processing
● Transaction Management and Logging
● Garbage Collection
● Experimental Results
● Conclusion
● Demo
Introduction
● Database Engine: Optimized for Memory-resident data
● Targeted for OLTP workloads
● Integrated into SQL Server and uses T-SQL
● Fully transactional and durable
● Tables - Compiled into machine code
● Two Index Types: Hash Index and Range Index
● High-level of concurrency
OLTP (Online Transaction Processing)
T-SQL (Transact - Structured Query Language)
Terminology
● Hekaton Table
● Hekaton Index
● Regular Table
● Regular Index
● Compiled Stored Procedure
● Interpreted Stored Procedure
Competitors
● Commercial
● VoltDB
● SAP in-memory computing
● Oracle TimesTen
● IBM SolidDB
● Research
● Hyrise
● H-store
● HyPer
Architectural Principles
● Optimize Indexes for main memory
● Uses lock-free hash tables and Bw-trees for optimized indexing
● Index operations not logged
● Rebuilding indexes during recovery
● Eliminate Latches and Lock
● Latch-free data structure – No latches or spinlocks
● Optimistic Multi-version concurrency control – transaction isolation
● Compile requests to native code
● Decisions: Compile time rather than Runtime
● Converts statements in T-SQL into customized, highly-efficient machine
code
Partitioning – We don’t like..
● Problem with Partitioning
● Secondary Indexes
● Works great ONLY if workload is also partitionable
● Not sufficiently robust for SQL server
● Any thread can access any part of the database
● Single Shared hash table
High Level Architecture
● Hekaton Storage Engine
● Manages user data and indexes
● Base mechanism for storage, check-pointing and high-availability
● Hekaton Compiler
● Abstract tree representation of T-SQL stored procedure
● Compiles the procedure into native code
● Hekaton Runtime System
● Integration with SQL Server resources
● Common library of additional functionality
Hekaton and SQL Server
Storage and Indexing
● Two types of Index
● Hash Index: Lock-free hash tables
● Range Index: Bw-trees
● Use of Multiversioning – Updates create new version
● Reads:
● Read operation specifies a logical read time and only versions whose valid
time overlaps the read time are visible to the read
● At most one version is visible
● Updates:
● Delete Old - Insert New
Storage and Indexing (continued)
Architecture of Hekaton Compiler
Programmability and Query Processing
● Compile-once Execute-many-times
● High level of language compatibility
● Reuse of SQL Server T-SQL compilation stack
● Output of Hekaton compiler is C code
● Invoking the compiler:
● During creation of a memory optimized table
● During creation of a compiled stored procedure
Schema Compilation
● Hekaton storage engine treats records as opaque objects
● Hekaton compiler provides the engine with customized callback
functions for each table
● Task of Callback functions
● Computing a hash function on a key or record
● Comparing two records
● Serializing a record into a log buffer
● Callback functions are compiled into Native code which makes index
operations extremely efficient
Compiled Stored Procedure
● Compatibility issues between T-SQL and C datatypes
● Problem Solver:
● MAT (Mixed Abstract Tree)
● PIT (Pure Imperative Tree)
● Each operator implements a common interface so that they can be
composed into arbitrarily complex plans
● Entire Query plan into a single function using labels and gotos
● Supports both blocking and non-blocking operators
Example
Fig.1: Sample T-SQL Procedure Fig.2: Query Plan
Fig.3: Operator interconnections for Sample Procedure
Query Interop
● Restrictions of Compiled Stored Procedures
● Supports limited set of options
● Stored procedures must execute in a predefined security context
● Must execute in the context of a single transaction
● Ad-hoc mechanism that enables conventional query execution engine
to access memory optimized tables
● Features
● Import and Export for memory optimized tables
● Ad-hoc queries and data repair support
● Support for transactions that access both kind of tables
● Ease of app migration
Transaction Management
● Hekaton utilizes optimistic multiversion concurrency control (MVCC)
to provide snapshot, repeatable read and serializable transaction
isolation without locking
● Serializable – guarantee that transaction will see exactly the same
data if all its reads were repeated at the end of the transaction
● Properties to ensure serializability:
● Read stability
● Phantom avoidance
● Timestamps are used to specify
● Valid Time
● Logical Read Time
● Commit/End time
● Version visible if Begin Time < Read Time < Execution Time
Transaction Commit Processing
● Validation and Dependencies
● Obtain End timestamp
● Validate for Read Stability and Phantom Avoidance
● Commit Dependency
● Dependency counter
● Read barrier
● Commit Logging and Post-Processing
● Changes to database are logged to transaction log
● Update versions with end timestamp of transactions
● Transaction Rollback
● Invalidate all versions created by the transaction using Write Sets.
Transaction Durability
● Uses transaction logs and checkpoints to ensure durability
● Integrated with Always-On component that maintains highly available
replicas
● Data on external storage consists of –
● Log streams (Logical effects of committed transactions to redo it)
● Checkpoint streams (Compressed representation of the log)
● Data Stream (all inserted versions during a timestamp interval)
● Delta Stream (a dense list of integers identifying deleted versions for its
corresponding data stream)
● Note: Index operations are not logged; They are reconstructed on
recovery.
Transaction Logging and Checkpoints
● Transaction Logging
● One transaction – one log file
● Does not use WAL (Write-ahead logging)
● Uses a single log stream per database
● Checkpoints
● Continuous Checkpointing
● Streaming I/O
● Checkpoint Files and Checkpoint Process
● Recovery
● Parallelism within Hekaton
● Parallelism between SQL Server and Hekaton
Garbage Collection
● Version of a record is garbage if it is no longer visible to any active
transaction
● Properties of GC subsystem: Non-blocking, co-operative, incremental,
parallelizable and scalable
● Garbage Correctness
● Version whose end timestamp < Oldest active transaction is not
visible
● Version becomes garbage if -
●Deleted (Explicit DELETE or through UPDATE)
●Cannot be read or acted upon by any active transaction
●Transaction Rollback
● Garbage Removal
● Unlink from indexes
● Reclaim the version
Experimental Results - CPU Efficiency
Comparison of CPU efficiency for lookups Comparison of CPU efficiency for updates
Experimental Results - Scaling Under
Contention
• Experiment illustrating scalability of Hekaton engine
Conclusion
● Optimized in-memory OLTP workloads oriented database engine by
Microsoft
● Fully integrated with SQL Server
● Uses latch-free data structures, multi-versioning concurrency control,
compiled T-SQL stored procedure
● Ensure durability by logging and checkpointing
● High availability – SQL Server’s Always-On feature
● Order of magnitude improvement in efficiency and scalability with
minimal changes to user applications.
References
● http://vldb.org/pvldb/vol5/p298_per-akelarson_vldb2012.pdf
● http://nms.csail.mit.edu/~stavros/pubs/OLTP_sigmod08.pdf
● http://www.cs.cmu.edu/~pavlo/courses/fall2013/static/papers/edbt09shor
emt.pdf
● http://research.microsoft.com/pubs/178758/bw-tree-icde2013-final.pdf
● https://voltdb.com/
● http://llvm.org/
● http://www.oracle.com/technetwork/database/database-
technologies/timesten/overview/index.html
Demo
THANK YOU
Questions??

Microsoft Hekaton

  • 1.
    HEKATON SQL Server’s Memory-OptimizedOLTP Engine Presented by: Prutha Date and Siraj Memon
  • 2.
    Outline ● Introduction ● DesignConsideration ● High-Level Architecture ● Storage and Indexing ● Programmability and Query Processing ● Transaction Management and Logging ● Garbage Collection ● Experimental Results ● Conclusion ● Demo
  • 3.
    Introduction ● Database Engine:Optimized for Memory-resident data ● Targeted for OLTP workloads ● Integrated into SQL Server and uses T-SQL ● Fully transactional and durable ● Tables - Compiled into machine code ● Two Index Types: Hash Index and Range Index ● High-level of concurrency OLTP (Online Transaction Processing) T-SQL (Transact - Structured Query Language)
  • 4.
    Terminology ● Hekaton Table ●Hekaton Index ● Regular Table ● Regular Index ● Compiled Stored Procedure ● Interpreted Stored Procedure
  • 5.
    Competitors ● Commercial ● VoltDB ●SAP in-memory computing ● Oracle TimesTen ● IBM SolidDB ● Research ● Hyrise ● H-store ● HyPer
  • 6.
    Architectural Principles ● OptimizeIndexes for main memory ● Uses lock-free hash tables and Bw-trees for optimized indexing ● Index operations not logged ● Rebuilding indexes during recovery ● Eliminate Latches and Lock ● Latch-free data structure – No latches or spinlocks ● Optimistic Multi-version concurrency control – transaction isolation ● Compile requests to native code ● Decisions: Compile time rather than Runtime ● Converts statements in T-SQL into customized, highly-efficient machine code
  • 7.
    Partitioning – Wedon’t like.. ● Problem with Partitioning ● Secondary Indexes ● Works great ONLY if workload is also partitionable ● Not sufficiently robust for SQL server ● Any thread can access any part of the database ● Single Shared hash table
  • 8.
    High Level Architecture ●Hekaton Storage Engine ● Manages user data and indexes ● Base mechanism for storage, check-pointing and high-availability ● Hekaton Compiler ● Abstract tree representation of T-SQL stored procedure ● Compiles the procedure into native code ● Hekaton Runtime System ● Integration with SQL Server resources ● Common library of additional functionality
  • 9.
  • 10.
    Storage and Indexing ●Two types of Index ● Hash Index: Lock-free hash tables ● Range Index: Bw-trees ● Use of Multiversioning – Updates create new version ● Reads: ● Read operation specifies a logical read time and only versions whose valid time overlaps the read time are visible to the read ● At most one version is visible ● Updates: ● Delete Old - Insert New
  • 11.
  • 12.
  • 13.
    Programmability and QueryProcessing ● Compile-once Execute-many-times ● High level of language compatibility ● Reuse of SQL Server T-SQL compilation stack ● Output of Hekaton compiler is C code ● Invoking the compiler: ● During creation of a memory optimized table ● During creation of a compiled stored procedure
  • 14.
    Schema Compilation ● Hekatonstorage engine treats records as opaque objects ● Hekaton compiler provides the engine with customized callback functions for each table ● Task of Callback functions ● Computing a hash function on a key or record ● Comparing two records ● Serializing a record into a log buffer ● Callback functions are compiled into Native code which makes index operations extremely efficient
  • 15.
    Compiled Stored Procedure ●Compatibility issues between T-SQL and C datatypes ● Problem Solver: ● MAT (Mixed Abstract Tree) ● PIT (Pure Imperative Tree) ● Each operator implements a common interface so that they can be composed into arbitrarily complex plans ● Entire Query plan into a single function using labels and gotos ● Supports both blocking and non-blocking operators
  • 16.
    Example Fig.1: Sample T-SQLProcedure Fig.2: Query Plan Fig.3: Operator interconnections for Sample Procedure
  • 17.
    Query Interop ● Restrictionsof Compiled Stored Procedures ● Supports limited set of options ● Stored procedures must execute in a predefined security context ● Must execute in the context of a single transaction ● Ad-hoc mechanism that enables conventional query execution engine to access memory optimized tables ● Features ● Import and Export for memory optimized tables ● Ad-hoc queries and data repair support ● Support for transactions that access both kind of tables ● Ease of app migration
  • 18.
    Transaction Management ● Hekatonutilizes optimistic multiversion concurrency control (MVCC) to provide snapshot, repeatable read and serializable transaction isolation without locking ● Serializable – guarantee that transaction will see exactly the same data if all its reads were repeated at the end of the transaction ● Properties to ensure serializability: ● Read stability ● Phantom avoidance ● Timestamps are used to specify ● Valid Time ● Logical Read Time ● Commit/End time ● Version visible if Begin Time < Read Time < Execution Time
  • 19.
    Transaction Commit Processing ●Validation and Dependencies ● Obtain End timestamp ● Validate for Read Stability and Phantom Avoidance ● Commit Dependency ● Dependency counter ● Read barrier ● Commit Logging and Post-Processing ● Changes to database are logged to transaction log ● Update versions with end timestamp of transactions ● Transaction Rollback ● Invalidate all versions created by the transaction using Write Sets.
  • 20.
    Transaction Durability ● Usestransaction logs and checkpoints to ensure durability ● Integrated with Always-On component that maintains highly available replicas ● Data on external storage consists of – ● Log streams (Logical effects of committed transactions to redo it) ● Checkpoint streams (Compressed representation of the log) ● Data Stream (all inserted versions during a timestamp interval) ● Delta Stream (a dense list of integers identifying deleted versions for its corresponding data stream) ● Note: Index operations are not logged; They are reconstructed on recovery.
  • 21.
    Transaction Logging andCheckpoints ● Transaction Logging ● One transaction – one log file ● Does not use WAL (Write-ahead logging) ● Uses a single log stream per database ● Checkpoints ● Continuous Checkpointing ● Streaming I/O ● Checkpoint Files and Checkpoint Process ● Recovery ● Parallelism within Hekaton ● Parallelism between SQL Server and Hekaton
  • 22.
    Garbage Collection ● Versionof a record is garbage if it is no longer visible to any active transaction ● Properties of GC subsystem: Non-blocking, co-operative, incremental, parallelizable and scalable ● Garbage Correctness ● Version whose end timestamp < Oldest active transaction is not visible ● Version becomes garbage if - ●Deleted (Explicit DELETE or through UPDATE) ●Cannot be read or acted upon by any active transaction ●Transaction Rollback ● Garbage Removal ● Unlink from indexes ● Reclaim the version
  • 23.
    Experimental Results -CPU Efficiency Comparison of CPU efficiency for lookups Comparison of CPU efficiency for updates
  • 24.
    Experimental Results -Scaling Under Contention • Experiment illustrating scalability of Hekaton engine
  • 25.
    Conclusion ● Optimized in-memoryOLTP workloads oriented database engine by Microsoft ● Fully integrated with SQL Server ● Uses latch-free data structures, multi-versioning concurrency control, compiled T-SQL stored procedure ● Ensure durability by logging and checkpointing ● High availability – SQL Server’s Always-On feature ● Order of magnitude improvement in efficiency and scalability with minimal changes to user applications.
  • 26.
    References ● http://vldb.org/pvldb/vol5/p298_per-akelarson_vldb2012.pdf ● http://nms.csail.mit.edu/~stavros/pubs/OLTP_sigmod08.pdf ●http://www.cs.cmu.edu/~pavlo/courses/fall2013/static/papers/edbt09shor emt.pdf ● http://research.microsoft.com/pubs/178758/bw-tree-icde2013-final.pdf ● https://voltdb.com/ ● http://llvm.org/ ● http://www.oracle.com/technetwork/database/database- technologies/timesten/overview/index.html
  • 27.
  • 28.