Executing Queries on a Sharded DatabaseNeha Narula
Determining a data storage solution as your web application scales can be the most difficult part of web development, and takes time away from developing application features. MongoDB, Redis, Postgres, Riak, Cassandra, Voldemort, NoSQL, MySQL, NewSQL — the options are overwhelming, and all claim to be elastic, fault-tolerant, durable, and give great performance for both reads and writes. In the first portion of this talk I’ll discuss these different storage solutions and explain what is really important when choosing a datastore — your application data schema and feature requirements.
Shard-Query, an MPP database for the cloud using the LAMP stackJustin Swanhart
This combined #SFMySQL and #SFPHP meetup talked about Shard-Query. You can find the video to accompany this set of slides here: https://www.youtube.com/watch?v=vC3mL_5DfEM
Conquering "big data": An introduction to shard queryJustin Swanhart
Shard-Query is a middleware solution that enables massively parallel query execution for MySQL databases. It works by splitting single SQL queries into multiple smaller queries or tasks that can run concurrently on one or more database servers. This allows it to scale out query processing across more CPUs and servers for improved performance on large datasets and analytics workloads. It supports both partitioning tables for parallelism within a single server as well as sharding tables across multiple servers. The end result is that it can enable MySQL to perform like other parallel database solutions through distributed query processing.
An introduction to SQL Server in-memory OLTP EngineKrishnakumar S
This is an introduction to Microsoft SQL Server In-memory Engine that was earlier code named Hekaton. It describes the basic concepts and technologies involved in the in-memory engine - This has presented in Kerala - Microsoft Users Group Meeting on May 31, 2014
Introduction to TokuDB v7.5 and Read Free ReplicationTim Callaghan
TokuDB v7.5 introduced Read Free Replication, allowing MySQL slaves to run with virtually no read IO. This presentation discusses how Fractal Tree indexes work, what they enable in TokuDB, and they allow TokuDB to uniquely offer this replication innovation.
This presentation can help you to apply partioning when appropriate, and to avoid problems when using it. The oneliner is: Simple Works Best. The illustrating demos are on Postgres12 (maybe -13 by the time of presenting) and show some of the problems and solutions that Partitioning can provide. Some of this “experience” is quite old and the demo runs near-identical on Oracle…
These problems are the same on any database.
This document provides an overview and agenda for a presentation on MySQL 5.6 performance tuning and best practices. The presentation covers analyzing MySQL workload and internals, performance improvements in MySQL 5.6 and 5.7, benchmark results, and pending issues. It emphasizes the importance of monitoring systems to understand performance bottlenecks and the need for an iterative process of monitoring, tuning, optimizing, and improving database performance over time.
Executing Queries on a Sharded DatabaseNeha Narula
Determining a data storage solution as your web application scales can be the most difficult part of web development, and takes time away from developing application features. MongoDB, Redis, Postgres, Riak, Cassandra, Voldemort, NoSQL, MySQL, NewSQL — the options are overwhelming, and all claim to be elastic, fault-tolerant, durable, and give great performance for both reads and writes. In the first portion of this talk I’ll discuss these different storage solutions and explain what is really important when choosing a datastore — your application data schema and feature requirements.
Shard-Query, an MPP database for the cloud using the LAMP stackJustin Swanhart
This combined #SFMySQL and #SFPHP meetup talked about Shard-Query. You can find the video to accompany this set of slides here: https://www.youtube.com/watch?v=vC3mL_5DfEM
Conquering "big data": An introduction to shard queryJustin Swanhart
Shard-Query is a middleware solution that enables massively parallel query execution for MySQL databases. It works by splitting single SQL queries into multiple smaller queries or tasks that can run concurrently on one or more database servers. This allows it to scale out query processing across more CPUs and servers for improved performance on large datasets and analytics workloads. It supports both partitioning tables for parallelism within a single server as well as sharding tables across multiple servers. The end result is that it can enable MySQL to perform like other parallel database solutions through distributed query processing.
An introduction to SQL Server in-memory OLTP EngineKrishnakumar S
This is an introduction to Microsoft SQL Server In-memory Engine that was earlier code named Hekaton. It describes the basic concepts and technologies involved in the in-memory engine - This has presented in Kerala - Microsoft Users Group Meeting on May 31, 2014
Introduction to TokuDB v7.5 and Read Free ReplicationTim Callaghan
TokuDB v7.5 introduced Read Free Replication, allowing MySQL slaves to run with virtually no read IO. This presentation discusses how Fractal Tree indexes work, what they enable in TokuDB, and they allow TokuDB to uniquely offer this replication innovation.
This presentation can help you to apply partioning when appropriate, and to avoid problems when using it. The oneliner is: Simple Works Best. The illustrating demos are on Postgres12 (maybe -13 by the time of presenting) and show some of the problems and solutions that Partitioning can provide. Some of this “experience” is quite old and the demo runs near-identical on Oracle…
These problems are the same on any database.
This document provides an overview and agenda for a presentation on MySQL 5.6 performance tuning and best practices. The presentation covers analyzing MySQL workload and internals, performance improvements in MySQL 5.6 and 5.7, benchmark results, and pending issues. It emphasizes the importance of monitoring systems to understand performance bottlenecks and the need for an iterative process of monitoring, tuning, optimizing, and improving database performance over time.
The document discusses best practices for running MySQL on Linux, covering choices for Linux distributions, hardware recommendations including using solid state drives, OS configuration such as tuning the filesystem and IO scheduler, and MySQL installation and configuration options. It provides guidance on topics like virtualization, networking, and MySQL variants to help ensure successful and high performance deployment of MySQL on Linux.
SQL Server In-Memory OLTP: What Every SQL Professional Should KnowBob Ward
Perhaps you have heard the term “In-Memory” but not sure what it means. If you are a SQL Server Professional then you will want to know. Even if you are new to SQL Server, you will want to learn more about this topic. Come learn the basics of how In-Memory OLTP technology in SQL Server 2016 and Azure SQL Database can boost your OLTP application by 30X. We will compare how In-Memory OTLP works vs “normal” disk-based tables. We will discuss what is required to migrate your existing data into memory optimized tables or how to build a new set of data and applications to take advantage of this technology. This presentation will cover the fundamentals of what, how, and why this technology is something every SQL Server Professional should know
10 things, an Oracle DBA should care about when moving to PostgreSQLPostgreSQL-Consulting
PostgreSQL can handle many of the same workloads as Oracle and provides alternatives to common Oracle features and practices. Some key differences for DBAs moving from Oracle to PostgreSQL include: using shared_buffers instead of SGA with a recommended 25-75% of RAM; using pgbouncer instead of a listener; performing backups with pg_basebackup and WAL archiving instead of RMAN; managing undo data in datafiles instead of undo segments; using streaming replication for high availability instead of RAC; and needing to tune autovacuum instead of manually managing redo and undo logs. PostgreSQL is very capable but may not be suited for some extremely high update workloads of 200K+ transactions per second on a single server
This document summarizes Tumblr's massively sharded MySQL database architecture. Some key points:
- Tumblr has seen huge growth in traffic and data in the past year, necessitating major database scaling. They now have over 175 machines dedicated to MySQL handling over 11TB of relational data across 25 billion rows.
- They use horizontal partitioning (sharding) across multiple MySQL instances to scale writes and handle large data sizes. Sharding by a core column allows querying the appropriate shard.
- Operational challenges include automation for adding/rebalancing shards, handling failures, and migrating/splitting very large datasets across shards without downtime.
- When splitting a large shard, the process
Sql server 2016 it just runs faster sql bits 2017 editionBob Ward
SQL Server 2016 includes several performance improvements that help it run faster than previous versions:
1. Automatic Soft NUMA partitions workloads across NUMA nodes when there are more than 8 CPUs per node to avoid bottlenecks.
2. Dynamic memory objects are now partitioned by CPU to avoid contention on global memory objects.
3. Redo operations can now be parallelized across multiple tasks to improve performance during database recovery.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.George Joseph
SAP HANA is an in-memory database system that stores data in main memory rather than on disk for faster access. It uses a column-oriented approach to optimize analytical queries. SAP HANA can scale from small single-server installations to very large clusters and cloud deployments. Its massively parallel processing architecture and in-memory analytics capabilities enable real-time processing of large datasets.
Brk3288 sql server v.next with support on linux, windows and containers was...Bob Ward
This document discusses Microsoft's plans to deliver SQL Server on Linux and other heterogeneous environments. Key points include:
- SQL Server will be available on Linux, Windows, and Docker containers, allowing choice of operating system. It will support multiple languages and tools.
- Microsoft is delivering more options in response to businesses adopting heterogeneous environments with various data types, languages, and platforms.
- The document outlines SQL Server's capabilities on Linux such as high availability, security, and tools/drivers available now or in development.
Jss 2015 in memory and operational analyticsDavid Barbarin
This document contains summaries of presentations and information about the #JSS2015 conference on SQL Server 2015 organized by GUSS. It provides information on speakers David Barbarin and Frédéric Pichaut and topics to be covered including columnstore architecture, columnstore improvements in SQL 2016, in-memory OLTP architecture and improvements, and remaining unsupported in-memory features.
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...Ivan Zoratti
The document discusses using Oracle Database 11g and MySQL together. It outlines how MySQL provides a cost-effective solution for online applications through its pluggable storage engine architecture, replication capabilities, and scaling options like sharding. MySQL Enterprise offers additional features for monitoring, management and high availability of MySQL deployments.
TokuDB is an ACID/transactional storage engine that makes MySQL even better by increasing performance, adding high compression, and allowing for true schema agility. All of these features are made possible by Tokutek's Fractal Tree indexes.
This document discusses common mistakes made when implementing Oracle Exadata systems. It describes improperly sized SGAs which can hurt performance on data warehouses. It also discusses issues like not using huge pages, over or under use of indexing, too much parallelization, selecting the wrong disk types, failing to patch systems, and not implementing tools like Automatic Service Request and exachk. The document provides guidance on optimizing these areas to get the best performance from Exadata.
Based on the popular blog series, join me in taking a deep dive and a behind the scenes look at how SQL Server 2016 “It Just Runs Faster”, focused on scalability and performance enhancements. This talk will discuss the improvements, not only for awareness, but expose design and internal change details. The beauty behind ‘It Just Runs Faster’ is your ability to just upgrade, in place, and take advantage without lengthy and costly application or infrastructure changes. If you are looking at why SQL Server 2016 makes sense for your business you won’t want to miss this session.
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
The document discusses design patterns for distributed non-relational databases, including consistent hashing for key placement, eventual consistency models, vector clocks for determining history, log-structured merge trees for storage layout, and gossip protocols for cluster management without a single point of failure. It raises questions to ask presenters about scalability, reliability, performance, consistency models, cluster management, data models, and real-life considerations for using such systems.
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document StoreFilipe Silva
The document discusses Connector/J Beyond JDBC and the X DevAPI for Java and MySQL as a Document Store. It provides an agenda that includes an introduction to MySQL as a document store, an overview of the X DevAPI, and how the X DevAPI is implemented in Connector/J. The presentation aims to demonstrate the X DevAPI for developing CRUD-based applications and using MySQL as both a relational database and document store.
The document discusses building a data platform for analytics in Azure. It outlines common issues with traditional data warehouse architectures and recommends building a data lake approach using Azure Synapse Analytics. The key elements include ingesting raw data from various sources into landing zones, creating a raw layer using file formats like Parquet, building star schemas in dedicated SQL pools or Spark tables, implementing alerting using Log Analytics, and loading data into Power BI. Building the platform with Python pipelines, notebooks, and GitHub integration is emphasized for flexibility, testability and collaboration.
This session shows an overview of the features and architecture of SQL Server on Linux and Containers. It covers install, config, performance, security, HADR, Docker containers, and tools. Find the demos on http://aka.ms/bobwardms
Factual is a platform for open data that allows users to share, combine, and publish crowd-sourced data. It aims to store 10 million tables with 1 billion rows of summarized data and 10 billion individual data inputs. This presents challenges for high-volume data storage and retrieval. Factual uses a MapReduce architecture with PostgreSQL for data persistence, caching, and querying. It aims for high availability through redundancy and an eventually consistent approach to handling new data and cache updates.
Run Cloud Native MySQL NDB Cluster in KubernetesBernd Ocklin
The more your database aligns with Cloud Native principles such as resilience, scaling, auto-healing and data consistency across all nodes, the better it also runs as DBaaS in Kubernetes. I walk through running databases in Kubernetes and demos manual deployment and deployment with an NDB operator.
This talk was given at the MySQL Dev Room FOSDEM 2021.
This document summarizes a workshop on migrating from Oracle to PostgreSQL. It discusses migrating the database, including getting Oracle and PostgreSQL instances, understanding how applications interact with databases, and using the ora2pg tool to migrate the database schema and data from Oracle to PostgreSQL.
En este diapositivas der Microsoft podemos ver qué aporta SQL 2014 en áreas como: Tablas optimizadas en memòria, Cambios en estimacion de la cardinalidad, Cifrado de los Backups, Mejoras en arquitectures, Always On, Cambios en Resource Governor, Data files en Azure.
The document discusses best practices for running MySQL on Linux, covering choices for Linux distributions, hardware recommendations including using solid state drives, OS configuration such as tuning the filesystem and IO scheduler, and MySQL installation and configuration options. It provides guidance on topics like virtualization, networking, and MySQL variants to help ensure successful and high performance deployment of MySQL on Linux.
SQL Server In-Memory OLTP: What Every SQL Professional Should KnowBob Ward
Perhaps you have heard the term “In-Memory” but not sure what it means. If you are a SQL Server Professional then you will want to know. Even if you are new to SQL Server, you will want to learn more about this topic. Come learn the basics of how In-Memory OLTP technology in SQL Server 2016 and Azure SQL Database can boost your OLTP application by 30X. We will compare how In-Memory OTLP works vs “normal” disk-based tables. We will discuss what is required to migrate your existing data into memory optimized tables or how to build a new set of data and applications to take advantage of this technology. This presentation will cover the fundamentals of what, how, and why this technology is something every SQL Server Professional should know
10 things, an Oracle DBA should care about when moving to PostgreSQLPostgreSQL-Consulting
PostgreSQL can handle many of the same workloads as Oracle and provides alternatives to common Oracle features and practices. Some key differences for DBAs moving from Oracle to PostgreSQL include: using shared_buffers instead of SGA with a recommended 25-75% of RAM; using pgbouncer instead of a listener; performing backups with pg_basebackup and WAL archiving instead of RMAN; managing undo data in datafiles instead of undo segments; using streaming replication for high availability instead of RAC; and needing to tune autovacuum instead of manually managing redo and undo logs. PostgreSQL is very capable but may not be suited for some extremely high update workloads of 200K+ transactions per second on a single server
This document summarizes Tumblr's massively sharded MySQL database architecture. Some key points:
- Tumblr has seen huge growth in traffic and data in the past year, necessitating major database scaling. They now have over 175 machines dedicated to MySQL handling over 11TB of relational data across 25 billion rows.
- They use horizontal partitioning (sharding) across multiple MySQL instances to scale writes and handle large data sizes. Sharding by a core column allows querying the appropriate shard.
- Operational challenges include automation for adding/rebalancing shards, handling failures, and migrating/splitting very large datasets across shards without downtime.
- When splitting a large shard, the process
Sql server 2016 it just runs faster sql bits 2017 editionBob Ward
SQL Server 2016 includes several performance improvements that help it run faster than previous versions:
1. Automatic Soft NUMA partitions workloads across NUMA nodes when there are more than 8 CPUs per node to avoid bottlenecks.
2. Dynamic memory objects are now partitioned by CPU to avoid contention on global memory objects.
3. Redo operations can now be parallelized across multiple tasks to improve performance during database recovery.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.George Joseph
SAP HANA is an in-memory database system that stores data in main memory rather than on disk for faster access. It uses a column-oriented approach to optimize analytical queries. SAP HANA can scale from small single-server installations to very large clusters and cloud deployments. Its massively parallel processing architecture and in-memory analytics capabilities enable real-time processing of large datasets.
Brk3288 sql server v.next with support on linux, windows and containers was...Bob Ward
This document discusses Microsoft's plans to deliver SQL Server on Linux and other heterogeneous environments. Key points include:
- SQL Server will be available on Linux, Windows, and Docker containers, allowing choice of operating system. It will support multiple languages and tools.
- Microsoft is delivering more options in response to businesses adopting heterogeneous environments with various data types, languages, and platforms.
- The document outlines SQL Server's capabilities on Linux such as high availability, security, and tools/drivers available now or in development.
Jss 2015 in memory and operational analyticsDavid Barbarin
This document contains summaries of presentations and information about the #JSS2015 conference on SQL Server 2015 organized by GUSS. It provides information on speakers David Barbarin and Frédéric Pichaut and topics to be covered including columnstore architecture, columnstore improvements in SQL 2016, in-memory OLTP architecture and improvements, and remaining unsupported in-memory features.
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...Ivan Zoratti
The document discusses using Oracle Database 11g and MySQL together. It outlines how MySQL provides a cost-effective solution for online applications through its pluggable storage engine architecture, replication capabilities, and scaling options like sharding. MySQL Enterprise offers additional features for monitoring, management and high availability of MySQL deployments.
TokuDB is an ACID/transactional storage engine that makes MySQL even better by increasing performance, adding high compression, and allowing for true schema agility. All of these features are made possible by Tokutek's Fractal Tree indexes.
This document discusses common mistakes made when implementing Oracle Exadata systems. It describes improperly sized SGAs which can hurt performance on data warehouses. It also discusses issues like not using huge pages, over or under use of indexing, too much parallelization, selecting the wrong disk types, failing to patch systems, and not implementing tools like Automatic Service Request and exachk. The document provides guidance on optimizing these areas to get the best performance from Exadata.
Based on the popular blog series, join me in taking a deep dive and a behind the scenes look at how SQL Server 2016 “It Just Runs Faster”, focused on scalability and performance enhancements. This talk will discuss the improvements, not only for awareness, but expose design and internal change details. The beauty behind ‘It Just Runs Faster’ is your ability to just upgrade, in place, and take advantage without lengthy and costly application or infrastructure changes. If you are looking at why SQL Server 2016 makes sense for your business you won’t want to miss this session.
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
The document discusses design patterns for distributed non-relational databases, including consistent hashing for key placement, eventual consistency models, vector clocks for determining history, log-structured merge trees for storage layout, and gossip protocols for cluster management without a single point of failure. It raises questions to ask presenters about scalability, reliability, performance, consistency models, cluster management, data models, and real-life considerations for using such systems.
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document StoreFilipe Silva
The document discusses Connector/J Beyond JDBC and the X DevAPI for Java and MySQL as a Document Store. It provides an agenda that includes an introduction to MySQL as a document store, an overview of the X DevAPI, and how the X DevAPI is implemented in Connector/J. The presentation aims to demonstrate the X DevAPI for developing CRUD-based applications and using MySQL as both a relational database and document store.
The document discusses building a data platform for analytics in Azure. It outlines common issues with traditional data warehouse architectures and recommends building a data lake approach using Azure Synapse Analytics. The key elements include ingesting raw data from various sources into landing zones, creating a raw layer using file formats like Parquet, building star schemas in dedicated SQL pools or Spark tables, implementing alerting using Log Analytics, and loading data into Power BI. Building the platform with Python pipelines, notebooks, and GitHub integration is emphasized for flexibility, testability and collaboration.
This session shows an overview of the features and architecture of SQL Server on Linux and Containers. It covers install, config, performance, security, HADR, Docker containers, and tools. Find the demos on http://aka.ms/bobwardms
Factual is a platform for open data that allows users to share, combine, and publish crowd-sourced data. It aims to store 10 million tables with 1 billion rows of summarized data and 10 billion individual data inputs. This presents challenges for high-volume data storage and retrieval. Factual uses a MapReduce architecture with PostgreSQL for data persistence, caching, and querying. It aims for high availability through redundancy and an eventually consistent approach to handling new data and cache updates.
Run Cloud Native MySQL NDB Cluster in KubernetesBernd Ocklin
The more your database aligns with Cloud Native principles such as resilience, scaling, auto-healing and data consistency across all nodes, the better it also runs as DBaaS in Kubernetes. I walk through running databases in Kubernetes and demos manual deployment and deployment with an NDB operator.
This talk was given at the MySQL Dev Room FOSDEM 2021.
This document summarizes a workshop on migrating from Oracle to PostgreSQL. It discusses migrating the database, including getting Oracle and PostgreSQL instances, understanding how applications interact with databases, and using the ora2pg tool to migrate the database schema and data from Oracle to PostgreSQL.
En este diapositivas der Microsoft podemos ver qué aporta SQL 2014 en áreas como: Tablas optimizadas en memòria, Cambios en estimacion de la cardinalidad, Cifrado de los Backups, Mejoras en arquitectures, Always On, Cambios en Resource Governor, Data files en Azure.
Business Insight 2014 - Microsofts nye BI og database platform - Erling Skaal...Microsoft
This document discusses in-memory technologies in Microsoft SQL Server including:
1) In-memory columnstore indexes that can provide over 100x faster query speeds and significant data compression.
2) In-memory OLTP that provides up to 30x faster transaction processing.
3) Using memory technologies to provide faster insights, queries, and transactions for analytics and operational workloads.
Inside sql server in memory oltp sql sat nyc 2017Bob Ward
This document provides a high-level summary of In-Memory OLTP in SQL Server:
- In-Memory OLTP stores and processes transactional data entirely in memory using natively compiled stored procedures to avoid concurrency bottlenecks like locks and latches.
- Data is stored in memory-optimized tables using either a hash index or range index for fast lookup. Transactions are logged and written to checkpoint files for durability.
- The Hekaton engine handles all transaction processing in memory without locks by using techniques like multi-version concurrency control and lock-free data structures. Checkpoint files are used to reconstruct the database after a restart.
- Natively compiled stored procedures provide improved performance by
Nesta segunda parte do tema Redshift, mostramos o case da Movile, líder em mobile commerce com 50 milhões de usuários, e analisamos tópicos avançados como compressão, macros SQL embutidas e índices multidimensionais para grandes bases de dados.
This document summarizes a lecture on key-value storage systems. It introduces the key-value data model and compares it to relational databases. It then describes Cassandra, a popular open-source key-value store, including how it maps keys to servers, replicates data across multiple servers, and performs reads and writes in a distributed manner while maintaining consistency. The document also discusses Cassandra's use of gossip protocols to manage cluster membership.
The document discusses database hardware requirements like RAM, disk space, processors and networks and how they impact database performance. It also covers topics like transaction logging, how databases and their related files are structured, and the different SQL data types and statements used to work with databases. Various SQL objects like tables, views, indexes and their creation are explained along with examples.
John Hugg presented on building an operational database for high-performance applications. Some key points:
- He set out to reinvent OLTP databases to be 10x faster by leveraging multicore CPUs and partitioning data across cores.
- The database, called VoltDB, uses Java for transaction management and networking while storing data in C++ for better performance.
- It partitions data and transactions across server cores for parallelism. Global transactions can access all partitions transactionally.
- VoltDB is well-suited for fast data applications like IoT, gaming, ad tech which require high write throughput, low latency, and global understanding of live data.
The document discusses SQL Server 2014's in-memory OLTP engine, which features a new high-performance, memory-optimized transaction processing engine integrated into SQL Server. The in-memory OLTP engine uses lock-free algorithms and native code compilation for high concurrency and efficient processing. It also provides an integrated experience with features like backup/restore and high availability that help reduce the total cost of ownership.
Slides for a talk.
Talk abstract:
In the dark of the night, if you listen carefully enough, you can hear databases cry. But why? As developers, we rarely consider what happens under the hood of widely used abstractions such as databases. As a consequence, we rarely think about the performance of databases. This is especially true to less widespread, but often very useful NoSQL databases.
In this talk we will take a close look at NoSQL database performance, peek under the hood of the most frequently used features to see how they affect performance and discuss performance issues and bottlenecks inherent to all databases.
The document discusses SQL Server 2014's in-memory OLTP feature. It begins by explaining the need for an in-memory architecture due to hardware trends. It then covers how the in-memory tables store and access data via optimized structures and algorithms. Native compiled stored procedures are also discussed. The benefits are high performance for hot datasets that fit entirely in memory, while limitations include unsupported data types and inability to partially store tables.
This is a summary of the sessions I attended at PASS Summit 2017. Out of the week-long conference, I put together these slides to summarize the conference and present at my company. The slides are about my favorite sessions that I found had the most value. The slides included screenshotted demos I personally developed and tested alike the speakers at the conference.
A talk given at VT Code Camp 2019 covering a variety of big data infrastructures. High level summary of distributed relational databases, NoSQL databases, ETL processes, high throughput computing, high performance computing, and hybrid systems.
Kudu is an open source storage layer developed by Cloudera that provides low latency queries on large datasets. It uses a columnar storage format for fast scans and an embedded B-tree index for fast random access. Kudu tables are partitioned into tablets that are distributed and replicated across a cluster. The Raft consensus algorithm ensures consistency during replication. Kudu is suitable for applications requiring real-time analytics on streaming data and time-series queries across large datasets.
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
This document discusses Apache Kudu, an open source columnar storage system for analytics workloads on Hadoop. Kudu is designed to enable both fast analytics queries as well as real-time updates on fast changing data. It aims to fill gaps in the current Hadoop storage landscape by supporting simultaneous high throughput scans, low latency reads/writes, and ACID transactions. An example use case described is for real-time fraud detection on streaming financial data.
DynamoDB is a key-value database that achieves high availability and scalability through several techniques:
1. It uses consistent hashing to partition and replicate data across multiple storage nodes, allowing incremental scalability.
2. It employs vector clocks to maintain consistency among replicas during writes, decoupling version size from update rates.
3. For handling temporary failures, it uses sloppy quorum and hinted handoff to provide high availability and durability guarantees when some replicas are unavailable.
The document discusses Snowflake, a cloud data warehouse that is built for the cloud, multi-tenant, and highly scalable. It uses a shared-data, multi-cluster architecture where compute resources can be scaled independently from storage. Data is stored immutably in micro-partitions across an object store. Virtual warehouses provide isolated compute resources that can access all the data.
This document provides an overview of NoSQL databases. It discusses that NoSQL databases are non-relational and do not follow the RDBMS principles. It describes some of the main types of NoSQL databases including document stores, key-value stores, column-oriented stores, and graph databases. It also discusses how NoSQL databases are designed for massive scalability and do not guarantee ACID properties, instead following a BASE model ofBasically Available, Soft state, and Eventually Consistent.
Similar to Hekaton introduction for .Net developers (20)
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
ScyllaDB is making a major architecture shift. We’re moving from vNode replication to tablets – fragments of tables that are distributed independently, enabling dynamic data distribution and extreme elasticity. In this keynote, ScyllaDB co-founder and CTO Avi Kivity explains the reason for this shift, provides a look at the implementation and roadmap, and shares how this shift benefits ScyllaDB users.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
"What does it really mean for your system to be available, or how to define w...Fwdays
We will talk about system monitoring from a few different angles. We will start by covering the basics, then discuss SLOs, how to define them, and why understanding the business well is crucial for success in this exercise.
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...Fwdays
Direct losses from downtime in 1 minute = $5-$10 thousand dollars. Reputation is priceless.
As part of the talk, we will consider the architectural strategies necessary for the development of highly loaded fintech solutions. We will focus on using queues and streaming to efficiently work and manage large amounts of data in real-time and to minimize latency.
We will focus special attention on the architectural patterns used in the design of the fintech system, microservices and event-driven architecture, which ensure scalability, fault tolerance, and consistency of the entire system.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
High performance Serverless Java on AWS- GoTo Amsterdam 2024Vadym Kazulkin
Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillLizaNolte
HERE IS YOUR WEBINAR CONTENT! 'Mastering Customer Journey Management with Dr. Graham Hill'. We hope you find the webinar recording both insightful and enjoyable.
In this webinar, we explored essential aspects of Customer Journey Management and personalization. Here’s a summary of the key insights and topics discussed:
Key Takeaways:
Understanding the Customer Journey: Dr. Hill emphasized the importance of mapping and understanding the complete customer journey to identify touchpoints and opportunities for improvement.
Personalization Strategies: We discussed how to leverage data and insights to create personalized experiences that resonate with customers.
Technology Integration: Insights were shared on how inQuba’s advanced technology can streamline customer interactions and drive operational efficiency.
3. About me
Shy Engelberg, SQL Server consultant.
shy.engelberg@hynt.co.il
054-7717115
@ShyEngelberg
www.blogs.Microsoft.co.il/ShyEngel
www.hynt.co.il
www.facebook.com/HYNTil
4. Agenda
• What is Hekaton?
• Why now?
• Removing the performance bottlenecks
• Disk
• Locking
• Latches
• Logging
• Interpreted query language
• Performance results
• Integration into SQL Server
5. What is Hekaton?
• Hekaton - "hundred“ (100) in Greek.
figures in an archaic stage of Greek mythology,
three giants of incredible strength and ferocity
each of them having
a hundred hands
(100 hands working together in parallel?)
and fifty heads. (Wikipedia)
6. What is Hekaton?
• Hekaton is a new database engine optimized for memory
resident data and OLTP workloads. It is optimized for large
main memories and many-core processors.
(Ohhh, and it‟s fully durable!)
• Hekaton‟s new boring name is In-Memory OLTP.
• The research and development took 5 years!
• The initial goal was to gain X100 Performance improvement.
9. The Past
• RAM prices were very high:
“…In 1990 I had an IBM PC 8088 it had a 20 meg HDD.
It came with 640KB RAM. It was great for the day.
It cost me $3000.00…”
• CPUs had a single core.
• SQL Server was designed when it could be assumed that main
memory was very expensive, so data needed to reside on disk
(except when it was actually needed for processing) -
Disk-optimized (Buffer pools, data pages, costing rules)
10.
11. Today (and also 5 years ago)
• RAM prices are low, CPU cores amount is
increasing:
A server with 32 cores and 1TB of memory for
about $50K.
(50$ = HP DL980 with 2TB of RAM)
• The majority of OLTP databases fit entirely in 1TB and even the largest OLTP
databases can keep the active working set in memory.
• Unlike “Big Data,” most OLTP data volumes are growing at more modest
rates
• Data management products are becoming “workload specific”
12. What should we do?
• Our goal is to gain a X10-100 throughput improvement.
(for OLTP workloads)
This cannot be achieved by optimizing existing SQL Server
mechanisms.
• Recognizing the trends, SQL Server began building a database
engine optimized for large main memories and many-core
CPU
• Hekaton is not a response to competitor‟s offers.
13. What is OLTP workload?
• OLTP – Online transactions processing:
• High concurrency
• Reads and writes in the same time
• Short transactions – the reliability of the data is very important
• Usually working with small datasets.
• Used for retail, e-commerce, gaming, Forex, tourism reservation systems etc.
• As Opposed to – DWH reporting workload:
• Writes usually happen in batches,
• a single row is not important, but aggregation of many rows.
• Not normalized.
• Usually low concurrency.
14. Do we have to build a new engine?
• Usually,
OLTP Performance improvement sources were:
• CPU getting faster
• Software getting better
• But now,
CPUs are not getting any faster.
DBMS have matured.
Yes, we must build a new engine.
15. What should we do?
In Other words:
1. Specialized database engine tuned for OLTP workloads.
2. Fitting most or all of data required by a workload into main-
memory.
3. Lower latency time for data operations
17. Is it just an In-Memory DB?
CREATE TABLE [Customer]
(
[CustomerID] INT NOT NULL
PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT =
1000000),
[Name] NVARCHAR(250) NOT NULL
INDEX [IName] HASH WITH (BUCKET_COUNT = 1000000),
[CustomerSince] DATETIME NULL
)
WITH
(
MEMORY_OPTIMIZED = ON,
DURABILITY = SCHEMA_AND_DATA);
* Hekaton is defined at a table level.
18. A X100 performance?
How will we do it?
– Architecture concepts:
1. Optimize indexes for main memory
2. Eliminate latches and locks
3. Compile requests to native code
*Me:
Use everything the hardware has to offer (multi-cores, large
memory), make it scalable and don‟t keep no one waiting.
19. The bottlenecks to overcome
Procedures interpretation
Logging
Latches
Concurrency Locking
Disk IO
22. Traditional DBMS
• Data is saved on disk.
• It is organized in 8KB pages.
• Every table and index is made out of extents – 8 pages – 64 KB.
• It is advised for data of a single table to reside physically next to each
other.
• Data is read from disk into a Buffer pool in memory (minimum of a
read from disk is extent)
• Every data needed for processing is read into buffer pool.
• Data is also updated in the buffer pool and a process called
Checkpoint writes the data back to the disk.
• SQL has an algorithm that defines what pages should stay in the
buffer pool and what should be removed (only in case of memory
pressure)
23. Traditional DBMS
• A table can be a Heap or a Clustered Index.
• A heap is a lot of pages, not sorted by anything, placed in
different places in the file.
• A clustered Index is a B-tree that is sorted by one column (or
more) and the leaf level of the tree are pages that hold the
table‟s data. (There can be only one clustered index on a table)
• Non Clustered index ins a B-tree that is sorted by one column
(or more) and the leaf level of the tree are pages that hold a
pointer to the location of the row (on a Clustered index or a
heap)
24. Yes, In-Memory. Memory optimized
• The design principle:
Optimizing for byte-addressable memory instead of block
addressable disk, so:
• Memory-optimized tables are not stored on pages like disk-based tables.
• No Buffer pool.
• No Clustered and non-clustered indexes:
Indexes include only pointers to rows.
Rows are not sorted in any way.
Indexes are not pointing to other indexes.
• Rows are never modified (from a different reason, will get to that)
Disk IO
25. Rows and Indexes
Data(1955,Marty)
Data(1955,Marty)
Data(1985,Marty)Data(1955,Doc)
Data(1985,Doc)
IdxA pointer
IdxA pointer
IdxA pointer
IdxA pointer
IdxA pointer
1955
2014
1985
…
IdxA (Year)
* This is a simple representation of the row and index, the real structure holds more data and might have a different
structure.
IdxB pointer
IdxB pointer
IdxB pointer
IdxB pointer
IdxB pointer
Marty
Einstein
Doc
…
IdxB (Name) Disk IO
26. Data storage - conclusion
• The row is consisted from a header (will be
discussed later), Index pointers and data.
• Memory-optimized tables must have at least one
index created on them. (the only thing connecting rows to the
table is the indexes)
• Records are always accessed via an index lookup.
• since the number of index pointers is part of the row structure,
and rows are never modified, all indexes must be defined at the
time your table is created.
Disk IO
27. Something must be written to disk?
• Only data rows are written to disk. (no indexes)
• More on that when we talk durability.
Disk IO
30. Locking
• In order to make sure a transaction is isolated, it reads only
committed data and a row can be changed by one user at a
time, we use a mechanism called locks.
• Before reading or writing a row, a process needs to place lock
on the row.
• If there is already a lock on the row, the process needs to
compare the locks for compatability.
• Reads can place locks on rows that read.
• Writers needs exclusive acess, so they block both readers and
writers.
31. No one needs to wait
• Optimistic multi-version concurrency control.
• Writers do not block readers. Writers do not block writers. No locks
are acquired. Never.
• Optimistic - Transactions proceed under the (optimistic) assumption
that there will be no conflicts with other transactions.
• Multi-version - like snapshot isolation, every transaction access
only the data‟s version that is correct for the time it started.
• Data is never updated in place- every DML Creates a new version of
the row.
• This new concurrency control mechanism is built into the Hekaton
data structures. It cannot be turned on or off.
Concurrency Locking
33. DataIdxA pointerIdxB pointerHeader
End timestampStart timestamp
MartyIdxA pointerIdxB pointer1, ∞
①
Concurrency Locking
Multi version
Row
Optimistic Multi-Version Concurrency control - Example
34. MartyIdxA pointerIdxB pointer1, ∞
②
Concurrency Locking
Multi version
Optimistic Multi-Version Concurrency control - Example
35. MartyIdxA pointerIdxB pointer1, ∞
③
MartyIdxA pointerIdxB pointer1, ∞
Concurrency Locking
Multi version
Tx1:
Update Table1
SET Name=„Doc‟
Optimistic Multi-Version Concurrency control - Example
36. MartyIdxA pointerIdxB pointer1, ∞
DocIdxA pointerIdxB pointerTx1, ∞
③
MartyIdxA pointerIdxB pointer1, Tx1
Concurrency Locking
Multi version
Tx1:
Update Table1
SET Name=„Doc‟
Optimistic Multi-Version Concurrency control - Example
37. MartyIdxA pointerIdxB pointer1, 3
DocIdxA pointerIdxB pointer3, ∞
③
Concurrency Locking
Multi version
Optimistic Multi-Version Concurrency control - Example
38. MartyIdxA pointerIdxB pointer1, 3
DocIdxA pointerIdxB pointer3, ∞
④
Concurrency Locking
Multi version
Optimistic Multi-Version Concurrency control - Example
39. MartyIdxA pointerIdxB pointer1, 3
DocIdxA pointerIdxB pointer3, ∞
⑤
Tx100-Read
(Start time: 5)
SELECT Name
from Table1
Concurrency Locking
Multi version
Optimistic Multi-Version Concurrency control - Example
40. MartyIdxA pointerIdxB pointer1, 3
DocIdxA pointerIdxB pointer3, ∞
⑤
Concurrency Locking
Multi version
Tx100-Read
(Start time: 5)
SELECT Name
from Table1
Optimistic Multi-Version Concurrency control - Example
41. MartyIdxA pointerIdxB pointer1, 3
DocIdxA pointerIdxB pointer3, ∞
⑥DocIdxA pointerIdxB pointer3, ∞
Concurrency Locking
Multi version
Tx2:
Update Table1
SET
Name=„Einstein‟
Optimistic Multi-Version Concurrency control - Example
42. MartyIdxA pointerIdxB pointer1, 3
DocIdxA pointerIdxB pointer3, ∞
⑥
EinsteinIdxA pointerIdxB pointerTx2, ∞
DocIdxA pointerIdxB pointer3, Tx2
Concurrency Locking
Multi version
Tx2:
Update Table1
SET
Name=„Einstein‟
Optimistic Multi-Version Concurrency control - Example
43. MartyIdxA pointerIdxB pointer1, 3
DocIdxA pointerIdxB pointer3, 6
⑥
EinsteinIdxA pointerIdxB pointer6, ∞
Concurrency Locking
Multi version
Tx200-Read
(Start time: 4)
SELECT Name
from Table1
Optimistic Multi-Version Concurrency control - Example
44. MartyIdxA pointerIdxB pointer1, 3
DocIdxA pointerIdxB pointer3, 6
⑥
EinsteinIdxA pointerIdxB pointer6, ∞
Concurrency Locking
Multi version
Tx200-Read
(Start time: 4)
SELECT Name
from Table1
Optimistic Multi-Version Concurrency control - Example
45. Multi version - conclusion
• All versions are equal, they act as rows and are
linked to the indexes, and to one another.
• To support the optimism – we validate versions for write
conflict- An Attempt to update a record that has been updated
since the transaction started.
• A garbage collection is available to remove old rows from the
memory.
• The model also supports REPEATABLE READ and SERIALIZEABLE
isolation levels (but that‟s for another time)
Concurrency Locking
47. No more threads waitingWewanttoreduce
Latch
waits
Sowecantakeadvantageof
Multiple
CPU cores
Sowewill
Use Lock-
free data
structures
Latches
48. Latches
• Latches are „region locks‟:
A mechanism used to protect a region of code or data structures
against simultaneous thread access.
• Region locks implement an „Acquire/Release‟ pattern – where a lock is
first acquired, the protected region executes, and then the lock is
released.
• All shared data structures must be protected with latches.
• In a system with many cores or very high concurrency the region
being protected becomes a bottleneck.
• Highly contended resources are the lock manager, the tail of the
transaction log, or the last page of a B-tree index
Latches
49. Lock free mechanism
• Hekaton uses lock free data structures.
• This is very complex algorithm stuff for University Professors, so
will not be explained now.
52. Durability?
• Durability was one of the main goals of the
development:
We want data to be available also after shutdown or unexpected
crash.
• RAM memory might be cheap, but it still can‟t survive power
outage.
• Conclusion: we must write something to the disk.
Logging
53. Logging today
• Because Dirty data can be written to data files by the
checkpoint, we need to write in the transaction log every action
and it‟s UNDO information.
• Data is logged by physical structure – if a row changes, we see a
log record for every index on that table.
54. Logging
• In order to support the high throughput, the following
concepts are applied:
• Index operations are not logged (No log records for physical structure
modifications- Work is pushed to recovery)
• No undo information is logged – only committed transactions.
• Each transaction is logged in a single, potentially large, log record.
(Fewer log records minimize the log-header overhead and reduce the contention for
inserting into log-buffer)
• Hekaton tries to group multiple log records into one large I/O.
• Hekaton designed to support multiple concurrently generated log streams
per database to avoid any scaling bottlenecks with the tail of the log
• Combine with “Delayed Durability” (New in 2014) and you have a hover-
board.
Logging
55. CheckpointRecovery
• We can‟t count only on T-log for durability, because
no log truncation will occur and recovery will take forever.
• Checkpoint files are actually a compressed version of log
transactions.
• Checkpointing is Optimized for sequential access (data only
written, not updated or deleted)
• Checkpoint related I/O occurs incrementally and continuously.
• Multiple checkpoint files exist, to allow parallelism of recovery
process.
• Indexes are built during the recovery.
Logging
58. Query Interpretation
• Current interpreter (Gets a physical query plan
as input) is totally generic and support every table, every type
etc.
• It performs many run time checks during the execution of even
simple statements.
• It is not fast, but was fast enough when data came from disk.
• Today, CPUs are not getting any faster, so
• We need to lower the # of CPU instructions used to perform
query processing and business logic execution.
Procedures interpretation
59. Native stored procedures compilation
• The primary goal is to support efficient execution
of compile-once- and-execute-many-times workloads as
opposed to optimizing the execution of ad hoc queries.
• Natively compiled SPs must interact with Hekaton tables only.
• The In-Memory OLTP compiler leverages the
query optimizer to create an efficient execution plan for each of
the queries in the stored procedure.
• The stored procedures is translated into C and compiled to
native code (a DLL) The DLL is slim and specific for the query.
Procedures interpretation
60. Improvements
• The procedure is compiled as a single function -
we avoid costly argument passing between functions and
expensive function calls.
• Rows are not going through all operators when it‟s not needed.
• To avoid runtime checks: compiled stored procedures execute in
a predefined security context.
• Compiled stored procedures must be schema bound- to avoid
costly schema locks.
Procedures interpretation
61. Things to remember
• Natively compiled stored procedures are not
automatically recompiled if the data in the table changes.
• There are some limitations on the T-SQL area surface we can use
(for now)
• Needs to be compiled with security context.
• Using natively compiled SPs give us the
biggest performance boost!
Procedures interpretation
64. CPU Efficiency for Lookups
• Random lookups in a table with 10M rows
• All data in memory
• Intel Xeon W3520 2.67 GHz
Transaction size in
#lookups
CPU cycles (in millions) Speedup
SQL Table Hekaton Table
1 0.734 0.040 10.8X
10 0.937 0.051 18.4X
100 2.72 0.150 18.1X
1,000 20.1 1.063 18.9X
10,000 201 9.85 20.4X
• Hekaton performance: 2.7M
lookups/sec/core
65. CPU Efficiency for Updates
• Random updates, 10M rows, one index,
snapshot isolation
• Log IO disabled (disk became bottleneck)
• Intel Xeon W3520 2.67 GHzTransaction size in
#updates
CPU cycles (in millions) Speedup
SQL Table Hekaton Table
1 0.910 0.045 20.2X
10 1.38 0.059 23.4X
100 8.17 0.260 31.4X
1,000 41.9 1.50 27.9X
10,000 439 14.4 30.5X
• Hekaton performance: 1.9M updates/sec/core
66. High Contention Throughput
Workload: read/insert into a table with a unique
index
Insert txn (50%): append a batch of 100 rows
Read txn (50%): read last inserted batch of
rows
68. SQL integration
• The engine is completely integrated with SQL 2014:
• No hidden licensing fees.
• No need to copy data.
• No need to support a new technology.
• No need to maintain 2 DBs.
• Migration can be done in stages.
• The Hekaton engine is transparent to the application.
69. How is it integrated?
• Use your existing DBs.
• In-Memory tables and disk tables can be joined together easily.
• Use the same installation and connection interface.
• Use the same T-SQL language.
• Backup the same way you always did.
• Manage and maintain the DB and storage in the same way and using
the same tools.
• Same tools you‟re used to – DMV‟s, SSMS, Perf counters, resource
governor…
• Out-of-the-box Integration with SQL HA solutions.
71. How to get started
Some scripts and basic knowledge
72. Migration is easy as
Upgrade your
DB to run on
SQL 2014
instance
Identify
performance
bottlenecks
tables, create
them as
Memory-
Optimized and
migrate data.
Continue
querying the
DB without
any change
using Interop
mode.
Identify
required code
changes and
Migrate
procedures to
native mode.
No additional hardware or licensing is
required.
New tools helps us identify potential
Hekaton tables and problems.
73. Working with Hekaton
• Adding a filegroup
• Migrating/ creating tables
• Migrating procedures
76. Create Table DDL
CREATE TABLE [Customer](
[CustomerID] INT NOT NULL
PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 1000000),
[Name] NVARCHAR(250) NOT NULL,
[CustomerSince] DATETIME NULL
INDEX [ICustomerSince] NONCLUSTERED
)
WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);
This table is memory
optimized
This table is durable
Indexes are specified inline
NONCLUSTERED indexes are
supported
Hash Index
BUCKET_COUNT 1-2X nr of
unique index key values
77. Memory-Optimized Indexes
Exist only in memory
Rebuilt on database startup
Do not contain data rows
Indexes contain memory pointers to the data
rows
No duplication of data
All indexes are coverin (or none are covering)
80. • Access both memory- and disk-based
tables
• Less performant
• Virtually full T-SQL surface
• When to use
• Ad hoc queries
• Reporting-style queries
• Speeding up app migration
Accessing Memory Optimized Tables
Interpreted T-SQL Access Natively Compiled Stored Procs
• Access only memory optimized tables
• Maximum performance
• Limited T-SQL surface area
• When to use
• OLTP-style operations
• Optimize performance critical business
logic
81. Create Procedure DDL
CREATE PROCEDURE [dbo].[InsertOrder] @id INT, @date DATETIME
WITH
NATIVE_COMPILATION,
SCHEMABINDING,
EXECUTE AS OWNER
AS
BEGIN ATOMIC
WITH
(TRANSACTION
ISOLATION LEVEL = SNAPSHOT,
LANGUAGE = N'us_english')
-- insert T-SQL here
END
This proc is natively
compiled
Native procs must be
schema-bound
Atomic blocks
• Create a transaction if
there is none
• Otherwise, create a
savepoint
Execution context is
required
Session settings are fixed at
create time
82. Procedure Creation
CREATE PROC DDL
Query optimization
Code generation and compilation
Procedure DLL produced
Procedure DLL loaded
84. Limitations on In-Memory OLTP in SQL
2014
Tables
Triggers: no DDL/DML triggers
Data types: no LOBs, no XML and no CLR data types
Constraints: no FOREIGN KEY and no CHECK constraints
No schema changes (ALTER TABLE) – need to drop/recreate
table
No add/remove index – need to drop/recreate table
Natively Compiled Stored Procedures
No outer join, no OR, no subqueries, no CASE
Limited built-in functions [core math, date/time, and string
functions are available]
Many of the largest financial, online retail and airline reservation systems fall between 500GB to 5TB with working sets that are significantly smaller.
By using the new memory-optimized tables you can speed up data access, in particular in concurrency situations, due to the lock- and latch-free architecture of the In-Memory OLTP Engine. This means that applications which suffer from a lot of contention between concurrent transactions can greatly benefit from just migrating your hot tables to memory-optimized.
If most or all of an application’s data is able to be entirely memory resident, the costing rules that the SQL Server optimizer has used since the very first version become almost completely obsoletewait time required for disk reads, other wait statistics, such as waiting for locks to be released, waiting for latches to be available, or waiting for log writes to complete, can become disproportionately large.
there is no collection of pages or extents, no partitions or allocation units that can be referenced to get all the pages for a table.
Scalability suffers when the systems has shared memory locations that are updated
The log contains the logical effects of committed transactions sufficient to redo the transaction. The changes are recorded as insertions and deletions of row versions labeled with the table they belong to