This document discusses SQL Server performance tuning with a focus on leveraging CPU caches through column store compression. It explains how column store compression can bridge the performance gap between IO subsystems and modern processors by breaking data through levels of compression to pipeline batches into CPU caches. Examples are provided showing significant performance improvements from column store compression and clustering over row-based storage and no compression.
The traditional topics of memory pressure, page life expectancy and memory grants have been covered to the point of saturation in the SQL community, in this deck I want to cover some topics relating to memory and SQL Server which might be considered "Off-piste" but are as equally relevant if not more so in terms of getting the best possible performance out of SQL Server.
An introduction to column store indexes and batch modeChris Adkin
This document discusses column store databases and how they work. It explains that column store databases store data by column rather than row to better utilize modern CPU architectures. It describes how column stores use compression techniques like run-length encoding and dictionaries. It also demonstrates how batch processing and sorting data can improve performance of queries against column stores by keeping more data in CPU caches.
Sql server engine cpu cache as the new ramChris Adkin
This document discusses CPU cache and memory architectures. It begins with a diagram showing the cache hierarchy from L1 to L3 cache within a CPU. It then discusses how larger CPUs have multiple cores, each with their own L1 and L2 caches sharing a larger L3 cache. The document highlights how main memory bandwidth has not kept up with increasing CPU speeds and caches.
Scaling sql server 2014 parallel insertChris Adkin
A slide deck on how to get the best possible performance out of the parallel insert feature introduced in SQL Server 2014 as presented at SQL Bits XIV.
The traditional topics of memory pressure, page life expectancy and memory grants have been covered to the point of saturation in the SQL community, in this deck I want to cover some topics relating to memory and SQL Server which might be considered "Off-piste" but are as equally relevant if not more so in terms of getting the best possible performance out of SQL Server.
An introduction to column store indexes and batch modeChris Adkin
This document discusses column store databases and how they work. It explains that column store databases store data by column rather than row to better utilize modern CPU architectures. It describes how column stores use compression techniques like run-length encoding and dictionaries. It also demonstrates how batch processing and sorting data can improve performance of queries against column stores by keeping more data in CPU caches.
Sql server engine cpu cache as the new ramChris Adkin
This document discusses CPU cache and memory architectures. It begins with a diagram showing the cache hierarchy from L1 to L3 cache within a CPU. It then discusses how larger CPUs have multiple cores, each with their own L1 and L2 caches sharing a larger L3 cache. The document highlights how main memory bandwidth has not kept up with increasing CPU speeds and caches.
Scaling sql server 2014 parallel insertChris Adkin
A slide deck on how to get the best possible performance out of the parallel insert feature introduced in SQL Server 2014 as presented at SQL Bits XIV.
Building scalable application with sql serverChris Adkin
Chris Adkin has 15 years of IT experience and 14 years of experience as a DBA working with various sectors. He has over 10 years of experience with SQL Server from version 2000. He provides his email and Twitter contact information. The document then discusses various topics related to database design and performance including OLTP vs OLAP characteristics, data modeling best practices, indexing strategies, query optimization techniques, and concurrency control methods.
In 40 minutes the audience will learn a variety of ways to make postgresql database suddenly go out of memory on a box with half a terabyte of RAM.
Developer's and DBA's best practices for preventing this will also be discussed, as well as a bit of Postgres and Linux memory management internals.
Как в PostgreSQL устроено взаимодействие с диском, какие проблемы производительности при этом бывают и как их решать выбором подходящего hardware, настройками операционной системы и настройками PostgreSQL
Oracle In-Memory option to improve analytic queries
In-Memory is now the trend for database editors. But they do not implement in-memory storage for the same reason nor the same architecture. Oracle In-Memory option directly addresses BI reporting. Oracle has always favored an hybrid approach where we can query the OLTP database (from Oracle 6 the reads do not block the writes) and the In-Memory approach follows the same philosophy: run efficient analytic queries on OLTP databases. Their first approach to columnar storage was bitmap indexes in 8i, which were very efficient to support ad-hoc queries but not compatible with an OLTP workload. Then came the Exadata SmartScan and Hybrid Columnar Compression that was still addressing datawarehouses load and reporting.
The 12c In-Memory option now gives columnar and in-memory efficiency directly on the OLTP database, without changing any design or code. A demo will show how this option can be used.
(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014Amazon Web Services
Learn the specifics of Amazon RDS for PostgreSQL's capabilities and extensions that make it powerful. This session covers database data import, performance tuning and monitoring, troubleshooting, security, and leveraging open source solutions with RDS. Throughout, this session focuses on capabilities particular to RDS for PostgreSQL.
The document discusses using PostgreSQL for data warehousing. It covers advantages like complex queries with joins, windowing functions and materialized views. It recommends configurations like separating the data warehouse onto its own server, adjusting memory settings, disabling autovacuum and using tablespaces. Methods of extract, transform and load (ETL) data discussed include COPY, temporary tables, stored procedures and foreign data wrappers.
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Altinity Ltd
ClickHouse is a powerful open source analytics database that provides fast, scalable performance for data warehousing and real-time analytics use cases. It can handle petabytes of data and queries and scales linearly on commodity hardware. ClickHouse is faster than other databases for analytical workloads due to its columnar data storage and parallel processing. It supports SQL and integrates with various data sources. ClickHouse can run on-premises, in the cloud, or in containers. The ClickHouse operator makes it easy to deploy and manage ClickHouse clusters on Kubernetes.
Devrim Gunduz gives a presentation on Write-Ahead Logging (WAL) in PostgreSQL. WAL logs all transactions to files called write-ahead logs (WAL files) before changes are written to data files. This allows for crash recovery by replaying WAL files. WAL files are used for replication, backup, and point-in-time recovery (PITR) by replaying WAL files to restore the database to a previous state. Checkpoints write all dirty shared buffers to disk and update the pg_control file with the checkpoint location.
Reduce Resource Consumption & Clone in Seconds your Oracle Virtual Environmen...BertrandDrouvot
Bertrand Drouvot will present how to minimize resource consumption on a laptop by using Linux containers (LXC) and the btrfs file system. This allows quickly cloning an Oracle virtual environment, software, and databases in seconds using few disk space. Specific use cases that will be demonstrated include cloning a database software home to apply CPU updates, cloning a database to apply CPU updates, and cloning a PDB. The benefits of using LXC for cloning will also be compared to cloning without LXC.
Testing Delphix: easy data virtualizationFranck Pachot
The document summarizes the author's testing of the Delphix data virtualization software. Some key points:
- Delphix allows users to easily provision virtual copies of database sources on demand for tasks like testing, development, and disaster recovery.
- It works by maintaining incremental snapshots of source databases and virtualizing the data access. Copies can be provisioned in minutes and rewound to past points in time.
- The author demonstrated provisioning a copy of an Oracle database using Delphix and found the process very simple. Delphix integrates deeply with databases.
- Use cases include giving databases to each tester/developer, enabling continuous integration testing, creating QA environments with real
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New FeaturesAmazon Web Services
Learn the specifics of Amazon RDS for PostgreSQL’s capabilities and extensions that make it powerful. This session begins with a brief overview of the RDS PostgreSQL service, how it provides High Availability & Durability and will then deep dive into the new features that we have released since re:Invent 2014, including major version upgrade and newly added PostgreSQL extensions to RDS PostgreSQL. During the session, we will also discuss lessons learned running a large fleet of PostgreSQL instances, including specific recommendations. In addition we will present benchmarking results looking at differences between the 9.3, 9.4 and 9.5 releases.
Introduction to Vacuum Freezing and XIDPGConf APAC
These are slides which were used by Masahiko Sawada of NTT, Japan for his presentation at pgDay Asia. He spoke about internals of VACCUM and XID Wraparound issue of PosgreSQL.
The talk will cover most of the performance enhancement introduced to Scylla over the past 12 months. As the throughput was very good before, we focused on Scylla’s behaviour under all types of workloads and data models. Scylla improved its latency under all scenarios, improving the behaviours of data models such as large partitions and time series, improvement of the I/O scheduler and behaviour of streaming and repair.
The document compares the performance of SQL Server Integration Services (SSIS) packages between SQL Server 2005 and 2008. It describes a study that loaded and aggregated over 1 billion rows of data from flat files using SSIS packages on high-end Unisys servers. The study found that SSIS 2008 packages with the optimized dataflow engine were over 3 times faster than equivalent SSIS 2005 packages for the same workload. Hardware upgrades, SQL Server 2008 configuration changes, and SSIS package optimizations all contributed to improved performance.
Out of the box replication in postgres 9.4(pg confus)Denish Patel
This document contains notes from a presentation on PostgreSQL replication. It discusses write-ahead logs (WAL), replication history in PostgreSQL from versions 7.0 to 9.4, how to set up basic replication, tools for backups and monitoring replication, and demonstrates setting up replication without third party tools using pg_basebackup, replication slots, and pg_receivexlog. It also includes contact information for the presenter and an invitation to join the PostgreSQL Slack channel.
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedEqunix Business Solutions
This document discusses tuning Linux and PostgreSQL for performance. It recommends:
- Tuning Linux kernel parameters like huge pages, swappiness, and overcommit memory. Huge pages can improve TLB performance.
- Tuning PostgreSQL parameters like shared_buffers, work_mem, and checkpoint_timeout. Shared_buffers stores the most frequently accessed data.
- Other tips include choosing proper hardware, OS, and database based on workload. Tuning queries and applications can also boost performance.
This presentation discusses optimizing Linux systems for PostgreSQL databases. Linux is a good choice for databases due to its active development, features, stability, and community support. The presentation covers optimizing various system resources like CPU scheduling, memory, storage I/O, and power management to improve database performance. Specific topics include disabling transparent huge pages, tuning block I/O schedulers, and selecting appropriate scaling governors. The overall message is that Linux can be adapted for database workloads through testing and iterative changes.
This document provides information about Pythian, a company that provides database management and consulting services. It begins by introducing the presenter, Christo Kutrovsky, and his background. It then provides details about Pythian, including that it was founded in 1997, has over 200 employees, 200 customers worldwide, and 5 offices globally. It notes Pythian's partnerships and awards. The document emphasizes Pythian's expertise in Oracle, SQL Server, and other technologies. It positions Pythian as a recognized leader in database management.
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)Altinity Ltd
ProxySQL 2.0 includes several new features such as query cache improvements, GTID causal reads for consistency, native Galera cluster support, Amazon Aurora integration, LDAP authentication, improved SSL support, a new audit log, and performance enhancements. It also adds new monitoring tables, variables, and configuration options to support these features.
1. The company is building advertising management platforms to help customers make smarter decisions and reach business goals faster using real-time data. They lead the online advertising market and strive to build long-term client relationships.
2. They are hiring a Data & BI Team Leader experienced in big data technologies like Hadoop and Impala to deliver real-time insights from large data sets for tasks like fraud detection and predictive analytics.
3. They chose to use Impala for its ability to perform interactive queries directly on HDFS data without relying on MapReduce, its compatibility with HiveQL, and its support through Cloudera Manager.
IBM FlashSystem and other SSD's are being adopted for OLTP and Analytics applications. Fast 16Gb Flash storage requires a reliable, high performance network to ensure applications can utilize it effectively. Learn how to plan for a highspeed reliable network to handle the increased demands while delivering reliable application response times. Understand the reliability, performance, and simplified management features of Gen5 FC and Fabric Vision. Be prepared for the next jump in SAN's.
Building scalable application with sql serverChris Adkin
Chris Adkin has 15 years of IT experience and 14 years of experience as a DBA working with various sectors. He has over 10 years of experience with SQL Server from version 2000. He provides his email and Twitter contact information. The document then discusses various topics related to database design and performance including OLTP vs OLAP characteristics, data modeling best practices, indexing strategies, query optimization techniques, and concurrency control methods.
In 40 minutes the audience will learn a variety of ways to make postgresql database suddenly go out of memory on a box with half a terabyte of RAM.
Developer's and DBA's best practices for preventing this will also be discussed, as well as a bit of Postgres and Linux memory management internals.
Как в PostgreSQL устроено взаимодействие с диском, какие проблемы производительности при этом бывают и как их решать выбором подходящего hardware, настройками операционной системы и настройками PostgreSQL
Oracle In-Memory option to improve analytic queries
In-Memory is now the trend for database editors. But they do not implement in-memory storage for the same reason nor the same architecture. Oracle In-Memory option directly addresses BI reporting. Oracle has always favored an hybrid approach where we can query the OLTP database (from Oracle 6 the reads do not block the writes) and the In-Memory approach follows the same philosophy: run efficient analytic queries on OLTP databases. Their first approach to columnar storage was bitmap indexes in 8i, which were very efficient to support ad-hoc queries but not compatible with an OLTP workload. Then came the Exadata SmartScan and Hybrid Columnar Compression that was still addressing datawarehouses load and reporting.
The 12c In-Memory option now gives columnar and in-memory efficiency directly on the OLTP database, without changing any design or code. A demo will show how this option can be used.
(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014Amazon Web Services
Learn the specifics of Amazon RDS for PostgreSQL's capabilities and extensions that make it powerful. This session covers database data import, performance tuning and monitoring, troubleshooting, security, and leveraging open source solutions with RDS. Throughout, this session focuses on capabilities particular to RDS for PostgreSQL.
The document discusses using PostgreSQL for data warehousing. It covers advantages like complex queries with joins, windowing functions and materialized views. It recommends configurations like separating the data warehouse onto its own server, adjusting memory settings, disabling autovacuum and using tablespaces. Methods of extract, transform and load (ETL) data discussed include COPY, temporary tables, stored procedures and foreign data wrappers.
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Altinity Ltd
ClickHouse is a powerful open source analytics database that provides fast, scalable performance for data warehousing and real-time analytics use cases. It can handle petabytes of data and queries and scales linearly on commodity hardware. ClickHouse is faster than other databases for analytical workloads due to its columnar data storage and parallel processing. It supports SQL and integrates with various data sources. ClickHouse can run on-premises, in the cloud, or in containers. The ClickHouse operator makes it easy to deploy and manage ClickHouse clusters on Kubernetes.
Devrim Gunduz gives a presentation on Write-Ahead Logging (WAL) in PostgreSQL. WAL logs all transactions to files called write-ahead logs (WAL files) before changes are written to data files. This allows for crash recovery by replaying WAL files. WAL files are used for replication, backup, and point-in-time recovery (PITR) by replaying WAL files to restore the database to a previous state. Checkpoints write all dirty shared buffers to disk and update the pg_control file with the checkpoint location.
Reduce Resource Consumption & Clone in Seconds your Oracle Virtual Environmen...BertrandDrouvot
Bertrand Drouvot will present how to minimize resource consumption on a laptop by using Linux containers (LXC) and the btrfs file system. This allows quickly cloning an Oracle virtual environment, software, and databases in seconds using few disk space. Specific use cases that will be demonstrated include cloning a database software home to apply CPU updates, cloning a database to apply CPU updates, and cloning a PDB. The benefits of using LXC for cloning will also be compared to cloning without LXC.
Testing Delphix: easy data virtualizationFranck Pachot
The document summarizes the author's testing of the Delphix data virtualization software. Some key points:
- Delphix allows users to easily provision virtual copies of database sources on demand for tasks like testing, development, and disaster recovery.
- It works by maintaining incremental snapshots of source databases and virtualizing the data access. Copies can be provisioned in minutes and rewound to past points in time.
- The author demonstrated provisioning a copy of an Oracle database using Delphix and found the process very simple. Delphix integrates deeply with databases.
- Use cases include giving databases to each tester/developer, enabling continuous integration testing, creating QA environments with real
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New FeaturesAmazon Web Services
Learn the specifics of Amazon RDS for PostgreSQL’s capabilities and extensions that make it powerful. This session begins with a brief overview of the RDS PostgreSQL service, how it provides High Availability & Durability and will then deep dive into the new features that we have released since re:Invent 2014, including major version upgrade and newly added PostgreSQL extensions to RDS PostgreSQL. During the session, we will also discuss lessons learned running a large fleet of PostgreSQL instances, including specific recommendations. In addition we will present benchmarking results looking at differences between the 9.3, 9.4 and 9.5 releases.
Introduction to Vacuum Freezing and XIDPGConf APAC
These are slides which were used by Masahiko Sawada of NTT, Japan for his presentation at pgDay Asia. He spoke about internals of VACCUM and XID Wraparound issue of PosgreSQL.
The talk will cover most of the performance enhancement introduced to Scylla over the past 12 months. As the throughput was very good before, we focused on Scylla’s behaviour under all types of workloads and data models. Scylla improved its latency under all scenarios, improving the behaviours of data models such as large partitions and time series, improvement of the I/O scheduler and behaviour of streaming and repair.
The document compares the performance of SQL Server Integration Services (SSIS) packages between SQL Server 2005 and 2008. It describes a study that loaded and aggregated over 1 billion rows of data from flat files using SSIS packages on high-end Unisys servers. The study found that SSIS 2008 packages with the optimized dataflow engine were over 3 times faster than equivalent SSIS 2005 packages for the same workload. Hardware upgrades, SQL Server 2008 configuration changes, and SSIS package optimizations all contributed to improved performance.
Out of the box replication in postgres 9.4(pg confus)Denish Patel
This document contains notes from a presentation on PostgreSQL replication. It discusses write-ahead logs (WAL), replication history in PostgreSQL from versions 7.0 to 9.4, how to set up basic replication, tools for backups and monitoring replication, and demonstrates setting up replication without third party tools using pg_basebackup, replication slots, and pg_receivexlog. It also includes contact information for the presenter and an invitation to join the PostgreSQL Slack channel.
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedEqunix Business Solutions
This document discusses tuning Linux and PostgreSQL for performance. It recommends:
- Tuning Linux kernel parameters like huge pages, swappiness, and overcommit memory. Huge pages can improve TLB performance.
- Tuning PostgreSQL parameters like shared_buffers, work_mem, and checkpoint_timeout. Shared_buffers stores the most frequently accessed data.
- Other tips include choosing proper hardware, OS, and database based on workload. Tuning queries and applications can also boost performance.
This presentation discusses optimizing Linux systems for PostgreSQL databases. Linux is a good choice for databases due to its active development, features, stability, and community support. The presentation covers optimizing various system resources like CPU scheduling, memory, storage I/O, and power management to improve database performance. Specific topics include disabling transparent huge pages, tuning block I/O schedulers, and selecting appropriate scaling governors. The overall message is that Linux can be adapted for database workloads through testing and iterative changes.
This document provides information about Pythian, a company that provides database management and consulting services. It begins by introducing the presenter, Christo Kutrovsky, and his background. It then provides details about Pythian, including that it was founded in 1997, has over 200 employees, 200 customers worldwide, and 5 offices globally. It notes Pythian's partnerships and awards. The document emphasizes Pythian's expertise in Oracle, SQL Server, and other technologies. It positions Pythian as a recognized leader in database management.
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)Altinity Ltd
ProxySQL 2.0 includes several new features such as query cache improvements, GTID causal reads for consistency, native Galera cluster support, Amazon Aurora integration, LDAP authentication, improved SSL support, a new audit log, and performance enhancements. It also adds new monitoring tables, variables, and configuration options to support these features.
1. The company is building advertising management platforms to help customers make smarter decisions and reach business goals faster using real-time data. They lead the online advertising market and strive to build long-term client relationships.
2. They are hiring a Data & BI Team Leader experienced in big data technologies like Hadoop and Impala to deliver real-time insights from large data sets for tasks like fraud detection and predictive analytics.
3. They chose to use Impala for its ability to perform interactive queries directly on HDFS data without relying on MapReduce, its compatibility with HiveQL, and its support through Cloudera Manager.
IBM FlashSystem and other SSD's are being adopted for OLTP and Analytics applications. Fast 16Gb Flash storage requires a reliable, high performance network to ensure applications can utilize it effectively. Learn how to plan for a highspeed reliable network to handle the increased demands while delivering reliable application response times. Understand the reliability, performance, and simplified management features of Gen5 FC and Fabric Vision. Be prepared for the next jump in SAN's.
You can also see these on ipad , iphone , ipod , tablet & android only on Voot , Youtube & Viu & one thing also my name is Aditya Singh Jadoun & I am a child . My school name is Jayshree Periwal High School & I am an indian , my father's name is Munesh Singh Jadoun . End of This line . Now see my fantastic , amazing , ultimate & cool presentation .
SQL Server 2012 introduced columnstore indexes which provide significant performance improvements for data warehouse and analytics queries against large datasets. Columnstore indexes store data by column rather than by row, allowing queries to access only the relevant columns needed. This results in lower I/O and higher data compression compared to row storage. Columnstore indexes also use a new batch processing execution mode which can further improve query performance by processing many rows at once in memory rather than row-by-row. Columnstore indexes require the table to be read-only but provide an easy way to boost query performance for analytics workloads by 10-100x without needing separate data marts or cubes.
The document discusses indexes in SQL Server. It describes internal and external fragmentation that can occur in indexes. Internal fragmentation is unused space between records within a page, while external fragmentation is when page extents are not stored contiguously on disk. It provides examples of identifying fragmentation using system views and the dm_db_index_physical_stats dynamic management function. It also covers best practices for index types, such as numeric and date fields making good candidates while character fields are less efficient. Composite indexes, fill factor, and rebuilding vs. reorganizing indexes are also discussed.
This document provides information about an upcoming presentation on Columnstore Indexes in SQL Server 2014. It notes that the presentation will be recorded so that those who could not attend live can view it later. It requests that anyone with issues about being recorded should leave immediately, and remaining will be taken as consent to the recording. It also states the presentation will be free and will begin in 1 minute.
This talk at the Percona Live MySQL Conference and Expo describes open source column stores and compares their capabilities, correctness and performance.
SQL 2016 Mejoras en InMemory OLTP y Column Store IndexEduardo Castro
Vemos las mejoras que presenta SQL Server 2016 en los temas de InMemory OLTP y también los cambios en Column Store Index, y su importancia en la mejora de desempeño.
Saludos,
Ing. Eduardo Castro, PhD
Microsoft SQL Server MVP
SQL Server 2016 introduces new capabilities to help improve performance, security, and analytics:
- Operational analytics allows running analytics queries concurrently with OLTP workloads using the same schema. This provides minimal impact on OLTP and best performance.
- In-Memory OLTP enhancements include greater Transact-SQL coverage, improved scaling, and tooling improvements.
- The new Query Store feature acts as a "flight data recorder" for databases, enabling quick performance issue identification and resolution.
The Top Skills That Can Get You Hired in 2017LinkedIn
We analyzed all the recruiting activity on LinkedIn this year and identified the Top Skills employers seek. Starting Oct 24, learn these skills and much more for free during the Week of Learning.
#AlwaysBeLearning https://learning.linkedin.com/week-of-learning
Faster transactions & analytics with the new SQL2016 In-memory technologiesHenk van der Valk
This document contains information about in-memory technologies in Microsoft SQL Server including:
- In-memory OLTP which provides low-latency updates using memory-optimized tables and natively compiled stored procedures.
- Columnstore indexes which provide high data compression and fast analytical queries by storing data in columns.
- Resource governor which allows binding databases to resource pools to control CPU and memory usage.
- Various server hardware configurations recommended for in-memory workloads.
The document discusses best practices for using Oracle Database In-Memory. It provides an overview of In-Memory and describes how to configure and populate the In-Memory Column Store. It also discusses how the optimizer utilizes In-Memory statistics and hints to optimize queries for In-Memory. Several examples of queries that benefit from In-Memory, such as aggregation queries and queries with predicates, are also provided.
EqualLogic is changing how people experience storage by delivering dynamic virtual storage solutions that address changing business needs in real time at a reasonable cost. The presentation covers how EqualLogic avoids disruptions and underutilization compared to traditional storage, provides high performance for applications, and includes comprehensive data management features like snapshots, cloning and replication at no additional cost. It also demonstrates how EqualLogic integrates with virtualization platforms and simplifies management.
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. You can start small for just $0.25 per hour with no commitment or upfront costs and scale to a petabyte or more for $1,000 per terabyte per year, less than a tenth of most other data warehousing solutions.
See a recording of the webinar based on this presentation here on YouTube: https://youtu.be/GgLKodmL5xE
Masterclass series webinars, including on-demand access to all of this years recorded webinars: http://aws.amazon.com/campaigns/emea/masterclass/
Journey Through the Cloud webinar series, including on-demand access to all webinars so far this year: http://aws.amazon.com/campaigns/emea/journey/
Simplifying SQL with CTE's and windowing functionsClayton Groom
Too busy to learn the new capabilities of SQL Server? This session will cover several of the new features of the T-SQL language, specifically Common Table Expressions (CTE's) and Windowing Functions. This will be an code-heavy session with examples hat you can readily leverage in your solutions.
The focus will be on techniques to shape and manipulate your data for easier consumption by your application, and to leverage your SQL Server to avoid writing code in your application.
A basic to intermediate understanding of T-SQL is required.
Hekaton is Microsoft SQL Server's in-memory OLTP engine. It allows for creating memory-optimized tables to fully leverage RAM and provide faster performance than disk-based tables. Memory-optimized tables use new row formats and indexing structures like hash and range indexes that are optimized for memory. Stored procedures can be natively compiled for maximum speed when operating on memory-optimized tables. There are some limitations around data types and features supported. Diagnostic objects like DMVs provide visibility into Hekaton's memory usage and performance.
The document introduces IBM's new DS8800 storage system. Key points:
- It provides faster performance, greater efficiency and scalability while maintaining reliability.
- Hardware upgrades include faster POWER6+ processors, higher density 2.5" drives, and more efficient airflow.
- It uses up to 50% less floor space but stores almost double the drives, lowering costs.
- Additional enhancements include simplified management, SSD options, and high availability features.
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...DataStax
Advanced Apache Cassandra operations depends on an understanding of what features are available via the JMX interface. While nodetool exposes many of these, the most useful are still waiting to be discovered. The JMX interface allows the code base to expose functions that operate directly on internal structures, making real time changes to the way the process runs. With this skill in your toolkit there is no limit to the changes you can make.
In this talk Nate McCall, CTO at The Last Pickle, will explain how to explore, secure, and invoke the JMX interface exposed by Cassandra. He'll then move on to what you can do with it such as compacting specific SSTables, changing compaction on a single node, managing repairs, diagnosing latency, viewing cross node timeouts, and others. Whether you are a developer or operator, new or experienced, you will be given a thorough understanding of what all is available via JMX without having to consult the code on your own.
About the Speaker
Nate McCall CTO, The Last Pickle
Nate McCall has 16 years of server-side systems and software development experience. He started his involvement in the Cassandra community in the late fall of 2009 when he became one of the original developers on the Hector Java client. He has contributed a number of patches over the years to the Apache Cassandra code base and continues to be actively involved on the mail lists, issue system and IRC. He has been a DataStax MVP every year since the inception of the program.
Advanced Apache Cassandra Operations with JMXzznate
Nodetool is a command line interface for managing a Cassandra node. It provides commands for node administration, cluster inspection, table operations and more. The nodetool info command displays node-specific information such as status, load, memory usage and cache details. The nodetool compactionstats command shows compaction status including active tasks and progress. The nodetool tablestats command displays statistics for a specific table including read/write counts, space usage, cache usage and latency.
Conference: HP Big Data Conference 2015
Session: Real-world Methods for Boosting Query Performance
Presentation: "Extra performance out of thin air"
Presenter: Konstantine Krutiy, Principal Software Engineer / Vertica Whisperer
Company: Localytics
Description:
Learn how to get extra performance out of Vertica from areas you never expected.
This presentation will illustrate how you can improve performance of your Vertica cluster without extra budget.
All you need is ingenuity, knowledge of Vertica internals, and the ability to challenge conventional wisdom.
We will show you real world examples on gaining performance by eliminating unneeded work, eliminating unneeded system waits and making your system operate more efficiently.
Visit my blog http://www.dbjungle.com for more Vertica insights
Performance improvements in PostgreSQL 9.5 and beyondTomas Vondra
This document discusses several performance improvements made in PostgreSQL versions 9.5 and beyond. Some key improvements discussed include:
- Faster sorting through allowing sorting by inlined functions, abbreviated keys for VARCHAR/TEXT/NUMERIC, and Sort Support benefits.
- Improved hash joins through reduced palloc overhead, smaller NTUP_PER_BUCKET, and dynamically resizing the hash table.
- Index improvements like avoiding index tuple copying, GiST and bitmap index scan optimizations, and block range tracking in BRIN indexes.
- Aggregate functions see speedups through using 128-bit integers for internal state instead of NUMERIC in some cases.
- Other optimizations affect PL/pgSQL performance,
The document discusses using the TPC-C benchmark to study Firebird database performance under load. It describes running tests with different Firebird configurations, hardware, and database sizes to determine optimal settings. Analysis found page size, buffer size, and hash slots impact performance, but settings optimized for HDDs did not always help SSD performance which responded differently. The tests provided valuable insights into Firebird performance tuning but also showed more analysis is needed to optimize configurations for different hardware.
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...Amazon Web Services
Amazon Redshift is the new data warehouse service from Amazon Web Services. Redshift offers you fast query performance when analyzing data sets from a few hundred gigabytes to over a petabyte at a fraction of the cost of traditional solutions. In this webinar, we will take a detailed look at Redshift, including a live demonstration. This webinar is ideal for anyone looking to gain deeper insight into their data, without the usual challenges of time, cost and effort.
- Oracle Database 11g Release 2 provides many advanced features to lower IT costs including in-memory processing, automated storage management, database compression, and real application testing capabilities.
- It allows for online application upgrades using edition-based redefinition which allows new code and data changes to be installed without disrupting the existing system.
- Oracle provides multiple upgrade paths from prior database versions to 11g to allow for predictable performance and a safe upgrade process.
The document discusses storage virtualization using IBM's Storwize V7000 and SVC storage arrays. It provides an overview of the key benefits of storage virtualization such as reducing complexity, improving availability and enabling better use of tiered storage. It also summarizes the history and enhancements of the SVC software, features of Storwize V7000 such as Easy Tier and support for VMware vSphere.
This document provides an overview and best practices for using Amazon Redshift as a data warehouse. It discusses ingestion best practices like using multiple files for COPY and primary keys. It also covers data hygiene practices like analyzing tables and vacuuming regularly. Recent features like automatic compression, table restore, UDFs and interleaved sort keys are described. The document provides guidance on migrating workloads and tuning queries, including using WLM queues and the performance monitor in the console.
ClickHouse Materialized Views: The Magic ContinuesAltinity Ltd
Slides for the webinar, presented on February 26, 2020
By Robert Hodges, Altinity CEO
Materialized views are the killer feature of ClickHouse, and the Altinity 2019 webinar on how they work was very popular. Join this updated webinar to learn how to use materialized views to speed up queries hundreds of times. We'll cover basic design, last point queries, using TTLs to drop source data, counting unique values, and other useful tricks. Finally, we'll cover recent improvements that make materialized views more useful than ever.
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesAmazon Web Services
Traditional data warehouses become expensive and slow down as the volume of your data grows. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze all of your data using existing business intelligence tools for as low as $1000/TB/year. This webinar will provide an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs.
Learning Objectives:
• Get an introduction to Amazon Redshift's massively parallel processing, columnar, scale-out architecture
• Learn how to configure your data warehouse cluster, optimize schema, and load data efficiently
• Get an overview of all the latest features including interleaved sorting and user-defined functions
Hypertable is an open source Bigtable clone that manages massive sparse tables with timestamped cell versions using a single primary key index. It is used by companies like Zvents and Baidu to process large amounts of data at scales of billions of cells per day and petabytes of data. Hypertable scales horizontally on commodity hardware and provides high performance through techniques like block caching, bloom filters, and access group optimizations. It is written in C++ for efficiency and provides client APIs in multiple languages.
Similar to Column store indexes and batch processing mode (nx power lite) (20)
This document provides an overview of how to deploy a SQL Server 2019 Big Data Cluster on Kubernetes. It discusses setting up infrastructure with Ubuntu templates, installing Kubespray to manage the Kubernetes cluster lifecycle, and using azdata to deploy the Big Data Cluster. Key steps include creating an Ansible inventory, configuring storage with labels and profiles, and deploying the cluster. The document also offers tips on sizing, upgrades, and next steps like load balancing and monitoring.
Data relay introduction to big data clustersChris Adkin
This document provides an overview of SQL Server 2019 Big Data Clusters, which enable hybrid SQL Server/Spark scale-out data platforms that run on Kubernetes. Big Data Clusters are available in public preview and will generally be available in the second half of 2019. They provide a true scale-out data platform for aggregating data from various sources, using data science tools with sensitive data on the same platform, and storing/querying large amounts of unstructured data with SQL Server tools.
Continuous Integration With Jenkins Docker SQL ServerChris Adkin
This document discusses using containers and Jenkins for continuous integration and deployment pipelines. It provides an overview of build pipelines, how they can be implemented as code in Jenkins using scripts or declarative syntax. Demonstrations are shown for simple webhooks, multi-branch pipelines, using build slaves in containers, image layering, and fully containerizing a build environment. Tips are provided on constructing Dockerfiles and using timeouts.
T-Sql programming guidelines, in terms of:-
1. Commenting code
2. Code readability
3. General good practise
4. Defensive coding and error handling
5. Coding for performance and scalability
A presentation on best practices for J2EE scalability from requirements gathering through to implementation, including design and architecture along the way.
The document summarizes findings from a project testing batch processing performance using J2EE. It discusses considerations for batch frameworks, infrastructure, caching, logging, design challenges, and whether to use batch processing. It also outlines the design of the batch process used, including leveraging raw JDBC, Oracle caching, and tools for performance monitoring.
The document discusses tuning SQL queries in Oracle databases. It begins by noting that while tools can help, there is no single process for tuning every query as each case depends on factors like the schema design, data distribution and how the optimizer chooses a plan. The document then provides a methodology for investigating and tuning a query with poor performance, including getting the execution plan, checking it visually, and identifying possible causes like stale statistics, missing indexes or inefficient SQL.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
"NATO Hackathon Winner: AI-Powered Drug Search", Taras KlobaFwdays
This is a session that details how PostgreSQL's features and Azure AI Services can be effectively used to significantly enhance the search functionality in any application.
In this session, we'll share insights on how we used PostgreSQL to facilitate precise searches across multiple fields in our mobile application. The techniques include using LIKE and ILIKE operators and integrating a trigram-based search to handle potential misspellings, thereby increasing the search accuracy.
We'll also discuss how the azure_ai extension on PostgreSQL databases in Azure and Azure AI Services were utilized to create vectors from user input, a feature beneficial when users wish to find specific items based on text prompts. While our application's case study involves a drug search, the techniques and principles shared in this session can be adapted to improve search functionality in a wide range of applications. Join us to learn how PostgreSQL and Azure AI can be harnessed to enhance your application's search capability.
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving
What began over 115 years ago as a supplier of precision gauges to the automotive industry has evolved into being an industry leader in the manufacture of product branding, automotive cockpit trim and decorative appliance trim. Value-added services include in-house Design, Engineering, Program Management, Test Lab and Tool Shops.
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillLizaNolte
HERE IS YOUR WEBINAR CONTENT! 'Mastering Customer Journey Management with Dr. Graham Hill'. We hope you find the webinar recording both insightful and enjoyable.
In this webinar, we explored essential aspects of Customer Journey Management and personalization. Here’s a summary of the key insights and topics discussed:
Key Takeaways:
Understanding the Customer Journey: Dr. Hill emphasized the importance of mapping and understanding the complete customer journey to identify touchpoints and opportunities for improvement.
Personalization Strategies: We discussed how to leverage data and insights to create personalized experiences that resonate with customers.
Technology Integration: Insights were shared on how inQuba’s advanced technology can streamline customer interactions and drive operational efficiency.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
This talk will cover ScyllaDB Architecture from the cluster-level view and zoom in on data distribution and internal node architecture. In the process, we will learn the secret sauce used to get ScyllaDB's high availability and superior performance. We will also touch on the upcoming changes to ScyllaDB architecture, moving to strongly consistent metadata and tablets.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
5. C
The “Cache out” Curve
Throughput
Every time we drop out of a cache
and use the next slower one down,
we pay a big throughput penalty
CPU Cache
TLB
NUMA
Remote
Storage
Touched
Data Size
6. CPCaches
Sequential Versus Random Page CPU Cache Throughput
Million Pages/sec
C
1,000
900
800
700
Random Pages
600
Sequential Pages
500
Single Page
400
300
200
100
0
0
2
4
6
8
10
12
14
16
18
20
Size of Accessed memory (MB)
22
24
Service Time + Wait Time
26
28
30
32
7. Moores Law Vs. Advancements In Disk Technology
“Transistors per square inch on integrated circuits
has doubled every two years since the integrated
circuit was invented”
Spinning disk state of play
Interfaces have evolved
Aerial density has increased
Rotation speed has peeked at 15K RPM
Not much else . . .
Up until NAND flash, disk based IO sub systems
have not kept pace with CPU advancements.
With next generation storage
( resistance ram etc) CPUs and storage may follow
the same curve.
8. Control flow
Row by row
Row by row
Row by row
Row by row
How do rows
travel between
Iterators ?
Data Flow
9.
Query execution which leverages
CPU caches.
Break through levels of compression
to bridge the performance gap
between IO subsystems and
modern processors.
Better query execution scalability
as the degree of parallelism
increase.
10. First introduced in SQL Server 2012, greatly enhanced in 2014
A batch is roughly 1000 rows in size and it is designed to fit into the L2/3
cache of the CPU, remember the slide on latency.
Moving batches around is very efficient*:
One test showed that regular row-mode hash join consumed about
600 instructions per row while the batch-mode hash join needed
about 85 instructions per row and in the best case (small, dense
join domain) was a low as 16 instructions per row.
* From: Enhancements To SQL Server Column Stores
Microsoft Research
11. xperf –on base –stackwalk profile
SELECT p.EnglishProductName
,SUM([OrderQuantity])
,SUM([UnitPrice])
,SUM([ExtendedAmount])
,SUM([UnitPriceDiscountPct])
,SUM([DiscountAmount])
,SUM([ProductStandardCost])
,SUM([TotalProductCost])
,SUM([SalesAmount])
,SUM([TaxAmt])
,SUM([Freight])
FROM [dbo].[FactInternetSales] f
JOIN [dbo].[DimProduct] p
ON
f.ProductKey = p.ProductKey
GOUP BY p.EnglishProductName
xperfview stackwalk.etl
xperf –d stackwalk.etl
12.
13. Conceptual View . . .
Break blobs into
batches and
pipeline them
into CPU cache
Load
segments
into
blob
cache
Blob
cache
CPU
. . and whats happening in the call stack
16.
Compressing data going down
the column is far superior to
compressing data going across
the row, also we only retrieve
the column data that is of
interest.
Run length compression is used
in order to achieve this.
SQL Server 2012 introduces
column store compression . .
., SQL Server 2014 adds more
features to this.
Dictionary
Lookup ID
1
Colour
Red
Red
Blue
Blue
Green
Green
Green
Label
Red
2
3
Blue
Green
Segment
Lookup ID
1
Run Length
2
2
3
2
3
17. SQL Server 2014 Column Store Storage Internals
Row
Groups
A
B
C
< 1,048,576
rows
Encode &
Compress
Store
Delta stores
Encode and
Compress
Columns
Segments
Blobs
18. Global dictionary
Deletion Bitmap
Local Dictionary
Inserts of 1,048,576 rows and over
Inserts less than 1,048,576 rows
and updates
update = insert into
delta store
+ insert to the
deletion bit map
Tuple
mover
Delta store B-tree
Column store segments
20. Query
SELECT a.number
INTO
OrderedSequence
FROM
master..spt_values AS a
CROSS JOIN master..spt_values AS b
CROSS JOIN master..spt_values AS c
WHERE c.number <= 57
ORDER BY a.number
SELECT a.number
INTO
RandomSequence
FROM
master..spt_values AS a
CROSS JOIN master..spt_values AS b
CROSS JOIN master..spt_values AS c
WHERE c.number <= 57
ORDER BY NEWID()
Uncompressed Size
Size After Column
Store
Compression
17.85 Mb
1048576
1.5 billion
rows, 39,233.86
Mb
18.48 Mb
21. SQL Server 2012
SQL Server
2014
Column store indexes
Yes
Yes
Clustered column store indexes
No
Yes
Updateable column store indexes
No
Yes
Column store archive compression
No
Yes
Columns in a column store index can be dropped
No
Yes
Support for GUID, binary, datetimeoffset precision > 2, numeric precision > 18.
No
Yes
Enhanced compression by storing short strings natively ( instead of 32 bit IDs )
No
Yes
Bookmark support ( row_group_id:tuple_id)
No
Yes
Mixed row / batch mode execution
No
Yes
Optimized hash build and join in a single iterator
No
Yes
Hash memory spills cause row mode execution
No
Yes
Scan, filter, project, hash (inner) join
and (local) hash aggregate
Yes
Feature
Iterators supported
22. Disclaimer: your own mileage may vary depending on your data, hardware
and queries
23. Hardware
2 x 2.0 Ghz 6 core Xeon CPUs
Hyper threading enabled
22 GB memory
Raid 0: 6 x 250 GB SATA III HD 10K RPM
Raid 0: 3 x 80 GB Fusion IO
Software
Windows server 2012
SQL Server 2014 CTP 2
AdventureWorksDW DimProductTable
Enlarged FactInternetSales table
24. Compression Type / Time (ms)
300000
Time (ms)
SELECT SUM([OrderQuantity])
,SUM([UnitPrice])
,SUM([ExtendedAmount])
,SUM([UnitPriceDiscountPct])
,SUM([DiscountAmount])
,SUM([ProductStandardCost])
,SUM([TotalProductCost])
,SUM([SalesAmount])
,SUM([TaxAmt])
,SUM([Freight])
FROM [dbo].[FactInternetSales]
250000
200000
150000
100000
2050Mb/s
85% CPU
50000
678Mb/s
98% CPU
256Mb/s
98% CPU
0
No compression
Row
compression
Page
No compression
Row
compression
compression
Page
compression
28. We will look at the best we can
do without column store indexes:
Partitioned heap fact table with page
compression for spinning disk
Partitioned heap fact table without
any compression our flash storage
Non partitioned column store indexes
on both types of store with and without
archive compression.
SELECT p.EnglishProductName
,SUM([OrderQuantity])
,SUM([UnitPrice])
,SUM([ExtendedAmount])
,SUM([UnitPriceDiscountPct])
,SUM([DiscountAmount])
,SUM([ProductStandardCost])
,SUM([TotalProductCost])
,SUM([SalesAmount])
,SUM([TaxAmt])
,SUM([Freight])
FROM [dbo].[FactInternetSales] f
JOIN [dbo].[DimProduct] p
ON
f.ProductKey = p.ProductKey
GROUP BY p.EnglishProductName
30. Join Scalability DOP / Time (ms)
Time (ms)
60000
hdd column store
hdd column store archive
50000
flash column store
flash column store archive
40000
30000
20000
10000
0
2
4
6
8
10
12
14
Degree of parallelism
16
18
20
22
24
31.
32. A SQL Server workload should scale up to
the limits of hardware, such that:
All CPU capacity is exhausted
or
All storage IOPS bandwidth is exhausted
As concurrency increases, we need to
watch out for “The usual suspects” that can
throttle throughput back.
Latch Contention
Lock Contention
Spinlock Contention
36. What most
people tend to
have
CPU
CPU used for IO consumption + CPU used for decompression < total CPU capacity
Compression works for you
37. CPU
CPU used for IO consumption + CPU used for decompression > total CPU capacity
Compression works against you
CPU used for IO consumption + CPU used for decompression = total CPU capacity
Nothing to be gained or lost from using compression
38. No significant difference in terms of performance between column store
compression and column store archive compression.
Pre-sorting the data makes little difference to compression ratios.
Batch mode
Provides a tremendous performance boost with just two schedulers.
Does not provide linear scalability with the hardware available.
Does provide an order of magnitude performance increase in JOIN
performance.
Performs marginally better with column store indexes which do not use
archive compression.
39. Enhancements To Column Store Indexes
(SQL Server 2014 ) Microsoft Research
SQL Server Clustered Columnstore Tuple Mover
Remus Rasanu
SQL Server Columnstore Indexes at Teched 2013
Remus Rasanu
The Effect of CPU Caches and Memory Access Patterns
Thomas Kejser
Note the difference in latency between accessing the on CPU cache and main memory, accessing main memory incurs a large penalty in terms of lost CPU cycles, this is important and one of the drivers behind the new optimizer mode that was introduced in SQL Server 2012 in order to support column store indexes.
This slide follows on from the previous one and quantifies what we lose in terms of throughput as we drop out of the different caches and IO sub systems in the cache / memory / IO sub system hierarchy.
Each operator has an open(), close(), next() method. An operator pulls data through the plan by calling the next() method on the next operator down. The ‘Root’ operator drives the control flow for the whole plan. Data is moved row by row throughout the entire plan; inefficient in terms of CPU instructions per row and prone to expensive CPU cache misses.
Xperf can provide deep insights into the database engine that other tools cannot, in this case we can walk the stack associated with query execution and observe the total CPU consumption up to any point in the stack in milliseconds
According to the Microsoft research paper on improvements made to column store indexes and batch mode in SQL 2014, the large object cache is new and it stores blob data contiguously in memory without any page breaks. The reason for this is that sequential page access for a CPU cache gives twice the throughput compared to single page access. Refer to slide 73 of “Super Scaling SQL Server Diagnosing and Fixing Hard Problems” by Thomas Kejser.
The slides on cache, memory and disk latency and “The cache out curve” have been building up to this one particular slide. It is the leveraging the on die CPU cache that make this speed up possible. The one big takeaway of using column store indexes in on this slide, the fact that based on two logical CPUs alone, there are tremendous performance improvements to be had through batch mode.
This slide illustrates the efficiency of the batch mode hash aggregate versus its row mode counterpart, also consider that the time of 78,400 ms is split across two logical CPUs ( schedulers ).
Run length compression is a generic term that pre-dates the column store functionality in SQL Server, it alludes to the techniques of compressing data by converting sequences of values into encoded “Run lengths”. The database engine scans down a column and stores each unique value it encounters in a dictionary, this can be local to a segment, ( the basic column store unit of storage; containing roughly 1 million rows ) and / or the dictionary can be global to the column store. Where sequences of values are found these are stored as encoded run lengths. In the example above the sequence of two red values is stored as 1, 2 etc . . .
Delta stores are new to SQL Server 2014 and they provide the means via which existing column stores can be inserted into. SQL Server 2014 also introduces column store archive compression. Writes to blobs, which row groups are stored as, are sequential in nature, for trickle inserts the presence of a delta store (b-tree) to act as a buffer mitigates against this. Updates take place by the deletion bit map for the column store being set and a new row being inserted into the column store via a delta store.
Not much difference, why ?. Column stores do not store data in sorted order, however the encoding and compression process can reorder data in order to help achieve better levels of compression.
Something on this slide, specifically around compression on flash requires further investigation. As the level of compression goes up on flash, IO throughput goes down and CPU consumption goes up, hypothesis: the flash throughput is so high that the CPU resources required to perform the decompression becomes a factor and throttles IO throughput back, the net effect being that elapsed execution time is longer.
This is subtly different to the scenario we had with row and page compression, the hypothesis for sustained CPU consumption going down when column store archive compression is used, is that column store archive decompression does not scale that well across multiple schedulers and / or the process of decompression individual segments is single threaded in nature.
Something strange happens around a DOP of 10 which I have not had the chance to investigate as yet.
With the flash clustered column store index ( with/ / without archive compression ) we get reasonable scalability up to a DOP of 10, with the partitioned table on the previous slide, we only got to a DOP of 8, before we started getting diminishing returns.
Elapsed time should decrease in a linear fashion as CPU consumption increases in a linear manner, the obvious explanation for this is that we are burning CPU cycles in a non production manner.
If your IO sub system does not have enough throughput to keep you processors busy, there is value to be had in using compression. On the other hand, if your IO sub system can keep up with your CPUs and then some, the use of compression can send performance backwards.
If your IO sub system does not have enough throughput to keep you processors busy, there is value to be had in using compression. On the other hand, if your IO sub system can keep up with your CPUs and then some, the use of compression can send performance backwards.