SQL Server 2012 introduced columnstore indexes which provide significant performance improvements for data warehouse and analytics queries against large datasets. Columnstore indexes store data by column rather than by row, allowing queries to access only the relevant columns needed. This results in lower I/O and higher data compression compared to row storage. Columnstore indexes also use a new batch processing execution mode which can further improve query performance by processing many rows at once in memory rather than row-by-row. Columnstore indexes require the table to be read-only but provide an easy way to boost query performance for analytics workloads by 10-100x without needing separate data marts or cubes.
In-Memory features is the most perspective trend in the area of high performance. Columnstore Indexes is one of such features, and even with their restrictions, they can accelerate your queries at times! How to get more from this feature? In which situations should we use them? Which internal mechanisms help to achive that? You can get answers on these questions on this session.
Column-Stores vs. Row-Stores: How Different are they Really?Daniel Abadi
The document compares the performance of row-stores and column-stores for data warehousing workloads. It finds that with certain optimizations, the performance difference can be minimized:
A row-store can match the performance of a column-store by vertically partitioning columns and allowing virtual tuple IDs. Removing optimizations from a column-store, like compression and late materialization, causes its performance to degrade to that of a row-store. While column stores are better suited for data warehousing, row-stores can achieve similar performance with improvements to support vertical partitioning and column-specific optimizations.
This document discusses managing Oracle database tables. It covers how to create tables with different storage options, control space usage of tables, retrieve metadata about tables from the data dictionary, and convert between ROWID formats. Specific topics include regular tables, cluster tables, partitioned tables, index-organized tables, data types, storing large objects, the structure of a database row, and functions for analyzing and optimizing table storage.
Multi-Dimensional Clustering: A High-Level Overview terraborealis
Multi-Dimensional Clustering (MDC) allows data in DB2 databases to be clustered on multiple dimensions simultaneously. MDC organizes data into blocks, slices, cells based on dimension values. It uses block and dimension block indexes to efficiently access clustered data. MDC can improve query performance for data warehousing and large databases while reducing logging and maintenance costs compared to regular tables. However, it requires more disk space and care in dimension selection.
These are the slides from a three hour session I did in January 2012 on how to use SQL Server 2008 (or any spatial database) with MapInfo Professional 11.0 (or any version of MapInfo Professional after version 10.0)
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all of your data for a fraction of the cost of traditional data warehouses. In this session, we take an in-depth look at data warehousing with Amazon Redshift for big data analytics. We cover best practices to take advantage of Amazon Redshift's columnar technology and parallel processing capabilities to deliver high throughput and query performance. We also discuss how to design optimal schemas, load data efficiently, and use work load management.
In-Memory features is the most perspective trend in the area of high performance. Columnstore Indexes is one of such features, and even with their restrictions, they can accelerate your queries at times! How to get more from this feature? In which situations should we use them? Which internal mechanisms help to achive that? You can get answers on these questions on this session.
Column-Stores vs. Row-Stores: How Different are they Really?Daniel Abadi
The document compares the performance of row-stores and column-stores for data warehousing workloads. It finds that with certain optimizations, the performance difference can be minimized:
A row-store can match the performance of a column-store by vertically partitioning columns and allowing virtual tuple IDs. Removing optimizations from a column-store, like compression and late materialization, causes its performance to degrade to that of a row-store. While column stores are better suited for data warehousing, row-stores can achieve similar performance with improvements to support vertical partitioning and column-specific optimizations.
This document discusses managing Oracle database tables. It covers how to create tables with different storage options, control space usage of tables, retrieve metadata about tables from the data dictionary, and convert between ROWID formats. Specific topics include regular tables, cluster tables, partitioned tables, index-organized tables, data types, storing large objects, the structure of a database row, and functions for analyzing and optimizing table storage.
Multi-Dimensional Clustering: A High-Level Overview terraborealis
Multi-Dimensional Clustering (MDC) allows data in DB2 databases to be clustered on multiple dimensions simultaneously. MDC organizes data into blocks, slices, cells based on dimension values. It uses block and dimension block indexes to efficiently access clustered data. MDC can improve query performance for data warehousing and large databases while reducing logging and maintenance costs compared to regular tables. However, it requires more disk space and care in dimension selection.
These are the slides from a three hour session I did in January 2012 on how to use SQL Server 2008 (or any spatial database) with MapInfo Professional 11.0 (or any version of MapInfo Professional after version 10.0)
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all of your data for a fraction of the cost of traditional data warehouses. In this session, we take an in-depth look at data warehousing with Amazon Redshift for big data analytics. We cover best practices to take advantage of Amazon Redshift's columnar technology and parallel processing capabilities to deliver high throughput and query performance. We also discuss how to design optimal schemas, load data efficiently, and use work load management.
You can also see these on ipad , iphone , ipod , tablet & android only on Voot , Youtube & Viu & one thing also my name is Aditya Singh Jadoun & I am a child . My school name is Jayshree Periwal High School & I am an indian , my father's name is Munesh Singh Jadoun . End of This line . Now see my fantastic , amazing , ultimate & cool presentation .
Column store indexes and batch processing mode (nx power lite)Chris Adkin
This document discusses SQL Server performance tuning with a focus on leveraging CPU caches through column store compression. It explains how column store compression can bridge the performance gap between IO subsystems and modern processors by breaking data through levels of compression to pipeline batches into CPU caches. Examples are provided showing significant performance improvements from column store compression and clustering over row-based storage and no compression.
The document discusses indexes in SQL Server. It describes internal and external fragmentation that can occur in indexes. Internal fragmentation is unused space between records within a page, while external fragmentation is when page extents are not stored contiguously on disk. It provides examples of identifying fragmentation using system views and the dm_db_index_physical_stats dynamic management function. It also covers best practices for index types, such as numeric and date fields making good candidates while character fields are less efficient. Composite indexes, fill factor, and rebuilding vs. reorganizing indexes are also discussed.
This document provides information about an upcoming presentation on Columnstore Indexes in SQL Server 2014. It notes that the presentation will be recorded so that those who could not attend live can view it later. It requests that anyone with issues about being recorded should leave immediately, and remaining will be taken as consent to the recording. It also states the presentation will be free and will begin in 1 minute.
This talk at the Percona Live MySQL Conference and Expo describes open source column stores and compares their capabilities, correctness and performance.
SQL 2016 Mejoras en InMemory OLTP y Column Store IndexEduardo Castro
Vemos las mejoras que presenta SQL Server 2016 en los temas de InMemory OLTP y también los cambios en Column Store Index, y su importancia en la mejora de desempeño.
Saludos,
Ing. Eduardo Castro, PhD
Microsoft SQL Server MVP
SQL Server 2016 introduces new capabilities to help improve performance, security, and analytics:
- Operational analytics allows running analytics queries concurrently with OLTP workloads using the same schema. This provides minimal impact on OLTP and best performance.
- In-Memory OLTP enhancements include greater Transact-SQL coverage, improved scaling, and tooling improvements.
- The new Query Store feature acts as a "flight data recorder" for databases, enabling quick performance issue identification and resolution.
The Top Skills That Can Get You Hired in 2017LinkedIn
We analyzed all the recruiting activity on LinkedIn this year and identified the Top Skills employers seek. Starting Oct 24, learn these skills and much more for free during the Week of Learning.
#AlwaysBeLearning https://learning.linkedin.com/week-of-learning
SQL Explore 2012 - Michael Zilberstein: ColumnStoresqlserver.co.il
This document discusses how columnstore indexes in Microsoft SQL Server can boost query performance. It provides an overview of column-oriented databases and columnstore technology. Columnstore indexes store data by column rather than by row, allowing for improved compression techniques like dictionary encoding. This reduces storage space and I/O. It also allows for more efficient query processing through batch execution and new execution plan elements. The document outlines best practices, limitations, issues and workarounds, as well as how to load data with columnstore indexes. It provides references for further reading on columnstore performance tuning.
This document discusses database performance factors for developers. It covers topics like query execution plans, table indexes, table partitioning, and performance troubleshooting. The goal is to help developers understand how to optimize database performance. It provides examples and recommends analyzing execution plans, properly indexing tables, partitioning large tables, and using a structured approach to troubleshooting performance issues.
- Distributed Replay allows replaying a captured workload from multiple client computers to better simulate production loads.
- A controller coordinates the replay across clients to reproduce the original query rates or run in stress test mode faster than original rates.
- It improves on SQL Server Profiler for application compatibility testing, performance debugging, capacity planning, and benchmarking.
- Events are replayed in synchronization mode to match original order, or unsynchronized to stress test without timing constraints.
Who is afraid of Columnstore Indexes? (Michael Zilberstein, DB-Art)
This talk describes new SQL Server 2012 feature called "columnstore index". In this session we will learn about the differences between columnstore indexes and B-Tree indexes we are used to work with. We will see when it is best to use and when not to use this new index. We will cover limitations that columnstore index imposes on the tables that use it and how to live with those limitations. Like in all my sessions, I won't let you go without some internals – how columnstore index is organized on a physical level and how Query Processor works this new type of index. And of course Demos, Demos, Demos…
The document discusses various techniques for performance tuning a database including indexing strategies, query optimization, and hardware upgrades. It provides details on different types of indexes like B-Trees, bitmap indexes, and hash indexes. The summary should recommend indexing on high-cardinality fields that are frequently queried, using the query optimizer to evaluate execution plans, and reviewing hardware needs.
Columnar databases store data by columns rather than rows. This column-oriented approach keeps all attribute information together, improving query performance for analytics workloads that retrieve subsets of columns. However, it increases overhead for write operations like inserts due to needing to modify all columns for each row. Columnar databases are well-suited for analytical workloads with many reads and few writes, like data warehousing.
The document discusses best practices for using Oracle Database In-Memory. It provides an overview of In-Memory and describes how to configure and populate the In-Memory Column Store. It also discusses how the optimizer utilizes In-Memory statistics and hints to optimize queries for In-Memory. Several examples of queries that benefit from In-Memory, such as aggregation queries and queries with predicates, are also provided.
This document provides guidance on database sizing including:
1. Reasons to size a database initially and continually such as selecting hardware, storage requirements, and understanding data characteristics.
2. Common data types and their storage sizes in bytes.
3. How to calculate average row size and the number of rows that fit in a database block.
4. How to calculate the number of blocks needed to store a table based on its number of rows and the rows per block.
5. Differences in sizing indexes compared to tables.
6. The process of sizing all major database objects and adding them to determine total disk space needs.
A quick tour in 16 slides of Amazon's Redshift clustered, massively parallel database.
Find out what differentiates it from the other database products Amazon has, including SimpleDB, DynamoDB and RDS (MySQL, SQL Server and Oracle).
Learn how it stores data on disk in a columnar format and how this relates to performance and interesting compression techniques.
Contrast the difference between Redshift and a MySQL instance and discover how the clustered architecture may help to dramatically reduce query time.
You can also see these on ipad , iphone , ipod , tablet & android only on Voot , Youtube & Viu & one thing also my name is Aditya Singh Jadoun & I am a child . My school name is Jayshree Periwal High School & I am an indian , my father's name is Munesh Singh Jadoun . End of This line . Now see my fantastic , amazing , ultimate & cool presentation .
Column store indexes and batch processing mode (nx power lite)Chris Adkin
This document discusses SQL Server performance tuning with a focus on leveraging CPU caches through column store compression. It explains how column store compression can bridge the performance gap between IO subsystems and modern processors by breaking data through levels of compression to pipeline batches into CPU caches. Examples are provided showing significant performance improvements from column store compression and clustering over row-based storage and no compression.
The document discusses indexes in SQL Server. It describes internal and external fragmentation that can occur in indexes. Internal fragmentation is unused space between records within a page, while external fragmentation is when page extents are not stored contiguously on disk. It provides examples of identifying fragmentation using system views and the dm_db_index_physical_stats dynamic management function. It also covers best practices for index types, such as numeric and date fields making good candidates while character fields are less efficient. Composite indexes, fill factor, and rebuilding vs. reorganizing indexes are also discussed.
This document provides information about an upcoming presentation on Columnstore Indexes in SQL Server 2014. It notes that the presentation will be recorded so that those who could not attend live can view it later. It requests that anyone with issues about being recorded should leave immediately, and remaining will be taken as consent to the recording. It also states the presentation will be free and will begin in 1 minute.
This talk at the Percona Live MySQL Conference and Expo describes open source column stores and compares their capabilities, correctness and performance.
SQL 2016 Mejoras en InMemory OLTP y Column Store IndexEduardo Castro
Vemos las mejoras que presenta SQL Server 2016 en los temas de InMemory OLTP y también los cambios en Column Store Index, y su importancia en la mejora de desempeño.
Saludos,
Ing. Eduardo Castro, PhD
Microsoft SQL Server MVP
SQL Server 2016 introduces new capabilities to help improve performance, security, and analytics:
- Operational analytics allows running analytics queries concurrently with OLTP workloads using the same schema. This provides minimal impact on OLTP and best performance.
- In-Memory OLTP enhancements include greater Transact-SQL coverage, improved scaling, and tooling improvements.
- The new Query Store feature acts as a "flight data recorder" for databases, enabling quick performance issue identification and resolution.
The Top Skills That Can Get You Hired in 2017LinkedIn
We analyzed all the recruiting activity on LinkedIn this year and identified the Top Skills employers seek. Starting Oct 24, learn these skills and much more for free during the Week of Learning.
#AlwaysBeLearning https://learning.linkedin.com/week-of-learning
SQL Explore 2012 - Michael Zilberstein: ColumnStoresqlserver.co.il
This document discusses how columnstore indexes in Microsoft SQL Server can boost query performance. It provides an overview of column-oriented databases and columnstore technology. Columnstore indexes store data by column rather than by row, allowing for improved compression techniques like dictionary encoding. This reduces storage space and I/O. It also allows for more efficient query processing through batch execution and new execution plan elements. The document outlines best practices, limitations, issues and workarounds, as well as how to load data with columnstore indexes. It provides references for further reading on columnstore performance tuning.
This document discusses database performance factors for developers. It covers topics like query execution plans, table indexes, table partitioning, and performance troubleshooting. The goal is to help developers understand how to optimize database performance. It provides examples and recommends analyzing execution plans, properly indexing tables, partitioning large tables, and using a structured approach to troubleshooting performance issues.
- Distributed Replay allows replaying a captured workload from multiple client computers to better simulate production loads.
- A controller coordinates the replay across clients to reproduce the original query rates or run in stress test mode faster than original rates.
- It improves on SQL Server Profiler for application compatibility testing, performance debugging, capacity planning, and benchmarking.
- Events are replayed in synchronization mode to match original order, or unsynchronized to stress test without timing constraints.
Who is afraid of Columnstore Indexes? (Michael Zilberstein, DB-Art)
This talk describes new SQL Server 2012 feature called "columnstore index". In this session we will learn about the differences between columnstore indexes and B-Tree indexes we are used to work with. We will see when it is best to use and when not to use this new index. We will cover limitations that columnstore index imposes on the tables that use it and how to live with those limitations. Like in all my sessions, I won't let you go without some internals – how columnstore index is organized on a physical level and how Query Processor works this new type of index. And of course Demos, Demos, Demos…
The document discusses various techniques for performance tuning a database including indexing strategies, query optimization, and hardware upgrades. It provides details on different types of indexes like B-Trees, bitmap indexes, and hash indexes. The summary should recommend indexing on high-cardinality fields that are frequently queried, using the query optimizer to evaluate execution plans, and reviewing hardware needs.
Columnar databases store data by columns rather than rows. This column-oriented approach keeps all attribute information together, improving query performance for analytics workloads that retrieve subsets of columns. However, it increases overhead for write operations like inserts due to needing to modify all columns for each row. Columnar databases are well-suited for analytical workloads with many reads and few writes, like data warehousing.
The document discusses best practices for using Oracle Database In-Memory. It provides an overview of In-Memory and describes how to configure and populate the In-Memory Column Store. It also discusses how the optimizer utilizes In-Memory statistics and hints to optimize queries for In-Memory. Several examples of queries that benefit from In-Memory, such as aggregation queries and queries with predicates, are also provided.
This document provides guidance on database sizing including:
1. Reasons to size a database initially and continually such as selecting hardware, storage requirements, and understanding data characteristics.
2. Common data types and their storage sizes in bytes.
3. How to calculate average row size and the number of rows that fit in a database block.
4. How to calculate the number of blocks needed to store a table based on its number of rows and the rows per block.
5. Differences in sizing indexes compared to tables.
6. The process of sizing all major database objects and adding them to determine total disk space needs.
A quick tour in 16 slides of Amazon's Redshift clustered, massively parallel database.
Find out what differentiates it from the other database products Amazon has, including SimpleDB, DynamoDB and RDS (MySQL, SQL Server and Oracle).
Learn how it stores data on disk in a columnar format and how this relates to performance and interesting compression techniques.
Contrast the difference between Redshift and a MySQL instance and discover how the clustered architecture may help to dramatically reduce query time.
Amazon Redshift is a fully managed data warehouse service that allows for petabyte-scale analytics on data stored in columns. It uses a massively parallel processing architecture and columnar data storage to improve query performance. Defining sort keys and distribution keys appropriately is crucial to influence how data is stored and queries are processed in parallel across nodes. Automatic features like concurrency scaling, resize operations, and backups help ensure the warehouse scales and remains available as data and usage grow over time.
05 Create and Maintain Databases and Tables.pptxMohamedNowfeek1
This document provides an overview of SQL and creating and managing databases and tables in SQL Server 2014. It defines SQL and some common SQL statements used to interact with databases. It then covers creating databases using CREATE DATABASE and creating tables using CREATE TABLE, specifying data types for columns. The document discusses data definition language (DDL) statements for creating, modifying and deleting databases and tables. It provides examples of creating tables with and without attributes like primary keys, foreign keys, check constraints and modifying and deleting tables.
The document provides an overview of NewSQL databases. It discusses why NewSQL databases were created, including the need to handle extreme amounts of data and traffic. It describes some key characteristics of NewSQL databases, such as providing scalability like NoSQL databases while also supporting SQL and ACID transactions. Finally, it reviews some examples of NewSQL database products, like VoltDB and Google Spanner, and their architectures.
An introduction to column store indexes and batch modeChris Adkin
This document discusses column store databases and how they work. It explains that column store databases store data by column rather than row to better utilize modern CPU architectures. It describes how column stores use compression techniques like run-length encoding and dictionaries. It also demonstrates how batch processing and sorting data can improve performance of queries against column stores by keeping more data in CPU caches.
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.
Deep dive into Clustered Columnstore structures with information on compression algorithms, compression types, locking and dictionaries, as well as the Batch Processing Mode.
The document describes C-Store, a column-oriented database management system. Some key points:
- C-Store stores data by column rather than by row to optimize for analytics queries that access a small number of columns from large tables.
- It uses column compression techniques, big disk blocks, and materialized views over columns rather than secondary indexes to improve read performance.
- Updates are handled by a write-optimized column store that periodically merges data into the read-optimized main store using a "tuple mover." This provides a hybrid approach between update-heavy row stores and read-heavy column stores.
Redshift is Amazon's cloud data warehousing service that allows users to interact with S3 storage and EC2 compute. It uses a columnar data structure and zone maps to optimize analytic queries. Data is distributed across nodes using either an even or keyed approach. Sort keys and queries are optimized using statistics from ANALYZE operations while VACUUM reclaims space. Security, monitoring, and backups are managed natively with Redshift.
This document provides an overview of in-memory databases, summarizing different types including row stores, column stores, compressed column stores, and how specific databases like SQLite, Excel, Tableau, Qlik, MonetDB, SQL Server, Oracle, SAP Hana, MemSQL, and others approach in-memory storage. It also discusses hardware considerations like GPUs, FPGAs, and new memory technologies that could enhance in-memory database performance.
MariaDB ColumnStore is a high performance columnar storage engine that supports analytical workloads through SQL. It uses a distributed, massively parallel architecture to provide faster and more efficient queries on large datasets. Key features include its use of a columnar data structure for compression and performance, distributed processing and parallel query execution, and integration with the standard MariaDB interface to allow SQL-based analytics.
Similar to Sql rally 2013 columnstore indexes (20)
Если раньше при старте нового проекта нам нужно было выбрать одну из доступных на тот момент SQL баз данных, то за последние 5 лет ситуация кардинально изменилась. Теперь выбор стал гораздо сложнее. SQL или NoSQL? Сloud или on-premises? Если SQL/NoSQL - то какая именно? А может использовать и то и другое?
В данном докладе мы постараемся представить общий обзор доступных сегодня решений для хранения данных и определиться с критериями выбора.
MS DevDay - SQLServer 2014 for DevelopersДенис Резник
Presentation about hidden treasures inside SQL Server 2014. It was 30 min presentation about 4 features: Cardinality Estimator, Query and Plan Fingerprints, Delayed Durability and TempDB performance.
The document discusses database locks and transaction isolation levels. It begins by defining shared, exclusive, and update locks. It then explains different isolation levels including read uncommitted, read committed, repeatable read, and serializable. Read uncommitted allows dirty reads while read committed does not. Repeatable read prevents non-repeatable reads and serializable prevents phantom records. The document also covers snapshot isolation and how to avoid deadlocks through proper database design and transaction ordering.
Talk is called Deep Dive, so be prepared to hold your breath. In this talk we will take a look at the mechanisms of the SQL Server and literally dive into the bowels of SQL Server, going through all the stages of the request processing.
TechEd 2012 - Сценарии хранения и обработки данных в windows azureДенис Резник
The document provides an agenda for a Windows Azure keynote presentation. It is divided into several sections including an introduction to Windows Azure, creating applications on Windows Azure, solving problems with Windows Azure, Windows Azure internals, storage and data processing scenarios, authentication using Azure ACS, Service Bus for connecting cloud services to on-premises systems, and several lab sessions on topics like Windows Azure Virtual Machines, diagnosing issues, and deploying Active Directory and SharePoint to Windows Azure. Slide details include storage metrics showing over 4 trillion objects stored and average/peak request rates. There are also client logos and case studies presented.
MS Swit 2012 - Windows 8 Application LifecycleДенис Резник
As your customers move your app between the foreground and background, Windows manages your app’s usage of critical system resources. Come learn the fundamentals of Process Lifetime Management and how to structure your app to suspend and resume quickly, save app state properly, and behave consistently. We will also highlight ways to keep your app fresh using push notifications and tile interactivity. Understanding these critical concepts will help you deliver a continuous experience that customers expect.
The new realise of one Microsoft flagman product is approaching – SQL Server. New realizes of SQL Server are not so often, that is why every new realise causes a splash of society activities and wide discussions. We already know the content of the new realise and the product is going through the last stages of testing before the launch. Let’s look, what new things wull bring us new SQL Server.
Масштабирование в SQL Azure - SQL Azure FederationsДенис Резник
Масштабируемость приложения стала критическим элементом дизайна после перехода от модели индивидуальных программ к серверной модели работы приложений. Масштабируемые приложения способны справиться с более активным использованием и возросшим объёмом данных. Масштабируемые приложения обладают высокой доступностью и отказоустойчивостью. Масштабируемые приложения управляемы и сопровождаемы. Всё это справедливо по отношению ко всем слоям приложения, в том числе и к базе данных. Масштабирование базы данных - процесс отнюдь не тривиальный и довольно сложный. SQL Azure, со своей концепцией моментального создания и распределения баз данных внутри ЦОД, является хорошей альтернативой одиночному серверу, а SQL Azure Federations выводят масштабирование в SQL Azure на совершенно новый уровень.
Масштабирование в SQL Azure - SQL Azure FederationsДенис Резник
Масштабируемость приложения стала критическим элементом дизайна после перехода от модели индивидуальных программ к серверной модели работы приложений. Масштабируемые приложения способны справиться с более активным использованием и возросшим объёмом данных. Масштабируемые приложения обладают высокой доступностью и отказоустойчивостью. Масштабируемые приложения управляемы и сопровождаемы. Всё это справедливо по отношению ко всем слоям приложения, в том числе и к базе данных. Масштабирование базы данных - процесс отнюдь не тривиальный и довольно сложный. SQL Azure, со своей концепцией моментального создания и распределения баз данных внутри ЦОД, является хорошей альтернативой одиночному серверу, а SQL Azure Federations выводят масштабирование в SQL Azure на совершенно новый уровень.
This document discusses federations in SQL Azure. Federations allow scaling out data by partitioning it across multiple databases. The document shows an example of a "CustData" federation with members for customer ID ranges that each contain sample data partitions. It also shows an "Orders_Fed" federation partitioned by customer ID that could include related order data spread across multiple databases. In the future, SQL Azure will support federations natively as a way to scale databases beyond a single server.
2. Columnstore indexes
• Column Store vs. Row Store
• Columnstore benefits
• Columnstore indexes
• CS indexes Internals
• Adding data to Columnstore index
3. Row Store and Column Store
In row store, data is stored tuple by tuple.
In column store, data is stored column by column
3
4. Row Store and Column Store
name address
Most of the queries does not id city state age
process all the attributes of a
particular relation.
SELECT c.name, c.address
FROM Customers c
WHERE c.region = ‘Moskow'
4
5. Row Store and Column Store
Row Store Column Store
(+) Easy to add/modify a record (+) Only need to read in relevant data
(-) Might read in unnecessary data (-) Tuple writes require multiple accesses
So column stores are suitable for read-mostly, read-intensive,
large data repositories
5
6. Compression
Trades I/O for CPU
Higher data value locality in column stores
Techniques such as run length encoding far more useful
Schemes
Null Suppression
Dictionary encoding
Run Length encoding
Bit-Vector encoding
Heavyweight schemes
6
9. Improved Data Warehouse Query performance
Columnstore indexes provide an
easy way to significantly improve
data warehouse and decision
support query performance against
very large data sets
Performance improvements for
“typical” data warehouse queries
from 10x to 100x
Ideal candidates include queries
against star schemas that use
filtering, aggregations and grouping
against very large fact tables
10
10. Good Candidates for Columnstore
Indexing
Table candidates:
Very large fact tables (for example – billions of rows)
Larger dimension tables (millions of rows) with compression friendly column
data
If unsure, it is easy to create a columnstore index and test the impact on
your query workload
Query candidates (against table with a columnstore index):
Scan versus seek (columnstore indexes don’t support seek operations)
Aggregated results far smaller than table size
Joins to smaller dimension tables
Filtering on fact / dimension tables – star schema pattern
Sub-set of columns (being selective in columns versus returning ALL
columns) 11
12. Defining the Columnstore Index
Base
OR
Columnstore index is nonclustered
table (secondary)
Clustered Heap
index Base table can be clustered index or heap
One CS index per table
Multiple other nonclustered (B-tree)
Nonclustered Nonclustered Nonclustered
index index columnstore indexes allowed
index
But may not be needed
CS index must be partition-aligned if table
is partitioned
Indexed Filtered
view index
13. segment 1
Column Segments and
Dictionaries
C1 C2 C3 C4 C5 C6
Set of about
1M rows
… dictionaries
segment N
Column
Segment
15
14. Memory management
• Memory management is automatic
• Columnstore is persisted on disk
• Needed columns fetched into memory
• Columnstore segments flow between disk and memory
SELECT C2,
SUM(C4)
T.C1 T.C2 T.C3 T.C4 FROM T T.C4
T.C2
T.C1
T.C3
GROUP BY C2;
T.C1 T.C2 T.C3 T.C4
T.C1 T.C4
T.C2
T.C3
T.C1 T.C3 T.C4
T.C2
16
16. Xvelocity
Microsoft SQL Server family of memory-optimized and
in-memory technologies
xVelocity In-Memory Analytics Engine
xVelocity Memory-Optimized Columnstore Indexes
The xVelocity engine is designed with 3 principles in
mind:
Performance, Performance, Performance! 18
17. How Are These Performance Gains
Achieved?
Two complimentary technologies:
Storage
Data is stored in a compressed columnar data format (stored
by column) instead of row store format (stored by row).
New “batch mode” execution
Vector-based query execution capability
Data can then be processed in batches versus row-by-row
Depending on filtering and other factors, a query may also
benefit by “segment elimination” - bypassing million row
chunks (segments) of data, further reducing I/O 19
18. Batch mode processing
Batch object
Process ~1000 rows at
Column vectors
a time
bitmap of qualifying rows
Vector operators
implemented
Greatly reduced CPU
time (7 to 40X)
19. Segment Elimination
select Date, count(*)
from dbo.Purchase
where Date >= 20120201
column_i group by Date
segment_id min_data_id max_data_id
d
1 1 20120101 20120131
1 2 20120115 20120215
1 3 20120201 20120228
20. Columnstore format + batch mode
Variations
Columnstore indexing alone + traditional row mode in
Query Processor
Columnstore indexing + batch mode in Query
Processor
Columnstore indexing + hybrid of batch and traditional
row mode in Query Processor
23
21. Plan operators supported in batch mode
Filter
Project
Scan
Local hash (partial) aggregation
Hash inner join
(Batch) hash table build
24
23. Maintaining Data in a Columnstore Index
Once built, the table becomes “read-only” and
INSERT/UPDATE/DELETE/MERGE is no longer
allowed
ALTER INDEX REBUILD / REORGANIZE not
allowed
How can I modify index data?
Drop columnstore index / make modifications / add
columnstore index
UNION ALL (but be sure to validate performance)
Partition switches (IN and OUT) 27
25. Summary
SQL Server 2012 offers significantly faster query performance
for data warehouse and decision support scenarios
10x to 100x performance improvement depending on the schema
and query
I/O reduction and memory savings through columnstore compressed
storage
CPU reduction with batch versus row processing, further I/O reduction if
segmentation elimination occurs
Easy to deploy and requires less management than some legacy
ROLAP or OLAP methods
No need to create intermediate tables, aggregates, pre-processing and
cubes
Interoperability with partitioning
29
26. Resources
Columnar Storage in SQL Server 2012 (PDF)
SQL Server Columnstore Performance Tuning
Inside the SQL Server 2012 Columnstore Indexes
24 HOP Russia 2013 – Dmitry Pilyugin (video - rus)
SQL Server Columnstore Performance Tuning (video)
30
27. SQL SERVER 2012 - COLUMNSTORE INDEXES
Denis Reznik
Senior Database Architect at The Frayman Group
Microsoft SQL Server MVP
denisreznik@live.ru
@denisreznik
http://reznik.uneta.com.ua