This document contains summaries of presentations and information about the #JSS2015 conference on SQL Server 2015 organized by GUSS. It provides information on speakers David Barbarin and Frédéric Pichaut and topics to be covered including columnstore architecture, columnstore improvements in SQL 2016, in-memory OLTP architecture and improvements, and remaining unsupported in-memory features.
This webinar by Volodymyr Trishyn (Senior Software Engineer, Consultant, GlobalLogic) was delivered at On Air webinar #15 on July 31, 2020.
Webinar agenda:
- SQL Database
- Azure SQL Data Warehouse
- Azure SQL Elastic Database Pool
- Geo-replication
- Distributed Transactions
- Transaction Isolation Level
- Table Partitioning
- Materialized View Pattern
More details and presentation: https://www.globallogic.com/ua/about/events/webinar-azure-sql/
Introduction into the world of Clustered Columnstore Indexes in SQL Server 2014, with explanations of the basic structures and functionalities.
Available data types, limitations and differences to SQL Server 2012 Nonclustered Columnstore Indexes are all described here
SkySQL implements a groundbreaking, state-of-the-art architecture based on Kubernetes and ServiceNow, and with a strong emphasis on cloud security – using compartmentalization and indirect access to secure and protect customer databases.
In this session, we’ll walk through the architecture of SkySQL and discuss how MariaDB leverages an advanced Kubernetes operator and powerful ServiceNow configuration/workflow management to deploy and manage databases on cloud infrastructure.
This DataStage internet Training will furnish you with the capability expected to work with the IBM DataStage. DataStage is an ETL device that uses a graphical documentation for the combination of information. This is the lead result of IBM in Business Intell
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...ScyllaDB
To maximize the benefits of ScyllaDB, you must adapt the structure of your data. Data modeling for ScyllaDB should be query-driven based on your access patterns – a very different approach than normalization for SQL tables. In this session, you will learn how tools can help you migrate your existing SQL structures to accelerate your digital transformation and application modernization.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
This webinar by Volodymyr Trishyn (Senior Software Engineer, Consultant, GlobalLogic) was delivered at On Air webinar #15 on July 31, 2020.
Webinar agenda:
- SQL Database
- Azure SQL Data Warehouse
- Azure SQL Elastic Database Pool
- Geo-replication
- Distributed Transactions
- Transaction Isolation Level
- Table Partitioning
- Materialized View Pattern
More details and presentation: https://www.globallogic.com/ua/about/events/webinar-azure-sql/
Introduction into the world of Clustered Columnstore Indexes in SQL Server 2014, with explanations of the basic structures and functionalities.
Available data types, limitations and differences to SQL Server 2012 Nonclustered Columnstore Indexes are all described here
SkySQL implements a groundbreaking, state-of-the-art architecture based on Kubernetes and ServiceNow, and with a strong emphasis on cloud security – using compartmentalization and indirect access to secure and protect customer databases.
In this session, we’ll walk through the architecture of SkySQL and discuss how MariaDB leverages an advanced Kubernetes operator and powerful ServiceNow configuration/workflow management to deploy and manage databases on cloud infrastructure.
This DataStage internet Training will furnish you with the capability expected to work with the IBM DataStage. DataStage is an ETL device that uses a graphical documentation for the combination of information. This is the lead result of IBM in Business Intell
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...ScyllaDB
To maximize the benefits of ScyllaDB, you must adapt the structure of your data. Data modeling for ScyllaDB should be query-driven based on your access patterns – a very different approach than normalization for SQL tables. In this session, you will learn how tools can help you migrate your existing SQL structures to accelerate your digital transformation and application modernization.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Introducing the ultimate MariaDB cloud, SkySQLMariaDB plc
SkySQL is the first and only database-as-a-service (DBaaS) engineered for MariaDB by MariaDB, to use a state-of-the-art multi-cloud architecture built on Kubernetes and ServiceNow, and to deploy databases and data warehouses for transactional, analytical and hybrid transactional/analytical workloads.
In this session, we’ll lay out the vision for SkySQL, provide an overview of its capabilities, take a tour of its architecture, and discuss the long-term roadmap. We’ll wrap things up with a live demo of SkySQL, including a preview of its deep learning–based workload analysis and visualization interface.
MySQL can now be used as a document store, combining the flexibility of the document store model with the power of the relational model. You’ll understand why you’ll be able to choose MySQL for your Relational AND Document Store needs, avoiding significant trade-offs and being forced into choosing multiple solutions.
CCV: migrating our payment processing system to MariaDBMariaDB plc
CCV is a Dutch payment processor and loyalty provider. CCV's current payment processing platform is built on top of Microsoft SQL Server, but they are currently in the process of migrating it to MariaDB. This migration project is in progress and first production transactions are expected to run in 2020. In this session, Ernst Wernicke and Harry Dijkstra of CCV share how they are using MariaDB to meet critical high availability requirements, including geographic replication, zero data-loss, zero downtime (both planned and unplanned) and no single point of failure anywhere.
What to expect from MariaDB Platform X5, part 1MariaDB plc
MariaDB Platform X5 will be based on MariaDB Enterprise Server 10.5. This release includes Xpand, a fully distributed storage engine for scaling out, as well as many new features and improvements for DBAs and developers alike, including enhancements to temporal tables, additional JSON functions, a new performance schema, non-blocking schema changes with clustering and a Hashicorp Vault plugin for key management.
In this session, we’ll walk through all of the new features and enhancements available in MariaDB Enterprise Server 10.5. In addition, we will highlight those being backported to maintenance releases of MariaDB Enterprise Server 10.2, 10.3 and 10.4.
ClustrixDB: how distributed databases scale outMariaDB plc
ClustrixDB, now part of MariaDB, is a fully distributed and transactional RDBMS for applications with the highest scalability requirements. In this session Robbie Mihalyi, VP of Engineering for ClustrixDB, provides an introduction to ClustrixDB, followed by an in-depth technical overview of its architecture, with a focus on distributed storage, transactions and query processing – and its unique approach to index partitioning.
En este diapositivas der Microsoft podemos ver qué aporta SQL 2014 en áreas como: Tablas optimizadas en memòria, Cambios en estimacion de la cardinalidad, Cifrado de los Backups, Mejoras en arquitectures, Always On, Cambios en Resource Governor, Data files en Azure.
MySQL 8.0 is the latest Generally Available version of MySQL. This session will give a brief introduction to MySQL 8.0 and help you upgrade from older versions, understand what utilities are available to make the process smoother and also understand what you need to bear in mind with the new version and considerations for possible behaviour changes and solutions. It really is a simple process.
Making MySQL Great For Business IntelligenceCalpont
This presentation describes how to make MySQL a great database for business intelligence, and presents a special focus on column databases and InfiniDB from Calpont
Scylla Summit 2016: ScyllaDB, Present and FutureScyllaDB
Where is Scylla now and where is it going? ScyllaDB's CTO Avi Kivity outlines the 3 ScyllaDB Commitments, and gives an overview of the ScyllaDB road map.
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
Get a look under the covers: Learn tuning best practices for taking advantage of Amazon Redshift's columnar technology and parallel processing capabilities to improve your delivery of queries and improve overall database performance. This session explains how to migrate from existing data warehouses, create an optimized schema, efficiently load data, use workload management, tune your queries, and use Amazon Redshift's interleaved sorting features.You’ll then hear from a customer who has leveraged Redshift in their industry and how they have adopted many of the best practices. Learn More: https://aws.amazon.com/government-education/
Migrating Apache Hive Workload to Apache Spark: Bridge the Gap with Zhan Zhan...Databricks
At Spark Summit 2017, we described our framework to migrate production Hive workload to Spark with minimal user intervention. After a year of migration, Spark now powers an important part of our batch processing workload. The migration framework supports syntax compatibility analysis, offline/online shadowing, and data validation.
In this session, we first introduce new features and improvements in the migration framework to support bucketed tables and increase automation. Next, we will deep dive into the top technical challenges we encountered and how we addressed them. We improved the the syntax compatibility between Hive and Spark from around 51% to 85% by identifying/developing top missing features, fixing incompatible UDFs, and implementing a UDF testing framework. In addition, we developed reliable join operators to improve Spark stability in production when leveraging optimizations such as ShuffledHashJoin.
Finally, we will share an update on our overall migration effort and examples of migrations wins. For example, we were able to migrate one of the most complicated workloads in Facebook from Hive to Spark with more than 2.5X performance gain.
Oracle 12c New Features For Better PerformanceZohar Elkayam
Oracle 12cR1 and 12cR2 came with some great features for better performance and scaling. In this session we will talk about some of the new features that might improve performance greatly: Optimizer changes, adaptive plans improvements, changes to statistics gathering and we'll get to know Oracle 12cR2 new sharding option
On the agenda:
- Oracle Database In Memory (Column Store)
- Oracle Sharding (12.2.0.1)
- Optimizer changes in 12c
- Statistics changes in 12c.
Presented first at ilOUG - Israel Oracle User Group meetup in February 2017.
[including promised hidden slide.. :) ]
Introducing the ultimate MariaDB cloud, SkySQLMariaDB plc
SkySQL is the first and only database-as-a-service (DBaaS) engineered for MariaDB by MariaDB, to use a state-of-the-art multi-cloud architecture built on Kubernetes and ServiceNow, and to deploy databases and data warehouses for transactional, analytical and hybrid transactional/analytical workloads.
In this session, we’ll lay out the vision for SkySQL, provide an overview of its capabilities, take a tour of its architecture, and discuss the long-term roadmap. We’ll wrap things up with a live demo of SkySQL, including a preview of its deep learning–based workload analysis and visualization interface.
MySQL can now be used as a document store, combining the flexibility of the document store model with the power of the relational model. You’ll understand why you’ll be able to choose MySQL for your Relational AND Document Store needs, avoiding significant trade-offs and being forced into choosing multiple solutions.
CCV: migrating our payment processing system to MariaDBMariaDB plc
CCV is a Dutch payment processor and loyalty provider. CCV's current payment processing platform is built on top of Microsoft SQL Server, but they are currently in the process of migrating it to MariaDB. This migration project is in progress and first production transactions are expected to run in 2020. In this session, Ernst Wernicke and Harry Dijkstra of CCV share how they are using MariaDB to meet critical high availability requirements, including geographic replication, zero data-loss, zero downtime (both planned and unplanned) and no single point of failure anywhere.
What to expect from MariaDB Platform X5, part 1MariaDB plc
MariaDB Platform X5 will be based on MariaDB Enterprise Server 10.5. This release includes Xpand, a fully distributed storage engine for scaling out, as well as many new features and improvements for DBAs and developers alike, including enhancements to temporal tables, additional JSON functions, a new performance schema, non-blocking schema changes with clustering and a Hashicorp Vault plugin for key management.
In this session, we’ll walk through all of the new features and enhancements available in MariaDB Enterprise Server 10.5. In addition, we will highlight those being backported to maintenance releases of MariaDB Enterprise Server 10.2, 10.3 and 10.4.
ClustrixDB: how distributed databases scale outMariaDB plc
ClustrixDB, now part of MariaDB, is a fully distributed and transactional RDBMS for applications with the highest scalability requirements. In this session Robbie Mihalyi, VP of Engineering for ClustrixDB, provides an introduction to ClustrixDB, followed by an in-depth technical overview of its architecture, with a focus on distributed storage, transactions and query processing – and its unique approach to index partitioning.
En este diapositivas der Microsoft podemos ver qué aporta SQL 2014 en áreas como: Tablas optimizadas en memòria, Cambios en estimacion de la cardinalidad, Cifrado de los Backups, Mejoras en arquitectures, Always On, Cambios en Resource Governor, Data files en Azure.
MySQL 8.0 is the latest Generally Available version of MySQL. This session will give a brief introduction to MySQL 8.0 and help you upgrade from older versions, understand what utilities are available to make the process smoother and also understand what you need to bear in mind with the new version and considerations for possible behaviour changes and solutions. It really is a simple process.
Making MySQL Great For Business IntelligenceCalpont
This presentation describes how to make MySQL a great database for business intelligence, and presents a special focus on column databases and InfiniDB from Calpont
Scylla Summit 2016: ScyllaDB, Present and FutureScyllaDB
Where is Scylla now and where is it going? ScyllaDB's CTO Avi Kivity outlines the 3 ScyllaDB Commitments, and gives an overview of the ScyllaDB road map.
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
Get a look under the covers: Learn tuning best practices for taking advantage of Amazon Redshift's columnar technology and parallel processing capabilities to improve your delivery of queries and improve overall database performance. This session explains how to migrate from existing data warehouses, create an optimized schema, efficiently load data, use workload management, tune your queries, and use Amazon Redshift's interleaved sorting features.You’ll then hear from a customer who has leveraged Redshift in their industry and how they have adopted many of the best practices. Learn More: https://aws.amazon.com/government-education/
Migrating Apache Hive Workload to Apache Spark: Bridge the Gap with Zhan Zhan...Databricks
At Spark Summit 2017, we described our framework to migrate production Hive workload to Spark with minimal user intervention. After a year of migration, Spark now powers an important part of our batch processing workload. The migration framework supports syntax compatibility analysis, offline/online shadowing, and data validation.
In this session, we first introduce new features and improvements in the migration framework to support bucketed tables and increase automation. Next, we will deep dive into the top technical challenges we encountered and how we addressed them. We improved the the syntax compatibility between Hive and Spark from around 51% to 85% by identifying/developing top missing features, fixing incompatible UDFs, and implementing a UDF testing framework. In addition, we developed reliable join operators to improve Spark stability in production when leveraging optimizations such as ShuffledHashJoin.
Finally, we will share an update on our overall migration effort and examples of migrations wins. For example, we were able to migrate one of the most complicated workloads in Facebook from Hive to Spark with more than 2.5X performance gain.
Oracle 12c New Features For Better PerformanceZohar Elkayam
Oracle 12cR1 and 12cR2 came with some great features for better performance and scaling. In this session we will talk about some of the new features that might improve performance greatly: Optimizer changes, adaptive plans improvements, changes to statistics gathering and we'll get to know Oracle 12cR2 new sharding option
On the agenda:
- Oracle Database In Memory (Column Store)
- Oracle Sharding (12.2.0.1)
- Optimizer changes in 12c
- Statistics changes in 12c.
Presented first at ilOUG - Israel Oracle User Group meetup in February 2017.
[including promised hidden slide.. :) ]
Breakout: Operational Analytics with HadoopCloudera, Inc.
Operationalizing models and responding to large volumes of data, fast, requires bolt on systems that can struggle with processing (transforming the data), consistency (always responding to data), and scalability (processing and responding to large volumes of data). If the data volume become too large, these traditional systems fail to deliver their responses resulting in significant losses to organizations. Join this breakout to learn how to overcome the roadblocks.
Operational Analytics at Credit Suisse from ThousandEyes ConnectThousandEyes
Darrell Westbury, Director of Operational Analytics at Credit Suisse, presents on how the global bank collects five types of IT operations data, analyzes it and uses it to derive insights.
Operational Analytics Using Spark and NoSQL Data StoresDATAVERSITY
NoSQL data stores have emerged for scalable capture and real-time analysis of data. Apache Spark and Hadoop provide additional scalable analytics processing. This session looks at these technologies and how they can be used to support operational analytics to improve operational effectiveness. It also looks at an example of how operational analytics can be implemented in NoSQL environments using the Basho Data Platform with Apache Spark:
•The emergence of NoSQL, Hadoop and Apache Spark
•NoSQL Use Cases
•The need for operational analytics
•Types of operational analysis
•Key requirements for operational analytics
•Operational analytics using the Basho Data Platform with Apache Spark.
One of the most powerful ways to apply advanced analytics is by putting them to work in operational systems. Using analytics to improve the way every transaction, every customer, every website visitor is handled is tremendously effective. The multiplicative effect means that even small analytic improvements add up to real business benefit.
This is the slide deck from the Webinar. James Taylor, CEO of Decision Management Solutions, and Dean Abbott of Abbott Analytics discuss 10 best practices to make sure you can effectively build and deploy analytic models into you operational systems. webinar recording available here: https://decisionmanagement.omnovia.com/archives/70931
SQL Server 2022 Programmability & PerformanceGianluca Hotz
SQL Server 2022 has introduced many new features across all areas of the product. In this session, we will focus on the news regarding programmability and performance improvements.
With the recent release of SQL Server 2016 SP1 providing a consistent programming surface area has generated quite a buzz in the SQL Server community. SQL Server 2016 SP1 allows businesses of all sizes to leverage full feature set such as In-Memory technologies on all editions of SQL Server to get enterprise grade performance. This presentation focuses on the new improvements, new limits on the lower editions, differentiating factors and key scenarios enabled by SQL Server 2016 SP1 which makes SQL Server 2016 SP1 an obvious choice for the customers. This session was delivered to PASS VC DBA fundamentals chapter for everyone to learn about these exciting new improvements announced with SQL Server 2016 SP1 to ensure they are leveraging them to maximize performance and throughput of your SQL Server environment.
This is a summary of the sessions I attended at PASS Summit 2017. Out of the week-long conference, I put together these slides to summarize the conference and present at my company. The slides are about my favorite sessions that I found had the most value. The slides included screenshotted demos I personally developed and tested alike the speakers at the conference.
Machine Learning on Distributed Systems by Josh PoduskaData Con LA
Abstract:- Most real-world data science workflows require more than multiple cores on a single server to meet scale and speed demands, but there is a general lack of understanding when it comes to what machine learning on distributed systems looks like in practice. Gartner and Forrester do not consider distributed execution when they score advanced analytics software solutions. Many formal machine learning training occurs on single node machines with non-distributed algorithms. In this talk we discuss why an understanding of distributed architectures is important for anyone in the analytical sciences. We will cover the current distributed machine learning ecosystem. We will review common pitfalls when performing machine learning at scale. We will discuss architectural considerations for a machine learning program such as the role of storage and compute and under what circumstances they should be combined or separated.
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDenny Lee
This is Nicholas Dritsas, Eric Jacobsen, and my 2007 SQL PASS Summit presentation on designing, building, and maintaining large Analysis Services cubes
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Michael Rys
When processing TB and PB of data, running your Big Data queries at scale and having them perform at peak is essential. In this session, we show you some state-of-the art tools on how to analyze U-SQL job performances and we discuss in-depth best practices on designing your data layout both for files and tables and writing performing and scalable queries using U-SQL. You will learn how to analyze performance and scale bottlenecks and will learn several tips on how to make your big data processing scripts both faster and scale better.
Real-time Big Data Analytics Engine using ImpalaJason Shih
Cloudera Impala is an open-source under Apache Licence enable real-time, interactive analytical SQL queries of the data stored in HBase or HDFS. The work was inspired by Google Dremel paper which is also the basis for Google BigQuery. It provide access same unified storage platform base on it's own distributed query engine but does not use mapreduce. In addition, it use also the same metadata, SQL syntax (HiveQL-like) ODBC driver and user interface (Hue Beeswax) as Hive. Besides the traditional Hadoop approach, aim to provide low-cost solution for resiliency and batch-oriented distributed data processing, we found more and more effort in the Big Data world pursuing the right solution for ad-hoc, fast queries and realtime data processing for large datasets. In this presentation, we'll explore how to run interactive queries inside Impala, advantages of the approach, architecture and understand how it optimizes data systems including also practical performance analysis.
Storage Optimization and Operational Simplicity in SAP Adaptive Server Enter...SAP Technology
This presentation will discuss the key storage optimization and operational simplicity features available in SAP ASE and introduce enhancements such as heat map providing the capability to move data to high/low performing storage devices based on access patterns.
Migrating on premises workload to azure sql databasePARIKSHIT SAVJANI
Azure SQL Database is a fully managed cloud database service with built-in intelligence, elastic scale, performance, reliability, and data protection that enables enterprises and ISVs to reduce their total cost of ownership and operational cost and overheads. In this session, I will share real-world experience of successfully migrated existing SaaS application and on-premises workload for some our tier 1 customers and ISV partners to Azure SQL Database service. The session walks through planning, assessment, migration tools and best practices from the proven experiences and practices of migrating real world applications to Azure SQL Database service.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
5. #JSS2015
Columnstore Architecture
C1 C2 C3 C4 C5
Row group
Set of rows (typically 1 million rows)
Segment
Contains values from one column for the row group
Unit of transfer between disk and memory
1- Encoding
Dictionary
> Value: Base / Scale
> Bit packing
> RLE
…
Vertipaq
2- Binary
compression
(3)- Archive
compression
Xpress 8
+
+
C1 C2 C3
C2 C4
C1 C5
C3
C5
LOB
C1 C2 C3 C4 C5
Tuple mover
6. #JSS2015
Columnstore & Batch mode
Scan
Predicate
I/O significantly reduced by
> Encoding & compression
> Segment elimination
Next challenge: CPU usage – how
to improve performance in this area?
Get row
Filtered row
X
X
X
Batch object
Column vector
C1 C2
Scan
Predicate
Get batch
Filtered batch
Process multiple rows in batch for efficiency (~1K rows)
> Using of SIMD instructions
> Optimized for 64 bits values of the register
> Significant reduction in function calls = less CPU time(7 – 40x)
Reduction of CPU latency and cache misses
> Optimized for CPU L2 cache and avoiding cache misses
> Aggressive memory pre-fetch (sequential vs random)
7. #JSS2015
SQL 2016 Columnstore Improvements
• Tables:
• Primary key, foreign keys, CDC, Triggers, temporal tables
• Change tracking (NCCI only)
• Transaction isolation level SI and RCSI
• Availability groups and readonly secondary replicas support
• CI introduced in operational analytics
• Disk-based table + NCCI (updatable and filter capabilities)
• CCI + nonclustered indexes
• In-memory table + NCCI
• ALTER TABLE .. REORGANIZE for dealing with fragmentation
• New or enhanced DMVS
• sys.dm_db_column_store_row_group_*
• sys.dm_db_index_* and sys.dm_xtp_*
9. #JSS2015
Relational Table
(Clustered Index/Heap)
Btree Index
Hot
Deletebitmap
Delta rowgroups
Operational analytics: columnstore indexes & In-Memory tables
Unified view for OLTP & DW developers
High performance past period DW
Data Lifecycle
Real-time operational OLTP
Create Columnstore only on cold data – using filtered predicate to minimize maintenance
create nonclustered columnstore index ….. where order_status = ‘SHIPPED’
CCI w / High compression
Deletebitmap
Delta rowgroups
11. #JSS2015
Memory-optimized Table
Filegroup
SQL Server.exe
Hekaton Engine: Memory_optimized Tables
& Indexes
TDS Handler and Session Management
OLTP In-Memory Architecture
Natively Compiled
SPs and Schema
Buffer Pool for Tables & Indexes
Proc/Plan cache for ad-hoc T-
SQL and SPs
Client App
Transaction Log
Query
Interop
Non-durable TableT1
T4T3
T2
T1
T4T3
T2
T1
Tables
Indexes
Interpreter for TSQL, query plans,
expressions
T1 T4T3
T4T3
Checkpoint & Recovery
Access Methods
Parser,
Catalog,
Algebrizer,
Optimizer
Hekaton
Compiler
Hekaton
Component
Key
Existing SQL
Component
Generated .dll
Data File Group
T4T3
T4T3
12. #JSS2015
SQL 2016 - Improved Scaling - Storage
• SQL Server 2014
• SQL Server 2016
– Multiple offline checkpoint threads
– Goal: 1GB/s of log generation [work in progress]
–
Single Offline Checkpoint Thread
Del Tran2
(TS 450)
Del Tran3
(TS 250)
Del
Tran1(TS150)
Insert into
Hekaton T1
Log in SQL
Table
Del Tran1
(TS150)
Del Tran2
(TS 450)
Del Tran3
(TS 250)
Insert into
Hekaton T1SQL Transaction log
disk
SQL Transaction log Del Tran2
(TS 450)
Del Tran3
(TS 250)
Del
Tran1(TS150)
Insert into
Hekaton T1
Log in SQL
Table
Del Tran1
(TS150)
Del Tran2
(TS 450)
Del Tran3
(TS 250)
Insert into
Hekaton T1
disk
Offline Checkpoint Thread Offline Checkpoint Thread Offline Checkpoint Thread
13. #JSS2015
SQL 2016 In-Memory Improvements
• Tables:
• ALTER TABLE (offline, requires 2X memory)
• Identity columns
• Indexes on NULLable columns
• COLUMNSTORE indexes
• FOREIGN KEY, CHECK, UNIQUE constraints.
• Change HASH index bucket_count through index REBUILD
• Add/drop index supported
• Stats improvements: auto-update and sampled stats
• More than 8 indexes.
• DML Triggers (Only AFTER triggers; no INSTEADOF; Natively compiled)
• 2TB of user data in durable tables (in SQL 2014 Max 256GB)
• Full collations support
14. #JSS2015
SQL 2016 In-Memory Improvements
• Native Procedure:
• ALTER PROC and sp_recompile (online and recompilation Durant execution)
• Nested native procedures
• Natively Compiled Scalar UDFs (Access from both native and interop)
• Native Inline Table-Valued Functions
• EXECUTE AS CALLER
• Security and math built-ins
• Full collations support in native modules
• Query surface area
• Subqueries
• LIKE operator
• {LEFT|RIGHT} OUTER JOIN
• Disjunction (OR, NOT)
• UNION [ALL]
• SELECT DISTINCT
• Subqueries (EXISTS, IN, scalar)
• Parallel scan for memory-optimized indexes
• MARS (Multiple Active Result Sets) support
• TDE (Transparent Data Encryption) (on-disk data files encrypted once TDE is enabled)
15. #JSS2015
Remaining In-Memory Unsupported Features
Tables
• DDL triggers and Transactional DDL
• Data types: XML, Spatial, CLR types,
datetimeoffset, rowversion, sql_variant
• ALTER TABLE ONLINE
• Cross-container transaction limitations:
snapshot/snapshot, serializable/serializable;
SAVEPOINT
• Online Migration of disk-based tables to
memory-optimized
Interop: cross-database queries, MERGE
INTO, locking hints
Native Compilation
• CASE, MERGE, JOIN in UPDATE/DELETE; DML
with OUTPUT; VIEWs
• Natively Compiled Table-Valued Functions
• Automatic and statement-level recompile
• Access to disk-based tables
• Must be schema-bound; no dynamic T-SQL
• Parallelism; limitations in query operators
(hash join/agg, merge join)
Management: replication; DB
snapshot; CDC; compression
16. #JSS2015
In-Memory OLTP Table
Updateable CCI
Tail
DRT
Range Index
Hash Index
Hot
Relational Table
(Clustered Index/Heap)
Btree Index
Hot
Deletebitmap
Delta rowgroups
Operational analytics: columnstore indexes & In-Memory tables
Unified view for OLTP & DW developers
High performance past period DW
Data Lifecycle
Real-time operational OLTP
Create Columnstore only on cold data – using filtered predicate to minimize maintenance
create nonclustered columnstore index ….. where order_status = ‘SHIPPED’
CCI w / High compression
Deletebitmap
Delta rowgroups