SlideShare a Scribd company logo
ETL Patterns 
with Clustered Columnstore 
Niko Neugebauer, OH22
Please silence cell phones
Explore Everything PASS Has to Offer 
Free SQL Server and BI Web Events 
Free 1-day Training Events 
Regional Event 
Local User Groups Around the World 
Free Online Technical Training 
This is Community 
Business Analytics Training 
Session Recordings 
PASS Newsletter
Session Evaluations 
ways to access 
Go to 
passsummit.com/evals 
Download the GuideBook App 
and search: PASS Summit 2014 
Follow the QR code link displayed 
on session signage throughout the 
conference venue and in the 
program guide 
Submit by 11:59 PM EST 
Friday Nov. 7 to 
WIN prizes 
Your feedback is 
important and valuable. 
Evaluation Deadline: 
11:59 PM EST, Sunday Nov. 16
Niko Neugebauer 
Microsoft Data Platform Professional 
OH22(http://www.oh22.net) 
SQL Server MVP 
Founder & Co-Founder of 3 Portuguese PASS Chapters 
In-Memory VC co-lead 
Blog: http://www.nikoport.com 
Twitter: @NikoNeugebauer 
LinkedIn: http://pt.linkedin.com/in/webcaravela 
Email: info@webcaravela.com
Oha! Since this is 400-level session: 
And so I assume: 
•You know the Columnstore Structure Elements, such as Row Groups, Delta-Stores, & Deleted Bitmaps. 
•You do understand Segment Elimination. 
•You have heard about Tuple Mover.  
•You know what Columnstore Dictionary is, what types of dictionaries do exist and why. 
•You know what a Batch Mode is.
Clustered Columnstore: 
•Delta-Stores (open & close) 
•Deleted Bitmap 
•Update work as a 
DELETE + INSERT
General Thoughts: 
Working with CCI requires a lot of resources, and so you might want/need to: 
•Use Resource Governor –control memory grants and impact on CPU & IO. (You should consider capping the number of cores & memorygiven to Columnstore.) 
•Use Partitioning–it helps you to control the impact.
ETL, ELT, LTE –My Today’s Agenda 
•E–ExtractData from Columnstore 
•T–Transform(Maintain) 
•L–Load data into Columnstore
Extract 
•Use Batch Processing, unless you prove that it does not help (You will need DOP>=2) Ω 
•Avoid Delta Stores (Force Tuple Mover to close & compress them with Reorganize with (CLOSE_ALL_DELTA_STORES))Ω
Consider Example of this Column:
Extraction with Segment Clustering 
•Use Column Cluserting (You will need 1 to keep it perfect (ref: 2014 RTM + CU2)) Ω 
•Use Partitioning: 
it helps Column Clustering Rebuilds with DOP = 1 (Resource & impact optimization)
Extract continued 
CCI Rebuild is a half-online, this means that you can read data but writing new info is not available. 
•Use SELECT INTO, which became parallel execution plan in SQL Server 2014
Transform (Maintain) 
•Do notrebuild CCI unless you really need it. 
•Columnstore is not a Rowstore. 
•What is the percentage of your 
data that generally became deleted? 
•What is the improvement that you are looking for ? Ω
Transform (Maintain) 
Rebuild if you have a good window and need to realign column clustering. 
If Rebuilding frequently, use partitioning –it helps you to minimize impact and control better 
Create your own Columnstore Resource Group in Resource Governor Ω
Load 
•Use Resource Governor 
•Use Partitioning(especially for Switching In) 
•Always use Batch Processing, unless proven that you don’t need it (You will need DOP>=2) 
•Use BULK Load API (with more than 102.400 rows) 
•Avoid using open Delta-Stores 
•Use Parallel T-SQL (SELECT INTO)
Load Strategies 
ColumnstoreDemo
Load Scenarios 
Things to bear in mind: 
Regarding Dictionaries: 
•Global Dictionaries are created only on the first load/build and than they are never updated 
•The Sampling is Max (1 Million Rows, 1%) 
•Single Threaded
Load Scenarios 
Global Dictionaries are not your Global Saviours: 
•You need to have similar data in your column in order to get their usage 
•You might actually get much more compact & efficient Local Dictionaries
Questions?
Thank you!
Links: 
My blog series on ColumnstoreIndexes (40+ Blogposts): 
•http://www.nikoport.com/columnstore/ 
Remus RusanuIntroduction for Clustered Columnstore: 
•http://rusanu.com/2013/06/11/sql-server-clustered- columnstore-indexes-at-teched-2013/ 
White Paper on the Clustered Columnstore: 
•http://research.microsoft.com/pubs/193599/Apollo3%20- %20Sigmod%202013%20-%20final.pdf
Connect Items 
Implement Batch Mode Support for Row Store: 
•https://connect.microsoft.com/SQLServer/Feedback/Details/ 938021 
Multi-threaded rebuilds of Clustered ColumnstoreIndexes break the sequence of pre-sorted segment ordering (Order Clustering): 
•https://connect.microsoft.com/SQLServer/Feedback/Details/912452 
ColumnstoreSegments Maintenance –Remove & Merge: 
•https://connect.microsoft.com/SQLServer/Feedback/Details/930664

More Related Content

What's hot

Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
Ashnikbiz
 
Membase Introduction
Membase IntroductionMembase Introduction
Membase Introduction
Membase
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
HBaseCon
 
Cloudant Overview Bluemix Meetup from Lisa Neddam
Cloudant Overview Bluemix Meetup from Lisa NeddamCloudant Overview Bluemix Meetup from Lisa Neddam
Cloudant Overview Bluemix Meetup from Lisa Neddam
Romeo Kienzler
 
Membase Intro from Membase Meetup San Francisco
Membase Intro from Membase Meetup San FranciscoMembase Intro from Membase Meetup San Francisco
Membase Intro from Membase Meetup San Francisco
Membase
 
Accelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks AutoloaderAccelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks Autoloader
Databricks
 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon
 
Using Kafka to scale database replication
Using Kafka to scale database replicationUsing Kafka to scale database replication
Using Kafka to scale database replication
Venu Ryali
 
Journey to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme GrowthJourney to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme Growth
SingleStore
 
TOUG-Oracle Open World 2013 Recap
TOUG-Oracle Open World 2013 RecapTOUG-Oracle Open World 2013 Recap
TOUG-Oracle Open World 2013 Recap
Toronto-Oracle-Users-Group
 
Building a derived data store using Kafka
Building a derived data store using KafkaBuilding a derived data store using Kafka
Building a derived data store using Kafka
Venu Ryali
 
Real time dashboards with Kafka and Druid
Real time dashboards with Kafka and DruidReal time dashboards with Kafka and Druid
Real time dashboards with Kafka and Druid
Venu Ryali
 
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
HostedbyConfluent
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
HBaseCon
 
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
jaxLondonConference
 
Image Management at Ebates
Image Management at EbatesImage Management at Ebates
Image Management at Ebates
Rakuten Group, Inc.
 
Migrating from Oracle to Postgres
Migrating from Oracle to PostgresMigrating from Oracle to Postgres
Migrating from Oracle to Postgres
EDB
 
Products.intro.forum version
Products.intro.forum versionProducts.intro.forum version
Products.intro.forum version
sqlserver.co.il
 
What's New in Infinispan 6.0
What's New in Infinispan 6.0What's New in Infinispan 6.0
What's New in Infinispan 6.0
JBUG London
 

What's hot (19)

Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Membase Introduction
Membase IntroductionMembase Introduction
Membase Introduction
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
Cloudant Overview Bluemix Meetup from Lisa Neddam
Cloudant Overview Bluemix Meetup from Lisa NeddamCloudant Overview Bluemix Meetup from Lisa Neddam
Cloudant Overview Bluemix Meetup from Lisa Neddam
 
Membase Intro from Membase Meetup San Francisco
Membase Intro from Membase Meetup San FranciscoMembase Intro from Membase Meetup San Francisco
Membase Intro from Membase Meetup San Francisco
 
Accelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks AutoloaderAccelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks Autoloader
 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
 
Using Kafka to scale database replication
Using Kafka to scale database replicationUsing Kafka to scale database replication
Using Kafka to scale database replication
 
Journey to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme GrowthJourney to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme Growth
 
TOUG-Oracle Open World 2013 Recap
TOUG-Oracle Open World 2013 RecapTOUG-Oracle Open World 2013 Recap
TOUG-Oracle Open World 2013 Recap
 
Building a derived data store using Kafka
Building a derived data store using KafkaBuilding a derived data store using Kafka
Building a derived data store using Kafka
 
Real time dashboards with Kafka and Druid
Real time dashboards with Kafka and DruidReal time dashboards with Kafka and Druid
Real time dashboards with Kafka and Druid
 
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
 
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
 
Image Management at Ebates
Image Management at EbatesImage Management at Ebates
Image Management at Ebates
 
Migrating from Oracle to Postgres
Migrating from Oracle to PostgresMigrating from Oracle to Postgres
Migrating from Oracle to Postgres
 
Products.intro.forum version
Products.intro.forum versionProducts.intro.forum version
Products.intro.forum version
 
What's New in Infinispan 6.0
What's New in Infinispan 6.0What's New in Infinispan 6.0
What's New in Infinispan 6.0
 

Similar to ETL with Clustered Columnstore - PASS Summit 2014

01 oracle architecture
01 oracle architecture01 oracle architecture
01 oracle architecture
Smitha Padmanabhan
 
Sql server tips from the field
Sql server tips from the fieldSql server tips from the field
Sql server tips from the field
JoAnna Cheshire
 
T sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powersT sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powers
Shehap Elnagar
 
T sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powersT sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powers
Shehap Elnagar
 
Pl sql best practices document
Pl sql best practices documentPl sql best practices document
Pl sql best practices document
Ashwani Pandey
 
T sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powersT sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powers
Shehap Elnagar
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Dmitry Anoshin
 
Turbocharge SQL Performance in PL/SQL with Bulk Processing
Turbocharge SQL Performance in PL/SQL with Bulk ProcessingTurbocharge SQL Performance in PL/SQL with Bulk Processing
Turbocharge SQL Performance in PL/SQL with Bulk Processing
Steven Feuerstein
 
Ten query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should knowTen query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should know
Kevin Kline
 
KoprowskiT_SQLSaturday409_MaintenancePlansForBeginners
KoprowskiT_SQLSaturday409_MaintenancePlansForBeginnersKoprowskiT_SQLSaturday409_MaintenancePlansForBeginners
KoprowskiT_SQLSaturday409_MaintenancePlansForBeginners
Tobias Koprowski
 
KoprowskiT_SQLSat409_MaintenancePlansForBeginners
KoprowskiT_SQLSat409_MaintenancePlansForBeginnersKoprowskiT_SQLSat409_MaintenancePlansForBeginners
KoprowskiT_SQLSat409_MaintenancePlansForBeginners
Tobias Koprowski
 
ETL with WSO2 Enterprise Middleware Platform
ETL with WSO2 Enterprise Middleware Platform ETL with WSO2 Enterprise Middleware Platform
ETL with WSO2 Enterprise Middleware Platform
WSO2
 
Exploring plsql new features best practices september 2013
Exploring plsql new features best practices   september 2013Exploring plsql new features best practices   september 2013
Exploring plsql new features best practices september 2013
Andrejs Vorobjovs
 
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
Jim Czuprynski
 
High Performance PL/SQL
High Performance PL/SQLHigh Performance PL/SQL
High Performance PL/SQL
Steven Feuerstein
 
Clontab webpage
Clontab webpageClontab webpage
Clontab webpage
freelancehyderabad
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
Hiram Fleitas León
 
Чурюканов Вячеслав, “Code simple, but not simpler”
Чурюканов Вячеслав, “Code simple, but not simpler”Чурюканов Вячеслав, “Code simple, but not simpler”
Чурюканов Вячеслав, “Code simple, but not simpler”
EPAM Systems
 
Open Source Library System Software: Libraries Are Doing it For Themselves
Open Source Library System Software: Libraries Are Doing it For ThemselvesOpen Source Library System Software: Libraries Are Doing it For Themselves
Open Source Library System Software: Libraries Are Doing it For Themselves
loriayre
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Flink Forward
 

Similar to ETL with Clustered Columnstore - PASS Summit 2014 (20)

01 oracle architecture
01 oracle architecture01 oracle architecture
01 oracle architecture
 
Sql server tips from the field
Sql server tips from the fieldSql server tips from the field
Sql server tips from the field
 
T sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powersT sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powers
 
T sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powersT sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powers
 
Pl sql best practices document
Pl sql best practices documentPl sql best practices document
Pl sql best practices document
 
T sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powersT sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powers
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
Turbocharge SQL Performance in PL/SQL with Bulk Processing
Turbocharge SQL Performance in PL/SQL with Bulk ProcessingTurbocharge SQL Performance in PL/SQL with Bulk Processing
Turbocharge SQL Performance in PL/SQL with Bulk Processing
 
Ten query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should knowTen query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should know
 
KoprowskiT_SQLSaturday409_MaintenancePlansForBeginners
KoprowskiT_SQLSaturday409_MaintenancePlansForBeginnersKoprowskiT_SQLSaturday409_MaintenancePlansForBeginners
KoprowskiT_SQLSaturday409_MaintenancePlansForBeginners
 
KoprowskiT_SQLSat409_MaintenancePlansForBeginners
KoprowskiT_SQLSat409_MaintenancePlansForBeginnersKoprowskiT_SQLSat409_MaintenancePlansForBeginners
KoprowskiT_SQLSat409_MaintenancePlansForBeginners
 
ETL with WSO2 Enterprise Middleware Platform
ETL with WSO2 Enterprise Middleware Platform ETL with WSO2 Enterprise Middleware Platform
ETL with WSO2 Enterprise Middleware Platform
 
Exploring plsql new features best practices september 2013
Exploring plsql new features best practices   september 2013Exploring plsql new features best practices   september 2013
Exploring plsql new features best practices september 2013
 
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
 
High Performance PL/SQL
High Performance PL/SQLHigh Performance PL/SQL
High Performance PL/SQL
 
Clontab webpage
Clontab webpageClontab webpage
Clontab webpage
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
Чурюканов Вячеслав, “Code simple, but not simpler”
Чурюканов Вячеслав, “Code simple, but not simpler”Чурюканов Вячеслав, “Code simple, but not simpler”
Чурюканов Вячеслав, “Code simple, but not simpler”
 
Open Source Library System Software: Libraries Are Doing it For Themselves
Open Source Library System Software: Libraries Are Doing it For ThemselvesOpen Source Library System Software: Libraries Are Doing it For Themselves
Open Source Library System Software: Libraries Are Doing it For Themselves
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
 

Recently uploaded

Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
marufrahmanstratejm
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
Data Hops
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 

Recently uploaded (20)

Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 

ETL with Clustered Columnstore - PASS Summit 2014

  • 1. ETL Patterns with Clustered Columnstore Niko Neugebauer, OH22
  • 3. Explore Everything PASS Has to Offer Free SQL Server and BI Web Events Free 1-day Training Events Regional Event Local User Groups Around the World Free Online Technical Training This is Community Business Analytics Training Session Recordings PASS Newsletter
  • 4. Session Evaluations ways to access Go to passsummit.com/evals Download the GuideBook App and search: PASS Summit 2014 Follow the QR code link displayed on session signage throughout the conference venue and in the program guide Submit by 11:59 PM EST Friday Nov. 7 to WIN prizes Your feedback is important and valuable. Evaluation Deadline: 11:59 PM EST, Sunday Nov. 16
  • 5. Niko Neugebauer Microsoft Data Platform Professional OH22(http://www.oh22.net) SQL Server MVP Founder & Co-Founder of 3 Portuguese PASS Chapters In-Memory VC co-lead Blog: http://www.nikoport.com Twitter: @NikoNeugebauer LinkedIn: http://pt.linkedin.com/in/webcaravela Email: info@webcaravela.com
  • 6. Oha! Since this is 400-level session: And so I assume: •You know the Columnstore Structure Elements, such as Row Groups, Delta-Stores, & Deleted Bitmaps. •You do understand Segment Elimination. •You have heard about Tuple Mover.  •You know what Columnstore Dictionary is, what types of dictionaries do exist and why. •You know what a Batch Mode is.
  • 7. Clustered Columnstore: •Delta-Stores (open & close) •Deleted Bitmap •Update work as a DELETE + INSERT
  • 8. General Thoughts: Working with CCI requires a lot of resources, and so you might want/need to: •Use Resource Governor –control memory grants and impact on CPU & IO. (You should consider capping the number of cores & memorygiven to Columnstore.) •Use Partitioning–it helps you to control the impact.
  • 9. ETL, ELT, LTE –My Today’s Agenda •E–ExtractData from Columnstore •T–Transform(Maintain) •L–Load data into Columnstore
  • 10. Extract •Use Batch Processing, unless you prove that it does not help (You will need DOP>=2) Ω •Avoid Delta Stores (Force Tuple Mover to close & compress them with Reorganize with (CLOSE_ALL_DELTA_STORES))Ω
  • 11. Consider Example of this Column:
  • 12. Extraction with Segment Clustering •Use Column Cluserting (You will need 1 to keep it perfect (ref: 2014 RTM + CU2)) Ω •Use Partitioning: it helps Column Clustering Rebuilds with DOP = 1 (Resource & impact optimization)
  • 13. Extract continued CCI Rebuild is a half-online, this means that you can read data but writing new info is not available. •Use SELECT INTO, which became parallel execution plan in SQL Server 2014
  • 14. Transform (Maintain) •Do notrebuild CCI unless you really need it. •Columnstore is not a Rowstore. •What is the percentage of your data that generally became deleted? •What is the improvement that you are looking for ? Ω
  • 15. Transform (Maintain) Rebuild if you have a good window and need to realign column clustering. If Rebuilding frequently, use partitioning –it helps you to minimize impact and control better Create your own Columnstore Resource Group in Resource Governor Ω
  • 16. Load •Use Resource Governor •Use Partitioning(especially for Switching In) •Always use Batch Processing, unless proven that you don’t need it (You will need DOP>=2) •Use BULK Load API (with more than 102.400 rows) •Avoid using open Delta-Stores •Use Parallel T-SQL (SELECT INTO)
  • 18. Load Scenarios Things to bear in mind: Regarding Dictionaries: •Global Dictionaries are created only on the first load/build and than they are never updated •The Sampling is Max (1 Million Rows, 1%) •Single Threaded
  • 19. Load Scenarios Global Dictionaries are not your Global Saviours: •You need to have similar data in your column in order to get their usage •You might actually get much more compact & efficient Local Dictionaries
  • 22. Links: My blog series on ColumnstoreIndexes (40+ Blogposts): •http://www.nikoport.com/columnstore/ Remus RusanuIntroduction for Clustered Columnstore: •http://rusanu.com/2013/06/11/sql-server-clustered- columnstore-indexes-at-teched-2013/ White Paper on the Clustered Columnstore: •http://research.microsoft.com/pubs/193599/Apollo3%20- %20Sigmod%202013%20-%20final.pdf
  • 23. Connect Items Implement Batch Mode Support for Row Store: •https://connect.microsoft.com/SQLServer/Feedback/Details/ 938021 Multi-threaded rebuilds of Clustered ColumnstoreIndexes break the sequence of pre-sorted segment ordering (Order Clustering): •https://connect.microsoft.com/SQLServer/Feedback/Details/912452 ColumnstoreSegments Maintenance –Remove & Merge: •https://connect.microsoft.com/SQLServer/Feedback/Details/930664