SlideShare a Scribd company logo
1 of 23
Download to read offline
ETL Patterns 
with Clustered Columnstore 
Niko Neugebauer, OH22
Please silence cell phones
Explore Everything PASS Has to Offer 
Free SQL Server and BI Web Events 
Free 1-day Training Events 
Regional Event 
Local User Groups Around the World 
Free Online Technical Training 
This is Community 
Business Analytics Training 
Session Recordings 
PASS Newsletter
Session Evaluations 
ways to access 
Go to 
passsummit.com/evals 
Download the GuideBook App 
and search: PASS Summit 2014 
Follow the QR code link displayed 
on session signage throughout the 
conference venue and in the 
program guide 
Submit by 11:59 PM EST 
Friday Nov. 7 to 
WIN prizes 
Your feedback is 
important and valuable. 
Evaluation Deadline: 
11:59 PM EST, Sunday Nov. 16
Niko Neugebauer 
Microsoft Data Platform Professional 
OH22(http://www.oh22.net) 
SQL Server MVP 
Founder & Co-Founder of 3 Portuguese PASS Chapters 
In-Memory VC co-lead 
Blog: http://www.nikoport.com 
Twitter: @NikoNeugebauer 
LinkedIn: http://pt.linkedin.com/in/webcaravela 
Email: info@webcaravela.com
Oha! Since this is 400-level session: 
And so I assume: 
•You know the Columnstore Structure Elements, such as Row Groups, Delta-Stores, & Deleted Bitmaps. 
•You do understand Segment Elimination. 
•You have heard about Tuple Mover.  
•You know what Columnstore Dictionary is, what types of dictionaries do exist and why. 
•You know what a Batch Mode is.
Clustered Columnstore: 
•Delta-Stores (open & close) 
•Deleted Bitmap 
•Update work as a 
DELETE + INSERT
General Thoughts: 
Working with CCI requires a lot of resources, and so you might want/need to: 
•Use Resource Governor –control memory grants and impact on CPU & IO. (You should consider capping the number of cores & memorygiven to Columnstore.) 
•Use Partitioning–it helps you to control the impact.
ETL, ELT, LTE –My Today’s Agenda 
•E–ExtractData from Columnstore 
•T–Transform(Maintain) 
•L–Load data into Columnstore
Extract 
•Use Batch Processing, unless you prove that it does not help (You will need DOP>=2) Ω 
•Avoid Delta Stores (Force Tuple Mover to close & compress them with Reorganize with (CLOSE_ALL_DELTA_STORES))Ω
Consider Example of this Column:
Extraction with Segment Clustering 
•Use Column Cluserting (You will need 1 to keep it perfect (ref: 2014 RTM + CU2)) Ω 
•Use Partitioning: 
it helps Column Clustering Rebuilds with DOP = 1 (Resource & impact optimization)
Extract continued 
CCI Rebuild is a half-online, this means that you can read data but writing new info is not available. 
•Use SELECT INTO, which became parallel execution plan in SQL Server 2014
Transform (Maintain) 
•Do notrebuild CCI unless you really need it. 
•Columnstore is not a Rowstore. 
•What is the percentage of your 
data that generally became deleted? 
•What is the improvement that you are looking for ? Ω
Transform (Maintain) 
Rebuild if you have a good window and need to realign column clustering. 
If Rebuilding frequently, use partitioning –it helps you to minimize impact and control better 
Create your own Columnstore Resource Group in Resource Governor Ω
Load 
•Use Resource Governor 
•Use Partitioning(especially for Switching In) 
•Always use Batch Processing, unless proven that you don’t need it (You will need DOP>=2) 
•Use BULK Load API (with more than 102.400 rows) 
•Avoid using open Delta-Stores 
•Use Parallel T-SQL (SELECT INTO)
Load Strategies 
ColumnstoreDemo
Load Scenarios 
Things to bear in mind: 
Regarding Dictionaries: 
•Global Dictionaries are created only on the first load/build and than they are never updated 
•The Sampling is Max (1 Million Rows, 1%) 
•Single Threaded
Load Scenarios 
Global Dictionaries are not your Global Saviours: 
•You need to have similar data in your column in order to get their usage 
•You might actually get much more compact & efficient Local Dictionaries
Questions?
Thank you!
Links: 
My blog series on ColumnstoreIndexes (40+ Blogposts): 
•http://www.nikoport.com/columnstore/ 
Remus RusanuIntroduction for Clustered Columnstore: 
•http://rusanu.com/2013/06/11/sql-server-clustered- columnstore-indexes-at-teched-2013/ 
White Paper on the Clustered Columnstore: 
•http://research.microsoft.com/pubs/193599/Apollo3%20- %20Sigmod%202013%20-%20final.pdf
Connect Items 
Implement Batch Mode Support for Row Store: 
•https://connect.microsoft.com/SQLServer/Feedback/Details/ 938021 
Multi-threaded rebuilds of Clustered ColumnstoreIndexes break the sequence of pre-sorted segment ordering (Order Clustering): 
•https://connect.microsoft.com/SQLServer/Feedback/Details/912452 
ColumnstoreSegments Maintenance –Remove & Merge: 
•https://connect.microsoft.com/SQLServer/Feedback/Details/930664

More Related Content

What's hot

Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 
Membase Introduction
Membase IntroductionMembase Introduction
Membase IntroductionMembase
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon
 
Cloudant Overview Bluemix Meetup from Lisa Neddam
Cloudant Overview Bluemix Meetup from Lisa NeddamCloudant Overview Bluemix Meetup from Lisa Neddam
Cloudant Overview Bluemix Meetup from Lisa NeddamRomeo Kienzler
 
Membase Intro from Membase Meetup San Francisco
Membase Intro from Membase Meetup San FranciscoMembase Intro from Membase Meetup San Francisco
Membase Intro from Membase Meetup San FranciscoMembase
 
Accelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks AutoloaderAccelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks AutoloaderDatabricks
 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon
 
Using Kafka to scale database replication
Using Kafka to scale database replicationUsing Kafka to scale database replication
Using Kafka to scale database replicationVenu Ryali
 
Journey to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme GrowthJourney to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme GrowthSingleStore
 
Building a derived data store using Kafka
Building a derived data store using KafkaBuilding a derived data store using Kafka
Building a derived data store using KafkaVenu Ryali
 
Real time dashboards with Kafka and Druid
Real time dashboards with Kafka and DruidReal time dashboards with Kafka and Druid
Real time dashboards with Kafka and DruidVenu Ryali
 
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...HostedbyConfluent
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon
 
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...jaxLondonConference
 
Migrating from Oracle to Postgres
Migrating from Oracle to PostgresMigrating from Oracle to Postgres
Migrating from Oracle to PostgresEDB
 
Products.intro.forum version
Products.intro.forum versionProducts.intro.forum version
Products.intro.forum versionsqlserver.co.il
 
What's New in Infinispan 6.0
What's New in Infinispan 6.0What's New in Infinispan 6.0
What's New in Infinispan 6.0JBUG London
 

What's hot (19)

Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Membase Introduction
Membase IntroductionMembase Introduction
Membase Introduction
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
Cloudant Overview Bluemix Meetup from Lisa Neddam
Cloudant Overview Bluemix Meetup from Lisa NeddamCloudant Overview Bluemix Meetup from Lisa Neddam
Cloudant Overview Bluemix Meetup from Lisa Neddam
 
Membase Intro from Membase Meetup San Francisco
Membase Intro from Membase Meetup San FranciscoMembase Intro from Membase Meetup San Francisco
Membase Intro from Membase Meetup San Francisco
 
Accelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks AutoloaderAccelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks Autoloader
 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
 
Using Kafka to scale database replication
Using Kafka to scale database replicationUsing Kafka to scale database replication
Using Kafka to scale database replication
 
Journey to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme GrowthJourney to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme Growth
 
TOUG-Oracle Open World 2013 Recap
TOUG-Oracle Open World 2013 RecapTOUG-Oracle Open World 2013 Recap
TOUG-Oracle Open World 2013 Recap
 
Building a derived data store using Kafka
Building a derived data store using KafkaBuilding a derived data store using Kafka
Building a derived data store using Kafka
 
Real time dashboards with Kafka and Druid
Real time dashboards with Kafka and DruidReal time dashboards with Kafka and Druid
Real time dashboards with Kafka and Druid
 
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
 
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
 
Image Management at Ebates
Image Management at EbatesImage Management at Ebates
Image Management at Ebates
 
Migrating from Oracle to Postgres
Migrating from Oracle to PostgresMigrating from Oracle to Postgres
Migrating from Oracle to Postgres
 
Products.intro.forum version
Products.intro.forum versionProducts.intro.forum version
Products.intro.forum version
 
What's New in Infinispan 6.0
What's New in Infinispan 6.0What's New in Infinispan 6.0
What's New in Infinispan 6.0
 

Similar to ETL with Clustered Columnstore - PASS Summit 2014

Sql server tips from the field
Sql server tips from the fieldSql server tips from the field
Sql server tips from the fieldJoAnna Cheshire
 
T sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powersT sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powersShehap Elnagar
 
T sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powersT sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powersShehap Elnagar
 
Pl sql best practices document
Pl sql best practices documentPl sql best practices document
Pl sql best practices documentAshwani Pandey
 
T sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powersT sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powersShehap Elnagar
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
 
Turbocharge SQL Performance in PL/SQL with Bulk Processing
Turbocharge SQL Performance in PL/SQL with Bulk ProcessingTurbocharge SQL Performance in PL/SQL with Bulk Processing
Turbocharge SQL Performance in PL/SQL with Bulk ProcessingSteven Feuerstein
 
Ten query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should knowTen query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should knowKevin Kline
 
KoprowskiT_SQLSat409_MaintenancePlansForBeginners
KoprowskiT_SQLSat409_MaintenancePlansForBeginnersKoprowskiT_SQLSat409_MaintenancePlansForBeginners
KoprowskiT_SQLSat409_MaintenancePlansForBeginnersTobias Koprowski
 
KoprowskiT_SQLSaturday409_MaintenancePlansForBeginners
KoprowskiT_SQLSaturday409_MaintenancePlansForBeginnersKoprowskiT_SQLSaturday409_MaintenancePlansForBeginners
KoprowskiT_SQLSaturday409_MaintenancePlansForBeginnersTobias Koprowski
 
ETL with WSO2 Enterprise Middleware Platform
ETL with WSO2 Enterprise Middleware Platform ETL with WSO2 Enterprise Middleware Platform
ETL with WSO2 Enterprise Middleware Platform WSO2
 
Exploring plsql new features best practices september 2013
Exploring plsql new features best practices   september 2013Exploring plsql new features best practices   september 2013
Exploring plsql new features best practices september 2013Andrejs Vorobjovs
 
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?Jim Czuprynski
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_SummaryHiram Fleitas León
 
Чурюканов Вячеслав, “Code simple, but not simpler”
Чурюканов Вячеслав, “Code simple, but not simpler”Чурюканов Вячеслав, “Code simple, but not simpler”
Чурюканов Вячеслав, “Code simple, but not simpler”EPAM Systems
 
Open Source Library System Software: Libraries Are Doing it For Themselves
Open Source Library System Software: Libraries Are Doing it For ThemselvesOpen Source Library System Software: Libraries Are Doing it For Themselves
Open Source Library System Software: Libraries Are Doing it For Themselvesloriayre
 
Flink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systemsFlink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systemsBowen Li
 

Similar to ETL with Clustered Columnstore - PASS Summit 2014 (20)

01 oracle architecture
01 oracle architecture01 oracle architecture
01 oracle architecture
 
Sql server tips from the field
Sql server tips from the fieldSql server tips from the field
Sql server tips from the field
 
T sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powersT sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powers
 
T sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powersT sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powers
 
Pl sql best practices document
Pl sql best practices documentPl sql best practices document
Pl sql best practices document
 
T sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powersT sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powers
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
Turbocharge SQL Performance in PL/SQL with Bulk Processing
Turbocharge SQL Performance in PL/SQL with Bulk ProcessingTurbocharge SQL Performance in PL/SQL with Bulk Processing
Turbocharge SQL Performance in PL/SQL with Bulk Processing
 
Ten query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should knowTen query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should know
 
KoprowskiT_SQLSat409_MaintenancePlansForBeginners
KoprowskiT_SQLSat409_MaintenancePlansForBeginnersKoprowskiT_SQLSat409_MaintenancePlansForBeginners
KoprowskiT_SQLSat409_MaintenancePlansForBeginners
 
KoprowskiT_SQLSaturday409_MaintenancePlansForBeginners
KoprowskiT_SQLSaturday409_MaintenancePlansForBeginnersKoprowskiT_SQLSaturday409_MaintenancePlansForBeginners
KoprowskiT_SQLSaturday409_MaintenancePlansForBeginners
 
ETL with WSO2 Enterprise Middleware Platform
ETL with WSO2 Enterprise Middleware Platform ETL with WSO2 Enterprise Middleware Platform
ETL with WSO2 Enterprise Middleware Platform
 
Exploring plsql new features best practices september 2013
Exploring plsql new features best practices   september 2013Exploring plsql new features best practices   september 2013
Exploring plsql new features best practices september 2013
 
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
 
High Performance PL/SQL
High Performance PL/SQLHigh Performance PL/SQL
High Performance PL/SQL
 
Clontab webpage
Clontab webpageClontab webpage
Clontab webpage
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
Чурюканов Вячеслав, “Code simple, but not simpler”
Чурюканов Вячеслав, “Code simple, but not simpler”Чурюканов Вячеслав, “Code simple, but not simpler”
Чурюканов Вячеслав, “Code simple, but not simpler”
 
Open Source Library System Software: Libraries Are Doing it For Themselves
Open Source Library System Software: Libraries Are Doing it For ThemselvesOpen Source Library System Software: Libraries Are Doing it For Themselves
Open Source Library System Software: Libraries Are Doing it For Themselves
 
Flink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systemsFlink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systems
 

Recently uploaded

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

ETL with Clustered Columnstore - PASS Summit 2014

  • 1. ETL Patterns with Clustered Columnstore Niko Neugebauer, OH22
  • 3. Explore Everything PASS Has to Offer Free SQL Server and BI Web Events Free 1-day Training Events Regional Event Local User Groups Around the World Free Online Technical Training This is Community Business Analytics Training Session Recordings PASS Newsletter
  • 4. Session Evaluations ways to access Go to passsummit.com/evals Download the GuideBook App and search: PASS Summit 2014 Follow the QR code link displayed on session signage throughout the conference venue and in the program guide Submit by 11:59 PM EST Friday Nov. 7 to WIN prizes Your feedback is important and valuable. Evaluation Deadline: 11:59 PM EST, Sunday Nov. 16
  • 5. Niko Neugebauer Microsoft Data Platform Professional OH22(http://www.oh22.net) SQL Server MVP Founder & Co-Founder of 3 Portuguese PASS Chapters In-Memory VC co-lead Blog: http://www.nikoport.com Twitter: @NikoNeugebauer LinkedIn: http://pt.linkedin.com/in/webcaravela Email: info@webcaravela.com
  • 6. Oha! Since this is 400-level session: And so I assume: •You know the Columnstore Structure Elements, such as Row Groups, Delta-Stores, & Deleted Bitmaps. •You do understand Segment Elimination. •You have heard about Tuple Mover.  •You know what Columnstore Dictionary is, what types of dictionaries do exist and why. •You know what a Batch Mode is.
  • 7. Clustered Columnstore: •Delta-Stores (open & close) •Deleted Bitmap •Update work as a DELETE + INSERT
  • 8. General Thoughts: Working with CCI requires a lot of resources, and so you might want/need to: •Use Resource Governor –control memory grants and impact on CPU & IO. (You should consider capping the number of cores & memorygiven to Columnstore.) •Use Partitioning–it helps you to control the impact.
  • 9. ETL, ELT, LTE –My Today’s Agenda •E–ExtractData from Columnstore •T–Transform(Maintain) •L–Load data into Columnstore
  • 10. Extract •Use Batch Processing, unless you prove that it does not help (You will need DOP>=2) Ω •Avoid Delta Stores (Force Tuple Mover to close & compress them with Reorganize with (CLOSE_ALL_DELTA_STORES))Ω
  • 11. Consider Example of this Column:
  • 12. Extraction with Segment Clustering •Use Column Cluserting (You will need 1 to keep it perfect (ref: 2014 RTM + CU2)) Ω •Use Partitioning: it helps Column Clustering Rebuilds with DOP = 1 (Resource & impact optimization)
  • 13. Extract continued CCI Rebuild is a half-online, this means that you can read data but writing new info is not available. •Use SELECT INTO, which became parallel execution plan in SQL Server 2014
  • 14. Transform (Maintain) •Do notrebuild CCI unless you really need it. •Columnstore is not a Rowstore. •What is the percentage of your data that generally became deleted? •What is the improvement that you are looking for ? Ω
  • 15. Transform (Maintain) Rebuild if you have a good window and need to realign column clustering. If Rebuilding frequently, use partitioning –it helps you to minimize impact and control better Create your own Columnstore Resource Group in Resource Governor Ω
  • 16. Load •Use Resource Governor •Use Partitioning(especially for Switching In) •Always use Batch Processing, unless proven that you don’t need it (You will need DOP>=2) •Use BULK Load API (with more than 102.400 rows) •Avoid using open Delta-Stores •Use Parallel T-SQL (SELECT INTO)
  • 18. Load Scenarios Things to bear in mind: Regarding Dictionaries: •Global Dictionaries are created only on the first load/build and than they are never updated •The Sampling is Max (1 Million Rows, 1%) •Single Threaded
  • 19. Load Scenarios Global Dictionaries are not your Global Saviours: •You need to have similar data in your column in order to get their usage •You might actually get much more compact & efficient Local Dictionaries
  • 22. Links: My blog series on ColumnstoreIndexes (40+ Blogposts): •http://www.nikoport.com/columnstore/ Remus RusanuIntroduction for Clustered Columnstore: •http://rusanu.com/2013/06/11/sql-server-clustered- columnstore-indexes-at-teched-2013/ White Paper on the Clustered Columnstore: •http://research.microsoft.com/pubs/193599/Apollo3%20- %20Sigmod%202013%20-%20final.pdf
  • 23. Connect Items Implement Batch Mode Support for Row Store: •https://connect.microsoft.com/SQLServer/Feedback/Details/ 938021 Multi-threaded rebuilds of Clustered ColumnstoreIndexes break the sequence of pre-sorted segment ordering (Order Clustering): •https://connect.microsoft.com/SQLServer/Feedback/Details/912452 ColumnstoreSegments Maintenance –Remove & Merge: •https://connect.microsoft.com/SQLServer/Feedback/Details/930664