This document provides an overview of performance tuning and diagnosing hard problems in SQL Server. It discusses tools like Xperf and PerfView that can be used to analyze performance bottlenecks. Specific techniques discussed include using indexes, avoiding unnecessary data conversions and padding, and understanding how query plans like merge and hash joins work at a low level. The document demonstrates how to optimize queries through better data modeling, indexing, and rewriting queries to avoid expensive operations.
This presentation will give you Information about :
1. What is Hadoop,
2. History of Hadoop,
3. Building Blocks – Hadoop Eco-System,
4. Who is behind Hadoop?,
5. What Hadoop is good for and why it is Good?,
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Uwe Printz
Talk held at the Java User Group on 05.09.2013 in Novi Sad, Serbia
Agenda:
- What is Big Data & Hadoop?
- Core Hadoop
- The Hadoop Ecosystem
- Use Cases
- What‘s next? Hadoop 2.0!
EclipseCon Keynote: Apache Hadoop - An IntroductionCloudera, Inc.
Todd Lipcon explains why you should be interested in Apache Hadoop, what it is, and how it works. Todd also brings to light the Hadoop ecosystem and real business use cases that evolve around Hadoop and the ecosystem.
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Simplilearn
This presentation about Hadoop training will help you understand the need for Hadoop, what is Hadoop and concepts including Hadoop ecosystem, Hadoop features, how HDFS works, what is MapReduce and how YARN works. Finally, we will implement a banking case study using Hadoop. To solve the issue of rapidly increasing data, we need big data technologies such as Hadoop, Spark, Storm, Cassandra and many more. Hadoop can store and process vast volumes of data. You will understand the architecture of HDFS, MapReduce workflow and the architecture of YARN. In the demo, you will learn in detail on how to export data from RDBMS (MySQL) into HDFS using Sqoop commands. Now, let us get started and gain expertise with Hadoop training video.
Below topics are explained in this Hadoop training presentation:
1. Need for Hadoop
2. What is Hadoop
3. Hadoop ecosystem
4. Hadoop features
5. What is HDFS
6. What is MapReduce
7. What is YARN
8. Bank case study
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
Hadoop is an open source software framework that supports data-intensive distributed applications. Hadoop is licensed under the Apache v2 license. It is therefore generally known as Apache Hadoop. Hadoop has been developed, based on a paper originally written by Google on MapReduce system and applies concepts of functional programming. Hadoop is written in the Java programming language and is the highest-level Apache project being constructed and used by a global community of contributors. Hadoop was developed by Doug Cutting and Michael J. Cafarella. And just don't overlook the charming yellow elephant you see, which is basically named after Doug's son's toy elephant!
The topics covered in presentation are:
1. Big Data Learning Path
2.Big Data Introduction
3. Hadoop and its Eco-system
4.Hadoop Architecture
5.Next Step on how to setup Hadoop
This presentation will give you Information about :
1. What is Hadoop,
2. History of Hadoop,
3. Building Blocks – Hadoop Eco-System,
4. Who is behind Hadoop?,
5. What Hadoop is good for and why it is Good?,
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Uwe Printz
Talk held at the Java User Group on 05.09.2013 in Novi Sad, Serbia
Agenda:
- What is Big Data & Hadoop?
- Core Hadoop
- The Hadoop Ecosystem
- Use Cases
- What‘s next? Hadoop 2.0!
EclipseCon Keynote: Apache Hadoop - An IntroductionCloudera, Inc.
Todd Lipcon explains why you should be interested in Apache Hadoop, what it is, and how it works. Todd also brings to light the Hadoop ecosystem and real business use cases that evolve around Hadoop and the ecosystem.
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Simplilearn
This presentation about Hadoop training will help you understand the need for Hadoop, what is Hadoop and concepts including Hadoop ecosystem, Hadoop features, how HDFS works, what is MapReduce and how YARN works. Finally, we will implement a banking case study using Hadoop. To solve the issue of rapidly increasing data, we need big data technologies such as Hadoop, Spark, Storm, Cassandra and many more. Hadoop can store and process vast volumes of data. You will understand the architecture of HDFS, MapReduce workflow and the architecture of YARN. In the demo, you will learn in detail on how to export data from RDBMS (MySQL) into HDFS using Sqoop commands. Now, let us get started and gain expertise with Hadoop training video.
Below topics are explained in this Hadoop training presentation:
1. Need for Hadoop
2. What is Hadoop
3. Hadoop ecosystem
4. Hadoop features
5. What is HDFS
6. What is MapReduce
7. What is YARN
8. Bank case study
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
Hadoop is an open source software framework that supports data-intensive distributed applications. Hadoop is licensed under the Apache v2 license. It is therefore generally known as Apache Hadoop. Hadoop has been developed, based on a paper originally written by Google on MapReduce system and applies concepts of functional programming. Hadoop is written in the Java programming language and is the highest-level Apache project being constructed and used by a global community of contributors. Hadoop was developed by Doug Cutting and Michael J. Cafarella. And just don't overlook the charming yellow elephant you see, which is basically named after Doug's son's toy elephant!
The topics covered in presentation are:
1. Big Data Learning Path
2.Big Data Introduction
3. Hadoop and its Eco-system
4.Hadoop Architecture
5.Next Step on how to setup Hadoop
Big Data Warehousing: Pig vs. Hive ComparisonCaserta
In a recent Big Data Warehousing Meetup in NYC, Caserta Concepts partnered with Datameer to explore big data analytics techniques. In the presentation, we made a Hive vs. Pig Comparison. For more information on our services or this presentation, please visit www.casertaconcepts.com or contact us at info (at) casertaconcepts.com.
http://www.casertaconcepts.com
Hadoop Training is cover Hadoop Administration training and Hadoop developer by Keylabs. we provide best Hadoop classroom & online-training in Hyderabad&Bangalore.
http://www.keylabstraining.com/hadoop-online-training-hyderabad-bangalore
Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment.
hadoop training, hadoop online training, hadoop training in bangalore, hadoop training in hyderabad, best hadoop training institutes, hadoop online training in chicago, hadoop training in mumbai, hadoop training in pune, hadoop training institutes ameerpet
These are slides from a lecture given at the UC Berkeley School of Information for the Analyzing Big Data with Twitter class. A video of the talk can be found at http://blogs.ischool.berkeley.edu/i290-abdt-s12/2012/08/31/video-lecture-posted-intro-to-hadoop/
So you want to get started with Hadoop, but how. This session will show you how to get started with Hadoop development using Pig. Prior Hadoop experience is not needed.
Thursday, May 8th, 02:00pm-02:50pm
Big Data and New Challenges for DBAs (Michael Naumov, LivePerson)
Hadoop has become a popular platform for managing large datasets of structured and unstructured data. It does not replace existing infrastructures, but instead augments them. Most companies will still use relational databases for transactional processing and low-latency queries, but can benefit from Hadoop for reporting, machine learning or ETL. This session will cover:
What is Hadoop and why do I care?
What do people do with Hadoop?
How can SQL Server DBAs add Hadoop to their architecture?
The cloud reduces the barrier to entry for many small and medium size enterprises into analytics. Hadoop and related frameworks like Hive, Oozie, Sqoop are becoming tools of choice for deriving insights from data. However, these frameworks were designed for in-house datacenters which have different tradeoffs from a cloud environment and making them run well in the cloud presents some challenges. In this talk, we describe how we've extended Hadoop and Hive to exploit these new tradeoffs and offer them as part of the Qubole Data Service (QDS). We will also present use-cases that show how QDS is making it extremely easy for an end user to use these technologies in the cloud.
Speaker: Ashish Thusoo, CEO, Qubole
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...NoSQLmatters
Simon Elliston Ball – When to NoSQL and When to Know SQL
With NoSQL, NewSQL and plain old SQL, there are so many tools around it’s not always clear which is the right one for the job.This is a look at a series of NoSQL technologies, comparing them against traditional SQL technology. I’ll compare real use cases and show how they are solved with both NoSQL options, and traditional SQL servers, and then see who wins. We’ll look at some code and architecture examples that fit a variety of NoSQL techniques, and some where SQL is a better answer. We’ll see some big data problems, little data problems, and a bunch of new and old database technologies to find whatever it takes to solve the problem.By the end you’ll hopefully know more NoSQL, and maybe even have a few new tricks with SQL, and what’s more how to choose the right tool for the job.
Big Data Warehousing: Pig vs. Hive ComparisonCaserta
In a recent Big Data Warehousing Meetup in NYC, Caserta Concepts partnered with Datameer to explore big data analytics techniques. In the presentation, we made a Hive vs. Pig Comparison. For more information on our services or this presentation, please visit www.casertaconcepts.com or contact us at info (at) casertaconcepts.com.
http://www.casertaconcepts.com
Hadoop Training is cover Hadoop Administration training and Hadoop developer by Keylabs. we provide best Hadoop classroom & online-training in Hyderabad&Bangalore.
http://www.keylabstraining.com/hadoop-online-training-hyderabad-bangalore
Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment.
hadoop training, hadoop online training, hadoop training in bangalore, hadoop training in hyderabad, best hadoop training institutes, hadoop online training in chicago, hadoop training in mumbai, hadoop training in pune, hadoop training institutes ameerpet
These are slides from a lecture given at the UC Berkeley School of Information for the Analyzing Big Data with Twitter class. A video of the talk can be found at http://blogs.ischool.berkeley.edu/i290-abdt-s12/2012/08/31/video-lecture-posted-intro-to-hadoop/
So you want to get started with Hadoop, but how. This session will show you how to get started with Hadoop development using Pig. Prior Hadoop experience is not needed.
Thursday, May 8th, 02:00pm-02:50pm
Big Data and New Challenges for DBAs (Michael Naumov, LivePerson)
Hadoop has become a popular platform for managing large datasets of structured and unstructured data. It does not replace existing infrastructures, but instead augments them. Most companies will still use relational databases for transactional processing and low-latency queries, but can benefit from Hadoop for reporting, machine learning or ETL. This session will cover:
What is Hadoop and why do I care?
What do people do with Hadoop?
How can SQL Server DBAs add Hadoop to their architecture?
The cloud reduces the barrier to entry for many small and medium size enterprises into analytics. Hadoop and related frameworks like Hive, Oozie, Sqoop are becoming tools of choice for deriving insights from data. However, these frameworks were designed for in-house datacenters which have different tradeoffs from a cloud environment and making them run well in the cloud presents some challenges. In this talk, we describe how we've extended Hadoop and Hive to exploit these new tradeoffs and offer them as part of the Qubole Data Service (QDS). We will also present use-cases that show how QDS is making it extremely easy for an end user to use these technologies in the cloud.
Speaker: Ashish Thusoo, CEO, Qubole
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...NoSQLmatters
Simon Elliston Ball – When to NoSQL and When to Know SQL
With NoSQL, NewSQL and plain old SQL, there are so many tools around it’s not always clear which is the right one for the job.This is a look at a series of NoSQL technologies, comparing them against traditional SQL technology. I’ll compare real use cases and show how they are solved with both NoSQL options, and traditional SQL servers, and then see who wins. We’ll look at some code and architecture examples that fit a variety of NoSQL techniques, and some where SQL is a better answer. We’ll see some big data problems, little data problems, and a bunch of new and old database technologies to find whatever it takes to solve the problem.By the end you’ll hopefully know more NoSQL, and maybe even have a few new tricks with SQL, and what’s more how to choose the right tool for the job.
ORC File and Vectorization - Hadoop Summit 2013Owen O'Malley
Eric Hanson and I gave this presentation at Hadoop Summit 2013:
Hive’s RCFile has been the standard format for storing Hive data for the last 3 years. However, RCFile has limitations because it treats each column as a binary blob without semantics. Hive 0.11 added a new file format named Optimized Row Columnar (ORC) file that uses and retains the type information from the table definition. ORC uses type specific readers and writers that provide light weight compression techniques such as dictionary encoding, bit packing, delta encoding, and run length encoding — resulting in dramatically smaller files. Additionally, ORC can apply generic compression using zlib, LZO, or Snappy on top of the lightweight compression for even smaller files. However, storage savings are only part of the gain. ORC supports projection, which selects subsets of the columns for reading, so that queries reading only one column read only the required bytes. Furthermore, ORC files include light weight indexes that include the minimum and maximum values for each column in each set of 10,000 rows and the entire file. Using pushdown filters from Hive, the file reader can skip entire sets of rows that aren’t important for this query.
Columnar storage formats like ORC reduce I/O and storage use, but it’s just as important to reduce CPU usage. A technical breakthrough called vectorized query execution works nicely with column store formats to do this. Vectorized query execution has proven to give dramatic performance speedups, on the order of 10X to 100X, for structured data processing. We describe how we’re adding vectorized query execution to Hive, coupling it with ORC with a vectorized iterator.
ORC File & Vectorization - Improving Hive Data Storage and Query PerformanceDataWorks Summit
Hive’s RCFile has been the standard format for storing Hive data for the last 3 years. However, RCFile has limitations because it treats each column as a binary blob without semantics. The upcoming Hive 0.11 will add a new file format named Optimized Row Columnar (ORC) file that uses and retains the type information from the table definition. ORC uses type specific readers and writers that provide light weight compression techniques such as dictionary encoding, bit packing, delta encoding, and run length encoding — resulting in dramatically smaller files. Additionally, ORC can apply generic compression using zlib, LZO, or Snappy on top of the lightweight compression for even smaller files. However, storage savings are only part of the gain. ORC supports projection, which selects subsets of the columns for reading, so that queries reading only one column read only the required bytes. Furthermore, ORC files include light weight indexes that include the minimum and maximum values for each column in each set of 10,000 rows and the entire file. Using pushdown filters from Hive, the file reader can skip entire sets of rows that aren’t important for this query.
Columnar storage formats like ORC reduce I/O and storage use, but it’s just as important to reduce CPU usage. A technical breakthrough called vectorized query execution works nicely with column store formats to do this. Vectorized query execution has proven to give dramatic performance speedups, on the order of 10X to 100X, for structured data processing. We describe how we’re adding vectorized query execution to Hive, coupling it with ORC with a vectorized iterator.
Learn from the author of SQLTXPLAIN the fundamentals of SQL Tuning: 1) Diagnostics Collection; 2) Root Cause Analysis (RCA); and 3) Remediation.
SQL Tuning is a complex and intimidating area of knowledge, and it requires years of frequent practice to master it. Nevertheless, there are some concepts and practices that are fundamental to succeed. From basic understanding of the Cost-based Optimizer (CBO) and the Execution Plans, to more advance topics such as Plan Stability and the caveats of using SQL Profiles and SQL Plan Baselines, this session is full of advice and experience sharing. Learn what works and what doesn't when it comes to SQL Tuning.
Participants of this session will also learn about several free tools (besides SQLTXPLAIN) that can be used to diagnose a SQL statement performing poorly, and some others to improve Execution Plan Stability.
Either if your are a novice DBA, or an experienced DBA or Developer, there will be something new for you on this session. And if this is your first encounter with SQL Tuning, at least you will learn the basic concepts and steps to succeed in your endeavor.
An introduction to the different types of NoSQL and some guidance on when to choose them, and when to use plain old SQL. Focuses on developer productivity, intuitive code, and system issues including scaling and usage patterns. As delivered at JavaOne 2014 in San Francisco
Building a Complex, Real-Time Data Management ApplicationJonathan Katz
Congratulations: you've been selected to build an application that will manage whether or not the rooms for PGConf.EU are being occupied by a session!
On the surface, this sounds simple, but we will be managing the rooms of PGConf.EU, so we know that a lot of people will be accessing the system. Therefore, we need to ensure that the system can handle all of the eager users that will be flooding the PGConf.EU website checking to see what availability each of the PGConf.EU rooms has.
To do this, we will explore the following PGConf.EU features:
* Data types and their functionality, such as:
* Data/Time types
* Ranges
Indexes such as:
* GiST
* SP-Gist
* Common Table Expressions and Recursion
* Set generating functions and LATERAL queries
* Functions and the PL/PGSQL
* Triggers
* Logical decoding and streaming
We will be writing our application primary with SQL, though we will sneak in a little bit of Python and using Kafka to demonstrate the power of logical decoding.
At the end of the presentation, we will have a working application, and you will be happy knowing that you provided a wonderful user experience for all PGConf.EU attendees made possible by the innovation of PGConf.EU!
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Databricks
This talk is about methods and tools for troubleshooting Spark workloads at scale and is aimed at developers, administrators and performance practitioners. You will find examples illustrating the importance of using the right tools and right methodologies for measuring and understanding performance, in particular highlighting the importance of using data and root cause analysis to understand and improve the performance of Spark applications. The talk has a strong focus on practical examples and on tools for collecting data relevant for performance analysis. This includes tools for collecting Spark metrics and tools for collecting OS metrics. Among others, the talk will cover sparkMeasure, a tool developed by the author to collect Spark task metric and SQL metrics data, tools for analysing I/O and network workloads, tools for analysing CPU usage and memory bandwidth, tools for profiling CPU usage and for Flame Graph visualization.
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?Jim Czuprynski
Autonomous Transaction Processing (ATP) - the second in the family of Oracle’s Autonomous Databases – offers Oracle DBAs the ability to apply a force multiplier for their OLTP database application workloads. However, it’s important to understand both the benefits and limitations of ATP before migrating any workloads to that environment. I'll offer a quick but deep dive into how best to take advantage of ATP - including how to load data quickly into the underlying database – and some ideas on how ATP will impact the role of Oracle DBA in the immediate future. (Hint: Think automatic transmission instead of stick-shift.)
Enterprise data is moving into Hadoop, but some data has to stay in operational systems. Apache Calcite (the technology behind Hive’s new cost-based optimizer, formerly known as Optiq) is a query-optimization and data federation technology that allows you to combine data in Hadoop with data in NoSQL systems such as MongoDB and Splunk, and access it all via SQL.
Hyde shows how to quickly build a SQL interface to a NoSQL system using Calcite. He shows how to add rules and operators to Calcite to push down processing to the source system, and how to automatically build materialized data sets in memory for blazing-fast interactive analysis.
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsJohn Beresniewicz
RMOUG 2020 abstract:
This session will cover core concepts for Oracle performance analysis first introduced in Oracle 10g and forming the backbone of many features in the Diagnostic and Tuning packs. The presentation will cover the theoretical basis and meaning of these concepts, as well as illustrate how they are fundamental to many user-facing features in both the database itself and Enterprise Manager.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
4. Performance Scalabilityvs.
Response Time
Ressource Use
Adding more
of a HW ressource
makes things
faster
You can scale without having performance
(ex: HADOOP)
You can perform without having scalability
(ex: In Memory Engines)
5. Our Reasonably Priced Server
• 2 Socket Xeon E3645
• 2 x 6 Cores
• 2.4Ghz
• NUMA enabled, HT off
• 12 GB RAM
• 1 ioDrive2 Duo
• 2.4TB Flash
• 4K formatted
• 64K AUS
• 1 Stripe
• Power Save Off
• Win 2008R2
• SQL 2012
Image Source: DeviantArt
6. Between disk and Memory
Core
Core
Core
Core
L1
L1
L1
L1
L3
L2
L2
L2
L2
1ns 10ns 100ns 100us 10ms10us
7. The “cache out curve”
Data Size
Throughput/thread
Cache Size
Service Time + Wait Time
9. There are several of these curves
Throughput
Touched
Data Size
CPU Cache
TLB
NUMA
Remote
Storage
10. Response time = Service Time + Wait Time
Algorithms
and
Data Structures
“Bottlenecks”
11. • DBA tasks
• Installation of OS and SQL
• Basic Memory Configuration
• Basic Perfmon style monitoring
• Backup/Restore and HA setup
• Basic reading a Query Plan
• Basic understanding of database
structures
• Adding Indexes to tables
• Running a Profiler trace
What you ALREADY know
13. What we Need
• Free tools from
MS
• Windows SDK
• In Win8: The
“ADK”
• Need .NET 4 to
install
14. Where Did the Time Go?
Service Time + Wait Time
Xperf –on Base –f Base.etl
SELECT TOP 100000 *
FROM LINEITEM
INNER JOIN ORDERS
ON O_ORDERKEY = L_ORDERKEY
SQLCMD –E –S. –i “Select.sql”
Xperf –stop
18. Quantifying just how stupid XML is
SELECT TOP 1000000 *
FROM ORDERS
JOIN LINEITEM
ON L_ORDERKEY = O_ORDERKEY
FOR XML RAW ('OUTPUT')
Xperf –on Base –f Base.etl
With XML
“Native” Format
19. Which CPU cycles are Expensive?
“App” tier
Web Server Licensing
>3K USD Blades
Database Tier
Core Licensing
>10K USD
<XML> ?
Service Time + Wait Time
20. • What about the time
INSIDE the process?
• What if the EXE won’t
tell us?
Diving even Deeper
21. What is a Debug Symbol?
mov ax,10
mov bx,20
mov cx,3
push ax
push bx
push cx
call <address>
<address>
push bp
mov bp,sp
mov ax,[bp+8]
mov bx,[bp+6]
mov cx,[bp+4]
add ax,bx
div cx
mov dx,ax
ret
HeaderdoStuff(10,20,3)
…
int doStuff(int a, int b, int c)
{
return (a + b) / c
}
myProg.exe
Machine Code
<address> = doStuff
Symbol table
myProg.pdb
Service Time + Wait Time
22. Where do you get PDB files?
_NT_SYMBOL_PATH=SRV*C:Symbols*http://msdl.microsoft.com/download/symbols
_NT_SYMCACHE_PATH=C:SymCache
• Public Symbol Server
• Configure Environment
• Dbghelp.dll
Service Time + Wait Time
23. • Auto Generated by Visual Studio:
Your Own Debug Symbols
Service Time + Wait Time
24. • Symbols are indexed. Have to add them
Adding and Checking Your Symbols
Cd Bin/x64/Release/
symstore add /f *.pdb /s C:/Symbols /t ‚MyExe‛
• Validate that the Symbols can resolve
Cd Bin/x64/Release/
symchk MyExe.exe /V
25. • Standard Xperf works fine
for you own native code
• BUT: Before Windows
8, stack walking is broken
for x64 .NET
• If you have .NET with 64
bit code. You must NGEN
first:
Got .NET and x64?
Ngen install Bin/x64/Release/MyExe.exe
(ngen lives here: %Windir%Microsoft.NETframework64<Version>Ngen.exe
Service Time + Wait Time
26. • Free tool from MS:
.NET tracing is a pain, get a tool!
• Not to be confused with xperfview
• Same trace API and file format
• Helps set obscure .NET specific trace flags
Service Time + Wait Time
27. And Finally, You can do Very Cool Things
Did I tell you about interlocked
operations?...
Whiteboard time!
Service Time + Wait Time
28. • Consider again our LINEITEM table
What is SQL Server REALLY doing?
• How expensive is it to read from that?
• Think ETL code and DW/BI queries
CREATE TABLE LINEITEM (
[L_ORDERKEY] [int] NOT NULL,
[L_PARTKEY] [int] NOT NULL,
[L_SUPPKEY] [int] NOT NULL,
[L_LINENUMBER] [int] NOT NULL,
[L_QUANTITY] [decimal](15, 2) NOT NULL,
[L_EXTENDEDPRICE] [decimal](15, 2) NOT NULL,
[L_DISCOUNT] [decimal](15, 2) NOT NULL,
[L_TAX] [decimal](15, 2) NOT NULL,
[L_RETURNFLAG] [char](1) NOT NULL,
[L_LINESTATUS] [char](1) NOT NULL,
[L_SHIPDATE] [date] NOT NULL,
[L_COMMITDATE] [date] NOT NULL,
[L_RECEIPTDATE] [date] NOT NULL,
[L_SHIPINSTRUCT] [char](25) NOT NULL,
[L_SHIPMODE] [char](10) NOT NULL,
[L_COMMENT] [varchar](44) NOT NULL
)
BigSmall
Small
Big
OLTP BI/DW
Simulation ETL
Service Time + Wait Time
29. SQLCMD – Native code Test
SQLCMD.EXE
Where does the time go?
Service Time + Wait Time
30. Standard Reading of Data
xperf -on base -stackwalk profile -f stackwalk.etl
SQLCMD -S. -dSlam –E -Q"SELECT * FROM LINEITEM_tpch"
55sec
xperf -stop
xperf –merge stackwalk.etl stackwalkmerge.etl
Service Time + Wait Time
33. An Educated guess about improvements
CREATE TABLE [dbo].[LINEITEM_native](
[L_ORDERKEY] [int] NOT NULL,
[L_PARTKEY] [int] NOT NULL,
[L_SUPPKEY] [int] NOT NULL,
[L_LINENUMBER] [int] NOT NULL,
[L_QUANTITY] money NOT NULL,
[L_EXTENDEDPRICE] money NOT NULL,
[L_DISCOUNT] money NOT NULL,
[L_TAX] money NOT NULL,
[L_RETURNFLAG] int NOT NULL,
[L_LINESTATUS] int NOT NULL,
[L_SHIPDATE] int NOT NULL,
[L_COMMITDATE] int NOT NULL,
[L_RECEIPTDATE] int NOT NULL,
[L_SHIPINSTRUCT] [char](25) NOT NULL,
[L_SHIPMODE] int NOT NULL,
[L_COMMENT] char(44) NOT NULL
)
CREATE TABLE [dbo].[LINEITEM](
[L_ORDERKEY] [int] NOT NULL,
[L_PARTKEY] [int] NOT NULL,
[L_SUPPKEY] [int] NOT NULL,
[L_LINENUMBER] [int] NOT NULL,
[L_QUANTITY] [decimal](15, 2) NOT NULL,
[L_EXTENDEDPRICE] [decimal](15, 2) NOT NULL,
[L_DISCOUNT] [decimal](15, 2) NOT NULL,
[L_TAX] [decimal](15, 2) NOT NULL,
[L_RETURNFLAG] [char](1) NOT NULL,
[L_LINESTATUS] [char](1) NOT NULL,
[L_SHIPDATE] [date] NOT NULL,
[L_COMMITDATE] [date] NOT NULL,
[L_RECEIPTDATE] [date] NOT NULL,
[L_SHIPINSTRUCT] [char](25) NOT NULL,
[L_SHIPMODE] [char](10) NOT NULL,
[L_COMMENT] [varchar](44) NOT NULL,
)
Before After
Service Time + Wait Time
34. Getting Rid of Useless Work
Additional parameters for SQLCMD:
-a32767 -W -s";" -f437
x1.5
Service Time + Wait Time
36. Lets try that with Native and Unicode …
x5
Service Time + Wait Time
37. • SQLNCLI is one of these in disguise
• ODBC
• OLEDB
• Pick good data types
• MONEY over NUMERIC
• UNICODE of data arrives like this
• Native protocols vs. flexibility
Summary Moving Data
38. • Get
• Windows 8 ADK
• Windows 7 SDK
• Set up Symbol Paths
• Xperf –on Base
• Standard trace for time, narrow to process
and DLL/EXE
• Xperf –on Base –stackwalk Profile
• Get to the call stack, find the offending
function(s)
• Ease of use for .NET: perfview.exe
Summary – Xperf
Service Time + Wait Time
41. Loop Join
n row B-tree
Log(n) reads
Complexity: O(m * log(n))
Service Time + Wait Time
m row result
1
43
13
7
3
42. Linked List Tree
Linked List vs. Tree
Service Time + Wait Time
0
1
2
3
4
5
6
7
8
n
8
134
62 1510
16141197531
Log2(n)
43. Cluster on O_ORDERKEY Index on O_ORDERKEY
Basic argument for Cluster Indexes
Service Time + Wait Time
CREATE UNIQUE CLUSTERED INDEX CIX_Key
ON ORDERS_Cluster (O_ORDERKEY)
WITH (FILLFACTOR = 100)
SELECT *
FROM ORDERS_Cluster
WHERE O_ORDERKEY = 3000000
CREATE UNIQUE INDEX IX_Key
ON ORDERS_Heap (O_ORDERKEY)
WITH (FILLFACTOR = 100)
SELECT *
FROM ORDERS_Heap
WHERE O_ORDERKEY = 3000000
Table 'ORDERS_Heap'. Scan count 0, logical reads 3
, physical reads 0, read-ahead reads 0
Table 'ORDERS_Cluster'. Scan count 0, logical reads 4
, physical reads 0, read-ahead reads 0
44. Cluster on O_ORDERKEY heap + Index on O_ORDERKEY
But what if we do this a lot?
CREATE INDEX IX_Customer ON ORDERS_Cluster (O_CUSTKEY)
WITH (FILLFACTOR = 100)
CREATE INDEX IX_Customer ON ORDERS_Heap (O_CUSTKEY)
WITH (FILLFACTOR = 100)
SELECT *
FROM ORDERS_Heap
WHERE O_CUSTKEY = 47480
SELECT *
FROM ORDERS_Cluster
WHERE O_CUSTKEY = 47480
Table 'ORDERS_Cluster'. Scan count 1
, logical reads 27, physical reads 0
Table 'ORDERS_Heap'. Scan count 1
, logical reads 11, physical reads 0
Service Time + Wait Time
45. How many LOOP joins/sec/core?
7 Sec
Service Time + Wait Time
46. What did we just measure?
Xperf –on Base –stackwalk profile
About 40%...
Service Time + Wait Time
47. • The query
language itself
• Why so many
ExecuteStmt?
• …With so much
CPU use?
What is sqllang.dll?
Service Time + Wait Time
48. A different way to Measure Loops
1 Sec
Service Time + Wait Time
49. VS.
What does THAT look like?
Takeaway:
The T-SQL language
itself is expensive
Service Time + Wait Time
50. • Sample from
LINEITEM
• Force loop join with
index seeks
• Do 1.4M seeks
Test: Singleton Row Fetch
51. Singleton seeks – Cost of compression
Compression Seek (1.4M seeks) CPU Load
None - Memory 13 sec 100% one core
PAGE - Memory 24 sec 100% one core
None – I/O 21 sec 100% one core
PAGE – I/O 32 sec 100% one core
Function % Weight
CDRecord::LocateColumnInternal 0.82%
DataAccessWrapper::DecompressColumnValue 0.47%
SearchInfo::CompareCompressedColumn 0.28%
PageComprMgr::DecompressColumn 0.24%
AnchorRecordCache::LocateColumn 0.18%
ScalarCompression::AddPadding 0.04%
ScalarCompression::Compare 0.11%
Additional Runtime of
GetNextRowValuesInternal 0.14%
Total Compression 2.28%
Total CPU (single core) 8.33%
Compression % 27.00%
xperf –on base
–stackwalk profile
55. Merge Join
m row result
1
1
2
3
n row result
1
2
3
4
4
43
43
Sorted
Sorted
Complexity: O(m + n)
Service Time + Wait Time
56. Merge Join – What is Fastest?
Service Time + Wait Time
SELECT MAX(L_PARTKEY), MAX(O_ORDERDATE)
FROM LINEITEM
INNER MERGE JOIN ORDERS
ON O_ORDERKEY = L_ORDERKEY
…or
SELECT MAX(L_PARTKEY), MAX(O_ORDERDATE)
FROM ORDERS
INNER MERGE JOIN LINEITEM
ON O_ORDERKEY = L_ORDERKEY
59. We can beat SQL Server at this game
SELECT MAX(O_ORDERDATE), MAX(MAX_P)
FROM
(SELECT L_ORDERKEY,MAX(L_PARTKEY) AS MAX_P
FROM LINEITEM
GROUP BY L_ORDERKEY) b
INNER MERGE JOIN ORDERS
ON O_ORDERKEY = b.L_ORDERKEY
Service Time + Wait Time
60. Hash Join
m row result
1
43
13
7
n row join table
Hash(1)
n row hash table
Complexity: O(m + 2n)
3
Service Time + Wait Time
61. When Hash Joins hurt you
Service Time + Wait Time
0
5
10
15
20
25
30
050100150200250300350400
Hash Memory (MB)
Runtime (seconds)
Spill Zone!
67. What LATCH pattern do we see?
GetNextRangeForChildScan
Inside:
TableScanNew
68. • Partition the table by a
“random” value
• Modulo the Key for
example
• Use SQL Server partition
function/schema
The Fix?…
0
1
2
3
4
5
6
253
254
255
hash
74. Goals:
• Compressed
• Prefetch Friendly
• Cache Resident Code
Example, Column Stores
ID Value
1 Beer
2 Beer
3 Vodka
4 Whiskey
5 Whiskey
6 Vodka
7 Vodka
ID Customer
1 Thomas
2 Thomas
3 Thomas
4 Christian
5 Christian
6 Alexei
7 Alexei
Product Customer
ID Date
1 2011-11-25
2 2011-11-25
3 2011-11-25
4 2011-11-25
5 2011-11-25
6 2011-11-25
7 2011-11-25
Date
ID Sale
1 2 GBP
2 2 GBP
3 10 GBP
4 5 GBP
5 5 GBP
6 10 GBP
7 10 GBP
Sale
Service Time + Wait Time
75. Compression is Easy
ID Value
1-2 Beer
3 Vodka
4-5 Whiskey
6-7 Vodka
ID Customer
1-3 Thomas
4-5 Christian
6-7 Alexei
Product’ Customer’
ID Date
1-7 2011-11-25
Date’
ID Sale
1-2 2 GBP
3 10 GBP
4-5 5 GBP
6-7 10 GBP
Sale’
RL Value
2 Beer
1 Vodka
2 Whiskey
2 Vodka
RL Customer
3 Thomas
2 Christian
2 Alexei
Product’ Customer’
RL Date
7 2011-11-25
Date’
RL Sale
2 2 GBP
1 10 GBP
4 5 GBP
2 10 GBP
Sale’
Service Time + Wait Time
77. RL Value
2 Beer
1 Vodka
2 Whiskey
2 Vodka
RL Customer
3 Thomas
2 Christian
2 Alexei
Product’ Customer’
2 steps with Beer
2 steps with Thomas
Beer Thomas
Beer Thomas
SELECT Product, Customer FROM Table
1 step with Vodka
1 step with Thomas
Vodka Thomas
2 step with Whiskey
2 step with Christian
Whiskey Christian
Whiskey Christian
2 step with Vodka
(Note: Repeated value)
2 step with Alexei
Vodka Alexei
Vodka Alexei
Service Time + Wait Time
78. Hash Joining with Column Stores
RL Key
2 Beer
1 Vodka
2 Whiskey
2 Vodka
Table
Key Type
Beer Soft
Vodka Strong
Whiskey Strong
Vodka Strong
Dim Product
SELECT …
FROM Table
JOIN DimProduct ON Key
WHERE Type = ‘Strong’
1 Compute bloom filter of Keys belonging to ‘strong’
2 Read RL = 2, Beer from Table
3 Compute bloom value of Beer.
4 Equal to filter value from 1? Yes. Output two rows (RL=2)
5 Compute bloom value for Vodka
6 Equal to filter value from 1? No. Do nothing
7 Compute bloom value for Whiskey
8 Equal to filter value from 1? No. Do nothing
Can pre fetch data (news RLE)
Can calculate match/no
match using only local CPU
cache
Wont work for OLTP!
Service Time + Wait Time
79. Why is it so hard to get joins right?
n
m
Time
Loop Join
Merge Join
Hash Join
Service Time + Wait Time
80. Desired Join Join Hint Query Hint
LOOP [INNER | LEFT | CROSS | FULL]
LOOP JOIN
OPTION (LOOP JOIN)
MERGE [INNER | LEFT | CROSS | FULL]
MERGE JOIN
OPTION (MERGE JOIN)
HASH [INNER | LEFT | CROSS | FULL]
HASH JOIN
OPTION (HASH JOIN)
LOOP with
Seek
WITH FORCESEEK
WITH ( INDEX (index = <name>) )
N/A
Controlling Joins
Note: Join hints force the order of the ENTIRE join tree!
Service Time + Wait Time
81. What Type of Workload?
BigSmall
Small
Big
DataReturned
Data Touched
OLTP BI/DW
Simulation ETL
Service Time + Wait Time
82. How to Classify?
OLTP BI/DW
Simulation ETL
Full Scan/sec
Range Scans/sec
Probe Scans/sec
Index Search/sec
Range Scans/sec
Full Scan/sec
Range Scans/sec
Bulk Copy Rows/sec
?
83. There should ALWAYS be a fully
indexed path to the data.
OLTP System Basic Query Pattern
BigSmall
Small
Big
OLTP BI/DW
Simulation ETL
Service Time + Wait Time
84. 1. Find worst CPU consuming query with
sys.dm_exec_query_stats
2. Add OPTION (LOOP JOIN) to offending
query
3. Check estimated query plan
4. If table spool found: add index to
remedy and GOTO 3
5. Happy? If not, GOTO 1
The Super Quick OLTP Tuning Guide
Service Time + Wait Time
85. The query will not be (much)
worse than a full scan of a fact
partition
DW/BI System Basic Query Pattern
BigSmall
Small
Big
OLTP BI/DW
Simulation ETL
Service Time + Wait Time
86. 1. Find offending query
2. Add OPTION (HASH JOIN) to query
3. Does dimension tables have indexed path
to build hash? If not, add index
4. Do you get a fact table scan and hash
build of all dimensions? If not, check
statistics (especially on facts and skewed)
5. Optimize Fact table scans
1. Partition and partition elimination
2. Column store if you have it
3. Aggregate Views
4. Bitmap index pushdown (statistics!)
5. Composite indexes (last resort!)
The Super Quick DW tuning Guide
Service Time + Wait Time
87. The expected DW Query Plan
Partial
Aggregate
Fact CSI Scan
Dim Scan
Dim Seek
Batch
Build
Batch
Build
Hash
Join
Hash
Join
HashStream
Aggregate
88. • At least enough RAM to hold the hash
tables of the largest dimension
• De-normalisation helps… a LOT
• Especially for the large/large joins
• Likely: need to scan fast from disk if
RAM is not big enough to hold the fact
• Compression REALLY matters
Things that Follow from desired DW Plan
Service Time + Wait Time
91. Where EVERY Server wide diagnosis starts
SELECT *
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN (SELECT wait_type FROM
#ignorewaits)
AND waiting_tasks_count > 0
ORDER BY wait_time_ms DESC
Service Time + Wait Time
92. • Shows up as waits for PAGEIOLATCH
• You can dig into details with:
Common Problems - PAGEIO
Service Time + Wait Time
SELECT *
FROM sys.dm_io_virtual_file_stats(DB_ID(), NULL)
• Can also Xevent your way to it per
query
CREATE EVENT SESSION [TraceIO] ON SERVER
ADD EVENT
sqlserver.file_read_completed(
ACTION (sqlserver.database_id,sqlserver.session_id))
93. • I/O, like memory, is a GLOBAL resource
for the machine
• When does it make sense to partition a
global resource?
• When you deeply know the workload
• When the workload is ALREADY partitioned
• When neither of those are true: DON’T
partition
• If you have NAND/SSD – Why bother?
The general I/O Guidance
Service Time + Wait Time
97. OLTP
• One big SAME setup
• data files
• Tempdb
• Dedicate
• Transaction log
• DRAM:
• Enough to hold most of
DB
Data Warehouse
• JBOD setup
• Data Files
• 1-2 per LUN
• SAME setup
• Tempdb
• Dedicate
• Transaction Log
• DRAM:
• Enough to hold largest
partition of largest table
Rules of Thumb – Spindle I/O and DRAM
Service Time + Wait Time
98. • Short Stroking
• Elevator Sort
• Sequential vs.
Random
• Weaving
You can do a bit better… or worse
Service Time + Wait Time
99. • Intentionally use
lower % of total
space
• Tradeoff:
• Space for Speed
• Test:
• 15K rpm
• SAS spindle
• 300GB
Short Stroking Disks
150
200
250
300
350
400
0% 20% 40% 60% 80% 100%
IOPS
% Capacity Used
Service Time + Wait Time
100. Full Stroked Short Stroked
Why does Short Stroking Work?
Disk are typically consumed “from the outside in”. If partitions don’t use the full disk size, the
disk wont use the full platter either. The result: less head movement
Service Time + Wait Time
102. Why Chase Sequential I/O?
0
10
20
30
40
50
60
70
80
1
10
100
1000
10000
100000
Sequential Full Stroke Random
Latency(ms)
Log(IOPS)
8K Block Pattern
IOPS
Avg Latency
Max Latency
Service Time + Wait Time
103. • One SATA disk
• Two partitions
• One file on each
• Sequential read on
each file
But all is not well!
File1 File2
Service Time + Wait Time
104. I/O Weaving in action
0
2
4
6
8
10
12
14
16
18
0
50
100
150
200
250
300
64K Random 64K Dual Sequential
Latency(ms)
IOPS
IOPS
Avg Latency
Source: Michael Anderson Service Time + Wait Time
105. Storage Pool and Weaving
DataLog DataLog DataLog
Massive, then Provisioned Pool
Seq
Ran
Seq
Ran
Seq
Ran
RANDOM!
Service Time + Wait Time
106. The SAN will properly handle Sharing!
Green: Checkpoint, Red: tx/sec, Black: Disk Latency Service Time + Wait Time
107. Numbers to Remember - Spindles
Characteristic Typical Units
Throughput / Bandwidth 90-125MB/sec
But ONLY if sequential access!
Operations per Sec 10K RPM Spindle: 100-130 IOPS
15K RPM Spindle: 150-180 IOPS
Can get about 2x if short stroking (more
later)
Latency 3-5ms
(compare DRAM: 100ns)
Capacity 100s of GB to single digit TB
2012 numbers, will change in future Service Time + Wait Time
108. • Few hundreds of IOPS
• Faster if short stroked
• Trade latency for speed with elevator
sort
• Sequential is hard to get right
Summary so far.. Single Disk
Service Time + Wait Time
109. • Wider Stripes neat
• But scale not linear
• Very deep queues
help
• But add latency
• Shared
Components
Why does a big RAID pile not solve this?
Service Time + Wait Time
111. Before After
Getting rid of Sharing
Switch
HBA HBA HBA HBA
Storage
Port
Storage
Port
Switch
LUN LUN
Cache
Disk
CPU
Switch
HBA HBA HBA HBA
Storage
Port
Storage
Port
Switch
LUN LUN
Cache
Disk
CPU
x2
112. 4K
PN N
NAND Flash Basics
112
PN N
Oxide Layer
Floating Gate
Electrons
trapped
Control Gate
NAND Die
Pack
Blocks
4K
4K
4K
4K
4K
4K
4K
4K
4K
4K
4K
4K
4K
4K
4K
4K
PN N PN NPN N
PN NPN N PN NPN N
Pages
113. NAND Flash Problems
• Erase Cycles
• Around 100K
• Rebalancing and reclaim/trim
• Voltage measurement
• Gets worse with density
• Changes over time
• Depends on how you program
• Bit Rot
• Must refresh even on read
• SLC easier to manage than MLC
• But much more expensive!
113
Voltage
00
01
10
11
116. • Only partially diagnosed as waits in
sys.dm_os_wait_stats
• Task Manager gives a bit more
information
• Need: transparency to the deep level
latencies and packets!
Common Problems: ASYNC_NETWORK, OLEDB
Service Time + Wait Time
117. A common Wait Type
The database is really
slow! The code takes
forever to run!
Service Time + Wait Time
118. • We may not always have insight into
what is going on at the client…
Xperf Diagnosing the Network
xperf –on latency+network
Summary
Table
Service Time + Wait Time
122. Short Story on DPC/ISR handling
CPU
Core
Core
L1-L3
Cache
PCI
BUS
IRQ
HALT execution
Fire ISR Routine
if (my interrupt)
{
<Mark Handled>
Queue DPC
}
NIC
Work Done
DPC
<Do work needed>
<Wake Application>
Core can
run other stuff
again
Service Time + Wait Time
123. It looks like this…
DPC
ISR
Service Time + Wait Time
124. • Option 1: Use the HW vendors tool
• Option 2: Use interrupt Affinity Policy Tool
from MS
Setting Interrupt Affinity
Service Time + Wait Time
125. • Standard Payload
Network (MTU):
• 1500 B
• Jumbo Frames
• 9014 B(MTU)
Jumbo Frame and SQL Packets
• Standard SQL
payload
• 4096 B
• Largest
• 32767 B
SELECT session_id, net_packet_size
FROM sys.dm_exec_connections
Server=foo;Packet size=32767
Service Time + Wait Time
127. Core Evolution
Moore’s “Law”:
“The number of transistors per
square inch on integrated
circuits has doubled every
two years since the
integrated circuit was
invented”
128. • Never faster than a single core
• Smaller servers are faster than bigger ones
• Large L2 caches and more clock speed help
• The algorithm dictates speed
• Latency of Wait Time sets upper limit
• Examples from MSSQL land:
• Formula Engine in MSAS
• Transaction Log Writes
• INSERT/UPDATE/DELETE (as we shall see)
Single Threaded
129. VLF files
• When switching to new VLF – it has to be ”formatted” with
8K sync write
• While this happens, transactions are blocked
• Too many VLF = Too much blocking
• Lesson: Preallocate the database log file in big chunks
• Up to 128 Log Buffers per database
• Spawned on demand, will not be released once spawned
• Transactions will wait for LOGBUFFER is no buffer available
• Think of this like a pipeline of commits waiting…
VLF(1) VLF(2) VLF(3) VLF(4) VLF(5) VLF(6)8K 8K 8K 8K 8K 8K
<=60K
X 128
131. • Speed is determined by Latency and
Code Path
• Max Log Write Size: 60K
Zooming to the Log Writer
Log Writer
Async I/O Completion Port
Signal thread which
issued commit
Latency
Writer Queue
132. Long Distance Replication…
Log Entry Log Entry
Network
Log Entry
Send log
Ack Log
Primary Secondary
Write Write
Executive Summary:
The speed of light ( c )
is not fast enough!
133. • Perfmon will only show millisec
• What if we want microseconds?
Getting to the Real Latency
xperf –on latency
134. It’s in Memory, so it must be fast?
VS.
Latency: 15-30us Latency: <5us
RAM DISK
1.5sec 1.5sec
136. The Effect on UPDATE
Naïve
UPDATE MyBigTable
SET c6 = 43
Parallel
UPDATE MyBigTable
SET c6 = 43
WHERE key
BETWEEN 10**9 * n
AND 10**9 * (n+1) -1CX
Runtime
(smaller is faster)
139. Amdahl’s Law of gated speedup
1
6
11
16
21
26
31
0 8 16 24 32 40 48 56 64
SpeedupFactor
Number of cores
P = 100%
P = 95%
P = 90%
P = 80%
P = Part of program that can be made Parallel
(Note that this may be 0... or 1)
N = Number of CPU cores available
Speedup =
141. But those rows have to be stored…
Table A
Table B
Table C
LCK
LCK
LCK
LCK
LCK
LCK
LCK
LCK
Data
File
File
Group
142. It all Starts with Wait Stats
SELECT *
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN (SELECT
wait_type FROM #ignorewaits)
AND waiting_tasks_count > 0
ORDER BY wait_time_ms DESC
DBCC PAGE
143. PFS – Hidden Single Page Contention
Data File
GAM/
SGAM
PFS
64MB
PFS PFS
64MB
PFS
64MB
PFS
B B B B
B B B B
B B B B
B B B B
8K
10010010
INSERT TableA …
Allocated bit
145. How many more Files?
1
10
100
1000
10000
100000
1000000
10000000
260
280
300
320
340
360
380
400
0 8 16 24 32 40 48
PAGELATCH
Runtime
# Data Files
Runtime PAGELATCH_UP
146. • Shared, physical MEMORY structures
can cause bottlenecks (ex: PFS)
• SQL Server must sync too…
• Understanding where structure resides
leads to tuning fix
• Theory of engine!
Concurrency: What we learned so far
147. • Commonly misdiagnosed
• CXPACKET does NOT (always) mean
that your DOP is “too high”
CXPACKET
0
20,000,000
40,000,000
60,000,000
80,000,000
100,000,000
120,000,000
140,000,000
160,000,000
180,000,000
200,000,000
10.015.020.025.030.035.040.0
CXPACKETWaits
Throughput (MB/sec)
CXPACKET waits / Throughput
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
50.0
1 11 21 31 41
Throughput(MB/sec(
DOP
Throughput / DOP
149. • What happens when you get things like:
LATCH_<x>
PAGELATCH_<x>
Step 1: Dig into:
Diagnosing Latches
SELECT *
FROM sys.dm_os_latch_stats
Service Time + Wait Time
156. Before
ALTER TABLE HotUpdates
ADD COLUMN Padding CHAR(5000)
NOT NULL DEFAULT („X‟)
After
UPDATE Hack on Small Tables
Page (8K)
ROW
LCK_U
PAGELATCH_EX
CHAR(5000)
Page (8K)
ROW
ROW
ROW
LCK_U
LCK_U
PAGELATCH_EX
157. Test: Updates of pages
Compression Update 1.4M CPU Load
None - Memory 13 sec 100% one core
PAGE - Memory 54 sec 100% one core
None – I/O 17 sec 100% one core
PAGE – I/O 59 sec 100% one core
L_QUANTITY is NOT NULL
i.e. in place UPDATE
160. How long are locks held?
0
100
200
300
400
500
600
PAGE NONE
CPU KCycles
Lock Held Cycle Count
Avg
StdDev
161. • Sharing is generally bad for scale (but
may be good for performance)
• PAGELATCH and LATCH diagnosis starts
in sys.dm_os_latch_stats
• CXPACKET
• Only important if throughput drops when
DOP goes up
• If this happens, look for another wait/latch
• Table partitioning can be used to work
around concurrency issues
Summary Concurrency – So Far..
162. The Paul Randal INSERT test
160M rows, executing at concurrency
Commit every 1K:
EASY
tuning?
166. And the Score Is…
0
5000
10000
15000
20000
25000
30000
35000
newguid() newsequentialid() IDENTITY
Time in Seconds
167. What is going on here???
Min
Min
Min Min
Min
Min
Min
Min Min
Min
HOBT_ROOT
Max
168. Tricks to Work Around this
0
-1000
1001
- 2000
2001
- 3000
3001
- 4000
INSERT
INSERT
INSERT
INSERT
169. All Cores at 100%
0
5000
10000
15000
20000
25000
30000
35000
newguid(
)
newsequ
entialid()
IDENTITY
IDENTITY
+Unique
IDENTITY
+Unique
+Hash8
IDENTITY
+Hash24
IDENTITY
+Hash48
SPID+
Offset
Seconds
Runtime in Seconds
600K
Inserts/sec
830K
Inserts/sec
All Cores at ~100%
170. • Don’t use Sequential Keys
• Page Splitting isn’t so bad
• Neither are GUID
• Generate keys wisely. Ideally in the app
server
• For “transparent” speedup, consider our
old hash trick
Takeaways, INSERT workload
171. • Minimally Logged
• Single, large
execution
(thousands)
• Unsorted data
• Concurrent Loaders
BULK INSERT Workload
Heap
Bulk Insert
Bulk Insert
172. Measure:
SELECT * FROM
sys_dm_os_latch_stats
Observe waits on
ALLOC_FREESPACE_CACHE
Theory (just read BOL):
“Used to synchronize the access to
a cache of pages with available
space for heaps and binary large
objects (BLOBs). Contention on
latches of this class can occur
when multiple connections try to
insert rows into a heap or BLOB at
the same time. You can reduce
this contention by partitioning the
object.”
When does BULK INSERT scale break?
0.0
50.0
100.0
150.0
200.0
250.0
0 5 10 15 20 25 30
MB/Sec
Concurrent BULK INSERT
1
2
3
173. What is Happening here?
Free Page information (PFS/GAM/SSGAM)
HOBT Cache
Fat
Chunks
Alloc
new
pages!Bulk Insert
ALLOC_FREESPACE_CACHE
This is in DRAM
and L2
174. • Break Up table
by “some key”
• Optional: Switch
out partitions
• Spin up multiple
bulks
• Linear scale
• 3GB/sec
• 16M
LINEITEM/sec
Breaking Through the Bottleneck
425
555
215
200
101
453
666
Area
Bulk Insert
Bulk Insert
Bulk Insert
175. BULK INSERT - Reloaded
• Thomas, you might have gotten 16M
rows/sec at 3GB/sec insert speed
• But this was on heaps, I have a clustered
table
• Alright then, let us hit a cluster index
1-1000
Clustered and partitioned
1001-2000
2001-3000
3001-4000
X Lock
X Lock
X Lock
X Lock
180. • Context Switching is expensive
• Typically 10K or more CPU cycles
• If you expect the ressource to be held
only shortly, why fall asleep?
What is a Spinlock?
spin_acquire(int* s)
{
while(*s==1)
*s = 1;
}
Spin_release(int* s)
{
*s = 0;
}
181. • Acquire can be very expensive
• SQL Server implements a backoff
mechanism
What is a backoff?
spin_acquire(int* s)
{
int spins = 0;
while(*s==1)
{
spins++;
if (spins > threshold)
{
<Sleep and WaitForRessource>
}
}
*s = 1;
}
SELECT *
FROM sys.dm_os_spinlock_stats
DBCC SQLPERF(spinlockstats)
Backoff
183. WRITELOG is I/O – right?
Should be the same as this… or?
No! Because:
184. • Step 1: Copy sqlserver.pdb to the BINN
directory
• Step 2: DBCC TRACEON (3656, -1)
• Step 3: Steal script from:
http://www.microsoft.com/en-
us/download/details.aspx?id=26666
Note for 2012, you additionally need:
• sqlmin.pdb, sqllang.pdb, sqldk.pdb
Diagnosing a Spinlock the Cool way!
185. Spinlock Walkthrough – Extended Events Script
--Get the type value for any given spinlock type
select map_value, map_key, name from
sys.dm_xe_map_values
where map_value IN ('SOS_CACHESTORE')
--create the even session that will capture the
callstacks to a bucketizer
create event session spin_lock_backoff on server
add event sqlos.spinlock_backoff (action
(package0.callstack)
where
type = 144 --SOS_CACHESTORE)
add target
package0.asynchronous_bucketizer (
set
filtering_event_name='sqlos.spinlock_backoff',
source_type=1, source='package0.callstack')
with
(MAX_MEMORY=50MB, MEMORY_PARTITION_MODE =
PER_NODE)
--Run this section to measure the contention
alter event session spin_lock_backoff on server
state=start
--wait to measure the number of backoffs over a 1
minute period
waitfor delay '00:01:00'
--To view the data
--1. Ensure the sqlservr.pdb is in the same directory
as the sqlservr.exe
--2. Enable this trace flag to turn on symbol
resolution
DBCC traceon (3656, -1)
--Get the callstacks from the bucketize target
select
event_session_address, target_name, execution_count, c
ast (target_data as XML)
from sys.dm_xe_session_targets xst
inner join sys.dm_xe_sessions xs on
(xst.event_session_address = xs.address)
where xs.name = 'spin_lock_backoff'
--clean up the session
alter event session spin_lock_backoff on server
state=stop
drop event session spin_lock_backoff on server
187. How to improve a spinlock?
CPU
Core
Core
L1-L3
Cache
CPU
Core
Core
L1-L3
Cache
spin_acquire
Int s
spin_acquire
Int s
spin_acquire
Int s
Transfer cache line
Transfer cache line
CPU CPU
191. Bulking at Concurrency
• What’s that spin?
xperf –on latency –stackwalk profile
xperf –d trace.etl
xperview trace.etl
SELECT * FROM sys.dm_os_spinlock_stats
ORDER BY spins_count
DBCC SQLPERF (spinlockstats)
?
192. SOS_OBJECT_STORE at high INSERT
• Observed: This Spin happens when
inserting
• Need: Reduce locking overhead
• Fixes that work well here:
8x
throughput
Bonus
193. • Lets try something really silly:
• Run lots of: EXEC emptyProc
• This should be infinitely scalable, right?
Diagnosing another Spinlock
CREATE PROCEDURE emptyProc
AS
RETURN
199. DECLARE @ParmDef NVARCHAR(500)
DECLARE @sql NVARCHAR(500)
SET @sql = N'INSERT INTO dbo_<t>.MyBigTable_<t> WITH (TABLOCK)
(c1, c2, c3, c4,c5,c6)
VALUES (@p1, @p2, @p3, @p4, @p5, @p6)'
SET @sql = REPLACE(@sql, '<t>', dbo.ZeroPad(@table, 3))
SET @ParmDef = '@p1 BIGINT, @p2 DATETIME, @p3 CHAR(111), @p4 INT, @p5
INT, @p6 BIGINT'
DECLARE @constDate DATETIME = '1974-12-22'
DECLARE @i INT
WHILE (1=1) BEGIN
BEGIN TRAN
SET @i = 1
WHILE @i <= 1000 BEGIN
EXEC sys.sp_executesql @sql, @ParmDef
, @p1 = 1, @p2 = @constDate, @p3 = 'x', @p4 = 42, @p5 = 7, @p6 = 13
SET @i = @i + 1
END
COMMIT TRAN
Consider this Test harness code…
200. Spinning on MUTEX
Diagnose with trace flag shows spins
stack offender:
CSecurityContext::GetUserTokenFromCache
This is REALLY expensive at scale:
WHILE @i <= 1000 BEGIN
EXEC sys.sp_executesql @sql,
SET @i = @i + 1
END
Initialize a new execution context on
every loop!
201. Fixing the MUTEX spin
• Instead of:
WHILE @i <= 1000 BEGIN
EXEC sys.sp_executesql @sql,
SET @i = @i + 1
END
• Write:
SET @sql = N'
DECLARE @i INT
WHILE (1=1) BEGIN
BEGIN TRAN
WHILE @i <= 1000 BEGIN
INSERT INTO dbo_<t>.MyBigTable_<t> WITH
(TABLOCK)
(c1, c2, c3, c4,c5,c6)
VALUES (@p1, @p2, @p3, @p4, @p5, @p6)
SET @i = @i + 1
END
COMMIT TRAN
END
EXEC sys.sp_executesql @sql, @ParmDef
4x
throughput
Bonus
202. • When all other bottlenecks are
gone, sharing happens in the most
unlikely places
• You can use spinlock Xevents inside SQL
Server
• Remember symbol files in BINN
• Trace flag 3656
• This can also be done in XPERF for non
SQL apps
• Ex: Analysis Services
Concurrency, Spinlock Summary
203. • Control of buffers and NUMA for Xperf
setting
• By default:
• 4MB mem
• Spool to disk at root of C-drive
• Can do buffer/file control:
• -buffersize and –maxbuffers
• -maxfile and –FileMode Circular
Xperf controlling buffers
204. • Round robin between NUMA nodes
• Inside the NUMA: Pick the one that
looks the least busy
• This is NOT a perfect system
How SQL Server assigns threads
206. • All the tuning wont help you if your
model is wrong
• Tunings gets your far, but to really
scale, you need a good data model
• This is what my other courses are about
But does the Data Model Work?
214. What if…
• Push
• Seek first value
page
• UPDATE Reference
Count
• Pop
• Seek last value
page
• UPDATE Reference
Count
Min Max
Msg++
Min Max
Msg--
218. Summing Up Message Queue Hack
• UPDATE
• instead of INSERT/DELETE
• More partitions = More
B-Trees
• Ring buffer using modulo
• Find Sweet spot
concurrency
Editor's Notes
For a great introductory course I recommend the Paul Randal course found here: http://www.sqlskills.com/T_ImmersionInternalsDesign.asp
To get a good runtime, we up the count of rows to 1M
Hint: NGEN lives in %Windir%\Microsoft.NET\framework64\<Version>Doc on NGEN: http://msdn.microsoft.com/en-us/magazine/cc163610.aspx
Get perfview here: http://www.microsoft.com/en-us/download/details.aspx?id=28567
Different data structures have different time complexities that lend themselves to more or less efficient service times.
Concurrency of JOIN even when single threaded
The B+ tree is a data structure that seeks to block fetch large areas of data (typically, but not always 8K) before seeking through the pages in memory. There exist many different ways to lay out the data pages of a B-tree, some of them more friendly to memory prefetch than others. The B-tree also allows you to seek the leaf nodes in a linear manner, without paying the log-proportional price to seekThis allows a logarithmic time to seek individual pages while still allowing linear time to range scan. When the expensive price of fetching a page (I/O) has been paid, the parsing of the page can also be made cheap by making use of the memory structures
Highlight spill warning
In the course material I have a query that will help you do 1 in this list.If you are curious about way to optimize the BEST index only plan, I recommend the book by Dan Tow called : “SQL Tuning”
We will get into WHY the transaction log needs to be dedicated
Elevator sorts orders the I-O requests before sending them to the spindle. Depending on the buffering, this ordering can increase IOPS per spindle quite signficantly. However, it comes at cost in increased latency.
Add the spindle illustration here
Hardware vendors have different implementation of RAID. It really depends on the gear you have and there is really only ONE way to get the true, unbiased answer… Which leads us to the next slide
Certain scenarios for shallow B-Trees (BizTalk Spool) row padding can shift the latch to internal structure @ACCESS_METHODS_HOBT_VIRTUAL_ROOT
Root splits are expensive, although it will only affect one partition at a time. It is when many transactions cause page splits. We are suggesting the partitioning is better.