There are three levels of tracing in Vertica: Select, Session, and System. The Select level traces a single statement, Session traces all statements in a session, and System traces all queries across sessions. The trace output populates tables with query metadata and execution details. A long-running query was identified and traced to a costly GroupByHash operator. Creating a projection to presort the data enabled pipelining between operators, improving performance by 85%.
Conference: HP Big Data Conference 2015
Session: Real-world Methods for Boosting Query Performance
Presentation: "Extra performance out of thin air"
Presenter: Konstantine Krutiy, Principal Software Engineer / Vertica Whisperer
Company: Localytics
Description:
Learn how to get extra performance out of Vertica from areas you never expected.
This presentation will illustrate how you can improve performance of your Vertica cluster without extra budget.
All you need is ingenuity, knowledge of Vertica internals, and the ability to challenge conventional wisdom.
We will show you real world examples on gaining performance by eliminating unneeded work, eliminating unneeded system waits and making your system operate more efficiently.
Visit my blog http://www.dbjungle.com for more Vertica insights
This document summarizes several myths about database redo, undo, commit, and rollback operations. It presents test cases and analysis to debunk the myths. The author is an experienced Oracle DBA who specializes in performance tuning and internals. Sample redo records are displayed and analyzed to explain how operations like rollback do generate redo. The document aims to clarify misunderstandings about the internal workings of Oracle's transaction and redo logging.
This document discusses features of Oracle Database 12c related to auditing and tracking changes over time. It summarizes that Oracle 12c includes flashback data archive, which allows viewing or restoring data to a previous state. This feature can be used for auditing and tracking changes made to database tables. The document also discusses how Oracle 12c captures additional context metadata with each change, including user, host, and program used, allowing more detailed tracking of changes than prior releases.
This document discusses using Oracle Database's block change tracking and direct NFS features to enable fast cloning of databases for development and testing purposes at low cost. Block change tracking allows incremental backups to be performed quickly, while direct NFS allows database files to be copied over the network efficiently to create clones that only require storage for changed blocks. Examples are provided demonstrating how this can be used to regularly clone a production database to multiple developer environments.
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...Altinity Ltd
Robert Hodges is the Altinity CEO with over 30 years of experience in DBMS, virtualization, and security. ClickHouse is the 20th DBMS he has worked with. Alexander Zaitsev is the Altinity CTO and founder with decades of experience designing and operating petabyte-scale analytic systems. Vitaliy Zakaznikov is the QA Architect with over 13 years of testing hardware and software and is the author of the TestFlows open source testing tool.
The document discusses Oracle database logging and redo operations. It describes how Oracle uses physiological logging to generate redo records from change vectors. Change vectors transition database blocks between versions. Redo records group change vectors and transition the overall database state. The document provides an example redo record for an INSERT statement, showing the change vectors for both the table and undo segments involved in the transaction.
1. The document discusses Oracle RAC Cluster Synchronization Services (CSS) and how it handles split-brain situations when network failures occur.
2. CSS uses services like Group Management and Node Monitoring to manage cluster membership and detect node failures. It relies on voting disks to resolve split-brain situations when nodes cannot communicate over the network.
3. During a split-brain, the CSS reconfiguration process attempts to keep the largest surviving subcluster alive by exiling nodes from the smaller competing subclusters. It does this by checking network and disk heartbeats as well as voting information stored on voting disks.
MySQL exposes a collection of tunable parameters and indicators that is frankly intimidating. But a poorly tuned MySQL server is a bottleneck for your PHP application scalability. This session shows how to do InnoDB tuning and read the InnoDB status report in MySQL 5.5.
Conference: HP Big Data Conference 2015
Session: Real-world Methods for Boosting Query Performance
Presentation: "Extra performance out of thin air"
Presenter: Konstantine Krutiy, Principal Software Engineer / Vertica Whisperer
Company: Localytics
Description:
Learn how to get extra performance out of Vertica from areas you never expected.
This presentation will illustrate how you can improve performance of your Vertica cluster without extra budget.
All you need is ingenuity, knowledge of Vertica internals, and the ability to challenge conventional wisdom.
We will show you real world examples on gaining performance by eliminating unneeded work, eliminating unneeded system waits and making your system operate more efficiently.
Visit my blog http://www.dbjungle.com for more Vertica insights
This document summarizes several myths about database redo, undo, commit, and rollback operations. It presents test cases and analysis to debunk the myths. The author is an experienced Oracle DBA who specializes in performance tuning and internals. Sample redo records are displayed and analyzed to explain how operations like rollback do generate redo. The document aims to clarify misunderstandings about the internal workings of Oracle's transaction and redo logging.
This document discusses features of Oracle Database 12c related to auditing and tracking changes over time. It summarizes that Oracle 12c includes flashback data archive, which allows viewing or restoring data to a previous state. This feature can be used for auditing and tracking changes made to database tables. The document also discusses how Oracle 12c captures additional context metadata with each change, including user, host, and program used, allowing more detailed tracking of changes than prior releases.
This document discusses using Oracle Database's block change tracking and direct NFS features to enable fast cloning of databases for development and testing purposes at low cost. Block change tracking allows incremental backups to be performed quickly, while direct NFS allows database files to be copied over the network efficiently to create clones that only require storage for changed blocks. Examples are provided demonstrating how this can be used to regularly clone a production database to multiple developer environments.
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...Altinity Ltd
Robert Hodges is the Altinity CEO with over 30 years of experience in DBMS, virtualization, and security. ClickHouse is the 20th DBMS he has worked with. Alexander Zaitsev is the Altinity CTO and founder with decades of experience designing and operating petabyte-scale analytic systems. Vitaliy Zakaznikov is the QA Architect with over 13 years of testing hardware and software and is the author of the TestFlows open source testing tool.
The document discusses Oracle database logging and redo operations. It describes how Oracle uses physiological logging to generate redo records from change vectors. Change vectors transition database blocks between versions. Redo records group change vectors and transition the overall database state. The document provides an example redo record for an INSERT statement, showing the change vectors for both the table and undo segments involved in the transaction.
1. The document discusses Oracle RAC Cluster Synchronization Services (CSS) and how it handles split-brain situations when network failures occur.
2. CSS uses services like Group Management and Node Monitoring to manage cluster membership and detect node failures. It relies on voting disks to resolve split-brain situations when nodes cannot communicate over the network.
3. During a split-brain, the CSS reconfiguration process attempts to keep the largest surviving subcluster alive by exiling nodes from the smaller competing subclusters. It does this by checking network and disk heartbeats as well as voting information stored on voting disks.
MySQL exposes a collection of tunable parameters and indicators that is frankly intimidating. But a poorly tuned MySQL server is a bottleneck for your PHP application scalability. This session shows how to do InnoDB tuning and read the InnoDB status report in MySQL 5.5.
This document contains a presentation on MongoDB replication and replica sets. It discusses:
- The benefits of replication for avoiding downtime, data loss and handling failures.
- The lifecycle of a replica set including creation, initialization, failure and recovery of nodes.
- Different roles nodes can have like primary, secondary or arbiter.
- Configuration options for replica sets including priority, hidden nodes and tags.
- Considerations for developing applications using replica sets including write concerns, read preferences and consistency levels.
New features in Performance Schema 5.7 in actionSveta Smirnova
New features in Performance Schema 5.7 in action provides an overview of Performance Schema improvements in MySQL 5.7 and 8.0 including new tables, instruments, and variables. It demonstrates how to use Performance Schema to diagnose locks, memory usage, stored routines, and prepared statements. Examples show identifying blocking locks, measuring memory usage by thread, and instrumentation of stored procedure execution and prepared statement statistics.
PostgreSQL autovacuum is important for garbage collection and preventing fragmentation. It works table-by-table to remove old tuples and collect statistics. While autovacuum settings are often left as defaults, it's best to configure it aggressively for OLTP workloads so it can work quickly in small portions. Autovacuum must be properly configured for replication as well to avoid conflicts. Tools exist to help remove existing bloat without needing to dump/restore the entire database.
The document discusses optimization of Real Application Clusters (RAC) in Oracle 12c. It provides background on the author and outlines common root causes of RAC performance issues such as CPU/memory starvation, network issues, and excessive dynamic remastering. The document then presents golden rules for RAC diagnostics including avoiding focusing only on top wait events, eliminating infrastructure issues, identifying problem instances, examining both send and receive side metrics, and using histograms. Specific techniques are described for analyzing wait events like gc buffer busy.
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015PostgreSQL-Consulting
This document discusses how PostgreSQL works with disks and provides recommendations for disk subsystem monitoring, hardware selection, and configuration tuning to optimize performance. It explains that PostgreSQL relies on disk I/O for reading pages, writing the write-ahead log (WAL), and checkpointing. It recommends monitoring disk utilization, IOPS, latency, and I/O wait. The document also provides tips for choosing hardware like SSDs or RAID configurations and configuring the operating system, file systems, and PostgreSQL to improve performance.
This document discusses techniques for detecting and preventing SQL injection using the Percona Toolkit and Noinject!. It begins by introducing SQL injection and how attackers can modify SQL queries without changing server code. It then discusses using query fingerprints to detect new queries that may indicate injection attempts. The Percona Toolkit tools pt-query-digest and pt-fingerprint are used to generate and store fingerprints in a whitelist. Pt-query-digest can detect new fingerprints that have not been reviewed. The Noinject! proxy script uses fingerprints to inspect queries in real-time and block any that do not match whitelisted patterns. The document concludes by discussing limitations and ways to improve the fingerprinting approach.
Using Apache Spark and MySQL for Data AnalysisSveta Smirnova
The document discusses using Apache Spark and MySQL for data analysis. It provides examples of loading Wikipedia usage statistics (Wikistats) data into both MySQL and Spark for analysis. Loading the full 10+ TB of Wikistats data into MySQL took over a month, while Spark was able to scan and analyze the entire dataset in under an hour by leveraging its ability to perform distributed, parallel processing across multiple nodes. The document compares key differences between Spark and MySQL for big data processing, such as Spark's lack of indexes but ability to perform full scans in parallel across nodes.
Percona xtra db cluster(pxc) non blocking operations, what you need to know t...Marco Tusa
Performing simple DDL operations as ADD/DROP INDEX in a tightly connected cluster as PXC, can become a nightmare. Metalock will prevent Data modifications for long period of time and to bypass this, we need to become creative, like using Rolling schema upgrade or Percona online-schema-change. With NBO, we will be able to avoid such craziness at least for a simple operation like adding an index. In this brief talk I will illustrate what you should do to see the negative effect of NON using NBO, as well what you should do to use it correctly and what to expect out of it.
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321maclean liu
The document discusses histograms in Oracle's cost-based optimizer (CBO). Histograms help improve cardinality estimates when data is skewed, leading to better query plans. They were introduced in Oracle 8 and are now automatically collected, with the number of buckets and type (frequency or height balanced) depending on the number of distinct values. The document provides background on histograms and how the CBO uses them to estimate selectivity and cardinality.
The document discusses strategic autovacuum configuration and monitoring in PostgreSQL. It begins by explaining the ACID properties and how MVCC and transactions work. It then discusses how to monitor workloads for heavily updated tables, adjust per-table autovacuum thresholds to prioritize those tables, monitor autovacuum behavior over time using logs and queries, and tune the autovacuum throttle settings based on that monitoring to optimize autovacuum performance. The key steps are to start with defaults, monitor workload changes, adjust settings for busy tables, continue monitoring, and refine settings as needed.
This document discusses various types of enqueue waits in Oracle related to locks, including row locks, transaction locks, and table modification locks. It provides examples of how to interpret the lock type and mode from the event and parameter values seen in wait events. It also demonstrates how to use Active Session History, logminer, and other views to identify the blocking session, lock details, and blocking SQL associated with enqueue waits.
Kyle Hailey is an Oracle expert who has worked with Oracle since 1990. He has experience with Oracle support, porting versions of Oracle, benchmarking, and real world performance. He has also worked with startups, Quest Software, Oracle OEM, and Embarcadero. The document discusses row locks in Oracle and how to find blocking sessions and SQL using tools like ASH, v$lock, and Logminer. It provides examples of creating row lock waits and how to investigate them using these tools.
In 40 minutes the audience will learn a variety of ways to make postgresql database suddenly go out of memory on a box with half a terabyte of RAM.
Developer's and DBA's best practices for preventing this will also be discussed, as well as a bit of Postgres and Linux memory management internals.
Big Data in Real-Time: How ClickHouse powers Admiral's visitor relationships ...Altinity Ltd
Slides for the webinar presented on June 16, 2020
By James Hartig, Co-Founders of Admiral and Robert Hodges, Altinity CEO
Advertising is dying in the wake of privacy and adblockers. Join us for a conversation with James Hartig, a Co-Founder at Admiral (getadmiral.com), who helps publishers diversify their revenue and build more meaningful relationships with users. We'll start with an overview of Admiral's platform and how they use large scale session data to power their engagement engine. We'll then discuss the ClickHouse features that Admiral uses to power these real-time decisions. Finally, we'll walk through how Admiral migrated from MongoDB to ClickHouse and some of their plans for future projects. Join us to learn how ClickHouse drives cutting edge real-time applications today!
Speaker Bios:
James Hartig is one of the Co-Founders of Admiral working on distributed systems in Golang. Before this, he worked at the online music streaming platform, Grooveshark.
Robert Hodges is CEO of Altinity, which offers enterprise support for ClickHouse. He has over three decades of experience in data management spanning 20 different DBMS types. ClickHouse is his current favorite. ;)
This document provides technical details about PostgreSQL WAL (Write Ahead Log) buffers. It describes the structure and purpose of WAL segments, WAL records, and their components. It also explains how the WAL is used to safely recover transactions after a server crash by replaying the log.
The document discusses how the PostgreSQL query planner works. It explains that a query goes through several stages including parsing, rewriting, planning/optimizing, and execution. The optimizer or planner has to estimate things like the number of rows and cost to determine the most efficient query plan. Statistics collected by ANALYZE are used for these estimates but can sometimes be inaccurate, especially for n_distinct values. Increasing the default_statistics_target or overriding statistics on columns can help address underestimation issues. The document also discusses different plan types like joins, scans, and aggregates that the planner may choose between.
ClickHouse Features for Advanced Users, by Aleksei MilovidovAltinity Ltd
This document summarizes key features for advanced users of ClickHouse, an open-source column-oriented database management system. It describes sample keys that can be defined in MergeTree tables to generate instant reports on large customer data. It also summarizes intermediate aggregation states, consistency modes, and tools for processing data without a server like clickhouse-local.
This document discusses PostgreSQL statistics and how to use them effectively. It provides an overview of various PostgreSQL statistics sources like views, functions and third-party tools. It then demonstrates how to analyze specific statistics like those for databases, tables, indexes, replication and query activity to identify anomalies, optimize performance and troubleshoot issues.
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
pg_proctab is a collection of PostgreSQL stored functions that provide access to the operating system process table using SQL. We'll show you which functions are available and where they collect the data, and give examples of their use to collect processor and I/O statistics on SQL queries. These stored functions currently only work on Linux-based systems.
The document discusses using pg_proctab, a PostgreSQL extension that provides functions to query operating system process and statistics tables from within PostgreSQL. It demonstrates how to use pg_proctab to monitor CPU and memory usage, I/O, and other process-level metrics for queries. The document also shows how to generate custom reports on database activity and performance by taking snapshots before and after queries and analyzing the differences.
This document contains a presentation on MongoDB replication and replica sets. It discusses:
- The benefits of replication for avoiding downtime, data loss and handling failures.
- The lifecycle of a replica set including creation, initialization, failure and recovery of nodes.
- Different roles nodes can have like primary, secondary or arbiter.
- Configuration options for replica sets including priority, hidden nodes and tags.
- Considerations for developing applications using replica sets including write concerns, read preferences and consistency levels.
New features in Performance Schema 5.7 in actionSveta Smirnova
New features in Performance Schema 5.7 in action provides an overview of Performance Schema improvements in MySQL 5.7 and 8.0 including new tables, instruments, and variables. It demonstrates how to use Performance Schema to diagnose locks, memory usage, stored routines, and prepared statements. Examples show identifying blocking locks, measuring memory usage by thread, and instrumentation of stored procedure execution and prepared statement statistics.
PostgreSQL autovacuum is important for garbage collection and preventing fragmentation. It works table-by-table to remove old tuples and collect statistics. While autovacuum settings are often left as defaults, it's best to configure it aggressively for OLTP workloads so it can work quickly in small portions. Autovacuum must be properly configured for replication as well to avoid conflicts. Tools exist to help remove existing bloat without needing to dump/restore the entire database.
The document discusses optimization of Real Application Clusters (RAC) in Oracle 12c. It provides background on the author and outlines common root causes of RAC performance issues such as CPU/memory starvation, network issues, and excessive dynamic remastering. The document then presents golden rules for RAC diagnostics including avoiding focusing only on top wait events, eliminating infrastructure issues, identifying problem instances, examining both send and receive side metrics, and using histograms. Specific techniques are described for analyzing wait events like gc buffer busy.
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015PostgreSQL-Consulting
This document discusses how PostgreSQL works with disks and provides recommendations for disk subsystem monitoring, hardware selection, and configuration tuning to optimize performance. It explains that PostgreSQL relies on disk I/O for reading pages, writing the write-ahead log (WAL), and checkpointing. It recommends monitoring disk utilization, IOPS, latency, and I/O wait. The document also provides tips for choosing hardware like SSDs or RAID configurations and configuring the operating system, file systems, and PostgreSQL to improve performance.
This document discusses techniques for detecting and preventing SQL injection using the Percona Toolkit and Noinject!. It begins by introducing SQL injection and how attackers can modify SQL queries without changing server code. It then discusses using query fingerprints to detect new queries that may indicate injection attempts. The Percona Toolkit tools pt-query-digest and pt-fingerprint are used to generate and store fingerprints in a whitelist. Pt-query-digest can detect new fingerprints that have not been reviewed. The Noinject! proxy script uses fingerprints to inspect queries in real-time and block any that do not match whitelisted patterns. The document concludes by discussing limitations and ways to improve the fingerprinting approach.
Using Apache Spark and MySQL for Data AnalysisSveta Smirnova
The document discusses using Apache Spark and MySQL for data analysis. It provides examples of loading Wikipedia usage statistics (Wikistats) data into both MySQL and Spark for analysis. Loading the full 10+ TB of Wikistats data into MySQL took over a month, while Spark was able to scan and analyze the entire dataset in under an hour by leveraging its ability to perform distributed, parallel processing across multiple nodes. The document compares key differences between Spark and MySQL for big data processing, such as Spark's lack of indexes but ability to perform full scans in parallel across nodes.
Percona xtra db cluster(pxc) non blocking operations, what you need to know t...Marco Tusa
Performing simple DDL operations as ADD/DROP INDEX in a tightly connected cluster as PXC, can become a nightmare. Metalock will prevent Data modifications for long period of time and to bypass this, we need to become creative, like using Rolling schema upgrade or Percona online-schema-change. With NBO, we will be able to avoid such craziness at least for a simple operation like adding an index. In this brief talk I will illustrate what you should do to see the negative effect of NON using NBO, as well what you should do to use it correctly and what to expect out of it.
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321maclean liu
The document discusses histograms in Oracle's cost-based optimizer (CBO). Histograms help improve cardinality estimates when data is skewed, leading to better query plans. They were introduced in Oracle 8 and are now automatically collected, with the number of buckets and type (frequency or height balanced) depending on the number of distinct values. The document provides background on histograms and how the CBO uses them to estimate selectivity and cardinality.
The document discusses strategic autovacuum configuration and monitoring in PostgreSQL. It begins by explaining the ACID properties and how MVCC and transactions work. It then discusses how to monitor workloads for heavily updated tables, adjust per-table autovacuum thresholds to prioritize those tables, monitor autovacuum behavior over time using logs and queries, and tune the autovacuum throttle settings based on that monitoring to optimize autovacuum performance. The key steps are to start with defaults, monitor workload changes, adjust settings for busy tables, continue monitoring, and refine settings as needed.
This document discusses various types of enqueue waits in Oracle related to locks, including row locks, transaction locks, and table modification locks. It provides examples of how to interpret the lock type and mode from the event and parameter values seen in wait events. It also demonstrates how to use Active Session History, logminer, and other views to identify the blocking session, lock details, and blocking SQL associated with enqueue waits.
Kyle Hailey is an Oracle expert who has worked with Oracle since 1990. He has experience with Oracle support, porting versions of Oracle, benchmarking, and real world performance. He has also worked with startups, Quest Software, Oracle OEM, and Embarcadero. The document discusses row locks in Oracle and how to find blocking sessions and SQL using tools like ASH, v$lock, and Logminer. It provides examples of creating row lock waits and how to investigate them using these tools.
In 40 minutes the audience will learn a variety of ways to make postgresql database suddenly go out of memory on a box with half a terabyte of RAM.
Developer's and DBA's best practices for preventing this will also be discussed, as well as a bit of Postgres and Linux memory management internals.
Big Data in Real-Time: How ClickHouse powers Admiral's visitor relationships ...Altinity Ltd
Slides for the webinar presented on June 16, 2020
By James Hartig, Co-Founders of Admiral and Robert Hodges, Altinity CEO
Advertising is dying in the wake of privacy and adblockers. Join us for a conversation with James Hartig, a Co-Founder at Admiral (getadmiral.com), who helps publishers diversify their revenue and build more meaningful relationships with users. We'll start with an overview of Admiral's platform and how they use large scale session data to power their engagement engine. We'll then discuss the ClickHouse features that Admiral uses to power these real-time decisions. Finally, we'll walk through how Admiral migrated from MongoDB to ClickHouse and some of their plans for future projects. Join us to learn how ClickHouse drives cutting edge real-time applications today!
Speaker Bios:
James Hartig is one of the Co-Founders of Admiral working on distributed systems in Golang. Before this, he worked at the online music streaming platform, Grooveshark.
Robert Hodges is CEO of Altinity, which offers enterprise support for ClickHouse. He has over three decades of experience in data management spanning 20 different DBMS types. ClickHouse is his current favorite. ;)
This document provides technical details about PostgreSQL WAL (Write Ahead Log) buffers. It describes the structure and purpose of WAL segments, WAL records, and their components. It also explains how the WAL is used to safely recover transactions after a server crash by replaying the log.
The document discusses how the PostgreSQL query planner works. It explains that a query goes through several stages including parsing, rewriting, planning/optimizing, and execution. The optimizer or planner has to estimate things like the number of rows and cost to determine the most efficient query plan. Statistics collected by ANALYZE are used for these estimates but can sometimes be inaccurate, especially for n_distinct values. Increasing the default_statistics_target or overriding statistics on columns can help address underestimation issues. The document also discusses different plan types like joins, scans, and aggregates that the planner may choose between.
ClickHouse Features for Advanced Users, by Aleksei MilovidovAltinity Ltd
This document summarizes key features for advanced users of ClickHouse, an open-source column-oriented database management system. It describes sample keys that can be defined in MergeTree tables to generate instant reports on large customer data. It also summarizes intermediate aggregation states, consistency modes, and tools for processing data without a server like clickhouse-local.
This document discusses PostgreSQL statistics and how to use them effectively. It provides an overview of various PostgreSQL statistics sources like views, functions and third-party tools. It then demonstrates how to analyze specific statistics like those for databases, tables, indexes, replication and query activity to identify anomalies, optimize performance and troubleshoot issues.
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
pg_proctab is a collection of PostgreSQL stored functions that provide access to the operating system process table using SQL. We'll show you which functions are available and where they collect the data, and give examples of their use to collect processor and I/O statistics on SQL queries. These stored functions currently only work on Linux-based systems.
The document discusses using pg_proctab, a PostgreSQL extension that provides functions to query operating system process and statistics tables from within PostgreSQL. It demonstrates how to use pg_proctab to monitor CPU and memory usage, I/O, and other process-level metrics for queries. The document also shows how to generate custom reports on database activity and performance by taking snapshots before and after queries and analyzing the differences.
New features in Performance Schema 5.7 in actionSveta Smirnova
The document discusses new features in Performance Schema 5.7, including improved instrumentation for locks, memory usage, stored routines, prepared statements, and variables. It provides examples of using Performance Schema tables like METADATA_LOCKS, TABLE_HANDLES, and prepared_statements_instances to diagnose issues like locks preventing DDL statements from completing and inconsistently timed stored procedure executions. Practices are suggested to identify memory usage and optimize prepared statement performance.
Talk for PerconaLive 2016 by Brendan Gregg. Video: https://www.youtube.com/watch?v=CbmEDXq7es0 . "Systems performance provides a different perspective for analysis and tuning, and can help you find performance wins for your databases, applications, and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes six important areas of Linux systems performance in 50 minutes: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events), static tracing (tracepoints), and dynamic tracing (kprobes, uprobes), and much advice about what is and isn't important to learn. This talk is aimed at everyone: DBAs, developers, operations, etc, and in any environment running Linux, bare-metal or the cloud."
The document provides tips for optimizing performance of MySQL databases by discussing settings for variables in MySQLD to optimize memory usage and query processing, settings for the MyISAM and InnoDB storage engines to improve performance, and methods for examining slow query logs and using EXPLAIN to identify and address inefficient queries.
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
pc_proctab is a collection of PostgreSQL stored functions that allow you to access the operating system process table using SQL. See examples on how to use these stored functions to collect processor and I/O statistics on SQL statements run against the database.
Mark Wong
pg_proctab is a collection of PostgreSQL stored functions that provide access to the operating system process table using SQL. We'll show you which functions are available and where they collect the data, and give examples of their use to collect processor and I/O statistics on SQL queries.
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
pg_proctab is a collection of PostgreSQL stored functions that provide access to the operating system process table using SQL. We'll show you which functions are available and where they collect the data, and give examples of their use to collect processor and I/O statistics on SQL queries.
The slow query log aggregates queries that took longer than a threshold to run and examines more than a minimum number of rows. Tools like mk-query-digest and mysqldumpslow can analyze the slow query log to provide summaries of the longest running queries, number of calls, and other metrics to help identify optimization opportunities. The top query in this example was a SELECT statement joining multiple tables that accounted for over 99% of the total execution time recorded in the log.
Profiling and benchmarking tools like perftools.rb, Benchmark, and ruby-prof can help optimize Ruby code by identifying slow parts. These tools sample programs to measure CPU time, memory usage, and other metrics. Benchmark allows basic timing comparisons, while profilers like perftools.rb and ruby-prof provide call graphs and other visualizations to pinpoint where to optimize. These techniques help focus optimization efforts on the most impactful areas.
Introduces important facts and tools to help you get starting with performance improvement.
Learn to monitor and analyze important metrics, then you can start digging and improving.
Includes useful munin probes, predefined SQL queries to investigate your database's performance, and a top 5 of the most common performance problems in custom Apps.
By Olivier Dony - Lead Developer & Community Manager, OpenERP
The document provides an overview of how to summarize and interpret information from MySQL server status and variable outputs to understand server performance and optimize configuration. It explains that status variables show current server activity levels, while global and session variables display configuration settings. Comparing status outputs over time calculates rates like queries/second. Key metrics help identify bottlenecks like a small key buffer size if the key read cache miss rate is high.
The document describes a toolkit for recovering data from MySQL databases using InnoDB tables. It provides an overview of the tools in the toolkit, including stream_parser to parse raw data files into InnoDB pages, sys_parser to recover table structures from those pages, and c_parser and recover_dictionary tools to extract and load table records and reconstruct the InnoDB data dictionary. It demonstrates dropping a sample table and then using the toolkit to recover the table structure and records from the raw data files.
The document discusses diagnosing and mitigating MySQL performance issues. It describes using various operating system monitoring tools like vmstat, iostat, and top to analyze CPU, memory, disk, and network utilization. It also discusses using MySQL-specific tools like the MySQL command line, mysqladmin, mysqlbinlog, and external tools to diagnose issues like high load, I/O wait, or slow queries by examining metrics like queries, connections, storage engine statistics, and InnoDB logs and data written. The agenda covers identifying system and MySQL-specific bottlenecks by verifying OS metrics and running diagnostics on the database, storage engines, configuration, and queries.
The document provides information about InnoDB tables in MySQL, including their structure on disk and in memory. It demonstrates tools for parsing InnoDB tablespaces and recovering the InnoDB data dictionary from tablespace files to retrieve table definitions and indexes.
Performance Enhancements In Postgre Sql 8.4HighLoad2009
PostgreSQL 8.4 introduced several performance enhancements including optimizations to anti-joins, semi-joins, hash aggregation, and new free space map and visibility map features. It also included application-level improvements such as subqueries in LIMIT/OFFSET clauses, window functions, common table expressions, and parallel restore. Many changes provided performance benefits transparently to applications or DBAs while some required application changes to realize gains.
Dbms plan - A swiss army knife for performance engineersRiyaj Shamsudeen
This document discusses dbms_xplan, a tool for performance engineers to analyze execution plans. It provides options for displaying plans from the plan table, shared SQL area in memory, and AWR history. Dbms_xplan provides more detailed information than traditional tools like tkprof, including predicates, notes, bind values, and plan history. It requires privileges to access dictionary views for displaying plans from memory and AWR. The document also demonstrates usage examples and output formats for dbms_xplan.analyze.
Using Compuware Strobe to Save CPU: 4 Real-life Cases from the Files of CPT G...Compuware
See four real-life examples of how CPT Global--a worldwide IT consulting services firm specializing in capacity planning, performance tuning and testing--uses Compuware Strobe for application performance monitoring and analysis to help customers save CPU by identifying inefficiencies in:
• COBOL code
• VSAM access
• A vendor product
• CICS system settings
Deep dive into PostgreSQL internal statistics / Алексей Лесовский (PostgreSQL...Ontico
СУБД PostgreSQL — это огромный механизм, который состоит из множества подсистем, чья работа определяет производительность PostgreSQL. В процессе эксплуатации обеспечивается сбор статистики и информации о работе компонентов, что позволяет оценить эффективность PostgreSQL и принять меры для повышения производительности. Однако, этой информации очень много и представлена она в достаточно упрощенном виде. Обработка этой информации и ее интерпретация порой совсем нетривиальная задача, а зоопарк инструментов и утилит запросто поставит в тупик даже продвинутого DBA.
В докладе речь пойдет о подсистеме сбора статистики, о том какая информация доступна для оценки эффективности PostgreSQL, как её получить, не прибегая к зоопарку инструментов. Как интерпретировать и использовать полученную информацию, как найти узкие места, устранить их и повысить производительность PostgreSQL.
This document discusses third party patches for MySQL that provide quick wins and new features. It summarizes five such patches: 1) Slow query filtering which helps identify expensive queries, 2) Index statistics which helps determine unused indexes, 3) An InnoDB dictionary limit which constrains memory usage, 4) A global long query time setting, and 5) A "fix" for InnoDB group commit performance regressions in MySQL 5.0. The document encourages using third party patches to gain features and improvements not yet available in the MySQL core.
Vertica’s recommendations for AWS deployments.
What too look for and check when deploying Vertica on AWS.
You can see the video here:
https://youtu.be/sSkWJ_Afhs4
In this lecture I will cover the following:
1. Present the architecture of Vertica
2. Compare row store to column store
3. Explain how Vertica achieve Fast query time
4. Show few use cases .
5. Explains what does Liveperson do with Vertica? Why we chose Vertica?
6. Talk about why we Love Vertica and Why we hate it .
7. Is Vertica SQL DB or NoSQL? Is vertica Consistent or Eventually consistent?
8. How Vertica differ from other SQL and noSQL technologies?
The document provides best practices for loading data into Vertica. It recommends avoiding updates and deletes, loading no more than 50 files per load due to Vertica limitations, using copy direct for large inserts to bypass the write optimized store, and inserting with /*+DIRECT*/ to also bypass the WOS. It also recommends using COPY FROM STDIN to pipe compressed data, loading from compressed files natively in Vertica, and using parallel loads with separate COPY commands to load different files from different nodes or a single multi-node COPY command. The document advises against using the map reducer plugin, fuse, and generic loading infrastructure instead of Vertica's.
The document discusses Vertica, a column-oriented database management system. It explains that Vertica provides 10x to 100x better performance than traditional RDBMS through its columnar storage format, linear scalability, and built-in fault tolerance. The document then provides details on how Vertica works, how to properly use it through configuration of projections and sort orders, and examples of queries and optimizations on a sample dataset.
1. Some Tips about Vertica's Trace
I come from Oracle Performance/Tuning World. Oracle has some great utilities that I use for tracking
down problems and solving them. That's why when I approached Vertica I first asked how could I see
what was going on in the system and what causes the queries to be slower than usual? Well … not for
all the questions I have answers. However, I've studied the trace mechanism of Vertica and I'd like to
share my insights.
There are 3 levels of tracing in Vertica,
1. Select
2. Session
3. System
The trace output populates some of the 3 tables, depends on the level of tracing .
1. session_profiles – aggregate counter for the session and locks stats.
2. Query_profile – info about queries such as query start, projection used and sql text.
3. execution_engine_profiles- detailed info per operator like the clock time of a join operator.
Session & system have three levels of tracing
1. Session - populates ONLY session_profiles table.
2. Query – populates ONLY Query_profile table.
3. ee – populates both Query_profile and execution_engine_profiles.
Let’s start by explaining those levels:
Select - when we put “profile” before our select, it will trace only the select statement .
Then we will be able to analyze the results by querying
○ Query_profile
○ execution_engine_profiles
Example – profile select ‘testing’ from dual;
Session – when using ENABLE_PROFILING command will trace all the session statements.
The output will populate the tables based on the level of tracing ( session, query and ee).
Example - SELECT ENABLE_PROFILING(‘ee’) ;
System – when using the SET_CONFIG_PARAMETER command it will trace all queries from all sessions
the output will populate the tables based on the level of tracing ( session,query and ee).
2. Example - SELECT SET_CONFIG_PARAMETER(‘GlobalEEProfiling’, 1) ;
Some general notes:
● When we trace at a session or a system level
we will get a lot of rows in execution_engine_profiles table.
To clean those trace tables use the clear_profiling command.
Example -select clear_profiling(‘GlobalEEProfiling’, 'local');
local – will clear data from tables only for the current session .
global – will clear all sessions data from all nodes from tables.
● Even when tracing is not enabled, Active Queries will still show up in query_profiles and
execution_engine_profiles tables.
This can be used to monitor long running queries in real time.
● If we want to save the history of queries we can turn on the query repository .
It will save all queries + internal Vertica queries to a persistent table and not in the system
tables shown above.
According to Vertica support this would be deprecated in the future and data collector tables
will replace it.( this is for another article).
● execution_engine_profiles tables is a very detailed table.
It has about 30 distinct operator for each sql (if all were active)
It has 50 counters for each operator
It’s mostly useful for a single statement to see which phase in the plan took the longest time and
used the most resources.
At system level we can see which operator takes most of the system resources.
for example :
3. This query will show the top 10 most time consuming operators on your system.
(Remember when profiling is not enabled this will show only active queries)
select
*
from
(
SELECT
row_number () over (order by counter_valuedesc ) as rnum ,
decode(is_executing,true,'X',null) as run,
decode(is_executing,true,eep.session_id||'/'||eep.transaction_id||'/'||eep.statement_id,null) as "sid/
trx_id/stm_id",
eep.node_name,
ses.client_hostname,
eep.user_name,
operator_name,
path_id ,
counter_name,
counter_value/1000000 as sec
FROM v_monitor.execution_engine_profileseep left outer join v_monitor.sessionsses on ( ses.session_id
= eep.session_id)
WHERE (counter_name='execution time (us)' or counter_name = 'clock time (us)')
)
a
wherernum<= 10
ORDER BY rnum
● The steps that I use to identify a problem are:
○ Get an alert for long running query or identify a problematic query mainly from querying
the query_profile table
○ Drill down to the execution_engine_profiles table to see which step in the plan
consumed the most time and resources.
○ Run explain on the query.
○ Analyze the explain plan and tune the query.
Here is a simple situation describing the steps used to identify and fix a problem.
1. We got an alert for a long running query in the db
Here is a sample output from the query_profile table:
select
is_executing as exe,
4. qpo.transaction_id||' / '||qpo.statement_id as "trx_id/stm_id",
qpo.user_name,
qpo.schema_name||'.'||qpo.table_name as t_name,
substr(qpo.projections_used,1,30) as proj_use,
substr(replace(query,chr(10),' '),1,35) as query,
to_char(query_start,'dd/mm/yy hh24:mi:ss') query_start,
decode(is_executing,true,extract (seconds from (now()-query_start))+(extract (minutes from (now()-query_start))
*60)+(extract (hours from (now()-query_start))*60),query_duration_us/1000000) as Sec,
processed_row_count as rows,
error_code as err
from query_profiles qpo
where qpo.user_name ilike :1
order by is_executing desc ,query_duration_us desc ;
output:
exe | trx_id/stm_id | user_name | t_name | proj_use | query | query_start | Sec | rows | err
-----+-----------------------+-----------+--------+--------------------------------+-------------------------------------+-------------------+------------+------+-----
t | 45035996274377652 / 9 | dbadmin | . | lp_15744040.FACT_VISIT_ROOM_fi | select a11.LP_ACCOUNT_ID A | 16/04/12 09:17:31 | 185.417043 | 0 | 0
t | 45035996274380738 / 1 | dbadmin | . | v_monitor.query_profiles_p | select is_executing as exe, qpo.tra | 16/04/12 09:20:36 | 0.000409 | 0 | 0
2. Let’s drill down to the execution_engine_profiles table to check what consumes the most resources.
select "trx/stm",operator_name,path_id,
"execution time(sec)","clock time (sec)",
"estimated rows produced","rows produced",
"estimated rows produced"-"rows produced" as RowsDiff,
"memory reserved (MB)" ,
"memory allocated (MB)"
from (
select transaction_id||' / '||statement_id as "trx/stm",
operator_name,path_id,
sum(decode(counter_name,'execution time (us)',counter_value,null))/1000000 as "execution time(sec)",
sum(decode(counter_name,'clock time (us)',counter_value,null))/1000000 as "clock time (sec)",
sum(decode(counter_name,'estimated rows produced',counter_value,null)) as "estimated rows produced",
sum(decode(counter_name,'rows produced',counter_value,null)) as "rows produced",
sum(decode(counter_name,'memory reserved (bytes)',counter_value/1024/1024,null)) as "memory reserved
(MB)",
sum(decode(counter_name,'memory allocated (bytes)',counter_value/1024/1024,null)) as "memory allocated
(MB)"
from v_monitor.execution_engine_profiles
where transaction_id = <trx_id>
and statement_id = <stm_id>
and counter_value/1000000 > 0
and counter_name in ('execution time (us)','clock time (us)','estimated rows produced','rows produced','memory
reserved (bytes)','memory allocated (bytes)')
group by transaction_id||' / '||statement_id,operator_name,path_id ) a
order by 4 desc
;
Output:
5. trx/stm | operator_name | path_id | execution time(sec) | clock time (sec) | estimated rows produced | rows produced | RowsDiff | memory reserved (MB) | memory allocated
(MB)
-----------------------+---------------+---------+---------------------+------------------+-------------------------+---------------+-----------+----------------------+-----------------------
45035996274377652 / 2 | GroupByHash | 2| 115 | 57 | 100000000 | | | 1263 | 1123
45035996274377652 / 2 | Join | 3| 18 | 29 | 794821764 | 61278976 | 733542788 | 4| 80
45035996274377652 / 2 | StorageUnion | 2| 2| 6| | 61268471 | | 3| 1
45035996274377652 / 2 | GroupByPipe | 2| | | 400000000 | 61269239 | 338730761 | |
45035996274377652 / 2 | ExprEval | 3| | | 794821764 | 61278976 | 733542788 | |
45035996274377652 / 2 | StorageUnion | -1 | | | | | | 3|
45035996274377652 / 2 | Scan | 5| | | | | | | 1
45035996274377652 / 2 | Scan | 4| | | 794821764 | 61284352 | 733537412 | |
45035996274377652 / 2 | NewEENode | -1 | | | | | | 64 |
45035996274377652 / 2 | ExprEval | 0| | | 100000000 | | | |
45035996274377652 / 2 | GroupByPipe | 1| | | 100000000 | | | |
(11 rows)
sorry for the little fonts.
As we can see GroupByHash is taking the most resources and it’s path_id is 2.
3. Now, let’s run explain against the Bad query
explain select
a11.LP_ACCOUNT_ID AS LP_ACCOUNT_ID,
count(distinct a11.VS_LP_SESSION_ID) AS Visits,
(count(distinct a11.VS_LP_SESSION_ID) * 1.0) AS WJXBFS1
from lp_15744040.FACT_VISIT_ROOM a11
group by
a11.LP_ACCOUNT_ID;
Output:
Access Path:
+-GROUPBY PIPELINED [Cost: 7M, Rows: 10K] (PATH ID: 1)
| Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID)
| Group By: a11.LP_ACCOUNT_ID
| +---> GROUPBY HASH (SORT OUTPUT) [Cost: 7M, Rows: 10K] (PATH ID: 2)
| | Group By: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID
| | +---> STORAGE ACCESS for a11 [Cost: 5M, Rows: 199M] (PATH ID: 3)
| | | Projection: lp_15744040.FACT_VISIT_ROOM_fix
| | | Materialize: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID
4. In the execution plan we can see that path_id 2
is the GROUPBY HASH (SORT OUTPUT) step in the plan.
The problem is that Vertica needs to wait for all rows to flow to this step until it can sort them and
grouped them.
As we can see it takes the most time and memory.
One way to speed up this step is to create a projection that it will presort the rows which will enable the
group by operation to pipeline the results to the next step.
6. For example we can create the following projection:
CREATE PROJECTION lp_15744040.FACT_VISIT_ROOM_fix1
(
LP_ACCOUNT_ID,
VS_LP_SESSION_ID,
VS_LP_VISITOR_ID,
VISIT_FROM_DT_TRUNC,
ACCOUNT_ID,
ROOM_ID,
VISIT_FROM_DT_ACTUAL,
VISIT_TO_DT_ACTUAL,
HOT_LEAD_IND
)
AS
SELECT FACT_VISIT_ROOM.LP_ACCOUNT_ID,
FACT_VISIT_ROOM.VS_LP_SESSION_ID,
FACT_VISIT_ROOM.VS_LP_VISITOR_ID,
FACT_VISIT_ROOM.VISIT_FROM_DT_TRUNC,
FACT_VISIT_ROOM.ACCOUNT_ID,
FACT_VISIT_ROOM.ROOM_ID,
FACT_VISIT_ROOM.VISIT_FROM_DT_ACTUAL,
FACT_VISIT_ROOM.VISIT_TO_DT_ACTUAL,
FACT_VISIT_ROOM.HOT_LEAD_IND
FROM lp_15744040.FACT_VISIT_ROOM
ORDER BY FACT_VISIT_ROOM.LP_ACCOUNT_ID,
FACT_VISIT_ROOM.VS_LP_SESSION_ID,
FACT_VISIT_ROOM.VS_LP_VISITOR_ID,
FACT_VISIT_ROOM.VISIT_FROM_DT_TRUNC,
FACT_VISIT_ROOM.ACCOUNT_ID,
FACT_VISIT_ROOM.ROOM_ID,
FACT_VISIT_ROOM.VISIT_FROM_DT_ACTUAL,
FACT_VISIT_ROOM.VISIT_TO_DT_ACTUAL,
FACT_VISIT_ROOM.HOT_LEAD_IND
SEGMENTED BY hash(FACT_VISIT_ROOM.VS_LP_SESSION_ID) ALL NODES ;
Now we run start_refresh() to make this projection valid and gather stats again.
We ran the explain again
explain select
a11.LP_ACCOUNT_ID AS LP_ACCOUNT_ID,
count(distinct a11.VS_LP_SESSION_ID) AS Visits,
(count(distinct a11.VS_LP_SESSION_ID) * 1.0) AS WJXBFS1
from lp_15744040.FACT_VISIT_ROOM a11
7. group by
a11.LP_ACCOUNT_ID;
Access Path:
+-GROUPBY PIPELINED [Cost: 7M, Rows: 10K] (PATH ID: 1)
| Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID)
| Group By: a11.LP_ACCOUNT_ID
| +---> GROUPBY PIPELINED [Cost: 7M, Rows: 10K] (PATH ID: 2)
| | Group By: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID
| | +---> STORAGE ACCESS for a11 [Cost: 5M, Rows: 199M] (PATH ID: 3)
| | | Projection: lp_15744040.FACT_VISIT_ROOM_fix1
| | | Materialize: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID
Now path_id 2 is not sorting and it’s doing GROUPBY PIPELINED.
This is what we wanted .
Vertica will group bulk of rows and send them to the next step.
Here are the results:
Before:
Time: First fetch (7 rows): 247037.106 ms. All rows formatted: 247037.177 ms
After:
Time: First fetch (7 rows): 34855.253 ms. All rows formatted: 34855.299 ms
85% decrease in elapsed time.