The document discusses the Performance Schema feature in MySQL 5.5, which instruments and collects data about internal operations to help identify performance bottlenecks. It is implemented as a storage engine that collects data about events like query execution steps, locks, I/O, and threads into tables that provide visibility into where the server spends its time. This helps address the lack of good instrumentation previously available in MySQL for performance tuning.
Percona Live '18 Tutorial: The Accidental DBAJenni Snyder
The document is an agenda for a talk titled "The Accidental DBA" about MySQL database administration. The talk covers MySQL basics like installation, configuration, backups and replication. It is meant to introduce attendees to common MySQL administration tasks and help them get more comfortable with database administration even if they came to it by accident. The speaker is Jenni Snyder, an engineering manager at Yelp who became their first DBA in 2011.
This document discusses the evolution of data analysis and how Couchbase database can help make data analysis more exciting again. In the past, data analysis used to be exciting because it took days to write analysis programs and results were only available overnight. Now with Couchbase, queries can be built and results retrieved in seconds for huge datasets using MapReduce queries. Couchbase allows slicing data in many ways without effort through its database clusters and JavaScript interface.
Fears, misconceptions, and accepted anti patterns of a first time cassandra a...Kinetic Data
Cassandra can be successfully used for applications that are not extremely large scale or write-heavy. The document discusses fears, misconceptions, and accepted anti-patterns of first-time Cassandra users. It provides examples from a deployed application called Kinetic Request that uses Cassandra for multi-datacenter replication, durability, and scalability. Common concerns like atomicity, joins, lookups, updates, and queues are addressed, with solutions demonstrated from the real-world application. The key takeaways are that Cassandra has benefits even at moderate scales, the barriers are not as high as perceived, and to gain experience through experimentation and testing.
These are the slides from my talk at Hulu in March 2015 discussing Apache Spark & Cassandra. I cover the evolution of data from a single machine to RDBMS (MySQL is the primary example) to big data systems.
On the Spark side, I covered batch jobs, streaming, Apache Kafka, an introduction to machine learning, clustering, logistic regression and recommendations systems (collaborative filtering).
The talk was recorded and is available on youtube: https://www.youtube.com/watch?v=_gFgU3phogQ
Cassandra nyc 2011 ilya maykov - ooyala - scaling video analytics with apac...ivmaykov
This document discusses scaling video analytics using Apache Cassandra. It provides an overview of Ooyala's video analytics platform and the challenges of scaling to support billions of log pings and terabytes of data daily. Cassandra is used to store over 10 terabytes of historical analytics data covering 4 years of growth. The key challenges addressed are scaling to handle enormous data volumes, providing fast processing and query speeds, supporting deep queries over many dimensions of data, ensuring accuracy, and allowing for rapid developer iteration. The document explains how Cassandra's data model and capabilities help meet these challenges through features like linear scalability, tunable consistency, and a rich data model.
The document provides an introduction to Cassandra presented by Nick Bailey. It discusses key Cassandra concepts like cluster architecture, data modeling using CQL, and best practices. Examples are provided to illustrate how to model time-series data and denormalize schemas to support different queries. Tools for testing Cassandra implementations like CCM and client drivers are also mentioned.
Percona Live '18 Tutorial: The Accidental DBAJenni Snyder
The document is an agenda for a talk titled "The Accidental DBA" about MySQL database administration. The talk covers MySQL basics like installation, configuration, backups and replication. It is meant to introduce attendees to common MySQL administration tasks and help them get more comfortable with database administration even if they came to it by accident. The speaker is Jenni Snyder, an engineering manager at Yelp who became their first DBA in 2011.
This document discusses the evolution of data analysis and how Couchbase database can help make data analysis more exciting again. In the past, data analysis used to be exciting because it took days to write analysis programs and results were only available overnight. Now with Couchbase, queries can be built and results retrieved in seconds for huge datasets using MapReduce queries. Couchbase allows slicing data in many ways without effort through its database clusters and JavaScript interface.
Fears, misconceptions, and accepted anti patterns of a first time cassandra a...Kinetic Data
Cassandra can be successfully used for applications that are not extremely large scale or write-heavy. The document discusses fears, misconceptions, and accepted anti-patterns of first-time Cassandra users. It provides examples from a deployed application called Kinetic Request that uses Cassandra for multi-datacenter replication, durability, and scalability. Common concerns like atomicity, joins, lookups, updates, and queues are addressed, with solutions demonstrated from the real-world application. The key takeaways are that Cassandra has benefits even at moderate scales, the barriers are not as high as perceived, and to gain experience through experimentation and testing.
These are the slides from my talk at Hulu in March 2015 discussing Apache Spark & Cassandra. I cover the evolution of data from a single machine to RDBMS (MySQL is the primary example) to big data systems.
On the Spark side, I covered batch jobs, streaming, Apache Kafka, an introduction to machine learning, clustering, logistic regression and recommendations systems (collaborative filtering).
The talk was recorded and is available on youtube: https://www.youtube.com/watch?v=_gFgU3phogQ
Cassandra nyc 2011 ilya maykov - ooyala - scaling video analytics with apac...ivmaykov
This document discusses scaling video analytics using Apache Cassandra. It provides an overview of Ooyala's video analytics platform and the challenges of scaling to support billions of log pings and terabytes of data daily. Cassandra is used to store over 10 terabytes of historical analytics data covering 4 years of growth. The key challenges addressed are scaling to handle enormous data volumes, providing fast processing and query speeds, supporting deep queries over many dimensions of data, ensuring accuracy, and allowing for rapid developer iteration. The document explains how Cassandra's data model and capabilities help meet these challenges through features like linear scalability, tunable consistency, and a rich data model.
The document provides an introduction to Cassandra presented by Nick Bailey. It discusses key Cassandra concepts like cluster architecture, data modeling using CQL, and best practices. Examples are provided to illustrate how to model time-series data and denormalize schemas to support different queries. Tools for testing Cassandra implementations like CCM and client drivers are also mentioned.
MySQL Performance - SydPHP October 2011Graham Weldon
A talk on optimisations around MySQL on the server side, and through the use of PHP extensions to reduce disk writes to provide for more IO access for MySQL. This was presented at SydPHP in October 2011
MySQL Troubleshooting with the Performance SchemaSveta Smirnova
This document discusses using the Performance Schema in MySQL to troubleshoot performance issues. It provides an overview of the Performance Schema and what information it collects. It then discusses how to use specific Performance Schema tables like events_statements_history_long, events_stages_history_long, and others to identify statements that examine too many rows, issues with index usage, and which internal operations are taking a long time. The document provides examples of queries to run and what to look for in the Performance Schema output to help troubleshoot and optimize SQL statements.
This document provides 10 tips for optimizing MySQL database performance at the operating system level. The tips include using SSDs instead of HDDs for faster I/O, allocating large amounts of memory, avoiding swap space, keeping the MySQL version up to date, using file systems without barriers, configuring RAID cards for write-back caching, and leveraging huge pages. Overall, the tips aim to improve I/O speeds and memory usage to enhance MySQL query processing performance.
Performance Schema for MySQL TroubleshootingSveta Smirnova
The Performance Schema in MySQL provides tables and instruments for troubleshooting issues like locks, I/O bottlenecks, slow queries, memory usage, and replication failures. It contains over 500 instruments in MySQL 5.6 and over 800 in 5.7. The tables provide visibility into the internal workings of MySQL to analyze and optimize performance.
This document summarizes MySQL's monitoring mechanisms and how they have evolved over time. It discusses tools like SHOW statements, INFORMATION_SCHEMA, slow/general query logs, and EXPLAIN that provided limited visibility in past versions. MySQL 5.5 introduced the Performance Schema framework for detailed instrumentation. Subsequent versions have expanded instrumentation to provide more developer-focused statistics on statements, stages, I/O, locks and more. New INFORMATION_SCHEMA tables in 5.6 provide additional InnoDB statistics on data dictionary, buffer pool, transactions and compression. The optimizer trace exposes query transformations. Enhanced EXPLAIN now supports more statement types and future improvements will provide a structured EXPLAIN output.
The slde contains an introduction to the global transaction identifiers(GTIDs) in MySQL Replication. The new protocol at re-connect, skipping transactions with GTIDS, replication filters, purging logs, backup/restore ets are covered here.
Priyanka, a MySQL cluster developer, presented MySQL cluster in the MySQL User camp. The slide deck contains an introduction to the cluster module- the architecture,
auto-sharding, failover etc in the cluster module.
This document discusses the Performance Schema in MySQL, which records instrumentation data to help profile and monitor database activity. It provides an overview of the Performance Schema's components and tables, how it has evolved between MySQL versions to include more metrics and functionality, and examples of how to query the tables to analyze wait events, statements, stages and other performance data.
MySQL 5.7 New Features for Developers session for DOAG (Oracle user group conference) in 2016. A similar version was also presented in Israel MySQL User Group on November 2016.
This presentation review new features in MySQL 5.7: Optimizer, InnoDB engine, JSON native data type, performance and sys schemas
Performance Schema for MySQL TroubleshootingSveta Smirnova
Percona Live (https://www.percona.com/live/data-performance-conference-2016/sessions/performance-schema-mysql-troubleshooting)
The performance schema in MySQL version 5.6, released in February, 2013, is a very powerful tool that can help DBAs discover why even the trickiest performance issues occur. Version 5.7 introduces even more instruments and tables. And while all these give you great power, you can get stuck choosing which instrument to use.
In this session, I will start with a description of a typical problem, then guide you how to use the performance schema to find out what causes the issue, the reason for unwanted behavior and how the received information can help you solve a particular problem.
Traditionally, performance schema sessions teach what is in contained in tables. I will, in contrast, start from a performance issue, then demonstrate which instruments and tables can help solve it. We will discuss how to setup the performance schema so that it has minimal impact on your server.
This document provides an overview of MySQL query optimization. It discusses MySQL features like storage engines, InnoDB, and indexing. It explains that query optimization is important for performance as data grows. Techniques like explaining query plans, indexing, and rewriting queries to make better use of indexes can improve query performance by 10-100 times. The document includes examples of indexing, query rewriting, and using EXPLAIN plans.
MySQL 5.7 proposes several changes to improve performance and consistency including:
1. Making replication durable by default by setting sync_binlog and repository options.
2. Deprecating features like INNODB monitor tables and ALTER IGNORE TABLE in favor of newer standards.
3. Simplifying and restricting SQL modes to encourage stricter querying and remove ambiguous options. Explanations for errors and modes will also be improved.
MySQL 5.6 - Operations and Diagnostics ImprovementsMorgan Tocker
This document discusses MySQL 5.6 and its improvements to operational and diagnostic capabilities. Key enhancements include online DDL operations that do not block reads or writes, buffer pool dump and restore for faster startup, import/export of partitioned tables, and transportable tablespaces. Diagnostic tools were improved with EXPLAIN showing more details, the ability to EXPLAIN updates and deletes, optimizer tracing, and the performance schema providing detailed query level instrumentation and monitoring by default.
This document discusses indexing in MySQL databases to improve query performance. It begins by defining an index as a data structure that speeds up data retrieval from databases. It then covers various types of indexes like primary keys, unique indexes, and different indexing algorithms like B-Tree, hash, and full text. The document discusses when to create indexes, such as on columns frequently used in queries like WHERE clauses. It also covers multi-column indexes, partial indexes, and indexes to support sorting, joining tables, and avoiding full table scans. The concepts of cardinality and selectivity are introduced. The document concludes with a discussion of index overhead and using EXPLAIN to view query execution plans and index usage.
This document provides an overview of MySQL for Linux system administrators. It discusses MySQL architecture including storage engines, memory usage, the MySQL server process, and InnoDB transaction processing. It also covers topics like backups and replication, and the agenda includes performance and capacity planning. The goal is to help system administrators understand and manage MySQL databases.
Instrumenting plugins for Performance SchemaMark Leith
This document discusses how to instrument plugins for the MySQL Performance Schema to provide visibility into plugin operations and avoid "black holes" in performance data. It covers the main interfaces for instrumenting threads, file/memory/network operations. An example audit plugin is provided that instruments mutexes, files, stages. The Performance Schema output shows the staged, waited events for a query.
MySQL 5.7: Performance Schema ImprovementsMark Leith
This document discusses improvements to the Performance Schema instrumentation in MySQL 5.7. It provides an overview of what Performance Schema is, how it has evolved from versions 5.5 to 5.6, and key improvements in 5.7, including better memory instrumentation, metadata locking instrumentation, and replication monitoring capabilities.
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...✔ Eric David Benari, PMP
The document discusses scalability issues with databases and proposes solutions. It introduces the concept of parallel databases which improve performance through linear scaling of reads, writes, joins and other operations. ParElastic is introduced as a parallel database architecture built on MySQL that addresses scalability through database virtualization and horizontal scaling in a way that is elastic and transparent to applications.
Python Utilities for Managing MySQL DatabasesMats Kindahl
Managing a MySQL database server can become a full time job. What we need are tools that bundle a set of related tasks into a common utility. While there are several such utility libraries to choose, it is often the case that you need to customize them to your needs. The MySQL Utilities library is the answer to that need. It is open source so you can modify and expand it as you see fit.
This is the presentation from OSCON 2011 in Portland.
MySQL Performance - SydPHP October 2011Graham Weldon
A talk on optimisations around MySQL on the server side, and through the use of PHP extensions to reduce disk writes to provide for more IO access for MySQL. This was presented at SydPHP in October 2011
MySQL Troubleshooting with the Performance SchemaSveta Smirnova
This document discusses using the Performance Schema in MySQL to troubleshoot performance issues. It provides an overview of the Performance Schema and what information it collects. It then discusses how to use specific Performance Schema tables like events_statements_history_long, events_stages_history_long, and others to identify statements that examine too many rows, issues with index usage, and which internal operations are taking a long time. The document provides examples of queries to run and what to look for in the Performance Schema output to help troubleshoot and optimize SQL statements.
This document provides 10 tips for optimizing MySQL database performance at the operating system level. The tips include using SSDs instead of HDDs for faster I/O, allocating large amounts of memory, avoiding swap space, keeping the MySQL version up to date, using file systems without barriers, configuring RAID cards for write-back caching, and leveraging huge pages. Overall, the tips aim to improve I/O speeds and memory usage to enhance MySQL query processing performance.
Performance Schema for MySQL TroubleshootingSveta Smirnova
The Performance Schema in MySQL provides tables and instruments for troubleshooting issues like locks, I/O bottlenecks, slow queries, memory usage, and replication failures. It contains over 500 instruments in MySQL 5.6 and over 800 in 5.7. The tables provide visibility into the internal workings of MySQL to analyze and optimize performance.
This document summarizes MySQL's monitoring mechanisms and how they have evolved over time. It discusses tools like SHOW statements, INFORMATION_SCHEMA, slow/general query logs, and EXPLAIN that provided limited visibility in past versions. MySQL 5.5 introduced the Performance Schema framework for detailed instrumentation. Subsequent versions have expanded instrumentation to provide more developer-focused statistics on statements, stages, I/O, locks and more. New INFORMATION_SCHEMA tables in 5.6 provide additional InnoDB statistics on data dictionary, buffer pool, transactions and compression. The optimizer trace exposes query transformations. Enhanced EXPLAIN now supports more statement types and future improvements will provide a structured EXPLAIN output.
The slde contains an introduction to the global transaction identifiers(GTIDs) in MySQL Replication. The new protocol at re-connect, skipping transactions with GTIDS, replication filters, purging logs, backup/restore ets are covered here.
Priyanka, a MySQL cluster developer, presented MySQL cluster in the MySQL User camp. The slide deck contains an introduction to the cluster module- the architecture,
auto-sharding, failover etc in the cluster module.
This document discusses the Performance Schema in MySQL, which records instrumentation data to help profile and monitor database activity. It provides an overview of the Performance Schema's components and tables, how it has evolved between MySQL versions to include more metrics and functionality, and examples of how to query the tables to analyze wait events, statements, stages and other performance data.
MySQL 5.7 New Features for Developers session for DOAG (Oracle user group conference) in 2016. A similar version was also presented in Israel MySQL User Group on November 2016.
This presentation review new features in MySQL 5.7: Optimizer, InnoDB engine, JSON native data type, performance and sys schemas
Performance Schema for MySQL TroubleshootingSveta Smirnova
Percona Live (https://www.percona.com/live/data-performance-conference-2016/sessions/performance-schema-mysql-troubleshooting)
The performance schema in MySQL version 5.6, released in February, 2013, is a very powerful tool that can help DBAs discover why even the trickiest performance issues occur. Version 5.7 introduces even more instruments and tables. And while all these give you great power, you can get stuck choosing which instrument to use.
In this session, I will start with a description of a typical problem, then guide you how to use the performance schema to find out what causes the issue, the reason for unwanted behavior and how the received information can help you solve a particular problem.
Traditionally, performance schema sessions teach what is in contained in tables. I will, in contrast, start from a performance issue, then demonstrate which instruments and tables can help solve it. We will discuss how to setup the performance schema so that it has minimal impact on your server.
This document provides an overview of MySQL query optimization. It discusses MySQL features like storage engines, InnoDB, and indexing. It explains that query optimization is important for performance as data grows. Techniques like explaining query plans, indexing, and rewriting queries to make better use of indexes can improve query performance by 10-100 times. The document includes examples of indexing, query rewriting, and using EXPLAIN plans.
MySQL 5.7 proposes several changes to improve performance and consistency including:
1. Making replication durable by default by setting sync_binlog and repository options.
2. Deprecating features like INNODB monitor tables and ALTER IGNORE TABLE in favor of newer standards.
3. Simplifying and restricting SQL modes to encourage stricter querying and remove ambiguous options. Explanations for errors and modes will also be improved.
MySQL 5.6 - Operations and Diagnostics ImprovementsMorgan Tocker
This document discusses MySQL 5.6 and its improvements to operational and diagnostic capabilities. Key enhancements include online DDL operations that do not block reads or writes, buffer pool dump and restore for faster startup, import/export of partitioned tables, and transportable tablespaces. Diagnostic tools were improved with EXPLAIN showing more details, the ability to EXPLAIN updates and deletes, optimizer tracing, and the performance schema providing detailed query level instrumentation and monitoring by default.
This document discusses indexing in MySQL databases to improve query performance. It begins by defining an index as a data structure that speeds up data retrieval from databases. It then covers various types of indexes like primary keys, unique indexes, and different indexing algorithms like B-Tree, hash, and full text. The document discusses when to create indexes, such as on columns frequently used in queries like WHERE clauses. It also covers multi-column indexes, partial indexes, and indexes to support sorting, joining tables, and avoiding full table scans. The concepts of cardinality and selectivity are introduced. The document concludes with a discussion of index overhead and using EXPLAIN to view query execution plans and index usage.
This document provides an overview of MySQL for Linux system administrators. It discusses MySQL architecture including storage engines, memory usage, the MySQL server process, and InnoDB transaction processing. It also covers topics like backups and replication, and the agenda includes performance and capacity planning. The goal is to help system administrators understand and manage MySQL databases.
Instrumenting plugins for Performance SchemaMark Leith
This document discusses how to instrument plugins for the MySQL Performance Schema to provide visibility into plugin operations and avoid "black holes" in performance data. It covers the main interfaces for instrumenting threads, file/memory/network operations. An example audit plugin is provided that instruments mutexes, files, stages. The Performance Schema output shows the staged, waited events for a query.
MySQL 5.7: Performance Schema ImprovementsMark Leith
This document discusses improvements to the Performance Schema instrumentation in MySQL 5.7. It provides an overview of what Performance Schema is, how it has evolved from versions 5.5 to 5.6, and key improvements in 5.7, including better memory instrumentation, metadata locking instrumentation, and replication monitoring capabilities.
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...✔ Eric David Benari, PMP
The document discusses scalability issues with databases and proposes solutions. It introduces the concept of parallel databases which improve performance through linear scaling of reads, writes, joins and other operations. ParElastic is introduced as a parallel database architecture built on MySQL that addresses scalability through database virtualization and horizontal scaling in a way that is elastic and transparent to applications.
Python Utilities for Managing MySQL DatabasesMats Kindahl
Managing a MySQL database server can become a full time job. What we need are tools that bundle a set of related tasks into a common utility. While there are several such utility libraries to choose, it is often the case that you need to customize them to your needs. The MySQL Utilities library is the answer to that need. It is open source so you can modify and expand it as you see fit.
This is the presentation from OSCON 2011 in Portland.
In this talk we will take a look at the core concepts of Riak and why you might want to use it for your application, we will then take a look at some customer use cases and how Riak helped them scale with ease.
Joel Jacobson is a Technical Evangelist at Basho Technologies where he helps build the Riak community across Europe. Prior to joining Basho, Joel worked closely with Neo Technologies as part of his role at the consultancy OpenCredo.
Geek Sync I Learn to Troubleshoot Query Performance in Analysis ServicesIDERA Software
You can watch the replay for this Geek Sync webcast in the IDERA Resource Center: http://ow.ly/DJWn50A5odn
In this webinar Stan Geiger walks you through how to find and troubleshoot query performance in Analysis Services.
We often use Analysis Services because we "should" get better query performance than if we were querying data from our relational data sources. Analysis Services is very fast whether using cubes or tabular with simple queries requiring no tuning.
However, complex queries or aggregation queries will often require tuning to make them perform efficiently. This presentation will walk you through how Analysis Services processes queries and how to determine where performance improvements can be made. This will include determining where the bottleneck is and possible ways to resolve the issues.
Survey of Percona Toolkit - Command-line Tools for MySQLpercona2013
The presentation provides you with a brief ovreview of all the essential Command-line tools used for MySQL.
Percona provides an in-depth review of your database and recommends appropriate changes by performing a complete MySQL health check in which we identify inefficiencies, find problems before they occur, and ensure that your MySQL database is in the best condition.
What can we learn from NoSQL technologies?Ivan Zoratti
This document summarizes Ivan Zoratti's presentation on NoSQL technologies. It discusses some of the perceived reasons for adopting NoSQL such as flexibility over schemas. It also summarizes key differences between NoSQL and SQL databases, such as schema-less designs and horizontal scaling in NoSQL. Additionally, it covers CAP theorem, examples of NoSQL databases, and when MySQL and NoSQL may each be better fits for different data and application needs.
A Backup Today Saves Tomorrow is a presentation from Percona Live 2013 that provides insight into planning and the tools used today to capture MySQL backups.
Building A Scalable Open Source Storage SolutionPhil Cryer
The Biodiversity Heritage Library (BHL), like many other projects within biodiversity informatics, maintains terabytes of data that must be safeguarded against loss. Further, a scalable and resilient infrastructure is required to enable continuous data interoperability, as BHL provides unique services to its community of users. This volume of data and associated availability requirements present significant challenges to a distributed organization like BHL, not only in funding capital equipment purchases, but also in ongoing system administration and maintenance. A new standardized system is required to bring new opportunities to collaborate on distributed services and processing across what will be geographically dispersed nodes. Such services and processing include taxon name finding, indexes or GUID/LSID services, distributed text mining, names reconciliation and other computationally intensive tasks, or tasks with high availability requirements.
The document discusses MongoDB use cases, roadmap, and future plans. It outlines how MongoDB has been used for applications like location-based services and e-commerce. The roadmap details features in previous and upcoming releases, including improvements to concurrency, the new aggregation framework, and read preferences. Future plans include integrating Kerberos/LDAP, additional geospatial functions, and implementing full text search.
Overview of data analytics service: Treasure Data ServiceSATOSHI TAGOMORI
Treasure Data provides a data analytics service with the following key components:
- Data is collected from various sources using Fluentd and loaded into PlazmaDB.
- PlazmaDB is the distributed time-series database that stores metadata and data.
- Jobs like queries, imports, and optimizations are executed on Hadoop and Presto clusters using queues, workers, and a scheduler.
- The console and APIs allow users to access the service and submit jobs for processing and analyzing their data.
Narayan Newton presented on recent developments in MySQL. He discussed how MySQL has fragmented into several variants including MariaDB, PerconaDB, and Drizzle. He provided details on improvements in Oracle MySQL 5.5 and 5.6, Percona Server, and MariaDB including new features like virtual and dynamic columns. Newton also covered optimization improvements and clustering options like Percona Cluster, MySQL Cluster, and Drizzle.
This document discusses potential ways to hack cloud computing infrastructures. It begins by introducing cloud computing and focusing on Infrastructure as a Service (IaaS). It then explores attacking storage technologies like snapshots and data deduplication. The document proposes methods for exploiting these technologies, such as modifying virtual disk descriptor files to access other virtual machines' data or identifying data deduplication penalties to potentially access shared files. It notes challenges and limitations but argues that both providers and customers must work to secure these systems.
Big data refers to large, complex datasets that are difficult to process using traditional methods. This document discusses three examples of real-world big data challenges and their solutions. The challenges included storage, analysis, and processing capabilities given hardware and time constraints. Solutions involved switching databases, using Hadoop/MapReduce, and representing complex data structures to enable analysis of terabytes of ad serving data. Flexibility and understanding domain needs were key to feasible versus theoretical solutions.
The document discusses various techniques for improving Rails application performance, including reducing database queries through includes and adding indexes, caching with memoization, fragment caching, and action caching, optimizing object allocations and garbage collection, using background jobs, avoiding disk access, and helpful profiling tools like rack-bug and rack-perftools. Database optimizations focus on avoiding N+1 queries, adding proper indexes, and other tweaks, while caching, minimizing allocations, tuning GC, and reducing disk I/O can improve application performance. A variety of tools are also recommended for profiling applications and identifying bottlenecks.
Building Antifragile Applications with Apache CassandraPatrick McFadin
Even with the best infrastructure, failures will occur without warning and are almost guaranteed. Building applications that can resist this fact of life can be both art and science. In this talk, I'll try to eliminate the art portion and focus more on the science. Starting at high level architecture decisions, I will take you through each layer and finally down to actual application code. Using Cassandra as the back end database, we can build layers of fault tolerance that will leave end users completely unaware of the underlying chaos that could be occurring. With a little planning, we can say goodbye to the Fail Whale and the fragility of the traditional RDBMS. Topics will include:
- Application strategies to utilize active-active, diverse, datacenters
- Replicating data with the highest integrity and maximum resilience
- Utilizing Cassandra's built-in fault tolerance
- Architecture of private, cloud or hybrid based applications
- Application driver techniques when using Cassandra
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...NETWAYS
With the advent of IoT, companies have the opportunity to put larger and larger volumes of machine data to work to optimize operations like manufacturing production, safety, security, user experience. Yet, they are finding that the old paradigms of processing this data do not help mainstream developers keep pace with the velocity of data, new analytic algorithms, and the need for real-time insight. Jodok Batlogg, founder and CTO of Crate.io, believes that the solution to this problem lies at the nexus of modern open source distributed database architectures, machine learning/AI, and IoT networking. These technologies will combine to create a new data management paradigm that moves beyond traditional conceptions of databases. He believes the future lies in a central nervous system, an “operational brain” that connects directly to sensory inputs and applies artificial intelligence to control, predict, and monitor systems and things in real time. In this session, Jodok will use-real world, in-production manufacturing and cybersecurity examples of “operational brains” at work to explain the new paradigm, and discuss the concrete steps organizations can take to implement them.
The document discusses analyzing database systems using a 3D method for performance analysis. It introduces the 3D method, which looks at performance from the perspectives of the operating system (OS), Oracle database, and applications. The 3D method provides a holistic view of the system that can help identify issues and direct solutions. It also covers topics like time-based analysis in Oracle, how wait events are classified, and having a diagnostic framework for quick troubleshooting using tools like the Automatic Workload Repository report.
Daniel Austin of PayPal presented on using MySQL Cluster to build a globally distributed database called YESQL. He discussed common myths about big data and NoSQL databases, including that big data always requires NoSQL and that the CAP theorem is simpler than it really is. Austin explained how MySQL Cluster was used to build YESQL to meet requirements like high availability, scalability and global replication within 1000ms. He reviewed the architecture involving tiling across AWS availability zones and lessons learned.
Optimizing WordPress Performance on Shared Web HostingJon Brown
This document discusses performance tweaks that can be made for WordPress sites on shared hosting. It is divided into three acts: inside WordPress, on the shared server, and off the shared server. Inside WordPress, it recommends right-sizing images, checking for 404 errors, keeping the database under control, and using caching plugins. On the shared server, it suggests updating to newer PHP versions, cleaning up the database, and using CloudFlare. Off the server, it only recommends using CDNs like CloudFlare for their free benefits.
Similar to Performance Schema in MySQL (Danil Zburivsky) (20)
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...Ontico
HighLoad++ 2017
Зал «Калининград», 8 ноября, 15:00
Тезисы:
http://www.highload.ru/2017/abstracts/2964.html
Одноклассники состоят из более чем восьми тысяч железных серверов, расположенных в нескольких дата-центрах. Каждая из этих машин была специализированной под конкретную задачу - как для обеспечения изоляции отказов, так и для обеспечения автоматизированного управления инфраструктурой.
...
Масштабируя DNS / Артем Гавриченков (Qrator Labs)Ontico
HighLoad++ 2017
Зал «Калининград», 8 ноября, 16:00
Тезисы:
http://www.highload.ru/2017/abstracts/3032.html
Протокол DNS на семь лет старше, чем Всемирная паутина. Стандарты RFC 882 и 883, определяющие основную функциональность системы доменных имён, появились в конце 1983 года, а первая реализация последовала уже годом позже. Естественно, что у технологии столь старой и при этом по сей день активнейшим образом используемой просто не могли не накопиться особенности, неочевидные обыкновенным пользователям.
...
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)Ontico
HighLoad++ 2017
Зал «Калининград», 8 ноября, 13:00
Тезисы:
http://www.highload.ru/2017/abstracts/3010.html
В этом докладе я расскажу, как BigData-платформа помогает трансформировать Почту России, как мы управляем построением и развитием платформы. Расскажу про найденные удачные решения, например, как разбиение на продукты с понятными SLA и интерфейсами между ними помогло нам сохранять управляемость с ростом масштабов проекта.
...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...Ontico
HighLoad++ 2017
Зал «Кейптаун», 8 ноября, 10:00
Тезисы:
http://www.highload.ru/2017/abstracts/2914.html
Казалось бы, что нужно для организации тестового окружения? Тестовая железка и копия боевого окружения - и тестовый сервер готов. Но как быть, когда проект сложный? А когда большой? А если нужно тестировать одновременно много версий? А если все это вместе?
Организация тестирования большого развивающегося проекта, где одновременно в разработке и тестировании около полусотни фич - достаточно непростая задача. Ситуация обычно осложняется тем, что иногда есть желание потрогать еще не полностью готовый функционал. В таких ситуациях часто возникает вопрос: "А куда это можно накатить и где покликать?"
...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...Ontico
HighLoad++ 2017
Зал «Кейптаун», 8 ноября, 18:00
Тезисы:
http://www.highload.ru/2017/abstracts/2854.html
Из этого доклада вы узнаете о возможностях репликации и автофейловера PostgreSQL, в том числе о возможностях, ставших доступных в PostgreSQL 10.
Среди прочих, будет затронуты следующие темы:
* Виды репликации и решаемые с ее помощью проблемы.
* Настройка потоковой репликации.
* Настройка логической репликации.
* Настройка автофейловера / HA средствами Stolon и Consul.
После прослушивания доклада вы сможете самостоятельно настраивать репликацию и автофейловер PostgreSQL.
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)Ontico
HighLoad++ 2017
Зал «Кейптаун», 8 ноября, 17:00
Тезисы:
http://www.highload.ru/2017/abstracts/3096.html
PostgreSQL is the world’s most advanced open source database. Indeed! With around 270 configuration parameters in postgresql.conf, plus all the knobs in pg_hba.conf, it is definitely ADVANCED!
How many parameters do you tune? 1? 8? 32? Anyone ever tuned more than 64?
No tuning means below par performance. But how to start? Which parameters to tune? What are the appropriate values? Is there a tool --not just an editor like vim or emacs-- to help users manage the 700-line postgresql.conf file?
Join this talk to understand the performance advantages of appropriately tuning your postgresql.conf file, showcase a new free tool to make PostgreSQL configuration possible for HUMANS, and learn the best practices for tuning several relevant postgresql.conf parameters.
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Ontico
HighLoad++ 2017
Зал «Кейптаун», 8 ноября, 16:00
Тезисы:
http://www.highload.ru/2017/abstracts/3115.html
During this session we will cover the last development in ProxySQL to support regular expressions (RE2 and PCRE) and how we can use this strong technique in correlation with ProxySQL's query rules to anonymize live data quickly and transparently. We will explain the mechanism and how to generate these rules quickly. We show live demo with all challenges we got from the Community and we finish the session by an interactive brainstorm testing queries from the audience.
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...Ontico
HighLoad++ 2017
Зал «Кейптаун», 8 ноября, 15:00
Тезисы:
http://www.highload.ru/2017/abstracts/2957.html
Расскажем о нашем опыте разработки модуля межсетевого экрана для MySQL с использованием генератора парсеров ANTLR и языка Kotlin.
Подробно рассмотрим следующие вопросы:
— когда и почему целесообразно использовать ANTLR;
— особенности разработки ANTLR-грамматики для MySQL;
— сравнение производительности рантаймов для ANTLR в рамках задачи синтаксического анализа MySQL (C#, Java, Kotlin, Go, Python, PyPy, C++);
— вспомогательные DSL;
— микросервисная архитектура модуля экранирования SQL;
— полученные результаты.
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)Ontico
HighLoad++ 2017
Зал «Кейптаун», 8 ноября, 14:00
Тезисы:
http://www.highload.ru/2017/abstracts/3114.html
ProxySQL aims to be the most powerful proxy in the MySQL ecosystem. It is protocol-aware and able to provide high availability (HA) and high performance with no changes in the application, using several built-in features and integration with clustering software. During this session we will quickly introduce its main features, so to better understand how it works. We will then describe multiple use case scenarios in which ProxySQL empowers large MySQL installations to provide HA with zero downtime, read/write split, query rewrite, sharding, query caching, and multiplexing using SSL across data centers.
MySQL Replication — Advanced Features / Петр Зайцев (Percona)Ontico
HighLoad++ 2017
Зал «Кейптаун», 8 ноября, 13:00
Тезисы:
http://www.highload.ru/2017/abstracts/2954.html
MySQL Replication is powerful and has added a lot of advanced features through the years. In this presentation we will look into replication technology in MySQL 5.7 and variants focusing on advanced features, what do they mean, when to use them and when not, Including.
When should you use STATEMENT, ROW or MIXED binary log format?
What is GTID in MySQL and MariaDB and why do you want to use them?
What is semi-sync replication and how is it different from lossless semi-sync?
...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...Ontico
HighLoad++ 2017
Зал «Кейптаун», 8 ноября, 12:00
Тезисы:
http://www.highload.ru/2017/abstracts/3120.html
Количество разработчиков мобильных приложений Сбербанк Онлайн с начала 2016 года выросло на порядок. Для того чтобы продолжать выпускать качественный продукт, мы кардинально перестраиваем процесс разработки.
Количество внутренних заказчиков тех или иных доработок в какой-то момент выросло настолько, что разработчики стали узким местом. Мы внедрили культуру разработки, которую можно условно назвать "внутренним open-source", сохранив за собой контроль над архитектурой и качеством проекта, но позволив разрабатывать новые фичи всем желающим.
...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...Ontico
HighLoad++ 2017
Зал «Мумбай», 8 ноября, 18:00
Тезисы:
http://www.highload.ru/2017/abstracts/2836.html
При использовании Eventually Consistent распределенных баз данных нет гарантий, что чтение возвращает результаты последних изменений данных, если чтение и запись производятся на разных узлах. Это ограничивает пропускную способность системы. Поддержка свойства Causal Consistency снимает это ограничение, что позволяет улучшить масштабируемость, не требуя изменений в коде приложения.
...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...Ontico
HighLoad++ 2017
Зал «Мумбай», 8 ноября, 16:00
Тезисы:
http://www.highload.ru/2017/abstracts/2858.html
Аудитория Одноклассников превышает 73 миллиона человек в России, СНГ и странах дальнего зарубежья. При этом ОК.ru - первая социальная сеть по просмотрам видео в рунете и крупнейшая сервисная платформа.
Качественный и количественный рост DDoS-атак за последние годы превращает их в одну из первоочередных проблем для крупнейших интернет-ресурсов. В зависимости от вектора атаки “узким” местом становится та или иная часть инфраструктуры. В частности, при SYN-flood первый удар приходится на систему балансировки трафика. От ее производительности зависит успех в противостоянии атаке.
...
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)Ontico
HighLoad++ 2017
Зал «Мумбай», 8 ноября, 15:00
Тезисы:
http://www.highload.ru/2017/abstracts/3008.html
Никогда не было и вот снова случилось! Компания Google в результате перенаправления трафика сделала недостпуными в Японии несколько тысяч различных сервисов, большинство из которых никак не связано с самой компанией Google. Однако, подобные инциденты происходят с завидной регулярностью, вот только не всегда попадают в большие СМИ. У таких инцидентов могут быть разные причины, начиная от ошибок сетевых инженеров и заканчивая государственным регулированием.
...
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)Ontico
HighLoad++ 2017
Зал «Мумбай», 8 ноября, 14:00
Тезисы:
http://www.highload.ru/2017/abstracts/2925.html
Облака и виртуализация – современные тренды развития IT-технологий. Операторы связи строят свои TelcoClouds на стандартах NFV (Network Functions Virtualization) и SDN (Software-Defined Networking). В докладе начнем с основ виртуализации, далее разберемся, для чего используются NFV и SDN, потом полетим к облакам и вернемся на землю для решения практических задач!
...
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)Ontico
HighLoad++ 2017
Зал «Мумбай», 8 ноября, 10:00
Тезисы:
http://www.highload.ru/2017/abstracts/3045.html
Как мы заставили Druid работать в Одноклассниках.
«Druid is a high-performance, column-oriented, distributed data store» http://druid.io.
Мы расскажем о том, как, внедрив Druid, мы справились с ситуацией, когда MSSQL-based система статистики на 50 терабайт стала:
- медленной: средняя скорость ответа была в разы меньше требуемой (и увеличилась в 20 раз);
- нестабильной: в час пик статистика отставала до получаса (теперь ничего не отстает);
- дорогой: изменилась политика лицензирования Microsoft, расходы на лицензии могли составить миллионы долларов.
...
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)Ontico
HighLoad++ 2017
Зал «Рио-де-Жанейро», 8 ноября, 18:00
Тезисы:
http://www.highload.ru/2017/abstracts/2905.html
Прошло более года с того момента, как Microsoft выпустила первую версию своего нового фреймворка для разработки web-приложений ASP.NET Core, и с каждым днем он находит все больше поклонников. ASP.NET Core базируется на платформе .NET Core, кроссплатформенной версии платформы .NET c открытым исходным кодом. Теперь у С#-разработчиков появилась возможность использовать Mac в качестве среды разработки, и запускать приложения на Linux или внутри Docker-контейнеров.
...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...Ontico
HighLoad++ 2017
Зал «Рио-де-Жанейро», 8 ноября, 14:00
Тезисы:
http://www.highload.ru/2017/abstracts/2913.html
Изначально будут раскрыты базовые причины, которые заставили появиться такой части механизма СУБД, как кэш результатов, и почему в ряде СУБД он есть или отсутствует.
Будут рассмотрены различные варианты кэширования результатов как sql-запросов, так и результатов хранимой в БД бизнес-логики. Произведено сравнение способов кэширования (программируемые вручную кэши, стандартный функционал) и даны рекомендации, когда и в каких случаях данные способы оптимальны, а порой опасны.
...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...Ontico
HighLoad++ 2017
Зал «Рио-де-Жанейро», 8 ноября, 13:00
Тезисы:
http://www.highload.ru/2017/abstracts/2947.html
Apache Ignite — Open Source платформа для высокопроизводительной распределенной работы с большими данными с применением SQL или Java/.NET/C++ API. Ignite используют в самых разных отраслях. Сбербанк, ING, RingCentral, Microsoft, e-Therapeutics — все эти компании применяют решения на основе Ignite. Размеры кластеров разнятся от всего одного узла до нескольких сотен, узлы могут быть расположены в одном ЦОД-е или в нескольких геораспределенных.
...
HighLoad++ 2017
Зал «Рио-де-Жанейро», 8 ноября, 12:00
Тезисы:
http://www.highload.ru/2017/abstracts/3005.html
Когда мы говорим о нагруженных системах и базах данных с большим числом параллельных коннектов, особый интерес представляет практика эксплуатации и сопровождения таких проектов. В том числе инструменты и механизмы СУБД, которые могут быть использованы DBA и DevOps-инженерами для решения задач мониторинга жизнедеятельности базы данных и ранней диагностики возможных проблем.
...
1. Performance Schema in
MySQL
Danil Zburivsky
MySQL DBA and Team Lead at Pythian
Tuesday, October 23, 12
2. About myself
• MySQL DBA and Team
Lead at Pythian
• Managing dozens of
customers and
thousands of MySQL
servers
• http://www.pythian.com/
news/author/zburivsky/
• @zburivsky
Tuesday, October 23, 12
4. MySQL is a great database, but
instrumentation sucks
Slow query log SHOW ENGINE
INNODB STATUS
OS STATS
SHOW
PROCESSLIST
Tuesday, October 23, 12
5. I think there is a problem with
database...
Tuesday, October 23, 12
6. I think there is a problem with
database...
• Identify timeframe (use trending tools)
Tuesday, October 23, 12
7. I think there is a problem with
database...
• Identify timeframe (use trending tools)
• Collect data during incident
Tuesday, October 23, 12
8. I think there is a problem with
database...
• Identify timeframe (use trending tools)
• Collect data during incident
• Collect more data
Tuesday, October 23, 12
9. I think there is a problem with
database...
• Identify timeframe (use trending tools)
• Collect data during incident
• Collect more data
• Collect as much data as you can!
Tuesday, October 23, 12
10. I think there is a problem with
database...
• Identify timeframe (use trending tools)
• Collect data during incident
• Collect more data
• Collect as much data as you can!
• Try to make sense of it
Tuesday, October 23, 12
11. I think there is a problem with
database...
• Identify timeframe (use trending tools)
• Collect data during incident
• Collect more data
• Collect as much data as you can!
• Try to make sense of it
• Tune settings, SQL, hardware. Did it help?
Tuesday, October 23, 12
12. I think there is a problem with
database...
• Identify timeframe (use trending tools)
• Collect data during incident
• Collect more data
• Collect as much data as you can!
• Try to make sense of it
• Tune settings, SQL, hardware. Did it help?
• I think there is a problem with database...
Tuesday, October 23, 12
13. Tools that might help you
• pt-stalk (http://www.percona.com/doc/
percona-toolkit/2.1/pt-stalk.html)
• innotop (http://code.google.com/p/
innotop/)
• Or you write your own stuff
Tuesday, October 23, 12
14. What do “Big Boys” have?
• Oracle: SQL-traces,
kernel events timings,
tons of books on
performance tuning
• SQL Server: SQL
profiler, Data
Management Views,
Extended Events,tons
of books on
performance tuning
Tuesday, October 23, 12
16. The Idea
• Executing a query breaks down to
hundreds of smaller tasks
• There are background tasks as well
• We want to instrument it all to know
where server is spending time
Tuesday, October 23, 12
17. Implementation
• Instrumentation: measuring when event
begins and ends
• Implemented in MySQL code on server
end storage engine level
• Can be enabled/disabled or customized
Tuesday, October 23, 12
30. Average wait per thread (I)
SELECT e.THREAD_ID, e.EVENT_NAME, MAX(e.AVG_TIMER_WAIT)
FROM events_waits_summary_by_thread_by_event_name e
LEFT JOIN threads t ON t.THREAD_ID = e.THREAD_ID
WHERE event_name LIKE 'wait/synch/mutex/innodb/%'
AND t.NAME='thread/sql/one_connection'
AND AVG_TIMER_WAIT > 0
GROUP BY t.thread_id, e.EVENT_NAME
ORDER BY t.thread_id, MAX(e.AVG_TIMER_WAIT)
Tuesday, October 23, 12
33. Picoseconds, shmicosecond! ps_helper to
rescue!
• http://www.markleith.co.uk/ps_helper/
• Great examples of using
performance_schema
• Useful tools for converting time, bytes,
paths into human readable format
Tuesday, October 23, 12
35. No free lunch: PFS overhead
• Instrumentation doesn’t come for free
• In CPU-bound workloads overhead can be
~15-20%
• In IO-bound — 5%-8%
• Significant improvements in 5.6
Tuesday, October 23, 12
37. What’s new in 5.6?
• performance_schema enabled by default!
• Less overhead: 5%-10% for CPU-bound
workload
• Statements, Stages, Actors and Objects
Tuesday, October 23, 12
38. Actors: filter events by user
(performance_schema) > SELECT * FROM setup_actors;
+------+------+------+
| HOST | USER | ROLE |
+------+------+------+
| % | % | % |
+------+------+------+
1 row in set (0.00 sec)
Tuesday, October 23, 12
39. Objects: filter out events by
database/table
(performance_schema) > SELECT * FROM setup_objects;
+-------------+--------------------+-------------+---------+-------+
| OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | ENABLED | TIMED |
+-------------+--------------------+-------------+---------+-------+
| TABLE | mysql | % | NO | NO |
| TABLE | performance_schema | % | NO | NO |
| TABLE | information_schema | % | NO | NO |
| TABLE | % | % | YES | YES |
+-------------+--------------------+-------------+---------+-------+
4 rows in set (0.00 sec)
Tuesday, October 23, 12
41. Summary. Pros.
• Provides insight into internal server metric
• Flexible
• Relational model allows to build your own
views on top of this data
Tuesday, October 23, 12
42. Summary. Cons.
• Steep learning curve: not very well
documented
• Overhead for CPU-bound loads can be
significant
Tuesday, October 23, 12