A Backup Today Saves Tomorrow is a presentation from Percona Live 2013 that provides insight into planning and the tools used today to capture MySQL backups.
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsJignesh Shah
This document discusses best practices for high availability (HA) and replication of PostgreSQL databases in virtualized environments. It covers enterprise needs for HA, technologies like VMware HA and replication that can provide HA, and deployment blueprints for HA, read scaling, and disaster recovery within and across datacenters. The document also discusses PostgreSQL's different replication modes and how they can be used for HA, read scaling, and disaster recovery.
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
Hadoop is emerging as the standard for big data processing and analytics. However, as usage of the Hadoop clusters grow, so do the demands of managing and monitoring these systems.
In this full-day Strata Hadoop World tutorial, attendees will get an overview of all phases for successfully managing Hadoop clusters, with an emphasis on production systems — from installation, to configuration management, service monitoring, troubleshooting and support integration.
We will review tooling capabilities and highlight the ones that have been most helpful to users, and share some of the lessons learned and best practices from users who depend on Hadoop as a business-critical system.
24 Hours of PASS, Summit Preview Session: Virtual SQL Server CPUsDavid Klee
This document discusses virtual CPUs and CPU architecture. It begins by explaining how hypervisor resource queues work and how requests for CPU, memory, storage and networking are placed in queues. It then covers physical CPU architecture including cores, sockets, NUMA and memory locality. It discusses how virtual CPUs are scheduled by the hypervisor and ways to measure scheduling pressure. Finally, it provides recommendations for right-sizing virtual machines and balancing workloads to reduce scheduling delays.
MySQL Infrastructure Testing Automation at GitHubIke Walker
The database team at GitHub is tasked with keeping the data available and with maintaining its integrity. Our infrastructure automates away much of our operation, but automation requires trust, and trust is gained by testing. This session highlights three examples of infrastructure testing automation that helps us sleep better at night:
- Backups: scheduling backups; making backup data accessible to our engineers; auto-restores and backup validation. What metrics and alerts we have in place.
- Failovers: how we continuously test our failover mechanism, orchestrator. How we setup a failover scenario, what defines a successful failover, how we automate away the cleanup. What we do in production.
- Schema migrations: how we ensure that gh-ost, our schema migration tool, which keeps rewriting our (and your!) data, does the right thing. How we test new branches in production without putting production data at risk.
This document discusses NameNode high availability (HA) in Hadoop Distributed File System (HDFS). It provides an overview of the current HDFS architecture, goals of NN HA, design approaches considered including active-standby with automatic failover, key use cases, design details around failover control, client failover, shared storage, fencing, and operations/administration. It also outlines future work such as alternative methods for sharing metadata and improving client failover.
Kscope 14 Presentation : Virtual Data PlatformKyle Hailey
This document discusses how data constraints impact IT and businesses. It presents virtual data as a solution to alleviate these constraints. Some key points made include:
1) Data constraints strain IT resources, have a huge price tag, and many companies are unaware of their impact. Moving and managing data is difficult and expensive.
2) Virtual data platforms use thin cloning and compression to reduce storage needs and provision environments quickly. This helps development, QA, recovery scenarios and enables faster business intelligence refreshes.
3) Customers using virtual data have seen benefits like doubling development throughput, slashing financial close times by 10x, and cutting surgical recovery times by 8x. Virtual data helps optimize the "factory floor of IT
The document describes the evolution of Hadoop operations at LinkedIn from 2009 to 2013. It started with 20 nodes running Hadoop 0.20.0 without configuration management, monitoring, or security. Over the years, LinkedIn made many improvements like adding configuration management, monitoring, capacity scheduling, and security. They also optimized workloads, upgraded hardware, and increased the cluster size to 5000 nodes running Hadoop 1.0.4 by 2013. Future work may include evaluating other frameworks besides Hadoop for all workloads.
My experience with embedding PostgreSQLJignesh Shah
At my current company, we embed PostgreSQL based technologies in various applications shipped as shrink-wrapped software. In this session we talk about the experience of embedding PostgreSQL where it is not directly exposed to end-user and the issues encountered on how they were resolved.
We will talk about business reasons,technical architecture of deployments, upgrades, security processes on how to work with embedded PostgreSQL databases.
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsJignesh Shah
This document discusses best practices for high availability (HA) and replication of PostgreSQL databases in virtualized environments. It covers enterprise needs for HA, technologies like VMware HA and replication that can provide HA, and deployment blueprints for HA, read scaling, and disaster recovery within and across datacenters. The document also discusses PostgreSQL's different replication modes and how they can be used for HA, read scaling, and disaster recovery.
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
Hadoop is emerging as the standard for big data processing and analytics. However, as usage of the Hadoop clusters grow, so do the demands of managing and monitoring these systems.
In this full-day Strata Hadoop World tutorial, attendees will get an overview of all phases for successfully managing Hadoop clusters, with an emphasis on production systems — from installation, to configuration management, service monitoring, troubleshooting and support integration.
We will review tooling capabilities and highlight the ones that have been most helpful to users, and share some of the lessons learned and best practices from users who depend on Hadoop as a business-critical system.
24 Hours of PASS, Summit Preview Session: Virtual SQL Server CPUsDavid Klee
This document discusses virtual CPUs and CPU architecture. It begins by explaining how hypervisor resource queues work and how requests for CPU, memory, storage and networking are placed in queues. It then covers physical CPU architecture including cores, sockets, NUMA and memory locality. It discusses how virtual CPUs are scheduled by the hypervisor and ways to measure scheduling pressure. Finally, it provides recommendations for right-sizing virtual machines and balancing workloads to reduce scheduling delays.
MySQL Infrastructure Testing Automation at GitHubIke Walker
The database team at GitHub is tasked with keeping the data available and with maintaining its integrity. Our infrastructure automates away much of our operation, but automation requires trust, and trust is gained by testing. This session highlights three examples of infrastructure testing automation that helps us sleep better at night:
- Backups: scheduling backups; making backup data accessible to our engineers; auto-restores and backup validation. What metrics and alerts we have in place.
- Failovers: how we continuously test our failover mechanism, orchestrator. How we setup a failover scenario, what defines a successful failover, how we automate away the cleanup. What we do in production.
- Schema migrations: how we ensure that gh-ost, our schema migration tool, which keeps rewriting our (and your!) data, does the right thing. How we test new branches in production without putting production data at risk.
This document discusses NameNode high availability (HA) in Hadoop Distributed File System (HDFS). It provides an overview of the current HDFS architecture, goals of NN HA, design approaches considered including active-standby with automatic failover, key use cases, design details around failover control, client failover, shared storage, fencing, and operations/administration. It also outlines future work such as alternative methods for sharing metadata and improving client failover.
Kscope 14 Presentation : Virtual Data PlatformKyle Hailey
This document discusses how data constraints impact IT and businesses. It presents virtual data as a solution to alleviate these constraints. Some key points made include:
1) Data constraints strain IT resources, have a huge price tag, and many companies are unaware of their impact. Moving and managing data is difficult and expensive.
2) Virtual data platforms use thin cloning and compression to reduce storage needs and provision environments quickly. This helps development, QA, recovery scenarios and enables faster business intelligence refreshes.
3) Customers using virtual data have seen benefits like doubling development throughput, slashing financial close times by 10x, and cutting surgical recovery times by 8x. Virtual data helps optimize the "factory floor of IT
The document describes the evolution of Hadoop operations at LinkedIn from 2009 to 2013. It started with 20 nodes running Hadoop 0.20.0 without configuration management, monitoring, or security. Over the years, LinkedIn made many improvements like adding configuration management, monitoring, capacity scheduling, and security. They also optimized workloads, upgraded hardware, and increased the cluster size to 5000 nodes running Hadoop 1.0.4 by 2013. Future work may include evaluating other frameworks besides Hadoop for all workloads.
My experience with embedding PostgreSQLJignesh Shah
At my current company, we embed PostgreSQL based technologies in various applications shipped as shrink-wrapped software. In this session we talk about the experience of embedding PostgreSQL where it is not directly exposed to end-user and the issues encountered on how they were resolved.
We will talk about business reasons,technical architecture of deployments, upgrades, security processes on how to work with embedded PostgreSQL databases.
Severalnines Self-Training: MySQL® Cluster - Part VISeveralnines
Part VI of our free self-training slides on MySQL Cluster.
In this part we cover ’Configuration and Installation'
* Data Node configuration
* SQL Node configuration
* Important parameters
* Installation
* Upgrading
Christian Johannsen presents on evaluating Apache Cassandra as a cloud database. Cassandra is optimized for cloud infrastructure with features like transparent elasticity, scalability, high availability, easy data distribution and redundancy. It supports multiple data types, is easy to manage, low cost, supports multiple infrastructures and has security features. A demo of DataStax OpsCenter and Apache Spark on Cassandra is shown.
Strata + Hadoop World 2012: HDFS: Now and FutureCloudera, Inc.
Hadoop 1.0 is a significant milestone in being the most stable and robust Hadoop release tested in production against a variety of applications. It offers improved performance, support for HBase, disk-fail-in-place, Webhdfs, etc over previous releases. The next major release, Hadoop 2.0 offers several significant HDFS improvements including new append-pipeline, federation, wire compatibility, NameNode HA, further performance improvements, etc. We describe how to take advantages of the new features and their benefits. We also discuss some of the misconceptions and myths about HDFS.
This document provides an overview of five steps to improve PostgreSQL performance: 1) hardware optimization, 2) operating system and filesystem tuning, 3) configuration of postgresql.conf parameters, 4) application design considerations, and 5) query tuning. The document discusses various techniques for each step such as selecting appropriate hardware components, spreading database files across multiple disks or arrays, adjusting memory and disk configuration parameters, designing schemas and queries efficiently, and leveraging caching strategies.
How to size up an Apache Cassandra cluster (Training)DataStax Academy
This document discusses how to size a Cassandra cluster based on replication factor, data size, and performance needs. It describes that replication factor, data size, data velocity, and hardware considerations like CPU, memory, and disk type should all be examined to determine the appropriate number of nodes. The goal is to have enough nodes to store data, achieve target throughput levels, and maintain performance and availability even if nodes fail.
The document provides an overview of five steps to optimize PostgreSQL performance: 1) application design, 2) query tuning, 3) hardware/OS configuration, 4) PostgreSQL configuration, and 5) caching. It discusses best practices for schema design, indexing, queries, transactions, and connection management to improve performance. Key recommendations include normalizing schemas, indexing commonly used columns, batching queries and transactions, using prepared statements, and implementing caching at multiple levels.
This presentation answer a lot of your questions about PostgreSQL and the Red Hat Cluster Suite.
It reviews how you can create failover/standby capabilities with the following activities:
General PostgreSQL clustering options
Overview of Red Hat Cluster Service
Identification of candidate databases for clustering
Identification of hardware for clustering
Analysis of uptime requirements and data latency
Implementation of clustering
Testing of clustering
PostgreSQL installation tips for RHCS
The document summarizes several industry standard benchmarks for measuring database and application server performance including SPECjAppServer2004, EAStress2004, TPC-E, and TPC-H. It discusses PostgreSQL's performance on these benchmarks and key configuration parameters used. There is room for improvement in PostgreSQL's performance on TPC-E, while SPECjAppServer2004 and EAStress2004 show good performance. TPC-H performance requires further optimization of indexes and query plans.
Backing up your virtual environment best practicesInterop
- Image-based backups provide faster and more efficient protection of virtual environments compared to traditional agent-based backups. With image-based backups, entire virtual machines are captured in binary image files.
- There are two methods for image-based backups - direct-to-target which has better performance and proxy-based which can preserve SAN investments.
- Best practices for backups include implementing weekly or bi-weekly full backups and daily incremental backups, with additional snapshots, replication, and off-site storage for critical systems based on recovery SLAs. A tiered approach is needed for large environments.
Virtualizing Tier One Applications - VarrowAndrew Miller
This document provides best practices for virtualizing mission critical applications like Exchange and SQL Server. It discusses the top 10 myths about virtualizing business critical applications and provides the truths. It then discusses best practices for virtualizing Exchange, including starting simple, licensing, storage configuration, and high availability options. For SQL Server, it covers starting simple, licensing, storage configuration, migrating, and database best practices. It also discusses tools that can be used for database performance analysis when virtualized like Confio IgniteVM and vCenter Operations.
With MySQL being the most popular open source DBMS in the world and with an estimated growth of 16 percent anually until 2020,we can assume that sooner or later an Oracle DBA will be handling a MySQL database in their shop. This beginner/intermediate-level session will take you through my journey of an Oracle DBA and my first 100 days of starting to administer a MySQL database, show several demos and all the roadblocks and the success I had along this path.
This document discusses high availability for HDFS and provides details on NameNode HA design. It begins with an overview of HDFS availability and reliability. It then discusses the initial goals for NameNode HA, which were to support an active and standby NameNode configuration with manual or automatic failover. The document also outlines some high-level use cases and provides a high-level overview of the NameNode HA design.
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
The document discusses design patterns for distributed non-relational databases, including consistent hashing for key placement, eventual consistency models, vector clocks for determining history, log-structured merge trees for storage layout, and gossip protocols for cluster management without a single point of failure. It raises questions to ask presenters about scalability, reliability, performance, consistency models, cluster management, data models, and real-life considerations for using such systems.
Clustering can provide high availability and scalability. Shared nothing architectures are best for achieving both high availability and scalability together. Oracle Real Application Cluster (RAC) offers advantages over alternative Oracle clustering configurations, but its scalability is limited. The cost-effectiveness of using RAC in a redundant array of inexpensive servers configuration is small due to its limited scalability. Alternatives may be more suitable depending on specific needs and requirements.
Veeam Availability for the Always-On EnterpriseArnaud PAIN
The document discusses the need for modern data centers to have always-available, reliable backup solutions due to 24/7 operations and no tolerance for downtime. It presents Veeam's availability suite as a solution that provides high-speed recovery, data loss avoidance, verified recoverability, leveraged backup data, and complete visibility compared to legacy backup systems. The document also summarizes new features in Veeam v9 including enhanced replication, automated recoverability testing, and integration with EMC storage.
Severalnines Training: MySQL® Cluster - Part IXSeveralnines
This document discusses best practices for designing a MySQL Cluster database infrastructure. It recommends dedicating instances for data and API nodes and not co-locating them. The number of nodes depends on storage, throughput and redundancy requirements. Hardware recommendations include fast CPUs, RAM sized for the dataset, and SSDs or RAID for storage. Performance planning requires benchmarking typical workloads to determine if resources need scaling. The document provides formulas and tools to help calculate storage and memory needs.
The document discusses HDFS high availability with NameNode HA, which allows two NameNodes - an active and standby - in the same cluster. The active NameNode handles client operations while the standby maintains enough state to provide a fast failover. The NameNodes write edit logs to journal nodes using a Paxos-like protocol to guarantee correctness. A ZooKeeper-based election process automatically fails over the active NameNode role in case of failure.
Severalnines Self-Training: MySQL® Cluster - Part VIISeveralnines
Part VII of our free self-training slides on MySQL Cluster.
In this installment, we cover ’Management and Administration'
* Backup and Restore
* Geographical Redundancy
* Online and Offline Operations
* Ndbinfo tables
* Reporting
* Single User Mode
* Scaling MySQL Cluster
MySQL Enterprise Backup provides fast, consistent, online backups of MySQL databases. It allows for backing up InnoDB and MyISAM tables while the database is running, minimizing downtime. The tool takes physical backups of the data files rather than logical backups, allowing for very fast restore times compared to alternatives like mysqldump. It supports features like compressed backups, incremental backups, and point-in-time recovery.
MySQL Enterprise Backup provides fast, consistent, online backups of MySQL databases. It allows for full and incremental backups, compressed backups to reduce storage needs, and point-in-time recovery. MySQL Enterprise Backup works by backing up InnoDB data files, copying and compressing the files, and backing up the transaction log files from the time period when the data files were copied. This allows for consistent backups and point-in-time recovery of the database.
Severalnines Self-Training: MySQL® Cluster - Part VISeveralnines
Part VI of our free self-training slides on MySQL Cluster.
In this part we cover ’Configuration and Installation'
* Data Node configuration
* SQL Node configuration
* Important parameters
* Installation
* Upgrading
Christian Johannsen presents on evaluating Apache Cassandra as a cloud database. Cassandra is optimized for cloud infrastructure with features like transparent elasticity, scalability, high availability, easy data distribution and redundancy. It supports multiple data types, is easy to manage, low cost, supports multiple infrastructures and has security features. A demo of DataStax OpsCenter and Apache Spark on Cassandra is shown.
Strata + Hadoop World 2012: HDFS: Now and FutureCloudera, Inc.
Hadoop 1.0 is a significant milestone in being the most stable and robust Hadoop release tested in production against a variety of applications. It offers improved performance, support for HBase, disk-fail-in-place, Webhdfs, etc over previous releases. The next major release, Hadoop 2.0 offers several significant HDFS improvements including new append-pipeline, federation, wire compatibility, NameNode HA, further performance improvements, etc. We describe how to take advantages of the new features and their benefits. We also discuss some of the misconceptions and myths about HDFS.
This document provides an overview of five steps to improve PostgreSQL performance: 1) hardware optimization, 2) operating system and filesystem tuning, 3) configuration of postgresql.conf parameters, 4) application design considerations, and 5) query tuning. The document discusses various techniques for each step such as selecting appropriate hardware components, spreading database files across multiple disks or arrays, adjusting memory and disk configuration parameters, designing schemas and queries efficiently, and leveraging caching strategies.
How to size up an Apache Cassandra cluster (Training)DataStax Academy
This document discusses how to size a Cassandra cluster based on replication factor, data size, and performance needs. It describes that replication factor, data size, data velocity, and hardware considerations like CPU, memory, and disk type should all be examined to determine the appropriate number of nodes. The goal is to have enough nodes to store data, achieve target throughput levels, and maintain performance and availability even if nodes fail.
The document provides an overview of five steps to optimize PostgreSQL performance: 1) application design, 2) query tuning, 3) hardware/OS configuration, 4) PostgreSQL configuration, and 5) caching. It discusses best practices for schema design, indexing, queries, transactions, and connection management to improve performance. Key recommendations include normalizing schemas, indexing commonly used columns, batching queries and transactions, using prepared statements, and implementing caching at multiple levels.
This presentation answer a lot of your questions about PostgreSQL and the Red Hat Cluster Suite.
It reviews how you can create failover/standby capabilities with the following activities:
General PostgreSQL clustering options
Overview of Red Hat Cluster Service
Identification of candidate databases for clustering
Identification of hardware for clustering
Analysis of uptime requirements and data latency
Implementation of clustering
Testing of clustering
PostgreSQL installation tips for RHCS
The document summarizes several industry standard benchmarks for measuring database and application server performance including SPECjAppServer2004, EAStress2004, TPC-E, and TPC-H. It discusses PostgreSQL's performance on these benchmarks and key configuration parameters used. There is room for improvement in PostgreSQL's performance on TPC-E, while SPECjAppServer2004 and EAStress2004 show good performance. TPC-H performance requires further optimization of indexes and query plans.
Backing up your virtual environment best practicesInterop
- Image-based backups provide faster and more efficient protection of virtual environments compared to traditional agent-based backups. With image-based backups, entire virtual machines are captured in binary image files.
- There are two methods for image-based backups - direct-to-target which has better performance and proxy-based which can preserve SAN investments.
- Best practices for backups include implementing weekly or bi-weekly full backups and daily incremental backups, with additional snapshots, replication, and off-site storage for critical systems based on recovery SLAs. A tiered approach is needed for large environments.
Virtualizing Tier One Applications - VarrowAndrew Miller
This document provides best practices for virtualizing mission critical applications like Exchange and SQL Server. It discusses the top 10 myths about virtualizing business critical applications and provides the truths. It then discusses best practices for virtualizing Exchange, including starting simple, licensing, storage configuration, and high availability options. For SQL Server, it covers starting simple, licensing, storage configuration, migrating, and database best practices. It also discusses tools that can be used for database performance analysis when virtualized like Confio IgniteVM and vCenter Operations.
With MySQL being the most popular open source DBMS in the world and with an estimated growth of 16 percent anually until 2020,we can assume that sooner or later an Oracle DBA will be handling a MySQL database in their shop. This beginner/intermediate-level session will take you through my journey of an Oracle DBA and my first 100 days of starting to administer a MySQL database, show several demos and all the roadblocks and the success I had along this path.
This document discusses high availability for HDFS and provides details on NameNode HA design. It begins with an overview of HDFS availability and reliability. It then discusses the initial goals for NameNode HA, which were to support an active and standby NameNode configuration with manual or automatic failover. The document also outlines some high-level use cases and provides a high-level overview of the NameNode HA design.
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
The document discusses design patterns for distributed non-relational databases, including consistent hashing for key placement, eventual consistency models, vector clocks for determining history, log-structured merge trees for storage layout, and gossip protocols for cluster management without a single point of failure. It raises questions to ask presenters about scalability, reliability, performance, consistency models, cluster management, data models, and real-life considerations for using such systems.
Clustering can provide high availability and scalability. Shared nothing architectures are best for achieving both high availability and scalability together. Oracle Real Application Cluster (RAC) offers advantages over alternative Oracle clustering configurations, but its scalability is limited. The cost-effectiveness of using RAC in a redundant array of inexpensive servers configuration is small due to its limited scalability. Alternatives may be more suitable depending on specific needs and requirements.
Veeam Availability for the Always-On EnterpriseArnaud PAIN
The document discusses the need for modern data centers to have always-available, reliable backup solutions due to 24/7 operations and no tolerance for downtime. It presents Veeam's availability suite as a solution that provides high-speed recovery, data loss avoidance, verified recoverability, leveraged backup data, and complete visibility compared to legacy backup systems. The document also summarizes new features in Veeam v9 including enhanced replication, automated recoverability testing, and integration with EMC storage.
Severalnines Training: MySQL® Cluster - Part IXSeveralnines
This document discusses best practices for designing a MySQL Cluster database infrastructure. It recommends dedicating instances for data and API nodes and not co-locating them. The number of nodes depends on storage, throughput and redundancy requirements. Hardware recommendations include fast CPUs, RAM sized for the dataset, and SSDs or RAID for storage. Performance planning requires benchmarking typical workloads to determine if resources need scaling. The document provides formulas and tools to help calculate storage and memory needs.
The document discusses HDFS high availability with NameNode HA, which allows two NameNodes - an active and standby - in the same cluster. The active NameNode handles client operations while the standby maintains enough state to provide a fast failover. The NameNodes write edit logs to journal nodes using a Paxos-like protocol to guarantee correctness. A ZooKeeper-based election process automatically fails over the active NameNode role in case of failure.
Severalnines Self-Training: MySQL® Cluster - Part VIISeveralnines
Part VII of our free self-training slides on MySQL Cluster.
In this installment, we cover ’Management and Administration'
* Backup and Restore
* Geographical Redundancy
* Online and Offline Operations
* Ndbinfo tables
* Reporting
* Single User Mode
* Scaling MySQL Cluster
MySQL Enterprise Backup provides fast, consistent, online backups of MySQL databases. It allows for backing up InnoDB and MyISAM tables while the database is running, minimizing downtime. The tool takes physical backups of the data files rather than logical backups, allowing for very fast restore times compared to alternatives like mysqldump. It supports features like compressed backups, incremental backups, and point-in-time recovery.
MySQL Enterprise Backup provides fast, consistent, online backups of MySQL databases. It allows for full and incremental backups, compressed backups to reduce storage needs, and point-in-time recovery. MySQL Enterprise Backup works by backing up InnoDB data files, copying and compressing the files, and backing up the transaction log files from the time period when the data files were copied. This allows for consistent backups and point-in-time recovery of the database.
The document discusses the 3-2-1 backup rule and strategies for implementing it using tiered storage approaches. The 3-2-1 rule recommends having 3 copies of your data, stored on 2 different media types, with 1 copy stored offsite. The document then outlines a tiered approach using storage snapshots (Tier 0), a small fast local disk system (Tier 1), a larger cheaper disk system (Tier 2), and offsite archival (Tier 3) to provide redundancy, fast restores, extended retention, and offsite protection in line with the 3-2-1 rule.
This document outlines the steps to execute a database platform migration using Zero Data Loss Recovery Appliance (ZDLRA). It discusses ZDLRA backup and restore strategies using incremental forever backups and virtual full backups for fast restore. The presentation covers both cross-endian and same-endian database migration processes using ZDLRA, including automating steps with the dbmigusera.pl tool. A customer case study shows how a semiconductor manufacturer consolidated databases to Exadata using ZDLRA for near-zero downtime migration.
This document discusses MySQL performance tuning and various MySQL products and features. It provides information on MySQL 5.6 including improved scalability, new InnoDB features for NoSQL access, and an improved optimizer. It also discusses MySQL Enterprise Monitor for performance monitoring, and the Performance Schema for instrumentation and monitoring internal operations.
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...Amazon Web Services
Learn how to monitor your database performance closely and troubleshoot database issues quickly using a variety of features provided by Amazon RDS and MySQL including database events, logs, and engine-specific features. You also learn about the security best practices to use with Amazon RDS for MySQL. In addition, you learn about how to effectively move data between Amazon RDS and on-premises instances. Lastly, you learn the latest about MySQL 5.6 and how you can take advantage of its newest features with Amazon RDS.
Preventing and Resolving MySQL DowntimeJervin Real
This document summarizes a presentation about preventing and resolving MySQL downtime. It discusses common issues that can cause downtime like disk failures, crashes from bugs or upgrades, unnoticed cluster node failures, and performance problems. Solutions presented include tools from Percona like XtraDB Cluster for high availability, the Percona Monitoring Plugins, and the Percona Toolkit. The presentation also covers preventing issues through monitoring, proper configuration of MySQL, the operating system, and hardware.
Managing MySQL at scale requires the ability to confidently plan for the future while remaining flexible and responsive to the dynamic needs of the present. Quickly responding to requirements to increase performance, deploy additional read-slaves, refresh Dev/Test, QA, and business copies of databases and improve backup and restore times are critical capabilities in the fast paced world of DevOps. In this session, you will learn how to avoid over-provisioning storage to improve performance, reduce replication slave creation times from hours to seconds, significantly shrink backup windows, and slash restore times, all while maintaining the ability to scale storage resources without downtime or performance impact.
Run MongoDB with Confidence: Backing up and Monitoring with MMSMongoDB
- The MongoDB Management Service (MMS) provides monitoring, backup, and automation capabilities for MongoDB deployments.
- MMS monitors deployments through agents that identify server configurations, collect performance metrics, and enable alerts. It provides topology views and charting of key indicators.
- MMS backups MongoDB data by taking snapshots of replica sets and clusters. It stores backups for up to one year and allows point-in-time restores. Restoring a backup is simple and speeds up tasks like launching QA environments.
- MMS automates backups without much overhead. It provides a consistent way to backup sharded clusters through checkpoint restores with a small restoration window.
Proact ExaGrid Seminar Presentation KK 20220419.pdfKarel Kannel
ExaGrid offers a tiered backup storage solution with a disk-cache landing zone for fast backups and restores, and a long-term retention repository for cost-efficient retention. Its scale-out architecture ensures storage and performance scale with increasing data volumes. Key features include fast ingest without inline deduplication, fast restores from the landing zone without rehydration, and retention time-lock capabilities to guard against ransomware deleting or encrypting backups.
- Big Green IT is a Microsoft Gold Partner that provides Azure Backup and Azure Site Recovery services.
- Azure Backup allows backing up of physical servers, virtual machines, files/folders to Azure using the Azure Backup agent, Azure Backup server, or native Azure VM backup.
- Azure Site Recovery allows replication and disaster recovery of on-premises and Azure VMs to Azure.
This document discusses migrating an Oracle Database Appliance (ODA) from a bare metal to a virtualized platform. It outlines the initial situation, desired target, challenges, and solution approach. The key challenges included system downtime during the migration, backup/restore processes, using external storage, and database reorganizations. The solution involved first converting to a virtual platform and then upgrading, using backup/restore, attaching an NGENSTOR Hurricane storage appliance for direct attached storage, and moving database reorganizations to a separate maintenance window. It also discusses the odaback-API tool created to help automate and standardize the migration process.
OOW13: Accelerate your Exadata deployment with the DBA skills you already haveMarc Fielding
The document provides an overview of accelerating an Exadata deployment. It discusses key differences with Exadata such as InfiniBand networking, storage servers, smart scans, hybrid columnar compression, flash cache, and management tools. It also covers recommendations for backups, disaster recovery, data migration, patching, and diagnostic tools when working with Exadata.
The document discusses three important things for IT leaders to know about SQL Server: database performance and speed matter; backups and disaster recovery plans are not all equal; and high availability/disaster recovery (HA/DR) tools provide proactive disaster protection. It provides tips on optimizing database performance through query tuning instead of hardware upgrades. It explains the importance of backing up transaction logs and having comprehensive disaster recovery plans, including solutions like AlwaysOn availability groups. The document promotes the services of SQLWatchmen for database diagnostics, tuning, disaster planning and recovery support.
Maria DB Galera Cluster for High AvailabilityOSSCube
Want to understand how to set high availability solutions for MySQL using MariaDB Galera Cluster? Join this webinar, and learn from experts. During this webinar, you will also get guidance on how to implement MariaDB Galera Cluster.
This document discusses configuring and implementing a MariaDB Galera cluster for high availability on 3 Ubuntu servers. It provides steps to install MariaDB with Galera patches, configure the basic Galera settings, and start the cluster across the nodes. Key aspects covered include state transfers methods, Galera architecture, and important status variables for monitoring the cluster.
Deep Dive on MySQL Databases on AWS - AWS Online Tech TalksAmazon Web Services
RDS provides fully managed MySQL, MariaDB, and Aurora database engines. It handles common database tasks to reduce management overhead and allows focusing on applications. Key features include automatic failover, backups/snapshots, scaling, security, compliance support, and integration across AWS services. Best practices involve leveraging multi-AZ, read replicas, monitoring, and storage optimization based on workload needs. Migration options include the Database Migration Service and Schema Conversion Tool.
Slides presented at Percona Live Europe Open Source Database Conference 2019, Amsterdam, 2019-10-01.
Imagine a world where all Wikipedia articles disappear due to a human error or software bug. Sounds unreal? According to some estimations, it would take an excess of hundreds of million person-hours to be written again. To prevent that scenario from ever happening, our SRE team at Wikimedia recently refactored the relational database recovery system.
In this session, we will discuss how we backup 550TB of MariaDB data without impacting the 15 billion page views per month we get. We will cover what were our initial plans to replace the old infrastructure, how we achieved recovering 2TB databases in less than 30 minutes while maintaining per-table granularity, as well as the different types of backups we implemented. Lastly, we will talk about lessons learned, what went well, how our original plans changed and future work.
Big data refers to large, complex datasets that are difficult to process using traditional methods. This document discusses three examples of real-world big data challenges and their solutions. The challenges included storage, analysis, and processing capabilities given hardware and time constraints. Solutions involved switching databases, using Hadoop/MapReduce, and representing complex data structures to enable analysis of terabytes of ad serving data. Flexibility and understanding domain needs were key to feasible versus theoretical solutions.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Zilliz
Join us to introduce Milvus Lite, a vector database that can run on notebooks and laptops, share the same API with Milvus, and integrate with every popular GenAI framework. This webinar is perfect for developers seeking easy-to-use, well-integrated vector databases for their GenAI apps.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Monitoring Java Application Security with JDK Tools and JFR Events
A Backup Today Saves Tomorrow
1. A backup today saves you
tomorrow
Because bad things do happen
Ben Mildren, MySQL Team Technical Lead
Andrew Moore, MySQL DBA
2. About Pythian
• Recognized Leader:
– Global industry-leader in remote database administration services and consulting for MySQL,
Oracle, Oracle Applications and Microsoft SQL Server
– Work with over 250 multinational companies such as Forbes.com, Fox Sports, Nordion and
Western Union to help manage their complex IT deployments
• Expertise:
– Pythian’s data experts are the elite in their field. We have the highest concentration of Oracle
ACEs on staff—10 including 2 ACE Directors and 2 Microsoft MVPs.
– Pythian holds 7 Specializations under Oracle Platinum Partner program, including Oracle
Exadata, Oracle GoldenGate & Oracle RAC
• Global Reach & Scalability:
– Around the clock global remote support for DBA and consulting, systems administration,
special projects or emergency response
5. Disaster Recovery Plan
“a documented process or set of procedures
to recover and protect a business IT
infrastructure in the event of a disaster.”
http://en.wikipedia.org/wiki/Disaster_recovery_plan
7. Disaster Recovery Plan
Designing a Disaster Recover Plan
• Define your boundaries. What can you afford
to lose? Time or data?
• Backup, what, when, where
• Organize (find what you need at 4am)
• Protect against disaster, removing SPOF
• Document and train
• Test restore, automation, review
9. Disaster Recovery Plan
Recovery Time Objective
“the duration of time and service level within
which a business process must be restored
after a disaster or disruption”
http://en.wikipedia.org/wiki/Recovery_Time_Objective
10. Disaster Recovery Plan
Recovery Time Objective
Includes
• Time allowed to troubleshoot (without recovery/fix)
• The recovery time itself
• Time for communication to stakeholders
11. Disaster Recovery Plan
Recovery Point Objective
“the maximum tolerable period in which data
might be lost from an IT service due to a major
incident”
http://en.wikipedia.org/wiki/Recovery_Point_Objective
12. Disaster Recovery Plan
Can you afford…
• Downtime?
• Data loss?
Generally it costs too much $ to
say no to both
22. Why Restore
• Audit
– Your company may be subject to audit
processes
• Legal
– You may be required to supply data as
part of legal proceedings
• Testing
– “Other” environments
– Testing backup process
– Verification of backup files
• Scale out
– Building replicas
Restoring a backup for reasons other then disaster
can include but not limited to;
23. How to Restore
A valid Backup
Before you can restore you need an important
ingredient;
24. !MySQL Backups
Some alternatives are NOT backups
• Standby Replicas/ Time delay Replicas
• Passive Cluster Nodes
• RAID
• Storage Snapshots
• Untested backups
25. MySQL Backups
Challenges that face MySQL Backups
• No one tool to rule them all
• Mixed engine environments
• MySQL Surface area
• Production impact of backup
26. MySQL Backup Types
Hot vs. Warm vs. Cold
Logical vs. Physical
Local vs. Remote
Full vs. Point in Time
27. Backup Repository Model
FULL: Complete system images
DIFFERENTIAL: Changes since last full backup
INCREMENTAL: Changes between two points in time.
Backups need to be stored and organized
28. MySQL Backup Types
Hot vs. Warm vs. Cold
Can you take your MySQL server offline to make a backup?
Do you have a replica where your backup will not impact the master?
Logical vs. Physical
Backup the files or dump out the data so that you can recreate your server
Local vs. Remote
Can you afford the latency of a network round-trip
Full vs. Point in Time
Linked to your RPO, how granular should you go?
29. MySQL Backup Tools
Logical
• mysqldump
• mydumper
Physical
• Cold Backup
• MySQL Enterprise Backup
• Xtrabackup
Snapshot
• SAN
• LVM
• ZFS
Frameworks
• Zmanda
• Holland
• Xtrabackup Manager
30. mysqldump
The command line utility to create logical dumps of your schema, database
objects and data Good solution for small to medium datasets (0G>20G)
Pros
• Packaged with MySQL
• Broad compatibility (engines)
• Flexible use with pipelines
(gzip, sed, awk, pv, ssh, nc)
• No locking in --single-
transaction with innodb only
tables
Cons
• Single threaded
• Locking by default
• Can be hard to troubleshoot
errors (syntax error on line
14917212938)
• Slow to reload data
• Be wary of foreign keys and
triggers when restoring.
Type: Logical Heat: Hot[innodb only]:Warm[myisam] Impact: Medium Speed: Slow
31. mysqldump
Backup Examples
Backup all tables
mysqldump –u user –p pass --all-databases > backup.sql
Backup all tables compressed
mysqldump –u user –p pass --all-databases | gzip -5 > backup.sql.gz
Backup with database objects
mysqldump –u user –p pass --routines --triggers --events > backup.sql
Backup with no data
mysqldump –u user –p pass --no-data --triggers –events > backup.sql
32. Restore Examples
Restore mysqldump
mysql –u user –p < backup.sql
Restore from within
source backup.sql;
Backup & restore one liner
mysqldump db_one | ssh moore@myslave mysql –u user db_one
mysqldump
Restore a compressed dump binlog off
(echo "set session sql_log_bin=0;" ; zcat dump.sql.gz) | mysql –u user
Restore table from dumpfile using sed
cat dump.sql | sed -n '/^-- Table structure for table `t1`/,/^UNLOCK TABLES;/p’
| mysql –u user
33. Tips
RTFM
There’s much more to see and test then what I’ve shown
--where = “id between 10000 and 20000”
--single-transaction
Check the time
Time your backup and recovery durations this will allow you to set expectations if you need to use the
backups. The time and pv unix programs will help you
time mysqldump > backup.sql
real 0m0.108s
user 0m0.003s
sys 0m0.006s
pv backup.sql | mysql -u user -p
101MB 0:00:39 [5.6MB/s]
[===> ] 13% ETA 0:03:12
mysqldump
34. mydumper
A parallel logical dumper for MySQL developed and maintained by ex-MySQL
employees.
Type: Logical Heat: Warm Impact: Medium Speed: Fast
Pros
• It’s fast! Multithreded
• Human readable output
• Good solution for larger datasets
(multi-threaded)
• Native compression
• Dump remote host
• Compatible with drizzle
• Nearly hot if using only innodb tables
Cons
• No official binaries shipped
• Slower restore then physical backups
but faster then mysqldump
• Caveats with restoration
• Relies on mysqldump for database
objects (routines, events, etc)
https://launchpad.net/mydumper
35. Build from source;
[moore@bkphost ~]# wget https://launchpad.net/mydumper/0.5/0.5.2/+download/mydumper-0.5.2.tar.gz
[moore@bkphost ~]# cmake .
-- The CXX compiler identification is GNU
…
-- Build files have been written to: ~/mydumper-0.5.2
[moore@bkphost ~]# make
Scanning dependencies of target mydumper
[ 20%] Building C object CMakeFiles/mydumper.dir/
…
[ 80%] Built target mydumper
Scanning dependencies of target myloader
[100%] Built target myloader
[moore@bkphost ~]# mydumper --help
…
Dependencies
(MySQL, GLib, ZLib, PCRE)
mydumper
36. mydumper
Example of mydumper & myloader
Dump data
[moore@bkphost ~]# mydumper -h localhost –u bkpuser –p secret --database world --outputdir /backup_dir --verbose 3
** Message: Connected to a MySQL server
** Message: Started dump at: 2013-04-15 12:22:48
** Message: Thread 1 connected using MySQL connection ID 4
** Message: Thread 1 dumping data for `world`.`City`
** Message: Thread 1 shutting down
** Message: Non-InnoDB dump complete, unlocking tables
** Message: Finished dump at: 2013-04-15 12:22:54
Restore data
[moore@bkphost ~]# myloader -h localhost –u bkpuser –p secret --database world –d /backup_dir --verbose 3
** Message: n threads created
** Message: Creating table `world`.`City`
** Message: Thread 1 restoring `world`.`City` part 0
** Message: Thread 1 shutting down
37. Percona Xtrabackup 2.x
One of the strongest and widely used solutions for consistently backing up MySQL files focused on
Xtradb/Innodb but also support for non-transactional tables too. Xtrabackup makes use of the XtraDB/
InnoDB crash recovery cycle to apply the redo logs to the data when preparing for a restore. This
prepare phase can happen at the end of the backup or as part of the recovery phase.
Type: Physical Heat: Hot Impact: Low Speed: Fast
Pros
• Free, GPL Licensed
• Hot & Physical
• Throttles to keep load low
• Native compression (qpress)
• Export tables
• Parallel backup
• Wide OS compatibility
• Consistent backup of MyISAM
• Great documentation and recipes
• Compatible with XtraDB Cluster
Cons
• Windows version in Alpha
• Multiple stage restore
• Cannot prepare compressed (on the fly)
backups
• Qpress compression < gzip/pigz
percona.com/software/percona-xtrabackup
38. Xtrabackup
C program that takes care of copying XtraDB/InnoDB data files and logs. Tails the iblog files whilst activity on the server
continues whilst the XtraDB/InnoDB files are copied into the backup location
Innobackupex
The perl script that oversees the backup process. This allows the backup to handle the non-transactional tables. It
connects to the server once the Xtrabackup binary suspends after copying the XtraDB/InnoDB files. Innobackupex issues a
“FLUSH TABLES WITH READ LOCK” so that the supported non-transactional tables are not written on. If –safe-slave-
backup was issued then Innobackupex stops and starts the slave before and after the file copy.
Tar4ibd
A special version of tar that understands how to handle innodb / xtradb data files. Archives built using the stream to tar
options need to be extracted using the –i option or the extraction will not succeed.
Xbstream
Custom streaming format which allows the dynamic compression and parallel copy of files to improve the performance of
the overall backup.
Percona Xtrabackup 2.x
41. More Examples
Optimizing the restore phase
innobackupex --use-memory=2G --apply-log /path/to/backup
Backups with pigz
innobackupex --stream=[xbstream|tar] . | [pigz|gzip] > backup.gz
Backup Incremental
innobackupex --incremental /path/to/incdest
--incremental-basedir=/path/to/fullbackup
Recipes: http://www.percona.com/doc/percona-xtrabackup/how-tos.html
Many, many more options…
Percona Xtrabackup 2.x
42. Other hints
Stream to another host
ssh or netcat
Native parallel & compress
--compress & --parallel
Non-transactional tables?
Use --rsync for much shorter lock time for large non-
trx surface area
Single table?
--apply-log --export
Percona Xtrabackup 2.x
43. The official MySQL hot backup product. Proprietary license as a large expense per server.
Lacks some of the features of Xtrabackup’s latest version. Solid solution for Enterprise
customers.
Type: Physical Heat: Hot Impact: Medium Speed: Fast
Pros
• Hot Backups
• Physical Backups
• Compressed backups
• Throttling to avoid overloading
server
Cons
• It costs real money L
• Throttling is a sleep call
whereas PXB uses IOPs
MySQL Enterprise Backup (MEB)
44. Taking a cold backup of your system is one sure way to a consistent backup. By stopping mysqld you can
copy the files you need to the location you want, to gain a fully consistent backup of your data and config.
You have to be able to afford the time to gracefully stop the server and then complete the copy. Buffers
are lost from shutdown so this will impact the performance of the instance when started again.
Type: Physical Heat: Cold Impact: Downtime Speed: Fast
Pros
• Fast
• Consistent Backup across all
engines
• Easily scripted
• No new software to learn
Cons
• It’s cold
• Buffers require warming on
restart
• Unsuitable for 24/7 operations
Cold Backup
45. A new MySQL feature as of MySQL version 5.6. The ability to stream all binary logs to another host to
enable redundancy for your binary logs the method used for incremental/point in time recovery. A good
addition to the backup and recovery arsenal and worthy of a mention. Can stream binlogs from older
versions. --read-from-remote-server --raw --stop-never
Type: Physical Heat: Hot Impact: Low Speed: Fast
Pros
• Hot streaming
• Physical Backup of binary
logs
• Low processing cost
Cons
• Dependent on connectivity
between servers
• Not a complete backup
• Restore could become
complex if many binlogs are
needed
MySQL Binary Log Streaming (5.6)
46. Techniques as seen in mylvmbackup show that snapshots can be used to make the backup online using
copy on write technologies. Using a snapshot capable filesystem such as LVM, ZFS, VSS or SAN storage
with the same ability can afford you a storage checkpoint where changes can be simply rolled back if
issues arise
Type: Physical Heat: Warm Impact: Varies Speed: Fast
Pros
• Fast
• Warm backups
• Familiar commands as with
storage tools
Cons
• Crash recovery needed for
InnoDB to apply redo log
• The ‘copy on write’ overhead
could impact performance
from the point the snapshot is
created until it’s destroyed.
Snapshot Backups
47. Contrary to popular belief the vast majority of MySQL backups are
warm not hot.
The FLUSH TABLES WITH READ LOCK statement is still required
to guarantee consistency for several aspects of a MySQL backup.
These include;
• MyISAM and other non transactional tables
• .frm files
• Binary log co-ordinates.
The dreaded global READ LOCK!
48. In the Frame
Frameworks for MySQL backups
Traits inherent of the framework concept
– Simplify complex technologies
– Implement specific design ‘rules’
– UI consistency
MySQL Backup Frameworks aim to solve;
– Wrap best practices into common interface
– Large environment administration pains
– Centralize groups configuration
49. ZRM (zmanda recovery manager)
Offering both commercial and GPL versions, ZRM is a great option for organizing and
scheduling your backups. Helps to simplify schedules, restores and verification.
Type: Framework Heat: Varied Impact: Varied Speed: Varied
Pros
• Handles many backup methods,
mysqldump, xtrabackup, snapshot
based full backups inc. EBS & SAN
• Commercial version has a central
dashboard GUI
• Automated alerts & reports
• Integrated into enterprise solutions
such as Symantec NetBackup and
Tivoli TSM
Cons
• Commercial nature hides much of the
nice features that make ZRM a
desirable solution.
• Xtrabackup, VSS & SAN snapshots
only available in commercial version
• Last release to community edition was
2010.
https://www.zmanda.com
50. Holland
The relatively unsung framework is originally from Rackspace and written in python under a New-BSD
license. It is a pluggable framework with MySQL, Xtrabackup, SQLite and Postgres support. Due to
pluggable nature there is scope to backup more than just databases.
Type: Framework Heat: Varied Impact: Varied Speed: Varied
Pros
• Open Source
• Pluggable structure
• In production at RackSpace
• Backup more then MySQL
• Manages retention
Cons
• Small user base (get involved
for the good of mankind!)
• No central control tower
http://hollandbackup.org/
51. Xtrabackup Manager
Winner of 2012 community award, Xtrabackup Manager (XBM) was created by Lauchlan Muchay. XBM
gives you a way to manage your Xtrabackup tasks for multiple hosts via a command line interface. Written
in php XBM uses cron, netcat, pv and ssh to get things done.
Type: Framework Heat: Warm/Hot Impact: low Speed: Fast
Pros
• GPL License
• Low overhead
• Handles incremental
xtrabackups
• Manage schedule, retention
policy
Cons
• Still in beta and development
has slowed
• Tested on 1.x so far
• Small user base (get involved
for the good of mankind!)
https://code.google.com/p/xtrabackup-
manager
52. Backup Testing
Regular development / Staging restores
• Can be sufficient when infrastructure is limited
Restore server
• Dedicated
• Shared – Sandbox / mysqld_multi
Verification
• Tests using information schema
• Best to include data taken at time of backup
53. Backup Philosophy
Add redundancy
• Use Logical and Physical types
• Don’t forget your binary logs (5.6 can stream to non-mysql server)
• Local copies (HDD, DAS) avoid network copying
• Remote copies (NFS, SAN)
• Offsite copies (S3, secure tape storage i.e. Iron Mountain)
Monitor those backups
• Check pass/fail and document a process for how to react if a backup fails
• A framework or custom wrapper will help
54. When backups go wrong
mysqldump
syntax error at line 1000101020020
charsets and collation issues (check config)
Xtrabackup
Ghost backups.
Shared tempdir for xtrabackup_checkpoint
56. Thank you
To contact us
sales@pythian.com
1-877-PYTHIAN
To follow us
http://www.pythian.com/blog
http://www.facebook.com/pages/The-Pythian-Group/163902527671
@pythian
http://www.linkedin.com/company/pythian