RealityMine collects digital user behavior data to help companies with marketing, product development, and analyzing user patterns. They are migrating from an on-premise SQL Server data warehouse to Amazon Redshift to handle doubling data volumes. Redshift provides better performance and scalability at lower cost compared to other options. It requires extracting raw data from SQL Server without encoding issues, loading to S3, and transforming in Redshift using a star schema with careful consideration of distribution and sort keys for query performance. Ongoing database maintenance and backups are also different in Redshift.
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This slide deck is a tutorial for the Kafka streaming platform. This slide deck covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example to demonstrate failover of brokers as well as consumers. Then it goes through some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have also expanded on the Kafka design section and added references. The tutorial covers Avro and the Schema Registry as well as advance Kafka Producers.
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftJie Li
In the last six month, we have set up Amazon Redshift to power our interactive data analysis at Pinterest. It has tremendously improved the speed of analyzing our data.
Take an in-depth look at data warehousing with Amazon Redshift and get answers to your technical questions. We will cover performance tuning techniques that take advantage of Amazon Redshift's columnar technology and massively parallel processing architecture. We will also discuss best practices for migrating from existing data warehouses, optimizing your schema, loading data efficiently, and using work load management and interleaved sorting.
SQL Server to Redshift Data Load Using SSISMarc Leinbach
In this article we will try to learn how to load data from SQL Server to Amazon Redshift Data warehouse using SSIS. Techniques outlined in this article can be also applied while extracting data from other Relational Source (e.g. Loading Data from MySQL to Redshift, Oracle to Redshift etc). First we will discuss steps needed to load data into Amazon Redshift Data Warehouse, challenges and then we will simplify whole process using SSIS Task for Amazon Redshift Data Transfer.
Kafka and Avro with Confluent Schema RegistryJean-Paul Azar
The document discusses Confluent Schema Registry, which stores and manages Avro schemas for Kafka clients. It allows producers and consumers to serialize and deserialize Kafka records to and from Avro format. The Schema Registry performs compatibility checks between the schema used by producers and consumers, and handles schema evolution if needed to allow schemas to change over time in a backwards compatible manner. It provides APIs for registering, retrieving, and checking compatibility of schemas.
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?Clustrix
Many web businesses enjoy a spike in traffic at some point in the year. Whether it's Black Friday, the NFL draft day, or Mother’s Day, your app needs to be able to scale and capture customer value when it is most needed. Downtime is not an option.
For a database, that means having enough capacity to ensure transaction latency stays within acceptable limits. For high capacity apps using MySQL, this means you may need to deploy triple the normal capacity usage to sustain traffic for one day. But what do you do with that hardware for the rest of the year? Do you leave it idling? That unused capacity is costing you an arm and a leg, and wasted expenses make CFOs grumpy.
In Part 3 of our Tech Talk series, we discuss what the options are for scaling down MySQL, as well as explore answers to the following questions:
- How do I figure out the costs of not scaling down?
- How does ClustrixDB scale-down differently than MySQL?
- How real is elastically scaling in ClustrixDB? What are the catches?
View the webcast of this Tech Talk on our YouTube channel.
Kafka Tutorial - Introduction to Apache Kafka (Part 2)Jean-Paul Azar
Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This slide deck is a tutorial for the Kafka streaming platform. This slide deck covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example to demonstrate failover of brokers as well as consumers. Then it goes through some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have also expanded on the Kafka design section and added references. The tutorial covers Avro and the Schema Registry as well as advance Kafka Producers.
RealityMine collects digital user behavior data to help companies with marketing, product development, and analyzing user patterns. They are migrating from an on-premise SQL Server data warehouse to Amazon Redshift to handle doubling data volumes. Redshift provides better performance and scalability at lower cost compared to other options. It requires extracting raw data from SQL Server without encoding issues, loading to S3, and transforming in Redshift using a star schema with careful consideration of distribution and sort keys for query performance. Ongoing database maintenance and backups are also different in Redshift.
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This slide deck is a tutorial for the Kafka streaming platform. This slide deck covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example to demonstrate failover of brokers as well as consumers. Then it goes through some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have also expanded on the Kafka design section and added references. The tutorial covers Avro and the Schema Registry as well as advance Kafka Producers.
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftJie Li
In the last six month, we have set up Amazon Redshift to power our interactive data analysis at Pinterest. It has tremendously improved the speed of analyzing our data.
Take an in-depth look at data warehousing with Amazon Redshift and get answers to your technical questions. We will cover performance tuning techniques that take advantage of Amazon Redshift's columnar technology and massively parallel processing architecture. We will also discuss best practices for migrating from existing data warehouses, optimizing your schema, loading data efficiently, and using work load management and interleaved sorting.
SQL Server to Redshift Data Load Using SSISMarc Leinbach
In this article we will try to learn how to load data from SQL Server to Amazon Redshift Data warehouse using SSIS. Techniques outlined in this article can be also applied while extracting data from other Relational Source (e.g. Loading Data from MySQL to Redshift, Oracle to Redshift etc). First we will discuss steps needed to load data into Amazon Redshift Data Warehouse, challenges and then we will simplify whole process using SSIS Task for Amazon Redshift Data Transfer.
Kafka and Avro with Confluent Schema RegistryJean-Paul Azar
The document discusses Confluent Schema Registry, which stores and manages Avro schemas for Kafka clients. It allows producers and consumers to serialize and deserialize Kafka records to and from Avro format. The Schema Registry performs compatibility checks between the schema used by producers and consumers, and handles schema evolution if needed to allow schemas to change over time in a backwards compatible manner. It provides APIs for registering, retrieving, and checking compatibility of schemas.
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?Clustrix
Many web businesses enjoy a spike in traffic at some point in the year. Whether it's Black Friday, the NFL draft day, or Mother’s Day, your app needs to be able to scale and capture customer value when it is most needed. Downtime is not an option.
For a database, that means having enough capacity to ensure transaction latency stays within acceptable limits. For high capacity apps using MySQL, this means you may need to deploy triple the normal capacity usage to sustain traffic for one day. But what do you do with that hardware for the rest of the year? Do you leave it idling? That unused capacity is costing you an arm and a leg, and wasted expenses make CFOs grumpy.
In Part 3 of our Tech Talk series, we discuss what the options are for scaling down MySQL, as well as explore answers to the following questions:
- How do I figure out the costs of not scaling down?
- How does ClustrixDB scale-down differently than MySQL?
- How real is elastically scaling in ClustrixDB? What are the catches?
View the webcast of this Tech Talk on our YouTube channel.
Kafka Tutorial - Introduction to Apache Kafka (Part 2)Jean-Paul Azar
Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This slide deck is a tutorial for the Kafka streaming platform. This slide deck covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example to demonstrate failover of brokers as well as consumers. Then it goes through some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have also expanded on the Kafka design section and added references. The tutorial covers Avro and the Schema Registry as well as advance Kafka Producers.
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech TalksAmazon Web Services
Learning Objectives:
- Learn about the capabilities of the PostgreSQL database
- Learn about PostgreSQL offerings on AWS
- Learn how to migrate from Oracle to PostgreSQL with minimal disruption
Near Real-Time Data Analysis With FlyData FlyData Inc.
This document describes our products. FlyData makes it easy to load data automatically and continuously to Amazon Redshift. You can also refer to our HP ( http://flydata.com/ ) for more information.
Amazon RDS for Microsoft SQL: Performance, Security, Best Practices (DAT303) ...Amazon Web Services
Come learn about architecting high-performance applications and production workloads using Amazon RDS for SQL Server. Understand how to migrate your data to an Amazon RDS instance, apply security best practices, and optimize your database instance and applications for high availability.
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAmazon Web Services
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze big data for a fraction of the cost of traditional data warehouses. By following a few best practices, you can take advantage of Amazon Redshift’s columnar technology and parallel processing capabilities to minimize I/O and deliver high throughput and query performance. This webinar will cover techniques to load data efficiently, design optimal schemas, and tune query and database performance.
Learning Objectives:
Get an inside look at Amazon Redshift's columnar technology and parallel processing capabilities
Learn how to migrate from existing data warehouses, optimize schemas, and load data efficiently
Learn best practices for managing workload, tuning your queries, and using Amazon Redshift's interleaved sorting features
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools.
This webinar will provide an overview of Redshift with an emphasis on the many changes we recently introduced. In particular, we will address the newly released DW2 instance types and what you can do with them.
This content is designed for database developers and architects interested in Amazon Redshift.
Migrating and Running DBs on Amazon RDS for OracleMaris Elsins
The process of migrating Oracle DBs to Amazon RDS is quite complex. Some of the challenges are - capacity planning, efficient loading of data, dealing with limitations of RDS, provisioning instance configurations, and lack and SYSDBA's access to the database. The author has migrated over 20 databases to Amazon RDS, and will provide an insight into how these challenges can be addressed. Once done with the migrations – the support of the databases is very different too, because the SYSDBA access is not provided. The author will talk about his experience on migrating to and supporting databases on Amazon RDS for Oracle from Oracle DBAs perspective, and will reveal the different problems encountered as well the solutions applied.
This document provides an overview and update on Amazon Aurora, Amazon's relational database service. It discusses new performance enhancements including improved read performance through caching, NUMA-aware scheduling, and lock compression to reduce contention. New availability features are also summarized, such as automatic repair and replacement of failed database nodes and storage volumes that can grow to 64TB. The document outlines Aurora's architecture advantages over traditional databases for scaling in the cloud through its distributed, self-healing design.
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013Amazon Web Services
Migrating data from the existing environments to AWS is a key part of the overall migration to Amazon RDS for most customers. Moving data into Amazon RDS from existing production systems in a reliable, synchronized manner with minimum downtime requires careful planning and the use of appropriate tools and technologies. Because each migration scenario is different, in terms of source and target systems, tools, and data sizes, you need to customize your data migration strategy to achieve the best outcome. In this session, we do a deep dive into various methods, tools, and technologies that you can put to use for a successful and timely data migration to Amazon RDS.
Vlad Vlasceanu, a specialist solutions architect at AWS, presented best practices for deploying SQL Server on Amazon Web Services. He discussed deployment options for SQL Server on Amazon EC2 and Amazon RDS, highlighting their differences. He then provided recommendations for optimizing SQL Server performance and high availability when using Amazon EC2 and Amazon RDS, focusing on storage, availability zones, and configuration management. The presentation aimed to help customers design, deploy, and optimize SQL Server workloads effectively on AWS.
AWS June Webinar Series - Getting Started: Amazon RedshiftAmazon Web Services
Amazon Redshift is a fast, fully-managed petabyte-scale data warehouse service, for less than $1,000 per TB per year. In this presentation, you'll get an overview of Amazon Redshift, including how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. Learn how, with just a few clicks in the AWS Management Console, you can set up with a fully functional data warehouse, ready to accept data without learning any new languages and easily plugging in with the existing business intelligence tools and applications you use today. This webinar is ideal for anyone looking to gain deeper insight into their data, without the usual challenges of time, cost and effort. In this webinar, you will learn: • Understand what Amazon Redshift is and how it works • Create a data warehouse interactively through the AWS Management Console • Load some data into your new Amazon Redshift data warehouse from S3 Who Should Attend • IT professionals, developers, line-of-business managers
AWR Difference Reports are very helpful when overall performance information about two different periods needs to be compared. However, if the requirement is to review the trends of performance of a specific query, average length of a particular wait event, or different of a specific statistic over time with a purpose of identification of peaks, the AWR Difference Reports are of little help. This presentation will concentrate on techniques of extracting information from the Automatic Workload Repository to analyze how things change over time, which is useful for both - forecasting and identification of specific time periods when issues affect specific areas of the database.
AWS July Webinar Series: Amazon redshift migration and load data 20150722Amazon Web Services
Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze your data for a fraction of the cost of traditional data warehouses.
In this webinar, you will learn how to easily migrate your data from other data warehouses into Amazon Redshift, efficiently load your data with Amazon Redshift's massively parallel processing (MPP) capabilities, and automate data loading with AWS Lambda and AWS Data Pipeline. You will also learn about ETL tools from our partners to extract, transform, and prepare data from disparate data sources before loading it into Amazon Redshift.
Learning Objectives:
Understand common patterns for migrating your data to Amazon Redshift
See live examples of the Copy command that fully parallelizes data ingestion
Learn how to automate the load process using AWS Lambda & AWS Data Pipleline
Techniques for real time data loading
Options for ETL tools from our partners
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Amazon Web Services
Amazon Redshift is a fast, fully-managed, petabyte-scale data warehouse service that costs less than $1,000 per terabyte per year—less than a tenth the price of most traditional data warehousing solutions. In this session, you get an overview of Amazon Redshift, including how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. Finally, we announce new features that we've been working on over the past few months.
Measuring Storage Performance
Course practice
Presented by Valerian Ceaus
The document discusses using SQLIO to test the input/output capacity of a disk subsystem. It provides guidance on running SQLIO tests with different I/O types, sizes, and durations. The document also discusses interpreting SQLIO results and monitoring I/O performance using Windows Performance Monitor and Resource Monitor. Key factors that influence I/O performance like outstanding I/Os, queue depth, throughput, and latency are explained.
This document provides best practices for deploying Microsoft SQL Server on Amazon EC2. It discusses using multiple Amazon EBS volumes for tempdb and data files to improve performance. It also covers high availability options like AlwaysOn Availability Groups across Availability Zones and failover cluster instances. The document recommends configuring security groups and network access control lists for security in a VPC.
A quick tour in 16 slides of Amazon's Redshift clustered, massively parallel database.
Find out what differentiates it from the other database products Amazon has, including SimpleDB, DynamoDB and RDS (MySQL, SQL Server and Oracle).
Learn how it stores data on disk in a columnar format and how this relates to performance and interesting compression techniques.
Contrast the difference between Redshift and a MySQL instance and discover how the clustered architecture may help to dramatically reduce query time.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
Amazon RDS makes it easy to set up, operate, and scale Oracle Database deployments in the cloud. In this webinar, we'll discuss practical ways of migrating applications to Amazon RDS for Oracle. Customer case studies will illustrate how customers moved to Amazon RDS for Oracle and how they benefited.
Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...Amazon Web Services
This session dives deep on best practices and considerations for running Microsoft SQL Server on AWS. We cover best practices for deploying SQL Server, how to choose between Amazon EC2 and Amazon RDS, and ways to optimize the performance of your SQL Server deployment for different types of applications. We review in detail how to provision and monitor your SQL Server databases, and how to manage scalability, performance, availability, security, and backup and recovery in both Amazon RDS and Amazon EC2. In addition, we discuss how you can set up a disaster recovery solution between an on-premises SQL Server environment and AWS, using native SQL Server features like log shipping, replication, and AlwaysOn Availability Groups.
Redshift is Amazon's cloud data warehousing service that allows users to interact with S3 storage and EC2 compute. It uses a columnar data structure and zone maps to optimize analytic queries. Data is distributed across nodes using either an even or keyed approach. Sort keys and queries are optimized using statistics from ANALYZE operations while VACUUM reclaims space. Security, monitoring, and backups are managed natively with Redshift.
1- Introduction about Database Mirroring Concept
2- Reference (8 Blogs )
3- Note
4- Database mirroring operation mode
5- Database Mirroring Requirement
6- Advantage of Database Mirroring
7- Disadvantage of Database Mirroring
8- Database Mirroring Enhancement in SQL Server 2008
9- Database Mirroring Installation Step by Step
10- High Availability Mode [Automatic Failover]
11- High Availability Mode [Manual Failover]
12- High Safety Mode Without witness server [Manual Failover]
13- Stander listener port in database mirroring
14- Check SQL server mirroring availability
15- Add or replace witness server to an existing mirroring database
16- How to monitor Database Mirroring
17- Mirroring in workshop not in DC (Domain Controller)
E-Commerce Success is a Balancing Act. Ensure Success with ClustrixDB.Clustrix
If you have been having issues with your e-commerce site slowing down or acting up during peak seasons or flash sales, your database may be the cause. ClustrixDB is the only database purpose-built for e-commerce and an excellent alternative to costly replatforming.
Watch this webinar to learn how ClustrixDB allows for scale on e-commerce sites: https://www.brighttalk.com/webcast/7485/129411
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech TalksAmazon Web Services
Learning Objectives:
- Learn about the capabilities of the PostgreSQL database
- Learn about PostgreSQL offerings on AWS
- Learn how to migrate from Oracle to PostgreSQL with minimal disruption
Near Real-Time Data Analysis With FlyData FlyData Inc.
This document describes our products. FlyData makes it easy to load data automatically and continuously to Amazon Redshift. You can also refer to our HP ( http://flydata.com/ ) for more information.
Amazon RDS for Microsoft SQL: Performance, Security, Best Practices (DAT303) ...Amazon Web Services
Come learn about architecting high-performance applications and production workloads using Amazon RDS for SQL Server. Understand how to migrate your data to an Amazon RDS instance, apply security best practices, and optimize your database instance and applications for high availability.
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAmazon Web Services
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze big data for a fraction of the cost of traditional data warehouses. By following a few best practices, you can take advantage of Amazon Redshift’s columnar technology and parallel processing capabilities to minimize I/O and deliver high throughput and query performance. This webinar will cover techniques to load data efficiently, design optimal schemas, and tune query and database performance.
Learning Objectives:
Get an inside look at Amazon Redshift's columnar technology and parallel processing capabilities
Learn how to migrate from existing data warehouses, optimize schemas, and load data efficiently
Learn best practices for managing workload, tuning your queries, and using Amazon Redshift's interleaved sorting features
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools.
This webinar will provide an overview of Redshift with an emphasis on the many changes we recently introduced. In particular, we will address the newly released DW2 instance types and what you can do with them.
This content is designed for database developers and architects interested in Amazon Redshift.
Migrating and Running DBs on Amazon RDS for OracleMaris Elsins
The process of migrating Oracle DBs to Amazon RDS is quite complex. Some of the challenges are - capacity planning, efficient loading of data, dealing with limitations of RDS, provisioning instance configurations, and lack and SYSDBA's access to the database. The author has migrated over 20 databases to Amazon RDS, and will provide an insight into how these challenges can be addressed. Once done with the migrations – the support of the databases is very different too, because the SYSDBA access is not provided. The author will talk about his experience on migrating to and supporting databases on Amazon RDS for Oracle from Oracle DBAs perspective, and will reveal the different problems encountered as well the solutions applied.
This document provides an overview and update on Amazon Aurora, Amazon's relational database service. It discusses new performance enhancements including improved read performance through caching, NUMA-aware scheduling, and lock compression to reduce contention. New availability features are also summarized, such as automatic repair and replacement of failed database nodes and storage volumes that can grow to 64TB. The document outlines Aurora's architecture advantages over traditional databases for scaling in the cloud through its distributed, self-healing design.
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013Amazon Web Services
Migrating data from the existing environments to AWS is a key part of the overall migration to Amazon RDS for most customers. Moving data into Amazon RDS from existing production systems in a reliable, synchronized manner with minimum downtime requires careful planning and the use of appropriate tools and technologies. Because each migration scenario is different, in terms of source and target systems, tools, and data sizes, you need to customize your data migration strategy to achieve the best outcome. In this session, we do a deep dive into various methods, tools, and technologies that you can put to use for a successful and timely data migration to Amazon RDS.
Vlad Vlasceanu, a specialist solutions architect at AWS, presented best practices for deploying SQL Server on Amazon Web Services. He discussed deployment options for SQL Server on Amazon EC2 and Amazon RDS, highlighting their differences. He then provided recommendations for optimizing SQL Server performance and high availability when using Amazon EC2 and Amazon RDS, focusing on storage, availability zones, and configuration management. The presentation aimed to help customers design, deploy, and optimize SQL Server workloads effectively on AWS.
AWS June Webinar Series - Getting Started: Amazon RedshiftAmazon Web Services
Amazon Redshift is a fast, fully-managed petabyte-scale data warehouse service, for less than $1,000 per TB per year. In this presentation, you'll get an overview of Amazon Redshift, including how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. Learn how, with just a few clicks in the AWS Management Console, you can set up with a fully functional data warehouse, ready to accept data without learning any new languages and easily plugging in with the existing business intelligence tools and applications you use today. This webinar is ideal for anyone looking to gain deeper insight into their data, without the usual challenges of time, cost and effort. In this webinar, you will learn: • Understand what Amazon Redshift is and how it works • Create a data warehouse interactively through the AWS Management Console • Load some data into your new Amazon Redshift data warehouse from S3 Who Should Attend • IT professionals, developers, line-of-business managers
AWR Difference Reports are very helpful when overall performance information about two different periods needs to be compared. However, if the requirement is to review the trends of performance of a specific query, average length of a particular wait event, or different of a specific statistic over time with a purpose of identification of peaks, the AWR Difference Reports are of little help. This presentation will concentrate on techniques of extracting information from the Automatic Workload Repository to analyze how things change over time, which is useful for both - forecasting and identification of specific time periods when issues affect specific areas of the database.
AWS July Webinar Series: Amazon redshift migration and load data 20150722Amazon Web Services
Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze your data for a fraction of the cost of traditional data warehouses.
In this webinar, you will learn how to easily migrate your data from other data warehouses into Amazon Redshift, efficiently load your data with Amazon Redshift's massively parallel processing (MPP) capabilities, and automate data loading with AWS Lambda and AWS Data Pipeline. You will also learn about ETL tools from our partners to extract, transform, and prepare data from disparate data sources before loading it into Amazon Redshift.
Learning Objectives:
Understand common patterns for migrating your data to Amazon Redshift
See live examples of the Copy command that fully parallelizes data ingestion
Learn how to automate the load process using AWS Lambda & AWS Data Pipleline
Techniques for real time data loading
Options for ETL tools from our partners
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Amazon Web Services
Amazon Redshift is a fast, fully-managed, petabyte-scale data warehouse service that costs less than $1,000 per terabyte per year—less than a tenth the price of most traditional data warehousing solutions. In this session, you get an overview of Amazon Redshift, including how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. Finally, we announce new features that we've been working on over the past few months.
Measuring Storage Performance
Course practice
Presented by Valerian Ceaus
The document discusses using SQLIO to test the input/output capacity of a disk subsystem. It provides guidance on running SQLIO tests with different I/O types, sizes, and durations. The document also discusses interpreting SQLIO results and monitoring I/O performance using Windows Performance Monitor and Resource Monitor. Key factors that influence I/O performance like outstanding I/Os, queue depth, throughput, and latency are explained.
This document provides best practices for deploying Microsoft SQL Server on Amazon EC2. It discusses using multiple Amazon EBS volumes for tempdb and data files to improve performance. It also covers high availability options like AlwaysOn Availability Groups across Availability Zones and failover cluster instances. The document recommends configuring security groups and network access control lists for security in a VPC.
A quick tour in 16 slides of Amazon's Redshift clustered, massively parallel database.
Find out what differentiates it from the other database products Amazon has, including SimpleDB, DynamoDB and RDS (MySQL, SQL Server and Oracle).
Learn how it stores data on disk in a columnar format and how this relates to performance and interesting compression techniques.
Contrast the difference between Redshift and a MySQL instance and discover how the clustered architecture may help to dramatically reduce query time.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
Amazon RDS makes it easy to set up, operate, and scale Oracle Database deployments in the cloud. In this webinar, we'll discuss practical ways of migrating applications to Amazon RDS for Oracle. Customer case studies will illustrate how customers moved to Amazon RDS for Oracle and how they benefited.
Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...Amazon Web Services
This session dives deep on best practices and considerations for running Microsoft SQL Server on AWS. We cover best practices for deploying SQL Server, how to choose between Amazon EC2 and Amazon RDS, and ways to optimize the performance of your SQL Server deployment for different types of applications. We review in detail how to provision and monitor your SQL Server databases, and how to manage scalability, performance, availability, security, and backup and recovery in both Amazon RDS and Amazon EC2. In addition, we discuss how you can set up a disaster recovery solution between an on-premises SQL Server environment and AWS, using native SQL Server features like log shipping, replication, and AlwaysOn Availability Groups.
Redshift is Amazon's cloud data warehousing service that allows users to interact with S3 storage and EC2 compute. It uses a columnar data structure and zone maps to optimize analytic queries. Data is distributed across nodes using either an even or keyed approach. Sort keys and queries are optimized using statistics from ANALYZE operations while VACUUM reclaims space. Security, monitoring, and backups are managed natively with Redshift.
1- Introduction about Database Mirroring Concept
2- Reference (8 Blogs )
3- Note
4- Database mirroring operation mode
5- Database Mirroring Requirement
6- Advantage of Database Mirroring
7- Disadvantage of Database Mirroring
8- Database Mirroring Enhancement in SQL Server 2008
9- Database Mirroring Installation Step by Step
10- High Availability Mode [Automatic Failover]
11- High Availability Mode [Manual Failover]
12- High Safety Mode Without witness server [Manual Failover]
13- Stander listener port in database mirroring
14- Check SQL server mirroring availability
15- Add or replace witness server to an existing mirroring database
16- How to monitor Database Mirroring
17- Mirroring in workshop not in DC (Domain Controller)
E-Commerce Success is a Balancing Act. Ensure Success with ClustrixDB.Clustrix
If you have been having issues with your e-commerce site slowing down or acting up during peak seasons or flash sales, your database may be the cause. ClustrixDB is the only database purpose-built for e-commerce and an excellent alternative to costly replatforming.
Watch this webinar to learn how ClustrixDB allows for scale on e-commerce sites: https://www.brighttalk.com/webcast/7485/129411
ClustrixDB 7.5 is the latest release of the only drop-in replacement for MySQL with true scale-out performance. The latest release of ClustrixDB is easier to use, provides more insight into the performance of the database and better utilizes hardware.
Database Architecture & Scaling Strategies, in the Cloud & on the Rack Clustrix
Watch the recording here: https://www.youtube.com/watch?v=ZwERp38ynxQ&feature=youtu.be
In this webinar, Robbie Mihayli, VP of Engineering at Clustrix explores how to set up a SQL RDBMS architecture that scales out and is both elastic and consistent, while simultaneously delivering fault tolerance and ACID compliance.
He also covers how data gets distributed in this architecture, how the query processor works, how rebalancing happens and other architectural elements. Examples cited include cloud deployments and e-commerce use-cases.
In this webinar, you will learn:
1. Five RDBMS scaling strategies along with their trade offs
2. The importance of having no single point of failure for OLTP (fault tolerance)
3. The vagaries of the cloud and how it impacts using an RDBMS in the cloud
Who should watch?
1. People interested in high performance, real-time database solutions
2. Companies who have MySQL in their infrastructure and are concerned that their growth will soon overwhelm MySQL’s single-box design
3. DBA’s who implement ‘read slaves’, ‘multiple-masters’ and ‘sharding’ for MySQL databases and want to learn about better ways to scale
This document provides an overview of basic concepts in databases including:
1. It defines what a database is and examples of databases like a phone book. It also defines what a database refers to in computers as a collection of organized data.
2. It explains the functions of a database to store, delete, organize, use and present data. It provides an example of data stored in an Access database.
3. It defines what a DBMS is and its purpose to create, manage and query databases. It lists examples of common DBMS like Microsoft Access, MySQL, and Oracle.
4. It outlines different database models including hierarchical, network, object-oriented, and relational models and provides examples
AWS Summit 2011: High Availability Database Architectures in AWS CloudAmazon Web Services
This document discusses database high availability architecture on AWS. It covers general HA principles, HA database options on AWS including Amazon RDS, and technologies for achieving HA such as backups, replication, and Multi-AZ deployments. Asynchronous replication allows for lower latency but lower durability than synchronous replication. Logical replication replicates database statements while physical replication replicates block changes. Amazon RDS provides automated backups, replication, and failover across Availability Zones to improve database availability and durability.
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?Clustrix
The document discusses scaling MySQL databases and alternatives to sharding. It begins by outlining the typical path organizations take to sharding MySQL as their data and usage grows over time. This involves continually upgrading hardware, adding read replicas, and eventually implementing sharding. The document then covers the challenges of sharding, such as data skew across shards, lack of ACID transactions, application changes required, and complex infrastructure needs. As an alternative, the document introduces ClustrixDB, a database that can scale write and read performance linearly just by adding more servers without sharding. It achieves this through automatic data distribution, query fan-out, and data rebalancing. Performance benchmarks show ClustrixDB vastly outscaling alternatives on Amazon
Laine Campbell, CEO of Blackbird, will explain the options for running MySQL at high volumes at Amazon Web Services, exploring options around database as a service, hosted instances/storages and all appropriate availability, performance and provisioning considerations using real-world examples from Call of Duty, Obama for America and many more. Laine will show how to build highly available, manageable and performant MySQL environments that scale in AWS—how to maintain then, grow them and deal with failure. Some of the specific topics covered are:
* Overview of RDS and EC2 – pros, cons and usage patterns/antipatterns.
* Implementation choices in both offerings: instance sizing, ephemeral SSDs, EBS, provisioned IOPS and advanced techniques (RAID, mixed storage environments, etc…)
* Leveraging regions and availability zones for availability, business continuity and disaster recovery.
* Scaling patterns including read/write splitting, read distribution, functional dataset partitioning and horizontal dataset partitioning (aka sharding)
* Common failure modes – AZ and Region failures, EBS corruption, EBS performance inconsistencies and more.
* Managing and mitigating cost with various instance and storage options
In addition to running databases in Amazon EC2, AWS customers can choose among a variety of managed database services. These services save effort, save time, and unlock new capabilities and economies. In this session, we make it easy to understand how they differ, what they have in common, and how to choose one or more. We explain the fundamentals of Amazon DynamoDB, a fully managed NoSQL database service; Amazon RDS, a relational database service in the cloud; Amazon ElastiCache, a fast, in-memory caching service in the cloud; and Amazon Redshift, a fully managed, petabyte-scale data-warehouse solution that can be surprisingly economical. We will cover how each service might help support your application, how much each service costs, and how to get started. We will also have with us Jeongsang Baek, the VP of Engineering from IGAWorks, Korea’s No.1 mobile business platform, who will walk us through their architecture and share with us the key insights that they gained from using the various AWS database technologies to deliver a reliable, efficient and cost-effective experience.
Beyond Aurora. Scale-out SQL databases for AWS Clustrix
As enterprises move to AWS, they have great choices for MySQL compatible databases. Knowing the best database for the specific job can save you time and money. In this webinar, Lokesh Khosla will discuss high-performance databases for AWS and share his findings based on a benchmark test that simulates the workload of a high-transaction AWS-based solution.
If you work with high transactional workloads, and you need a relational database to keep track of economically valuable items like revenue, inventory and monetary transactions, you'll be interested in this discussion about the strengths and weaknesses of Aurora and other MySQL solutions for AWS.
The document discusses data engineering and compares different data stores. It motivates data engineering to gain insights from data and build data infrastructures. It describes the data engineering ecosystem and various data stores like relational databases, key-value stores, and graph stores. It then compares Amazon Redshift, a cloud data warehouse, to NoSQL databases Cassandra and HBase. Redshift is optimized for analytics with SQL and columnar storage while Cassandra and HBase are better for scalability with eventual consistency. The best data store depends on an organization's architecture, use cases, and tradeoffs between consistency, availability and performance.
Running your database in the cloud presentationManish Singh
This document discusses running databases in the cloud and the challenges involved. It outlines the paradigm shift from on-premise to cloud-hosted databases and how this affects availability, elasticity, manageability and cost. Specific solutions are presented for addressing each challenge, such as database-as-a-service providers that offer automated scaling, high availability and APIs for management. The use case of an ecommerce application's architectural evolution is provided to illustrate how these challenges emerge over time with growth.
AWS Databases
·Database models (SQL vs. NoSQL)
·Amazon Relational Database Service (RDS) concepts, including database instances, security groups, and parameter and option groups
·Amazon DynamoDB concepts, including data model and supported operations
AWS ofrece una gran variedad de servicios de base de datos que se adaptan a los requisitos de su aplicación. Los servicios de bases de datos están totalmente administrados y se pueden implementar en cuestión de minutos con tan solo unos clics. Los servicios de AWS incluyen Amazon Relational Database Service (Amazon RDS), compatible con 6 motores de bases de datos comunes, Amazon Aurora, base de datos relacional compatible con MySQL con un desempeño 5 veces superior, Amazon DynamoDB, servicio de bases de datos NoSQL rápido y flexible, Amazon Redshift, almacén de datos a escala de petabytes, y Amazon Elasticache, servicio de caché en memoria compatible con Memcached y Redis. AWS también proporciona AWS Database Migration Service, un servicio que permite migrar las bases de datos a la nube de AWS de forma sencilla y rentable.
This document discusses the challenges of running databases in the cloud and available solutions. The key challenges are availability, scalability, manageability and cost. Availability requires replication and failover. Scalability involves scaling up resources or scaling out horizontally. Manageability requires self-service tools. Cost savings require pay-per-use elastic scaling without overprovisioning. Database-as-a-Service providers aim to address these challenges by offering managed database services.
This document discusses the challenges of running databases in the cloud and available solutions. The key challenges are availability, scalability, manageability and cost. Availability requires standby servers and replication. Scalability involves scaling up resources or scaling out horizontally by adding servers. Manageability requires self-service tools. Cost savings require pay-per-use elastic scaling without overprovisioning. The document compares building your own database in the cloud versus using a database-as-a-service, and provides examples like Amazon RDS and Xeround that aim to address these challenges.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
This document summarizes a presentation by Kevin Kline on strategies for addressing common SQL Server challenges. The presentation covered topics such as tuning disk I/O, managing very large databases, and an overview of Quest software solutions for SQL Server monitoring and performance. Key points included strategies for tiered storage, partitioning very large databases, monitoring disk queue lengths and page reads/writes in SQL Server.
This document provides an overview and comparison of relational and NoSQL databases. Relational databases use SQL and have strict schemas while NoSQL databases are schema-less and include document, key-value, wide-column, and graph models. NoSQL databases provide unlimited horizontal scaling, very fast performance that does not deteriorate with growth, and flexible queries using map-reduce. Popular NoSQL databases include MongoDB, Cassandra, HBase, and Redis.
1) Apache Cassandra in term of CAP Theorem
2) What makes Apache Cassandra "Available"?
3) How Apache Cassandra ensures data consistency?
4) Cassandra advantages and disadvantages
5) Frameworks/libraries to access Apache Cassandra + performance comparison
The document discusses how to build cloud-enabled apps that can scale on AWS. It covers scaling vertically by increasing instance sizes, scaling horizontally by adding more instances, using auto-scaling to dynamically scale based on demand, distributing load with an ELB, scaling databases using read replicas and sharding, and taking advantage of managed database services like RDS and DynamoDB for easier administration. It also discusses decomposing applications into small, stateless components and using infrastructure as code for continuous deployment and agility.
AWS Certified Cloud Practitioner Course S11-S17Neal Davis
This deck contains the slides from our AWS Certified Cloud Practitioner video course. It covers:
Section 11 Databases and Analytics
Section 12 Management and Governance
Section 13 AWS Cloud Security and Identity
Section 14 Architecting for the Cloud
Section 15 Accounts, Billing and Support
Section 16 Migration, Machine Learning and More
Section 17 Exam Preparation and Tips
Full course can be found here: https://digitalcloud.training/courses/aws-certified-cloud-practitioner-video-course/
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...Amazon Web Services
Amazon RDS allows you to launch an optimally configured, secure and highly available database with just a few clicks. It provides cost-efficient and resizable capacity, automates time-consuming database administration tasks, and provides you with six familiar database engines to choose from: Amazon Aurora, Oracle, Microsoft SQL Server, PostgreSQL, MySQL and MariaDB. In this session, we will take a close look at the capabilities of Amazon RDS and explain how it works. We’ll also discuss the AWS Database Migration Service and AWS Schema Conversion Tool, which help you migrate databases and data warehouses with minimal downtime from on-premises and cloud environments to Amazon RDS and other Amazon services. Gain your freedom from expensive, proprietary databases while providing your applications with the fast performance, scalability, high availability, and compatibility they need.
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
This document summarizes a survey of advanced non-relational database systems, their approaches, applications, and comparison to relational database management systems (RDBMS). It outlines the problem of scaling to meet new web-scale demands, describes how non-relational databases provide a solution by sacrificing consistency for availability and partition tolerance. Examples of non-relational databases are provided, including their data models, APIs, optimizations, and benefits compared to RDBMS such as improved scalability and fault tolerance.
Similar to Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711 (20)
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfVALiNTRY360
Salesforce Healthcare CRM, implemented by VALiNTRY360, revolutionizes patient management by enhancing patient engagement, streamlining administrative processes, and improving care coordination. Its advanced analytics, robust security, and seamless integration with telehealth services ensure that healthcare providers can deliver personalized, efficient, and secure patient care. By automating routine tasks and providing actionable insights, Salesforce Healthcare CRM enables healthcare providers to focus on delivering high-quality care, leading to better patient outcomes and higher satisfaction. VALiNTRY360's expertise ensures a tailored solution that meets the unique needs of any healthcare practice, from small clinics to large hospital systems.
For more info visit us https://valintry360.com/solutions/health-life-sciences
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Drona Infotech is a premier mobile app development company in Noida, providing cutting-edge solutions for businesses.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
8 Best Automated Android App Testing Tool and Framework in 2024.pdfkalichargn70th171
Regarding mobile operating systems, two major players dominate our thoughts: Android and iPhone. With Android leading the market, software development companies are focused on delivering apps compatible with this OS. Ensuring an app's functionality across various Android devices, OS versions, and hardware specifications is critical, making Android app testing essential.
How Can Hiring A Mobile App Development Company Help Your Business Grow?ToXSL Technologies
ToXSL Technologies is an award-winning Mobile App Development Company in Dubai that helps businesses reshape their digital possibilities with custom app services. As a top app development company in Dubai, we offer highly engaging iOS & Android app solutions. https://rb.gy/necdnt
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.
When it is all about ERP solutions, companies typically meet their needs with common ERP solutions like SAP, Oracle, and Microsoft Dynamics. These big players have demonstrated that ERP systems can be either simple or highly comprehensive. This remains true today, but there are new factors to consider, including a promising new contender in the market that’s Odoo. This blog compares Odoo ERP with traditional ERP systems and explains why many companies now see Odoo ERP as the best choice.
What are ERP Systems?
An ERP, or Enterprise Resource Planning, system provides your company with valuable information to help you make better decisions and boost your ROI. You should choose an ERP system based on your company’s specific needs. For instance, if you run a manufacturing or retail business, you will need an ERP system that efficiently manages inventory. A consulting firm, on the other hand, would benefit from an ERP system that enhances daily operations. Similarly, eCommerce stores would select an ERP system tailored to their needs.
Because different businesses have different requirements, ERP system functionalities can vary. Among the various ERP systems available, Odoo ERP is considered one of the best in the ERp market with more than 12 million global users today.
Odoo is an open-source ERP system initially designed for small to medium-sized businesses but now suitable for a wide range of companies. Odoo offers a scalable and configurable point-of-sale management solution and allows you to create customised modules for specific industries. Odoo is gaining more popularity because it is built in a way that allows easy customisation, has a user-friendly interface, and is affordable. Here, you will cover the main differences and get to know why Odoo is gaining attention despite the many other ERP systems available in the market.
Measures in SQL (SIGMOD 2024, Santiago, Chile)Julian Hyde
SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context.
A talk at SIGMOD, June 9–15, 2024, Santiago, Chile
Authors: Julian Hyde (Google) and John Fremlin (Google)
https://doi.org/10.1145/3626246.3653374
UI5con 2024 - Bring Your Own Design SystemPeter Muessig
How do you combine the OpenUI5/SAPUI5 programming model with a design system that makes its controls available as Web Components? Since OpenUI5/SAPUI5 1.120, the framework supports the integration of any Web Components. This makes it possible, for example, to natively embed own Web Components of your design system which are created with Stencil. The integration embeds the Web Components in a way that they can be used naturally in XMLViews, like with standard UI5 controls, and can be bound with data binding. Learn how you can also make use of the Web Components base class in OpenUI5/SAPUI5 to also integrate your Web Components and get inspired by the solution to generate a custom UI5 library providing the Web Components control wrappers for the native ones.
7. Scaling-Up: Reads + Writes
• Keep increasing the size of the (single) database server
• Pros
– Simple, no application changes needed. ‘Click to Scale-up’ on AWS console
– Best solution for Capacity, if it can handle your workload
• Cons
– Capacity Limit. Most clouds provide up to 36 ‘vcpu’s at most for a single server
– Leave the cloud=Expensive. Soon, you’re often paying 5x for 2x the performance
Eventually you ‘hit the wall’, and you literally cannot scale-up anymore
7
8. Scaling Reads: Master/Slave
• Add a ‘Slave’ read-server(s) to your ‘Master’ database server
• Pros
– Simple to implement, lots of automation available. AWS has ‘Read Replicas’
– Read/write fan-out can be done at the proxy level
• Cons
– Best for read-heavy workloads- only adds Read performance
– Data consistency issues can occur, especially if the application isn’t coded to
ensure read-consistency between Master & Slave (not an issue with RDS)
8
9. Scaling Reads + Writes: Master/Master
• Add additional ‘Master’(s) to your ‘Master’ database server
• Pros
– Adds Reads + Write scaling without needing to shard
– Depending on workload (e.g. non-serialized), scaling can approach linear
• Cons
– Adds Write scaling at the cost of read-slaves, which would add even more latency
– Application changes are required to ensure data consistency / conflict resolution
– AWS: Not available on RDS console; ‘roll-your-own’ with EC2
9
10. Examples: Master/Master Replication Solutions
• Replication-based synchronous COMMIT solutions:
– Galera (open-source library)
– Percona XtraDB Cluster (leverages Galera replication library)
– Tungsten
• Pros
– Good for High-Availability
– Good for Read scaling
• Cons
– Provides variable Write scale, depending on workload
– Replication has inherent potential consistency and latency issues.
High-transaction workloads such as OLTP (e.g. E-Commerce) are exactly the
workloads that replication struggles the most with
10
11. Scaling Reads & Writes: Horizontal (‘Regular’) Sharding
• Partitioning tables across separate database servers
• Pros
– Adds both Read and Write scaling, depending on well-chosen sharding keys and low skew
– Most common way to scale-out both Reads and Writes
• Cons
– Loses the ability of an RDBMS to manage transactionality, referential integrity and ACID;
Application must ‘re-invent the wheel’
– Consistent backups across all the shards are very hard to manage
– Data management (skew/hotness) is ongoing significant maintenance
– AWS: Not available on RDS console; ‘roll-your-own’ with EC2
11
SHARDO1 SHARDO2 SHARDO3 SHARDO4
A - K L - O P - S T - Z
12. Examples: Horizontal Sharding Solutions
MySQL Fabric
• Pros
– Elasticity: Can add nodes using Python scripts or OpenStack, etc
– Resiliency: Automated load-balancing, auto slave promotion, & master/promotion-
aware routing, all transparent to the application
• Cons
– Application needs to provide sharding key per query
– JOINs involving multiple shards not supported
– Data rebalancing across shards is manual operation
ScaleArc
• Pros
– Capacity: Rule-based range or key-based sharding. Automatic read-slave promotion
– Resiliency: Automatically manages MySQL replication, managing Master/Master,
promotion, and fail-over
• Cons
– All queries need to route through ‘smart load balancer’ which manages shards
– Data rebalancing across shards is manual operation
12
13. Scaling Reads & Writes: Vertical Sharding
• Separating tables across separate database servers (used by Magento eCommerce 2, etc)
• Pros
– Adds both write and read scaling, depending on well-chosen table distribution
– Much less difficult than ‘regular’ sharding, and can have much of the gains
• Cons
– Loses the ability of an RDBMS to manage transactionality, referential integrity and ACID;
Application must ‘re-invent the wheel’
– Consistent backups across all the shards are very hard to manage
– Data management (skew/hotness) is ongoing significant maintenance
– AWS: Not available on RDS console; ‘roll-your-own’ with EC2
13
SHARDO1 SHARDO2 SHARDO3 SHARDO4
Table
1,2
Table
3,4
Table
5,6
Table
7,8
14. Application Workload Partitioning
• Partition entire application + RDBMS stack across several “pods”
• Pros
– Adds both Write and Read scaling
– Flexible: can keep scaling with addition of pods
• Cons
– No data consistency across pods (only suited for cases
where it is not needed)
– Queries / Reports across all pods can be very complex
– Complex environment to setup and support
14
APP
APP
APP
APP
APP
APP
16. Elasticity – Flexing Up and Down
• Application (for reference)
• Scale-up
• Master – Slave
• Master – Master
• Sharding
• Application Partitioning
16
Scaling Options Flex UP Flex DOWN
o Easy: Add more web nodes o Easy: Drop web nodes
o RDS: Easy. EC2: Expensive
and awkward
o RDS: Easy. EC2: Difficult and
awkward
o Easy: add read Replicas or
slave(s)
o Easy: Drop read Replicas or
slave(s)
o Involved o Involved
o Expensive and complex o Infeasible &/or untenable
o Expensive and complex o Expensive and complex
18. Resiliency – High-Availably and Fault Tolerance
• Application (for reference)
• Scale-up
• Master – Slave
• Master – Master
• Sharding
• Application Partitioning
18
Scaling Options
o No single point failure – failed node bypassed
Resilience to failures
o RDS: Easy if standby instance. EC2: One large machine à Single
point failure
o RDS: Easy. EC2: Fail-over to Slave à Potential data consistency
issue(s)
o RDS: Unavailable. EC2: Resilient to one of the Masters failing
o RDS: Unavailable. EC2: Multiple points of failures, without redundant
hardware
o RDS: Unavailable. EC2: Multiple points of failures, without redundant
hardware
19. Summary: RDBMS Capacity, Elasticity and Resiliency
Scale-up
Master – Slave
Master – Master
Sharding
ClustrixDB
19
RDBMS Scaling
Many cores – expensive if
exceed cloud instance sizes
Reads Only
Read / some Write
Unbalanced Read/Writes
Scale-out Reads + Writes
Capacity
Single Point Failure
Fail-over
Yes
Multiple points of failure
Can lose node(s)
without data loss or
downtime
ResiliencyElasticity
RDS: Yes
EC2: No
RDS: Yes
EC2: Yes
RDS: No
EC2: Yes
RDS: No
EC2: Yes
Yes
None
Consistent reads requires
coding
High – conflict resolution
Very High
No application changes
needed
Application Impact
20. 20
ANOTHER APPROACH:
§ MYSQL-COMPATIBLE CLUSTERED DATABASE
§ LINEAR SCALE-OUT OF BOTH WRITES & READS
§ HIGH-TRANSACTION, LOW-LATENCY
§ ARCHITECTED FROM THE GROUND-UP TO ADDRESS:
CAPACITY, ELASTICITY AND RESILIENCY
CLUSTRIXDB
21. ClustrixDB: Scale-Out, Fault-tolerant, MySQL-Compatible
21
ClustrixDB
ACID Compliant
Transactions & Joins
Optimized for OLTP
Built-In Fault Tolerance
Flex-Up and Flex-Down
Minimal DB Admin
Also runs GREAT in
the Data Center
Built to run
GREAT
in the Cloud
25. Example: Heavy Write Workload (AWS Deployment)
25
The Application
Inserts 254 million / day
Updates 1.35 million / day
Reads 252.3 million / day
Deletes 7,800 / day
The Database
Queries 5-9k per sec
CPU Load 45-65%
Nodes - Cores 10 nodes - 80 cores
Application Sees a Single RDBMS Instance
26. Example: Very Heavy Update Workload (Bare-Metal)
26
The Application
Inserts 31.4 million / day
Updates 3.7 billion / day
Reads 1 billion / day
Deletes 4,300 / day
The Database
Queries 35-55k per sec
CPU Load 25-35%
Nodes - Cores 8 nodes - 160 cores
Application Sees a Single RDBMS Instance
27. 27
CLUSTRIX RDBMS
§ MYSQL COMPATIBLE SHARED-NOTHING CLUSTERED RDBMS
§ FULL TRANSACTIONAL ACID COMPLIANCE ACROSS ALL NODES
§ ARCHITECTED FROM THE GROUND-UP TO ADDRESS:
CAPACITY, ELASTICITY AND RESILIENCY
TECHNICAL OVERVIEW
28. ClustrixDB Overview
Fully Distributed & Consistent Cluster
• Fully Consistent, and ACID-compliant database
– Cross-node Transactions & JOINs
– Optimized for OLTP
– But also supports reporting SQL
• All servers are read + write
• All servers accept client connections
• Tables & Indexes distributed across all nodes
– Fully automatic distribution, re-balancing
& re-protection
– All Primary and Secondary Keys
28
PrivateNetwork
ClustrixDB on commodity/cloud servers
HW or SW Load
Balancer
SQL-based
Applications
High Concurrency
Custom:
PHP, Java, Ruby, etc
Packaged:
Magento, etc
29. ClustrixDB – Shared Nothing Symmetric Architecture
• Database Engine:
– all nodes can perform all database operations (no
leader, aggregator, leaf, data-only, special nodes)
• Query Compiler:
– distribute compiled partial query fragments to the
node containing the ranking replica
• Data: Table Slices:
– All table slices auto-redistributed by the
Rebalancer (default: replicas=2)
• Data Map:
– all nodes know where all replicas are
29
Each Node Contains
ClustrixDB
Compiler Map
Engine Data
Compiler Map
Engine Data
Compiler Map
Engine Data
31. S1
S2
S3
S3
S4
S4
S5
Database Capacity And Elasticity
• Easy and simple Flex Up (and Flex Down)
– Flex multiple nodes at the same time
• Data is automatically rebalanced
across the cluster
31
S1
ClustrixDB
S2
S5
32. S1
S2
S3
S3
S4
S4
S5
Built-in Fault Tolerance
• No Single Point-of-Failure
– No Data Loss
– No Downtime
• Server node goes down…
– Data is automatically rebalanced across
the remaining nodes
32
S1
ClustrixDB
S2
S5
33. Query
Distributed Query Processing
• Queries are fielded by any peer node
– Routed to node holding the data
• Complex queries are split into fragments processed in parallel
– Automatically distributed for optimized performance
33
ClustrixDB
Load
Balancer
TRXTRXTRX
34. Automatic Cluster Data Rebalancing
The ClustrixDB Rebalancer:
• Initial Data: Distributes the data into even slices across nodes
• Data Growth: Splits large slices into smaller slices
• Failed Nodes: Re-protects slices to ensure proper replicas exist
• Flex-Up/Flex-Down: Moves slices to leverage new nodes and/or evacuate nodes
• Skewed Data: Re-distributes the data to even out across nodes
• Hotness Detection: Finds hot slices and balances then across nodes
Patent 8,543,538 - Systems and methods for redistributing data in a relational database
Patent 8,554,726 - Systems and methods for reslicing data in a relational database
35. Replication and Disaster Recovery
35
Asynchronous multi-point MySQL 5.6 Replication
ClustrixDB
Parallel Backup
up to 10x faster
Replicate to any cloud, any datacenter, anywhere
Patent 9,348,883 - Systems and methods for replication replay in a relational database
37. ClustrixDB
37
Capacity
Massive
read write scalability
Very high
concurrency
Linear throughput
scale
Elasticity
Flex UP in
minutes
Flex DOWN
easily
Right-size resources
on-demand
Resiliency
Automatic, 100%
fault tolerance
No single
point of failure
Battle-tested
performance
Cloud
Cloud, VM, or
bare-metal
Virtual Images
available
Point/click
Scale-out
46. Yahoo! Cloud Service
Benchmark (YCSB) (AWS)
• 95% Reads + 5% Writes
– 1 Transaction/sec = 1 SQL
• 100% Reads
• Over 1 Million TPS
– With 3 ms query response
– Using 50 ClustrixDB servers
46
> 1,000,000 TPS
@ 3 ms
ClustrixDB scaled to 50 nodes
(c3.2xl, 400 vcpu) in 1 day
47. 47
CLUSTRIX RDBMS
UNDER THE HOOD
§ DISTRIBUTION STRATEGY
§ REBALANCER TASKS
§ QUERY OPTIMIZER
§ EVALUATION MODEL
§ CONCURRENCY CONTROL
48. ClustrixDB key components enabling Scale-Out
• Shared-nothing architecture
– Eliminates potential bottlenecks.
• Independent Index Distribution
– Hash each distribution key to a 64-bit number space divided into ranges with a specific
slice owning each range
• Rebalancer
– Ensures optimal data distribution across all nodes.
– Rebalancer assigns slices to available nodes for data capacity and access balance
• Query Optimizer
– Distributed query planner, compiler, and distributed shared-nothing execution engine
– Executes queries with max parallelism and many simultaneous queries concurrently.
• Evaluation Model
– Parallelizes queries, which are distributed to the node(s) with the relevant data.
• Consistency and Concurrency Control
– Using Multi-Version Concurrency Control (MVCC), 2 Phase Locking (2PL) on writes,
and Paxos Consensus Protocol
48
49. Rebalancer Process
• User tables are vertically partitioned in representations.
• Representations are horizontally partitioned into slices.
• Rebalancer ensures:
– The representation has an appropriate number of slices.
– Slices are well distributed around the cluster on storage devices
– Slices are not placed on server(s) that are being flexed-down.
– Reads from each representation are balanced across the nodes
49
50. ClustrixDB Rebalancer Tasks
• Flex-UP
– Re-distribute replicas to new nodes
• Flex-DOWN
– Move replicas from the flex-down nodes to other nodes in the cluster
• Under-Protection – when a slice has fewer replicas than desired
– Create a new copy of the slice on a different node.
• Slice Too Big
– Split the slice into several new slices and re-distribute them
50
51. ClustrixDB Query Optimizer
• The ClustrixDB Query Optimizer is modeled on the Cascades optimization framework.
– Other RDBMS leverage Cascades are Tandem's Nonstop SQL and Microsoft's SQL Server.
– Cost-driven - Extensible via a rule based mechanism
– Top-down approach
• Query Optimizer must answer the following, per SQL query:
– In what order should the tables be joined?
– Which indexes should be used?
– Should the sort/aggregate be non-blocking?
51
52. ClustrixDB Evaluation Model
• Parallel query evaluation
• Massively Parallel Processing (MPP) for analytic queries
• The Fair Scheduler ensures OLTP prioritized ahead of OLAP
• Queries are broken into fragments (functions).
• Joins require more data movement by their nature.
– ClustrixDB is able to achieve minimal data movement
– Each representation (table or index) has its own distribution map,
allowing direct look-ups for which node/slice to go to next, removing
broadcasts.
– There is no a central node orchestrating data motion. Data moves
directly to the next node it needs to go to. This reduces hops to the
minimum possible given the data distribution.
52
COMPILATION
FRAGMENTS
FRAGMENT
1
FRAGMENT
2
VM
FRAGMENT 1
Node := lookup id = 15
<forward to node>
VM
FRAGMENT 2
SELECT id, amount
<return>
SELECT id, amount
FROM donation
WHERE id=15
53. Concurrency Control
• Readers never interfere with writers (or vice-versa). Writers use explicit locking for updates
• MVCC maintains a version of each row as writers modify rows
• Readers have lock-free snapshot isolation while writers use 2PL to manage conflict
53
Time
reader
reader
writer
writer
writer
row conflict one
writer blocked
no conflict
no blocking
Lock Conflict Matrix
Reader Writer
Reader None None
Writer None Row
Before we begin-
1. Much of today’s presentation comes from the presentation I did at Percona Live earlier this year
2. In general I'd like to keep it generic, but will focus on AWS, b/c this is an AWS meetup :-D
3. But for reference- our database ClustrixDB does run on any cloud or datacenter
so if you'd like to discuss any other cloud, I'd be happy to answer your ?s
Let’s start by positioning ‘RDBMS’ in the current Database Landscape
There are lots of DB’s out there
Whole spectrum of DBs out there, & it can be confusing
We’re talking about OLTP, the stuff on the left
MySQL is a general-purpose RDBMS
It can be used for OLTP, & for OLAP… but like any general-purpose RDBMS it’s not ideal for either.
This has created an explosion of specific databases, and we can see how they fit across OLTP –to- OLAP, & how they scale (up or out)
Specifically- what we’re talking about today- is OLTP/transactional workloads
When we talk about Scaling a general-purpose RDBMS like MySQL, there can be a lot of trade-offs.
So let’s emphasize three dimensions which are critical to an Enterprise deployment…
And for reference, when I say “MySQL”, I’m going to start with a sweeping generalization and club all the MySQL variants together:
MySQL itself, Percona, MariaDB
Google Cloud SQL
Azure ClearDB
RDS MySQL, & RDS Aurora to some extents
In general, if your code-base leverages MySQL code, then we’re putting them in the same high-level ‘grouping’ for right now
And we’ll differentiate them further later
Now that we’ve Introduced 3 Dimensions for Enterprise Scaling- Capacity, Elasticity, & Resiliency
It’s also very good to keep in mind some of the core Features of an RDBMS
These are critical for the Application
But these are often what are ‘relaxed’ in search of Scale.
But for an application needing an RDBMS, especially OLTP workloads,
These are NOT an option, and need to be addressed in any scaling strategy.
CAP – Consistency ; Availability ; Partition Tolerance (CLX is CP)
BASE – Basically Available ; Soft State ; Eventual Consistency
Latency, Response time- eg Reports for Larry
Pinterest – does NOT WANT TO DEAL W/ READ LATENCY
Each pod is MASTER/MASTER
ACID properties still a challenge with cross-shard transactions, and additional complexity is now added with the management layer
Marketo, Salesforce, etc
Now that we’ve reviewed the main RDBMS scaling strategies, from the standpoint of ‘Capacity’- ie, how much more hardware can you add?
Let’s revisit each scaling strategy from the standpoint of how Elastic each are. How FAST can you scale each strategy?
Rather than going thru the deck again, let’s do it as an overview:
Now let’s review each scaling strategy from the standpoint of how Resilient each are. How fault-tolerant is strategy?
Staples, Best Buy
Here’s a high-level overview…
But the PROOF is in the pudding- let’s see some examples of how ClustrixDB can scale.
Here’s a whole bunch of pretty lines- what’s important here, is how each line scales
For example, at 20ms CLX is 4X Aurora
Lets say you have have an application that needs 20ms
Simple queries
Fielded by any node
Routed to data node
Complex queries
Split into query fragments
Process fragments in parallel
Building a scalable distributed database requires two things
Distributing the data intelligently
Moving the queries to the data
We've automated away a lot of the complexity in a distributed DB, so users and applications just see a single DB that looks like MySQL
Clustrix support MySQL replication both as master and slave – so you can replicate both ways.
Within a cluster we saw earlier that all data has multiple copies
For Disaster Recovery (when a whole region loses power) Clustrix has 2 options
Fast Parallel Backup – This is in addition to slower MySqlDump backup
Fast Parallel Replication – This is asynchronous across two Clustrix Clusters
"Imagine if you had to scale MySQL to 50 nodes - how many weeks it would take to get it all working? With Clustrix we did in one day."