Marco Tusa gave a presentation on using ProxySQL to improve performance with Amazon Aurora databases. He explained what ProxySQL and AWS Aurora are and how ProxySQL can be configured to work with Aurora to distribute read queries across replicas for load balancing. Metrics were shown that read latency is lower with ProxySQL compared to directly connecting to Aurora. The native Aurora connection endpoints are limited while ProxySQL allows more granular control over query routing and additional features like caching.
PyData London 2015 - How We Turned EverythingMe Into a Data Driven CompanyArik Fraimovich
In the past 2 years we started a mission, of making our company data available to everyone at the company to help with better decision making. We first loaded all of our data into AWS's Redshift, but then we needed a tool to make this data accessible to everyone at the company. We first tried "traditional" BI tools, like Tableau and YellowFin, but it didn't feel right and it wasn't giving us the full power of Redshift.
This was the point when we decided to make our own tool - re:dash, that better aligns with our "hacker"/engineering culture at the company. We made a simple tool, that allowed you to write a query, utilizing the full power of Redshift, and share its result (along with the query) with your peers.
This high level of transparency in the work with data, helped us bring data to everyone's daily work. No more it was the realm of few, but accessible to everyone - with varying levels of knowledge of SQL. Some wrote queries, others just changed existing queries, but everyone had the same level of access.
In this talk I would like to review what we learned from this experience, and what new challenged we discovered beyond giving everyone access to the data, and how we plan to tackle them.
In this webinar, we will be covering general best practices for running MongoDB on AWS.
Topics will range from instance selection to storage selection and service distribution to ensure service availability. We will also look at any specific best practices related to using WiredTiger. We will then shift gears and explore recommended strategies for managing your MongoDB instance on AWS.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
Configuring MongoDB HA Replica Set on AWS EC2ShepHertz
It has always been a tedious task to choose the right configuration for MongoDB on AWS EC2
It is always challenging and takes a lots of time to make your system Production Ready.
Here is a quick guide on how to setup MongoDB on AWS EC2.
Tempto is a product test framework that allows developers to write and execute tests for SQL databases running on Hadoop. Individual test requirements such as data generation, HDFS file copy/storage of generated data and schema creation are expressed declaratively and are automatically fulfilled by the framework. Developers can write tests using Java (using a TestNG like paradigm and AssertJ style assertion) or by providing query files with expected results. We will show how we use it for presto product tests.
Benchto is a benchmark framework that provides an easy and manageable way to define, run and analyze macro benchmarks in clustered environment. Understanding behavior of distributed systems is hard and requires good visibility intostate of the cluster and internals of tested system. This project was developed for repeatable benchmarking ofHadoop SQL engines, most importantly Presto.
In the big data world, our data stores communicate over an asynchronous, unreliable network to provide a facade of consistency. However, to really understand the guarantees of these systems, we must understand the realities of networks and test our data stores against them.
Jepsen is a tool which simulates network partitions in data stores and helps us understand the guarantees of our systems and its failure modes. In this talk, I will help you understand why you should care about network partitions and how can we test datastores against partitions using Jepsen. I will explain what Jepsen is and how it works and the kind of tests it lets you create. We will try to understand the subtleties of distributed consensus, the CAP theorem and demonstrate how different data stores such as MongoDB, Cassandra, Elastic and Solr behave under network partitions. Finally, I will describe the results of the tests I wrote using Jepsen for Apache Solr and discuss the kinds of rare failures which were found by this excellent tool.
PyData London 2015 - How We Turned EverythingMe Into a Data Driven CompanyArik Fraimovich
In the past 2 years we started a mission, of making our company data available to everyone at the company to help with better decision making. We first loaded all of our data into AWS's Redshift, but then we needed a tool to make this data accessible to everyone at the company. We first tried "traditional" BI tools, like Tableau and YellowFin, but it didn't feel right and it wasn't giving us the full power of Redshift.
This was the point when we decided to make our own tool - re:dash, that better aligns with our "hacker"/engineering culture at the company. We made a simple tool, that allowed you to write a query, utilizing the full power of Redshift, and share its result (along with the query) with your peers.
This high level of transparency in the work with data, helped us bring data to everyone's daily work. No more it was the realm of few, but accessible to everyone - with varying levels of knowledge of SQL. Some wrote queries, others just changed existing queries, but everyone had the same level of access.
In this talk I would like to review what we learned from this experience, and what new challenged we discovered beyond giving everyone access to the data, and how we plan to tackle them.
In this webinar, we will be covering general best practices for running MongoDB on AWS.
Topics will range from instance selection to storage selection and service distribution to ensure service availability. We will also look at any specific best practices related to using WiredTiger. We will then shift gears and explore recommended strategies for managing your MongoDB instance on AWS.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
Configuring MongoDB HA Replica Set on AWS EC2ShepHertz
It has always been a tedious task to choose the right configuration for MongoDB on AWS EC2
It is always challenging and takes a lots of time to make your system Production Ready.
Here is a quick guide on how to setup MongoDB on AWS EC2.
Tempto is a product test framework that allows developers to write and execute tests for SQL databases running on Hadoop. Individual test requirements such as data generation, HDFS file copy/storage of generated data and schema creation are expressed declaratively and are automatically fulfilled by the framework. Developers can write tests using Java (using a TestNG like paradigm and AssertJ style assertion) or by providing query files with expected results. We will show how we use it for presto product tests.
Benchto is a benchmark framework that provides an easy and manageable way to define, run and analyze macro benchmarks in clustered environment. Understanding behavior of distributed systems is hard and requires good visibility intostate of the cluster and internals of tested system. This project was developed for repeatable benchmarking ofHadoop SQL engines, most importantly Presto.
In the big data world, our data stores communicate over an asynchronous, unreliable network to provide a facade of consistency. However, to really understand the guarantees of these systems, we must understand the realities of networks and test our data stores against them.
Jepsen is a tool which simulates network partitions in data stores and helps us understand the guarantees of our systems and its failure modes. In this talk, I will help you understand why you should care about network partitions and how can we test datastores against partitions using Jepsen. I will explain what Jepsen is and how it works and the kind of tests it lets you create. We will try to understand the subtleties of distributed consensus, the CAP theorem and demonstrate how different data stores such as MongoDB, Cassandra, Elastic and Solr behave under network partitions. Finally, I will describe the results of the tests I wrote using Jepsen for Apache Solr and discuss the kinds of rare failures which were found by this excellent tool.
The Azure service fabric mesh is a fully managed cluster in Microsoft cloud to run containerized applications. Any application that runs in the container can he run in service fabric mesh cluster.
Using AWS S3, CloudFront, Route53 and Front-end Javascript, web applications can be deployed without any server in between. AWS Route53 mixed with S3 provides decent set of name resolution functionalities out of the box.
Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...DataStax Academy
Astyanax is the thrift protocol based C* driver widely used and open sourced by Netflix. It was recently integrated with the Java Driver released by DataStax. This talk focusses on the different options available with Astyanax and how it complements the Java Driver.
About Puneet Oberai, Senior Software Engineer at Netflix
Senior Software Engineer at Netflix and proud team member of Netflix CDE (Cloud Data Engineering).
MongoDB and Amazon Web Services: Storage Options for MongoDB DeploymentsMongoDB
When using MongoDB and AWS, you want to design your infrastructure to avoid storage bottlenecks and make the best use of your available storage resources. AWS offers a myriad of storage options, including ephemeral disks, EBS, Provisioned IOPS, and ephemeral SSD's, each offering different performance and persistence characteristics. In this session, we’ll evaluate each of these options in the context of your MongoDB deployment, assessing the benefits and drawbacks of each.
You know, for search. Querying 24 Billion Documents in 900msJodok Batlogg
Who doesn't love building high-available, scalable systems holding multiple Terabytes of data? Recently we had the pleasure to crack some tough nuts to solve the problems and we'd love to share our findings designing, building up and operating a 120 Node, 6TB Elasticsearch (and Hadoop) cluster with the community.
Anyone who has tried integrating search in their application knows how good and powerful Solr is but always wished it was simpler to get started and simpler to take it to production.
I will talk about the recent features added to Solr making it easier for users and some of the changes we plan on adding soon to make the experience even better.
Building a near real time search engine & analytics for logs using solrlucenerevolution
Presented by Rahul Jain, System Analyst (Software Engineer), IVY Comptech Pvt Ltd
Consolidation and Indexing of logs to search them in real time poses an array of challenges when you have hundreds of servers producing terabytes of logs every day. Since the log events mostly have a small size of around 200 bytes to few KBs, makes it more difficult to handle because lesser the size of a log event, more the number of documents to index. In this session, we will discuss the challenges faced by us and solutions developed to overcome them. The list of items that will be covered in the talk are as follows.
Methods to collect logs in real time.
How Lucene was tuned to achieve an indexing rate of 1 GB in 46 seconds
Tips and techniques incorporated/used to manage distributed index generation and search on multiple shards
How choosing a layer based partition strategy helped us to bring down the search response times.
Log analysis and generation of analytics using Solr.
Design and architecture used to build the search platform.
Building highly scalable website requires to understand the core building blocks of your applicative environment. In this talk we dive into Jahia core components to understand how they interact and how by (1) respecting a few architectural practices and (2) fine tuning Jahia components and the JVM, you will be able to build a highly scalable service
Secure our data is a complex topic. We can build a very strong protection around our data, but nothing will prevent the one WHO could potentially access it to compromise the data integrity or to expose it.
This because we either under estimate the control we can or should impose, or because we think to do not have the tools to perform such control.
Nowadays to be able to control and manage what can access our data is a must, while how to do with standard tools it is a nightmare.
The presentation will guide you in a journey, there you will discover how implementing a quite robust protection, more than what you thought was possible.
Even more, it is possible and your performances will even improve. Cool right?
We will discuss:
- Access using not standard port
- Implement selective query access
- Define accessibility by location/ip/id
- Reduce to minimum cost of filtering
- Automate the query discovery
Deep Dive into MySQL InnoDB Cluster Read Scale-out Capabilities.pdfMiguel Araújo
MySQL's first Innovation Release is out, 8.1.0, and with it, we're introducing MySQL InnoDB Cluster Read Replicas.
The main purpose of secondaries on MySQL InnoDB Cluster is to be ready to take over when a primary member has failed (High Availability). This is done using MySQL Group Replication. Another commonly used purpose for the secondaries is to use them to offload read workloads away from the primary. With MySQL InnoDB Cluster Read Replicas, it's now possible to add asynchronous replicas to the database topology, to be used to offload read traffic away from primary or secondaries, to have dedicated read replicas, special purpose read replicas (e.g. for reporting), or to scale beyond what the secondaries can handle by adding multiple read replicas.
This talk covers the read replicas functionality, showcase its usage in different database architectures, and include a demonstration on its setup and management.
Amazon Aurora is a MySQL-compatible database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. This session introduces you to Amazon Aurora, explains common use cases for the service, and helps you get started with building your first Amazon Aurora–powered application.
The Azure service fabric mesh is a fully managed cluster in Microsoft cloud to run containerized applications. Any application that runs in the container can he run in service fabric mesh cluster.
Using AWS S3, CloudFront, Route53 and Front-end Javascript, web applications can be deployed without any server in between. AWS Route53 mixed with S3 provides decent set of name resolution functionalities out of the box.
Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...DataStax Academy
Astyanax is the thrift protocol based C* driver widely used and open sourced by Netflix. It was recently integrated with the Java Driver released by DataStax. This talk focusses on the different options available with Astyanax and how it complements the Java Driver.
About Puneet Oberai, Senior Software Engineer at Netflix
Senior Software Engineer at Netflix and proud team member of Netflix CDE (Cloud Data Engineering).
MongoDB and Amazon Web Services: Storage Options for MongoDB DeploymentsMongoDB
When using MongoDB and AWS, you want to design your infrastructure to avoid storage bottlenecks and make the best use of your available storage resources. AWS offers a myriad of storage options, including ephemeral disks, EBS, Provisioned IOPS, and ephemeral SSD's, each offering different performance and persistence characteristics. In this session, we’ll evaluate each of these options in the context of your MongoDB deployment, assessing the benefits and drawbacks of each.
You know, for search. Querying 24 Billion Documents in 900msJodok Batlogg
Who doesn't love building high-available, scalable systems holding multiple Terabytes of data? Recently we had the pleasure to crack some tough nuts to solve the problems and we'd love to share our findings designing, building up and operating a 120 Node, 6TB Elasticsearch (and Hadoop) cluster with the community.
Anyone who has tried integrating search in their application knows how good and powerful Solr is but always wished it was simpler to get started and simpler to take it to production.
I will talk about the recent features added to Solr making it easier for users and some of the changes we plan on adding soon to make the experience even better.
Building a near real time search engine & analytics for logs using solrlucenerevolution
Presented by Rahul Jain, System Analyst (Software Engineer), IVY Comptech Pvt Ltd
Consolidation and Indexing of logs to search them in real time poses an array of challenges when you have hundreds of servers producing terabytes of logs every day. Since the log events mostly have a small size of around 200 bytes to few KBs, makes it more difficult to handle because lesser the size of a log event, more the number of documents to index. In this session, we will discuss the challenges faced by us and solutions developed to overcome them. The list of items that will be covered in the talk are as follows.
Methods to collect logs in real time.
How Lucene was tuned to achieve an indexing rate of 1 GB in 46 seconds
Tips and techniques incorporated/used to manage distributed index generation and search on multiple shards
How choosing a layer based partition strategy helped us to bring down the search response times.
Log analysis and generation of analytics using Solr.
Design and architecture used to build the search platform.
Building highly scalable website requires to understand the core building blocks of your applicative environment. In this talk we dive into Jahia core components to understand how they interact and how by (1) respecting a few architectural practices and (2) fine tuning Jahia components and the JVM, you will be able to build a highly scalable service
Secure our data is a complex topic. We can build a very strong protection around our data, but nothing will prevent the one WHO could potentially access it to compromise the data integrity or to expose it.
This because we either under estimate the control we can or should impose, or because we think to do not have the tools to perform such control.
Nowadays to be able to control and manage what can access our data is a must, while how to do with standard tools it is a nightmare.
The presentation will guide you in a journey, there you will discover how implementing a quite robust protection, more than what you thought was possible.
Even more, it is possible and your performances will even improve. Cool right?
We will discuss:
- Access using not standard port
- Implement selective query access
- Define accessibility by location/ip/id
- Reduce to minimum cost of filtering
- Automate the query discovery
Deep Dive into MySQL InnoDB Cluster Read Scale-out Capabilities.pdfMiguel Araújo
MySQL's first Innovation Release is out, 8.1.0, and with it, we're introducing MySQL InnoDB Cluster Read Replicas.
The main purpose of secondaries on MySQL InnoDB Cluster is to be ready to take over when a primary member has failed (High Availability). This is done using MySQL Group Replication. Another commonly used purpose for the secondaries is to use them to offload read workloads away from the primary. With MySQL InnoDB Cluster Read Replicas, it's now possible to add asynchronous replicas to the database topology, to be used to offload read traffic away from primary or secondaries, to have dedicated read replicas, special purpose read replicas (e.g. for reporting), or to scale beyond what the secondaries can handle by adding multiple read replicas.
This talk covers the read replicas functionality, showcase its usage in different database architectures, and include a demonstration on its setup and management.
Amazon Aurora is a MySQL-compatible database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. This session introduces you to Amazon Aurora, explains common use cases for the service, and helps you get started with building your first Amazon Aurora–powered application.
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...Amazon Web Services
Amazon Aurora is a fully managed relational database engine that provides higher performance, availability and durability than previously possible using conventional monolithic database architectures. After launching a year ago, we continued adding many new features and capabilities to Aurora. In this session AWS Aurora experts will discuss the best practices that will help you put these capabilities to the best use. You will also hear from Amazon Aurora customer Intercom on the best practices they adopted for moving live databases with over two billion rows to a new datastore in Amazon Aurora with almost no downtime or lost records.
Intercom was founded to provide a fundamentally new way for Internet businesses to communicate with customers at scale. For growing startups like Intercom, it’s natural for the load on datastores to grow on a weekly basis. The usual solution to this problem is to get a bigger box from AWS. But very soon you reach a point where bigger boat is not an option anymore. You will learn about the benefits of moving to such a datastore, the problems it introduced, and all about the new ability for scaling that was not there before.
온디맨드 다시보기: https://www.youtube.com/watch?v=dBLv4V3hRRQ
엔터프라이즈 미션 크리티컬 시스템을 위해서 다양한 고객 환경에서 Oracle RAC와 같은 대용량 데이터베이스가 운영중에 있습니다. 클라우드를 도입하는 많은 고객이 이러한 대용량 데이터베이스 환경을 클라우드 네이티브 서비스가 이를 대체할 수 있을지에 많은 의구심을 가지고 있습니다. AWS의 클라우드 네이티브 데이터베이스 서비스의 기술적인 관점에서 대용량 데이터 운영 관리의 특성을 살펴 보고 Oracle RAC의 완벽한 대체제로 충분한 역량을 가지고 있음을 소개합니다.
AWS Webcast - AWS Webinar Series for Education #3 - Discover the Ease of AWS ...Amazon Web Services
This webinar will emphasize how easy it is to deploy AWS resources with access to various publicly available AMIs, SaaS solutions, and CloudFormation templates to get started quickly with AWS. This session will dig deeper into how to launch critical business applications on AWS such as deploy an emergency website, launch SharePoint server and more. The gist of the webinar will be on ease of use and ability to clone environments that largest customers are running while trivializing undifferentiated heavy lifting to emphasize AWS’ ease in deploying in enterprises settings.
This is the presentation delivered by Karthik.P.R at MySQL User Camp Bangalore on 09th June 2017. ProxySQL is a high performance MySQL Load Balancer Designed to scale database servers.
Amazon Web Services (AWS) can make hosting scalable, highly-available websites and web applications easier and less expensive for the Enterprise Education customers. Join us for an informative webinar on tools AWS provides to elastically scale your architecture to avoid underutilized resources while reducing complexity with templates, partners, and tools to do much of the heavy lifting of creating and running a website for you.
Amazon Aurora Getting started Guide -level 0kartraj
Introduction To Amazon Aurora, Amazon Aurora
applying a Service-oriented architecture
to the database
Aurora Makes it Easy to Run Your Databases
Aurora simplifies storage management
Aurora simplifies Data Security
Aurora is Highly Available
Spark is fast becoming a critical part of Customer Solutions on Azure. Databricks on Microsoft Azure provides a first-class experience for building and running Spark applications. The Microsoft Azure CAT team engaged with many early adopter customers helping them build their solutions on Azure Databricks.
In this session, we begin by reviewing typical workload patterns, integration with other Azure services like Azure Storage, Azure Data Lake, IoT / Event Hubs, SQL DW, PowerBI etc. Most importantly, we will share real-world tips and learnings that you can take and apply in your Data Engineering / Data Science workloads
Amazon Aurora is a cloud-optimized relational database that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. The recently announced PostgreSQL-compatibility, together with the original MySQL compatibility, are perfect for new application development and for migrations from overpriced, restrictive commercial databases. In this session, we’ll do a deep dive into the new architectural model and distributed systems techniques behind Amazon Aurora, discuss best practices and configurations, look at migration options and share customer experience from the field.
Amazon Aurora is a MySQL-compatible database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. The service is now in preview. Come to our session for an overview of the service and learn how Aurora delivers up to five times the performance of MySQL yet is priced at a fraction of what you'd pay for a commercial database with similar performance and availability.
Speakers:
Ronan Guilfoyle, AWS Solutions Architect
Brian Scanlan, Engineer, Intercom.io
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQLContinuent
MS Azure Database for MySQL vs. Continuent Tungsten Clusters
Building a Geo-Scale, Multi-Region and Highly Available MySQL Cloud Back-End
This is the third of our High Noon series covering MySQL clustering solutions for high availability (HA), disaster recovery (DR), and geographic distribution.
Azure Database for MySQL is a managed database cluster within Microsoft Azure Cloud that runs MySQL community edition. There are really two deployment options: “Single Server” and “Flexible Server (Preview).” We will look at the Flexible Server version, even though it is still preview, because most enterprise applications require failover, so this is the relevant comparison for Tungsten Clustering.
You may use Tungsten Clustering with native MySQL, MariaDB or Percona Server for MySQL in GCP, AWS, Azure, and/or on-premises data centers for better technological capabilities, control, and flexibility. But learn about the pros and cons!
Enjoy the webinar!
AGENDA
- Goals for the High Noon Webinar Series
- High Noon Series: Tungsten Clustering vs Others
- Microsoft Azure Database for MySQL
- Key Characteristics
- Certification-based Replication
- Azure MySQL Multi-Site Requirements
- Limitations Using Azure MySQL
- How to do better MySQL HA / DR / Geo-Scale?
- Azure MySQL vs Tungsten Clustering
- About Continuent & Its Solutions
PRESENTER
Matthew Lang - Customer Success Director – Americas, Continuent - has over 25 years of experience in database administration, database programming, and system architecture, including the creation of a database replication product that is still in use today. He has designed highly available, scaleable systems that have allowed startups to quickly become enterprise organizations, utilizing a variety of technologies including open source projects, virtualization and cloud.
AWS January 2016 Webinar Series - Amazon Aurora for Enterprise Database Appli...Amazon Web Services
Relational databases are a cornerstone of the enterprise IT landscape, powering business-critical applications of many kinds. Though they have been around for a while, current commercial relational databases have lagged behind in innovation. Amazon Aurora, a managed database service built for the cloud, is intended to change that. It targets the high-performance needs of business-critical applications with an emphasis on cost-effectiveness.
In this session, we will look into how Aurora fits the needs of applications built and bought by enterprises to power their business.
Learning Objectives:
Learn about the overall architecture, capabilities, and cost-effectiveness of Aurora, comparing it to current commercial database offerings
Explore best practices for enterprises adopting Aurora for existing and new applications, as well as strategies, tools, and techniques for migrating existing databases to Aurora
Who Should Attend:
IT Managers, DBAs, Enterprise and Solution Architects , DevOps Engineers and Developers
AWS Webcast - Webinar Series for State and Local Government #3: Discover the ...Amazon Web Services
This webinar will provide an overview of tools that help you deploy AWS resources easily and quickly using publicly available Amazon Machine Images (AMIs), SaaS solutions, and CloudFormation templates. This session will dig deeper into how to launch critical business applications on AWS such as deploy an emergency website, launch SharePoint server and more. The focus of the webinar will be on demonstrating vast ecosystem of AWS to help customers deploy business critical applications quickly without the steep learning curve.
Amazon Aurora New Features - September 2016 Webinar SeriesAmazon Web Services
Amazon Aurora is a fully managed MySQL-compatible database with high-end commercial database features and performance at one-tenth the cost. Since launching Aurora a year ago we have added many new capabilities and features. Some of these features include encryption, database snapshot sharing, enhanced monitoring, cross-region replication, S3 binary snapshot ingestion and customized failover priority. In this session we'll demonstrate how these features work and discuss how you can make the best use of them.
Learning Objectives:
• Learn about the newly added features of Aurora
• Learn how to use those features
• Learn when and why to use those features
Who Should Attend:
• IT Managers, DBAs, Enterprise and Solution Architects, Devops Engineers and Developers
Amazon Aurora is a MySQL-compatible database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. The service is now in preview. Come to our session for an overview of the service and learn how Aurora delivers up to five times the performance of MySQL yet is priced at a fraction of what you'd pay for a commercial database with similar performance and availability.
Percona xtra db cluster(pxc) non blocking operations, what you need to know t...Marco Tusa
Performing simple DDL operations as ADD/DROP INDEX in a tightly connected cluster as PXC, can become a nightmare. Metalock will prevent Data modifications for long period of time and to bypass this, we need to become creative, like using Rolling schema upgrade or Percona online-schema-change. With NBO, we will be able to avoid such craziness at least for a simple operation like adding an index. In this brief talk I will illustrate what you should do to see the negative effect of NON using NBO, as well what you should do to use it correctly and what to expect out of it.
The constant pressure to move DATA in containers and Kubernetes is creating a lot of confusion and misunderstanding.
This is particularly dangerous when talking about Relational Database Management System.
MySQL, as well as Oracle, Postgres or SQL Server, is a RDBM, as such subject to the erroneous interpretation caused by this new crazy shining things that will solve all. In this short talk we will clarify, that first of all, we are not looking to something new and second why we need to be very careful when talking about using Kubernetes and containers for RDBMS.
Comparing high availability solutions with percona xtradb cluster and percona...Marco Tusa
Percona XtraDB Cluster (PXC) is currently the most popular solution for HA in the MySQL ecosystem, and any solutions Galera-based as PXC have been the only viable option when looking for a high grade of HA using synchronous replication.
But Oracle had intensively worked on making Group Replication more solid and easy to use.
It is time to identify if Group Replication and attached solutions, like InnoDB cluster, can compete or even replace solutions based on Galera.
This presentation will focus on comparing the two solutions and how they behave when serving basic HA problems.
Attendees will be able to get a clearer understanding of which solutions will serve them better, and in which cases.
Accessing data through hibernate: what DBAs should tell to developers and vic...Marco Tusa
Accessing data through Hibernate, what DBA should tell to developers by Marco Tusa & Francisco Bordenave
This presentation will go through the simple process of accessing data from a Java application. What actually happens when we use a simple direct connection, and what instead happen using an ORM/Persistent layer like hibernate. How this apparently makes programmers life easier and DBAs days more difficult.
Best practice-high availability-solution-geo-distributed-finalMarco Tusa
Nowadays implementing different grades of business continuity for the data layer storage is a common requirement. When designing architectures that include MySQL as a data layer, we have different options to cover the required target. Nevertheless we still see a lot of confusion when in the need to properly cover concepts such as High Availability and Disaster Recovery. Confusion that often leads to improper architecture design and wrong solution implementation. This presentation aims to remove that confusion and provide clear guidelines when in the need to design a robust, flexible resilient architecture for your data layer.
In this presentation I am illustrating how and why InnodDB perform Merge and Split pages. I will also show what are the possible things to do to reduce the impact.
Robust HA Solutions - Native Support for PXC and InnoDB cluster in ProxySQL
This talk will illustrate and discuss several MySQL reference architectures that implement a different grade of tightly coupled database cluster.
We will show how ProxySQL implementation is a natural fit in all of them, and how easily it will provide additional stability and functionalities improvement.
Are we there Yet?? (The long journey of Migrating from close source to opens...Marco Tusa
Migrating from Oracle to MySQL or another Open source RDBMS like Postgres is not as straightforward as many think if not well guided. Check what it means doing with someone that has done it already.
Mysql8 advance tuning with resource groupMarco Tusa
I have a very noisy secondary application written by a very, very bad developer that accesses my servers, mostly with read queries, and occasionally with write updates. Reads and writes are obsessive and create an impact on the MAIN application. My task is to limit the impact of this secondary application without having the main one affected. To do that I will create two resource groups, one for WRITE and another for READ. The first group, Write_app2, will have no cpu affiliation, but will have lowest priority.
Advance Sharding Solution with ProxySQL
ProxySQL is a very powerful platform that allows us to manipulate and manage our connections and queries in a simple but effective way.
Historically MySQL lacks in sharding capability. This significant missing part had often cause developer do implement sharding at application level, or DBA/SA to move on to another solution.
ProxySQL comes with an elegant and simple solution that allow us to implement sharding capability with MySQL without the need to perform significant, or at all, changes in the code.
This brief presentation will illustrate how to successfully configure and use ProxySQL to perform sharding, from very simple approach based on connection user/ip/port, to complicate ones that see the need to read values inside queries.
Geographically dispersed perconaxtra db cluster deploymentMarco Tusa
Geographically Dispersed Percona XtraDB Cluster Deployment
Percona XtraDB Cluster is a very robust, high performing and widly used solution to answer to High Availability needs. But it can be very challinging when we are in the need to deploy the cluster over a geographically disperse area.
This presentation will briefely discuss what is the right approach to sucessfully deploy PXC when in the need to cover multiple geographical sites, close and far.
- What is PXC and what happens in a set of node when commit
- Let us clarify, geo dispersed
- What to keep in mind then
- how to measure it correctly
- Use the right way (sync/async)
- Use help like replication_manager
After some years, MySQL with Galera became the most common solution for synchronous replication. The cloud (and EC2 in particular) was one of the platforms that most successfully employed MySQL/Galera installations.
This year with Aurora, Amazon introduced an alternative solution that use all the flexibility of AWS and simplicity of RDS.
This presentation describes the behavior of both MySQL/Galera and Aurora, showing the details of how the two different solutions behave when dealing with same load. We will highlight the strong point of each, and which represents the best tool - depending on the needs of the situation.
Attendees will be able to make an informed decision on what kind of solutions will be the most efficient, in respect to their actual requirements.
Presentation shows how ProxySQL can improve the HA in solution like MySQL async and sync replication without the need to increase the platform complexity.
Scaling with sync_replication using Galera and EC2Marco Tusa
Challenging architecture design, and proof of concept on a real case of study using Syncrhomous solution.
Customer asks me to investigate and design MySQL architecture to support his application serving shops around the globe.
Scale out and scale in base to sales seasons.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
4. 4
What is ProxySQL (in 1 slide)
• ProxySQL has an advanced multi-core architecture.
• It's built from the ground up to support hundreds of thousands of
concurrent connections, multiplexed to potentially hundreds of backend
servers.
• Query filtering by design
• Query caching
• Embedded configuration distribution (cluster)
• Design to scale (the largest ProxySQL deployment spans several
hundred proxies).
• … and more
5. 5
What is AWS Aurora (in 1 slide)
• Amazon Aurora is a MySQL and PostgreSQL compatible relational
database built for the cloud
• Features a distributed, fault-tolerant, self-healing storage system that
auto-scales up to 64TB per database instance
• Delivers high performance and availability with up to 15 low-latency read
replicas, point-in-time recovery, continuous backup to Amazon S3, and
replication across three Availability Zones
• fully managed by Amazon Relational Database Service (RDS)
6. 6
The problem
ProxySQL deal with backend servers using:
• Replication Hostgroup
• Async replication
• Scheduler
• PXC, NDB etc
AWS Aurora do not use READ_ONLY but INNODB_READ_ONLY
https://dev.mysql.com/doc/refman/5.7/en/innodb-read-only-instance.html
7. 7
Solution
October 2017, this issue was opened
(https://github.com/sysown/proxysql/issues/1195 )
MYHGM_MYSQL_REPLICATION_HOSTGROUPS "CREATE TABLE mysql_replication_hostgroups
(writer_hostgroup INT CHECK (writer_hostgroup>=0) NOT NULL PRIMARY KEY ,
reader_hostgroup INT NOT NULL CHECK (reader_hostgroup<>writer_hostgroup AND
reader_hostgroup>=0) , check_type VARCHAR CHECK (LOWER(check_type) IN
('read_only','innodb_read_only','super_read_only')) NOT NULL DEFAULT 'read_only' ,
comment VARCHAR NOT NULL DEFAULT '' , UNIQUE (reader_hostgroup))”
mysql> select * from mysql_replication_hostgroups;
+------------------+------------------+------------------+------------+
| writer_hostgroup | reader_hostgroup | check_type | comment |
+------------------+------------------+------------------+------------+
| 70 | 71 | innodb_read_only | aws-aurora |
+------------------+------------------+------------------+------------+
1 row in set (0.00 sec)
8. 8
How to implement
First rollout your Aurora setup
• Identify the Endpoint for EACH instance
• aws rds describe-db-instances
• Web interface
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections)
VALUES ('proxysqltestdb.eu-central-1',70,3306,1000,2000);
VALUES ('proxysqltestdb.eu-central-1',71,3306,1000,2000);
VALUES ('proxysqltestdb2.eu-central-1',71,3306,1000,2000);
VALUES ('proxysqltestdb-eu-central-1b.eu-central.1',71,3306,1,2000);
INSERT INTO mysql_replication_hostgroups(writer_hostgroup,reader_hostgroup,comment,check_type)
VALUES (70,71,'aws-aurora’, 'innodb_read_only’);
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
13. 13
Why it happens
The Cluster endpoint is an endpoint for an Aurora DB cluster that connects
to the current primary instance for that DB cluster. Each Aurora DB cluster
has a cluster endpoint and one primary instance.
That endpoint receives the read and write request and sends them to the
same instance. The main use for it is to perform failover if needed.
Each Aurora DB cluster has a reader endpoint. If there is more than one
Aurora Replica, the reader endpoint directs each connection request to one
of the Aurora Replicas. The reader endpoint only load balances
connections to available Aurora Replicas in an Aurora DB cluster. It
does not load balance specific queries.
If you want to load balance queries to distribute the read workload for a DB
cluster, you need to manage that in your application and use instance
endpoints to connect directly to Aurora Replicas to balance the load.
14. 14
Why it happens
ProxySQL can redirect the queries as you like and to the instance you want.
How do we read this graph? From left to right:
• read_only test with an Aurora cluster endpoint
• read_only test with ProxySQL
• write_only with an Aurora cluster endpoint
• write_only with ProxySQL
• read and write with an Aurora cluster endpoint
• read and write with ProxySQL
15. 15
Conclusions
• Native AWS Cluster endpoints and Reader endpoints are limited in
what they offer
• With ProxySQL you can very granularly choose how to use each
instance, without the need to have the application modify how it works
• Using ProxySQL will allow the use of additional elements like
• Query Cache
• Query rewrite
• Blocking/firewalling