The document discusses planning and executing a major version upgrade of PostgreSQL in a production environment. It evaluates different upgrade strategies including pg_dump/restore, pg_upgrade, and logical replication. For the author's environment, a two-step upgrade from 9.3 to the latest version using pg_upgrade then logical replication is proposed. Significant considerations include minimizing downtime, handling workload during downtime, evaluating performance on the new version, and creating a rollback plan.
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon
In this presentation, we will introduce Hotspot's Garbage First collector (G1GC) as the most suitable collector for latency-sensitive applications running with large memory environments. We will first discuss G1GC internal operations and tuning opportunities, and also cover tuning flags that set desired GC pause targets, change adaptive GC thresholds, and adjust GC activities at runtime. We will provide several HBase case studies using Java heaps as large as 100GB that show how to best tune applications to remove unpredicted, protracted GC pauses.
Will it Scale? The Secrets behind Scaling Stream Processing ApplicationsNavina Ramesh
This talk was presented at the Apache Big Data 2016, North America conference that was held in Vancouver, CA (http://events.linuxfoundation.org/events/archive/2016/apache-big-data-north-america/program/schedule)
Apache Arrow-Based Unified Data Sharing and Transferring Format Among CPU and...Databricks
CPU technologies have scaled well in past years, by more complex architecture design, more wide execution pipelines, more cores in same processor, and higher frequency. However accelerators show more computational power and higher throughput with lower cost in dedicated area, which leads to more usages in Spark. But when we integrate accelerators in Spark a common case is huge performance promises through micro test with little performance boost actually we get. One reason is the cost of data transfer between JVM and accelerator. The other reason is the accelerator lack the information how it's used in Spark. In this research, we investigate the usage of apache arrow based dataframe as the unified data sharing and transferring way between CPU and accelerators, and make it dataframe aware when we design hardware and software stack. In this way we seamlessly integrate Spark and Accelerators design and get close to promised performance.
by Rajeev Srinivasan, Sr. Solutions Architect, AWS
Amazon DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit millisecond latency at any scale. It is a fully managed cloud database and supports both document and key-value store models. Its flexible data model, reliable performance, and automatic scaling of throughput capacity, makes it a great fit for mobile, web, gaming, ad tech, IoT, and many other applications. We’ll take a look at how DynamoDB works and how it can be accelerated by DAX, the DynamoDB Accelerator.
Amazon Aurora with PostgreSQL Compatibility is a relational database service that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open-source databases. We review the functionality in order to understand the architectural differences that contribute to improved scalability, availability, and durability. We also dive deep into the capabilities of the service and review the latest available features. Finally, we walk through the techniques that can be used to migrate to Amazon Aurora.
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon
In this presentation, we will introduce Hotspot's Garbage First collector (G1GC) as the most suitable collector for latency-sensitive applications running with large memory environments. We will first discuss G1GC internal operations and tuning opportunities, and also cover tuning flags that set desired GC pause targets, change adaptive GC thresholds, and adjust GC activities at runtime. We will provide several HBase case studies using Java heaps as large as 100GB that show how to best tune applications to remove unpredicted, protracted GC pauses.
Will it Scale? The Secrets behind Scaling Stream Processing ApplicationsNavina Ramesh
This talk was presented at the Apache Big Data 2016, North America conference that was held in Vancouver, CA (http://events.linuxfoundation.org/events/archive/2016/apache-big-data-north-america/program/schedule)
Apache Arrow-Based Unified Data Sharing and Transferring Format Among CPU and...Databricks
CPU technologies have scaled well in past years, by more complex architecture design, more wide execution pipelines, more cores in same processor, and higher frequency. However accelerators show more computational power and higher throughput with lower cost in dedicated area, which leads to more usages in Spark. But when we integrate accelerators in Spark a common case is huge performance promises through micro test with little performance boost actually we get. One reason is the cost of data transfer between JVM and accelerator. The other reason is the accelerator lack the information how it's used in Spark. In this research, we investigate the usage of apache arrow based dataframe as the unified data sharing and transferring way between CPU and accelerators, and make it dataframe aware when we design hardware and software stack. In this way we seamlessly integrate Spark and Accelerators design and get close to promised performance.
by Rajeev Srinivasan, Sr. Solutions Architect, AWS
Amazon DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit millisecond latency at any scale. It is a fully managed cloud database and supports both document and key-value store models. Its flexible data model, reliable performance, and automatic scaling of throughput capacity, makes it a great fit for mobile, web, gaming, ad tech, IoT, and many other applications. We’ll take a look at how DynamoDB works and how it can be accelerated by DAX, the DynamoDB Accelerator.
Amazon Aurora with PostgreSQL Compatibility is a relational database service that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open-source databases. We review the functionality in order to understand the architectural differences that contribute to improved scalability, availability, and durability. We also dive deep into the capabilities of the service and review the latest available features. Finally, we walk through the techniques that can be used to migrate to Amazon Aurora.
In this session, you will learn the work Xiaomi has done to improve the availability and stability of our HBase clusters, including cross-site data and service backup and a coordinated compaction framework. You'll also learn about the Themis framework, which supports cross-row transactions on HBase based on Google's percolator algorithm, and its usage in Xiaomi's applications.
Stateful streaming and the challenge of stateYoni Farin
The different challenges of working with state in a distributed streaming data pipeline and how we solve it with the 3S architecture and Kafka streams stores based on rocksDB
Now that you've seen Base 1.0, what's ahead in HBase 2.0, and beyond—and why? Find out from this panel of people who have designed and/or are working on 2.0 features.
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLSeveralnines
To operate PostgreSQL efficiently, you need to have insight into database performance and make sure it is at optimal levels.
With that in mind, we dive into monitoring PostgreSQL for performance in this webinar replay.
PostgreSQL offers many metrics through various status overviews and commands, but which ones really matter to you? How do you trend and alert on them? What is the meaning behind the metrics? And what are some of the most common causes for performance problems in production?
We discuss this and more in ordinary, plain DBA language. We also have a look at some of the tools available for PostgreSQL monitoring and trending; and we’ll show you how to leverage ClusterControl’s PostgreSQL metrics, dashboards, custom alerting and other features to track and optimize the performance of your system.
AGENDA
- PostgreSQL architecture overview
- Performance problems in production
- Common causes
- Key PostgreSQL metrics and their meaning
- Tuning for performance
- Performance monitoring tools
- Impact of monitoring on performance
- How to use ClusterControl to identify performance issues
- Demo
SPEAKER
Sebastian Insausti, Support Engineer at Severalnines, has loved technology since his childhood, when he did his first computer course (Windows 3.11). And from that moment he was decided on what his profession would be. He has since built up experience with MySQL, PostgreSQL, HAProxy, WAF (ModSecurity), Linux (RedHat, CentOS, OL, Ubuntu server), Monitoring (Nagios), Networking and Virtualization (VMWare, Proxmox, Hyper-V, RHEV).
Prior to joining Severalnines, Sebastian worked as a consultant to state companies in security, database replication and high availability scenarios. He’s also a speaker and has given a few talks locally on InnoDB Cluster and MySQL Enterprise together with an Oracle team. Previous to that, he worked for a Mexican company as chief of sysadmin department as well as for a local ISP (Internet Service Provider), where he managed customers' servers and connectivity.
This webinar builds upon a related blog post by Sebastian: https://severalnines.com/blog/performance-cheat-sheet-postgresql.
PGConf APAC 2018 - Monitoring PostgreSQL at ScalePGConf APAC
Speaker: Lukas Fittl
Your PostgreSQL database is one of the most important pieces of your architecture - yet the level of introspection available in Postgres is often hard to work with. Its easy to get very detailed information, but what should you really watch out for, send reports on and alert on?
In this talk we'll discuss how query performance statistics can be made accessible to application developers, critical entries one should monitor in the PostgreSQL log files, how to collect EXPLAIN plans at scale, how to watch over autovacuum and VACUUM operations, and how to flag issues based on schema statistics.
We'll also talk a bit about monitoring multi-server setups, first going into high availability and read standbys, logical replication, and then reviewing how monitoring looks like for sharded databases like Citus.
The talk will primarily describe free/open-source tools and statistics views readily available from within Postgres.
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...StreamNative
In this session, we provide an overview of the “Lakehouse” architecture and how Apache Pulsar™ can be used to support this architecture through integrations with the Apache Spark™ and Delta Lake to build your reliable data lake. We will also discuss the current state of Pulsar + Spark & Delta Lake connectors and discuss real world use cases and present the roadmap on what you can expect in the future of integrations between Spark, Delta Lake, and Pulsar communities.
AWS re:Invent 2019 - DAT328 Deep Dive on Amazon Aurora PostgreSQLGrant McAlister
Amazon Aurora with PostgreSQL compatibility is a relational database service that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open-source databases. In this session, we review the functionality in order to understand the architectural differences that contribute to improved scalability, availability, and durability. You'll also get a deep dive into the capabilities of the service and a review of the latest available features. Finally, we walk you through the techniques that you can use to migrate to Amazon Aurora.
How to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech TalksAmazon Web Services
Learning Objectives:
- Learn how to migrate from Cassandra to DynamoDB
- Learn about the considerations and pre-requisites for migrating to DynamoDB
- Learn the benefits of a fully managed nosql database - DynamoDB
Presentation on Apache Apex, the enterprise-grade big data analytics platform and how it is used in production use cases. In this talk you will learn about:
• Architecture highlights: high throughput, low-latency, operability with stateful fault tolerance, strong processing guarantees, auto-scaling etc
• Application development model, unified approach for real-time and batch use cases
• Tools for ease of use, ease of operability and ease of management
• How customers use Apache Apex in production
Speakers:
Pramod Immaneni is Apache Apex (incubating) PPMC member, committer and senior architect at DataTorrent Inc, where he works on Apex and specializes in big data applications. Prior to DataTorrent he was a co-founder and CTO of Leaf Networks LLC, eventually acquired by Netgear Inc, where he built products in core networking space and was granted patents in peer-to-peer VPNs. Prior to that he was a technical co-founder of a mobile startup where he was an architect of a dynamic content rendering engine for mobile devices.
Deep dive into the Rds PostgreSQL Universe Austin 2017Grant McAlister
A deep dive into the two RDS PostgreSQL offerings, RDS PostgreSQL and Aurora PostgreSQL. Covering what is common between the engines, what is different and updates that we have done over the past year.
Business-critical MySQL with DR in vCloud AirContinuent
VMware Continuent enables demanding enterprise customers to process billions of business-critical transactions using economical MySQL relational databases. Join us to learn how VMware Continuent adds high-availability, disaster recovery, and real-time data warehouse loading to off-the-shelf MySQL operating in vCloud Air. We introduce vCloud Air basics, then do a deep dive into the VMware Continuent system architecture covering important issues like failover, zero-downtime maintenance, and load scaling. We will conclude with a demonstration of using VMware Continuent to implement disaster recovery between a typical on-prem environment and vCloud Air.
Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...Grant McAlister
Presentation from Postgres Open 2016 in Dallas (Sept 2016) - Covers new RDS features introduced over the last year and lessons learned operating a large fleet of PostgreSQL.
Parallelization of Structured Streaming Jobs Using Delta LakeDatabricks
We’ll tackle the problem of running streaming jobs from another perspective using Databricks Delta Lake, while examining some of the current issues that we faced at Tubi while running regular structured streaming. A quick overview on why we transitioned from parquet data files to delta and the problems it solved for us in running our streaming jobs.
Pg conf 2017 HIPAA Compliant and HA DB architecture on AWSGlenn Poston
In this talk we will be sharing Clearcares' DB re-architecture journey to achieve the following key goals: - Ability to increase / decrease the PostgreSQL read servers as per the load - High availability of the database tier - HIPAA compliance
This talk will focus on the following: - The challenges faced with the old architecture - The requirements for the new architecture - Some of the options evaluated for the new architecture - How AWS products contributed in this new design (ELB, ASG, EBS) - Influence of HIPAA requirements on the design - Challenges faced during this re-architecture
https://www.pgconf.us/conferences/2017/program/proposals/355
PostgreSQL is a powerful, enterprise class open source object-relational database system with an emphasis on extensibility and standards-compliance. PostgreSQL boasts many sophisticated features and runs stored procedures in more than a dozen programming languages. We’ll explore the advantages and limitations of PostgreSQL, examples of where it is best suited for use, and examples of who is using PostgreSQL to power their applications.
by Ben Willett, Solutions Architect, AWS
Database Week at the AWS Loft is an opportunity to learn about Amazon’s broad and deep family of managed database services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon RDS and Amazon Aurora relational databases, Amazon DynamoDB non-relational databases, Amazon Neptune graph databases, and Amazon ElastiCache managed Redis, along with options for database migration, caching, search and more. You'll will learn how to get started, how to support applications, and how to scale.
In this session, you will learn the work Xiaomi has done to improve the availability and stability of our HBase clusters, including cross-site data and service backup and a coordinated compaction framework. You'll also learn about the Themis framework, which supports cross-row transactions on HBase based on Google's percolator algorithm, and its usage in Xiaomi's applications.
Stateful streaming and the challenge of stateYoni Farin
The different challenges of working with state in a distributed streaming data pipeline and how we solve it with the 3S architecture and Kafka streams stores based on rocksDB
Now that you've seen Base 1.0, what's ahead in HBase 2.0, and beyond—and why? Find out from this panel of people who have designed and/or are working on 2.0 features.
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLSeveralnines
To operate PostgreSQL efficiently, you need to have insight into database performance and make sure it is at optimal levels.
With that in mind, we dive into monitoring PostgreSQL for performance in this webinar replay.
PostgreSQL offers many metrics through various status overviews and commands, but which ones really matter to you? How do you trend and alert on them? What is the meaning behind the metrics? And what are some of the most common causes for performance problems in production?
We discuss this and more in ordinary, plain DBA language. We also have a look at some of the tools available for PostgreSQL monitoring and trending; and we’ll show you how to leverage ClusterControl’s PostgreSQL metrics, dashboards, custom alerting and other features to track and optimize the performance of your system.
AGENDA
- PostgreSQL architecture overview
- Performance problems in production
- Common causes
- Key PostgreSQL metrics and their meaning
- Tuning for performance
- Performance monitoring tools
- Impact of monitoring on performance
- How to use ClusterControl to identify performance issues
- Demo
SPEAKER
Sebastian Insausti, Support Engineer at Severalnines, has loved technology since his childhood, when he did his first computer course (Windows 3.11). And from that moment he was decided on what his profession would be. He has since built up experience with MySQL, PostgreSQL, HAProxy, WAF (ModSecurity), Linux (RedHat, CentOS, OL, Ubuntu server), Monitoring (Nagios), Networking and Virtualization (VMWare, Proxmox, Hyper-V, RHEV).
Prior to joining Severalnines, Sebastian worked as a consultant to state companies in security, database replication and high availability scenarios. He’s also a speaker and has given a few talks locally on InnoDB Cluster and MySQL Enterprise together with an Oracle team. Previous to that, he worked for a Mexican company as chief of sysadmin department as well as for a local ISP (Internet Service Provider), where he managed customers' servers and connectivity.
This webinar builds upon a related blog post by Sebastian: https://severalnines.com/blog/performance-cheat-sheet-postgresql.
PGConf APAC 2018 - Monitoring PostgreSQL at ScalePGConf APAC
Speaker: Lukas Fittl
Your PostgreSQL database is one of the most important pieces of your architecture - yet the level of introspection available in Postgres is often hard to work with. Its easy to get very detailed information, but what should you really watch out for, send reports on and alert on?
In this talk we'll discuss how query performance statistics can be made accessible to application developers, critical entries one should monitor in the PostgreSQL log files, how to collect EXPLAIN plans at scale, how to watch over autovacuum and VACUUM operations, and how to flag issues based on schema statistics.
We'll also talk a bit about monitoring multi-server setups, first going into high availability and read standbys, logical replication, and then reviewing how monitoring looks like for sharded databases like Citus.
The talk will primarily describe free/open-source tools and statistics views readily available from within Postgres.
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...StreamNative
In this session, we provide an overview of the “Lakehouse” architecture and how Apache Pulsar™ can be used to support this architecture through integrations with the Apache Spark™ and Delta Lake to build your reliable data lake. We will also discuss the current state of Pulsar + Spark & Delta Lake connectors and discuss real world use cases and present the roadmap on what you can expect in the future of integrations between Spark, Delta Lake, and Pulsar communities.
AWS re:Invent 2019 - DAT328 Deep Dive on Amazon Aurora PostgreSQLGrant McAlister
Amazon Aurora with PostgreSQL compatibility is a relational database service that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open-source databases. In this session, we review the functionality in order to understand the architectural differences that contribute to improved scalability, availability, and durability. You'll also get a deep dive into the capabilities of the service and a review of the latest available features. Finally, we walk you through the techniques that you can use to migrate to Amazon Aurora.
How to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech TalksAmazon Web Services
Learning Objectives:
- Learn how to migrate from Cassandra to DynamoDB
- Learn about the considerations and pre-requisites for migrating to DynamoDB
- Learn the benefits of a fully managed nosql database - DynamoDB
Presentation on Apache Apex, the enterprise-grade big data analytics platform and how it is used in production use cases. In this talk you will learn about:
• Architecture highlights: high throughput, low-latency, operability with stateful fault tolerance, strong processing guarantees, auto-scaling etc
• Application development model, unified approach for real-time and batch use cases
• Tools for ease of use, ease of operability and ease of management
• How customers use Apache Apex in production
Speakers:
Pramod Immaneni is Apache Apex (incubating) PPMC member, committer and senior architect at DataTorrent Inc, where he works on Apex and specializes in big data applications. Prior to DataTorrent he was a co-founder and CTO of Leaf Networks LLC, eventually acquired by Netgear Inc, where he built products in core networking space and was granted patents in peer-to-peer VPNs. Prior to that he was a technical co-founder of a mobile startup where he was an architect of a dynamic content rendering engine for mobile devices.
Deep dive into the Rds PostgreSQL Universe Austin 2017Grant McAlister
A deep dive into the two RDS PostgreSQL offerings, RDS PostgreSQL and Aurora PostgreSQL. Covering what is common between the engines, what is different and updates that we have done over the past year.
Business-critical MySQL with DR in vCloud AirContinuent
VMware Continuent enables demanding enterprise customers to process billions of business-critical transactions using economical MySQL relational databases. Join us to learn how VMware Continuent adds high-availability, disaster recovery, and real-time data warehouse loading to off-the-shelf MySQL operating in vCloud Air. We introduce vCloud Air basics, then do a deep dive into the VMware Continuent system architecture covering important issues like failover, zero-downtime maintenance, and load scaling. We will conclude with a demonstration of using VMware Continuent to implement disaster recovery between a typical on-prem environment and vCloud Air.
Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...Grant McAlister
Presentation from Postgres Open 2016 in Dallas (Sept 2016) - Covers new RDS features introduced over the last year and lessons learned operating a large fleet of PostgreSQL.
Parallelization of Structured Streaming Jobs Using Delta LakeDatabricks
We’ll tackle the problem of running streaming jobs from another perspective using Databricks Delta Lake, while examining some of the current issues that we faced at Tubi while running regular structured streaming. A quick overview on why we transitioned from parquet data files to delta and the problems it solved for us in running our streaming jobs.
Pg conf 2017 HIPAA Compliant and HA DB architecture on AWSGlenn Poston
In this talk we will be sharing Clearcares' DB re-architecture journey to achieve the following key goals: - Ability to increase / decrease the PostgreSQL read servers as per the load - High availability of the database tier - HIPAA compliance
This talk will focus on the following: - The challenges faced with the old architecture - The requirements for the new architecture - Some of the options evaluated for the new architecture - How AWS products contributed in this new design (ELB, ASG, EBS) - Influence of HIPAA requirements on the design - Challenges faced during this re-architecture
https://www.pgconf.us/conferences/2017/program/proposals/355
PostgreSQL is a powerful, enterprise class open source object-relational database system with an emphasis on extensibility and standards-compliance. PostgreSQL boasts many sophisticated features and runs stored procedures in more than a dozen programming languages. We’ll explore the advantages and limitations of PostgreSQL, examples of where it is best suited for use, and examples of who is using PostgreSQL to power their applications.
by Ben Willett, Solutions Architect, AWS
Database Week at the AWS Loft is an opportunity to learn about Amazon’s broad and deep family of managed database services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon RDS and Amazon Aurora relational databases, Amazon DynamoDB non-relational databases, Amazon Neptune graph databases, and Amazon ElastiCache managed Redis, along with options for database migration, caching, search and more. You'll will learn how to get started, how to support applications, and how to scale.
by Mahesh Pakal, AWS
PostgreSQL is a powerful, enterprise class open source object-relational database system with an emphasis on extensibility and standards-compliance. PostgreSQL boasts many sophisticated features and runs stored procedures in more than a dozen programming languages. We’ll explore the advantages and limitations of PostgreSQL, examples of where it is best suited for use, and examples of who is using PostgreSQL to power their applications.
Docker containers are great for running stateless microservices, but what about stateful applications such as databases and persistent queues? Kubernetes provides the StatefulSets controller for such applications that have to manage data in some form of persistent storage. While StatefulSets is a great start, a lot more goes into ensuring high performance, data durability and high availability for stateful apps in Kubernetes. Following are 5 best practices that developers and operations engineers should be aware of.
1. Ensure high performance with local persistent volumes and pod anti-affinity rules.
2. Achieve data resilience with auto-failover and multi-zone pod scheduling.
3. Integrate StatefulSet services with other application services through NodePorts & LoadBalancer services.
4. Run Day 2 operations such as monitoring, elastic scaling, capacity re-sizing, backups with caution.
5. Automate operations through Kubernetes Operators that extend the StatefulSets controller.
We will demonstrate how to run a complete E-Commerce application powered by YugaByte DB, when all services are deployed in Kubernetes.
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...HostedbyConfluent
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam K Dey | Current 2022
Robinhood’s mission is to democratize finance for all. Data driven decision making is key to achieving this goal. Data needed are hosted in various OLTP databases. Replicating this data near real time in a reliable fashion to data lakehouse powers many critical use cases for the company. In Robinhood, CDC is not only used for ingestion to data-lake but is also being adopted for inter-system message exchanges between different online micro services. .
In this talk, we will describe the evolution of change data capture based ingestion in Robinhood not only in terms of the scale of data stored and queries made, but also the use cases that it supports. We will go in-depth into the CDC architecture built around our Kafka ecosystem using open source system Debezium and Apache Hudi. We will cover online inter-system message exchange use-cases along with our experience running this service at scale in Robinhood along with lessons learned.
This is the presentation I delivered on Hadoop User Group Ireland meetup in Dublin on Nov 28 2015. It covers at glance the architecture of GPDB and most important its features. Sorry for the colors - Slideshare is crappy with PDFs
Real Time Analytics for Big Data a Twitter Case StudyNati Shalom
Hadoop's batch-oriented processing is sufficient for many use cases, especially where the frequency of data reporting doesn't need to be up-to-the-minute. However, batch processing isn't always adequate, particularly when serving online needs such as mobile and web clients, or markets with real-time changing conditions such as finance and advertising.
In the same way that Hadoop was born out of large-scale web applications, a new class of scalable frameworks and platforms for handling real time streaming processing or real time analysis is born to handle the needs of large-scale location-aware mobile, social and sensor use.
Facebook, Twitter and Google have been pioneers in that arena and recently launched new analytics services designed to meet the real time needs.
In this session we will review the common patterns and architectures that drive these platforms and learn how to build a Twitter-like analytics system in a simple way using frameworks such as Spring Social, Active In-Memory Data Grid for Big Data event processing, and NoSQL database such as Cassandra or HBase for handling the managing the historical data.
Participants in this session will also receive a hands-on tutorial for trying out these patterns on their own environment.
A detailed post covering the topic including a reference to a code example illustrating the reference architecture is available below:
http://horovits.wordpress.com/2012/01/27/analytics-for-big-data-venturing-with-the-twitter-use-case/
by John McGrath, Startup Solutions Architect, AWS
Database Week at the AWS Loft is an opportunity to learn about Amazon’s broad and deep family of managed database services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon RDS and Amazon Aurora relational databases, Amazon DynamoDB non-relational databases, Amazon Neptune graph databases, and Amazon ElastiCache managed Redis, along with options for database migration, caching, search and more. You'll will learn how to get started, how to support applications, and how to scale.
Data Analytics Week at the San Francisco Loft
Loading Data Into Redshift
How do you get data from your sources into your Redshift data warehouse? We'll show how to use AWS Glue and Amazon Kinesis Firehose to make it easy to automate the work to get data loaded.
Speakers:
Jay Formosa - Solutions Architect, AWS
Asser Moustafa - Data Warehouse Specialist Solutions Architect, AWS
Loading Data into Redshift: Data Analytics Week at the SF LoftAmazon Web Services
Loading Data into Redshift: Data Analytics Week at the San Francisco Loft
How do you get data from your sources into your Redshift data warehouse? We'll show how to use AWS Glue and Amazon Kinesis Firehose to make it easy to automate the work to get data loaded.
Level: Intermediate
Speakers:
Aser Moustafa - Data Warehouse Specialist Solutions Architect, AWS
Vikram Gangulavoipalyam - Enterprise Solutions Architect, AWS
by Manish Mohite, Solutions Architect, AWS
How do you get data from your sources into your Redshift data warehouse? We'll show how to use AWS Glue and Amazon Kinesis Firehose to make it easy to automate the work to get data loaded.
DAT316_Report from the field on Aurora PostgreSQL PerformanceAmazon Web Services
Tatsuo Ishii from SRA OSS has done extensive testing to compare the Aurora PostgreSQL-compatible Edition with standard PostgreSQL. In this session, he will present his performance testing results, and his work on Pgpool-II with Aurora; Pgpool-II is an open source tool which provides load balancing, connection pooling, and connection management for PostgreSQL.
Report from the Field on the PostgreSQL-compatible Edition of Amazon Aurora -...Amazon Web Services
Tatsuo Ishii from SRA OSS has done extensive testing to compare the Aurora PostgreSQL-compatible Edition with standard PostgreSQL. In this session, he will present his performance testing results, and his work on Pgpool-II with Aurora; Pgpool-II is an open source tool which provides load balancing, connection pooling, and connection management for PostgreSQL.
Database Week at the San Francisco Loft
Oracle and SQL Server on the Cloud
Amazon Relational Database Service (Amazon RDS) makes it easy to set up, operate, and scale a relational database in the cloud. In this session we'll look at open commercial databases supported by Amazon RDS.
Level: 200
Speakers:
Joyjeet Banerjee - Enterprise Solutions Architect, AWS
Vishwajit Tigadi - Manager, Strategic Accounts, AWS
Big data challenges are common : we are all doing aggregations , machine learning , anomaly detection, OLAP ...
This presentation describe how InnerActive answer those requirements
How to use postgresql.conf to configure and tune the PostgreSQL serverEDB
Tuning your PostgreSQL server plays an important role in making sure you get the most out of your server resources, and running with default parameters is not always enough. Using the PostgreSQL server configuration file postgresql.conf, we can tune the right area and make the most out of the server resources. The postgresql.conf file tuning parameters are classified into different categories including database connections, memory, optimizers, and logging.
In this webinar, you will learn:
- A basic understanding of postgresql.conf
- The categories and parameters of postgresql.conf
- How to adjust parameters
- Expert tuning recommendations
Similar to How to Upgrade Major Version of Your Production PostgreSQL (20)
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
In the ever-evolving landscape of technology, enterprise software development is undergoing a significant transformation. Traditional coding methods are being challenged by innovative no-code solutions, which promise to streamline and democratize the software development process.
This shift is particularly impactful for enterprises, which require robust, scalable, and efficient software to manage their operations. In this article, we will explore the various facets of enterprise software development with no-code solutions, examining their benefits, challenges, and the future potential they hold.
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
How to Upgrade Major Version of Your Production PostgreSQL
1. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Keisuke Suzuki
Software engineer
How to Upgrade Major Version of
Your Production PostgreSQL
2018/12/12 PGConf.ASIA
1
2. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Who am I?
Keisuke Suzuki
• Backend Engineer @ Treasure Data K.K.
– PlazmaDB: distributed storage
– Datatank: data mart
Both uses PostgreSQL internally
• Interest: DB / Distributed system / Performance
optimization
• Twitter: @yajilobee
2
3. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
PostgreSQL Versions
3
4. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
PostgreSQL Versions
• Major version: released yearly - new features
– DB data are not compatible between different major versions
• Minor version: every 3 months at least - bug & security fixes
– DB data are compatible if major versions are the same
9.6.11 10.6
major minor major minor
4
5. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
EOL of Major Versions
https://www.postgresql.org/support/versioning/
The PostgreSQL Global Development Group supports a
major version for 5 years after its initial release.
5
6. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Q: Should we follow the latest Major Version?
A: Depends on the case
-> Upgrading requires downtime and may cause incompatibility
• Extended support is provided by venders
– Mission critical systems
• SaaS venders may stop providing old major versions
– e.g.) 9.3 retirement on Amazon RDS
✓ Stop Deployment: Aug. 2018
✓ Force Major Version Upgrade: Nov. 2018
6
7. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Our Version
We were here until May 2018...
7
8. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
PostgreSQL Usage in
Treasure Data
8
9. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Arm Treasure Data eCDP
9
10. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
PlazmaDB: Core Storage Layer of TD
Streaming data
Bulk load
PlazmaDB
Metadata
(PostgreSQL)
AWS S3 /
Riak CS
10
11. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Daily Workload & Storage Size of PlazmaDB
Import Analytical Query Storage size
500 Billion Records / day
~ 5.8 Million Records / sec
5 PB (+5~10 TB / day)
55 Trillion Records
600,000 Queries / day
15 Trillion Records / day
11
12. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
AWS RDS
Server Structure of Meta DB
Master
r3.8xlarge
(32 vcores, 244GB RAM)
Multi-AZ
Read replica
(asynchronous streaming
replication)
m3.large
(2 vcores, 7.5GB RAM)
No production workloadConnection from
applications
12
13. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Data Volume
PlazmaDB
Meta DB (PostgreSQL)
Realtime Storage Archive Storage
AWS S3 / Riak CS
5 PB
GiST GiST
Partition
Metadata
Partition
Metadata
1 TB
13
14. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
CPU & IO utilization of PostgreSQL (Meta DB)
AVG: 13%
MAX: 25%
AVG: 700 read IOPS
1100 write IOPS
MAX: 2500 read IOPS
3000 read IOPS
14
15. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
TPS & WAL Throughput
AVG: 1.2k TPS
MAX: 1.5k TPS
AVG: 4MB/sec
MAX: 20MB/sec
15
16. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
If PlazmaDB was down..
Streaming data
Bulk load
PlazmaDB
Metadata
(PostgreSQL)
AWS S3 /
Riak CS
16
17. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
If PlazmaDB was down..
17
18. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Planning Major Version Upgrade
18
19. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Long Way to Complete Upgrade
• Choose upgrade strategy
• Choose the next major version
• Plan operation and evaluation (include Rollback ops)
• Regression Test on the new major version
19
20. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Major Version Upgrade Strategies
• pg_dump/pg_dumpall
• pg_upgrade
• Logical replication
Old ver New ver
SQL
dump restore
Old ver New ver
Log shipping
Convert data
Old ver New ver
20
21. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Upgrade with pg_dump/pg_dumpall
Old ver New ver
SQL
2. pg_dump
/pg_dumpall
3. psql
/ pg_restore
1. Stop DB 4. Sart DB
Downtime: Step 1 to 4
Include Full data copy
21
22. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Upgrade with pg_upgrade Link mode
1. Stop DB
Old ver New ver
System table
User data User data
System table
3. Start DB
2.2. Create symbolic link
2. pg_upgrade
2.1. Rebuild
Downtime: Step 1 to 3
Not include user data copy 22
23. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Upgrade with Logical Replication
Old ver New ver
1. Start replication
& wait synchronization
2. Stop connections
Downtime: Step 2 to 3
DBs don’t stop during upgrade
Applications
3. Start connections
23
24. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Downtime
pg_dump/pg_dumpall >> pg_upgrade > Logical Replication
Full data copy Rebuild system tables Switch servers
Downtime
comes from...
Not acceptable
24
25. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Downtime
pg_dump/pg_dumpall >> pg_upgrade > Logical Replication
Full data copy Rebuild system tables Switch servers
Downtime
comes from...
Not acceptable
How Long?
25
26. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Downtime: pg_upgrade in RDS
• Operation
– Upgrade major version by modify-db-instance API
Perform pg-upgrade link mode internally
• Downtime -> 7-8 mins
• Note
– Downtime happens several minutes after invoking API
Exact time to happen cannot be specified.
26
27. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Upgrade of Replicas
Old ver New ver Old ver New ver
• Downtime will be longer if upgrade of replicas is required
– 1. Stop DBs
– 2. Upgrade master
– 3. Upgrade slave
– 4. Start DBs
27
28. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Logical Replication: How to switch servers?
Update application configuration Overwrite DB endpoint
Old ver New ver
App App App App App
Distribute new DB configuration
Release new DB after connections of
old DB are closed to avoid write conflict
App App App App App
IP / DNS / proxy etc..
Old ver New ver
Update only 1 point
28
29. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Downtime: Logical Replication
• Operation
– Change DB endpoints (DNS) by modify-db-instance API
• Downtime -> 1-2 mins
Old ver New ver
Applications
1. plazma.***.com
-> plazma-old.***.com
2. plazma-new.***.com
-> plazma.***.com
29
30. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Feasibility of each Strategy
pg_upgrade Logical Replication
• Pros
– Easier operation
Especially in AWS RDS
• Cons
– Longer downtime
– Upgrade only 1 major
version at once (AWS RDS)
• Pros
– Shorter downtime
• Cons
– PostgreSQL 9.3 doesn’t
have native logical
replication
30
31. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Logical Replication on PostgreSQL
• Before 9.3: 3rd party tool based on trigger
– Bucardo, Londiste, Slony
– Write amplification happens to record delta
– For AWS RDS, replication server is required outside of DB servers
• 9.4 - 9.6: 3rd party tool based on logical decoding
– pglogical, AWS Database Migration Service (DMS)
– DMS provides managed replication server
• After 10: Native logical replication
– No external process is required
31
32. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Our Plan
• Step 1: 9.3 -> 9.4 by pg_upgrade
– Operation simplicity and manageable downtime
– Done on May 2018
• Step 2: 9.4 -> latest by DMS
– Less downtime and managed service is available
– Coming soon..
32
33. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Things to Consider on Upgrade
33
34. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
How to handle downtime?
8 minutes downtime is expected
Endpoint Queue PlazmaDBWorker
Import
Query
Sync Async Retry Retry
Has capacity for 8 mins
Requests were delayed but no error was returned
34
35. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Impact of Cold Start
• Shared buffers is empty after booting DB
– In our env, Shared buffer = 160GB
– PostgreSQL page size = 8kB
– Pages to load into buffer = 160GB / 8kB = 20M pages
• IOs are issued based on some factors
– Number of queries
– Number of pages read by a query
– Locality
35
36. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Bonus of pg_upgrade: Page Cache is Hot
• Page cache of OS remains during upgrade
– Binary format of user data file is compatible
Old ver New ver
System table
User data User data
System table
symbolic link
Rebuild
Page cache
36
37. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Mitigate Impact of Cold Start
• Adjust RDS Preserved IOPS to 10k
– Our major select workload scans ~10k pages/sec
– RAM = 244GB, Buffer = 160GB -> Page cache ~ 80GB
✓ 50% is filled by page cache -> expected IOPS ~ 5k
✓ +5k just in case
• Page cache isn’t hot if you need to change servers
– pg_prewarm
37
38. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Rollback Plan
1. Promote Replica: 1-2 mins
2. Restore from Backup (if No.1 doesn’t work): 30-60 mins
Master Replica
Applications
1. plazma.***.com
-> plazma-old.***.com
2. plazma-rr.***.com
-> plazma.***.com
3. promote
38
39. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Evaluate New Major Version
39
40. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Importance of Evaluation
• Rollback is the worst case
– Longer downtime
– Data created after upgrade should be recovered manually
• Detect degradation by test
– Interface incompatibility
✓ System test on staging environment
– Performance degradation
✓ production != staging in terms of server resources
40
41. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Real World Application is Complicated..
Many Performance Related Factors
• Type of Workload
• Users’ behaviour
– # of import requests / analytical query requests
– Data skew
• Metadata Storage size
• Server Size (CPU cores, RAM, Storage, Network)
• etc..
Modeling workload and Define target performance
41
42. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Use Metrics for Modeling Workload
• Metrics Collectors
– Arm Treasure Data:
detailed analyze
– DataDog: visualization
• Metrics
– CPU / IOPS
– Table size
– # of Query / Query Types
– Cache hit ratio
– SEL / INS / DEL / UPD rows
– … 50+ metrics
42
43. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Created Benchmark Tool based on Metrics
• Enable production size perf measurement without integration
– Iteratively refined benchmark workload by adding lacked parameters
● Number of queries / Types of queries
● DB size
● Data access locality
● Data skew
• Goal of Measurement
– Check if pg 9.4 is capable of processing current production workload (+α)
– Detect different behavior of metrics
43
44. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Workload of DB
Plazma
Meta DB
Streaming
Import
Bulk
Load
Merge
Presto
Select
Presto
Insert /
delete
Hive
Select
GC
Hive
Insert
44
45. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Workload of DB
Plazma
Meta DB
Streaming
Import
Bulk
Load
Merge
Presto
Select
Presto
Insert /
delete
Hive
Select
GC
Hive
Insert
80% of query workload
Select 10k rows/sec
93% of import workload
Insert 1k rows/sec
45
46. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Selectivity
• Random sampling from actual
selectivity distribution
e.g.)
– 40% queries: sl = 1
– 5% queries: sl = [0.01, 0.5]
– 5% queries: sl = [0.5, 0.99]
Percentile of total # of queriesSelectivity
46
47. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Data Access Locality
• Metadata size = 1TB
• Shared Buffer size = 160GB
• But, Hot Data size is smaller
than Shared Buffer
e.g.)
– 85% of workload comes
from 1% data sets
– 95% of workload comes
from 5% data sets
# of Read Requested on Data Set /day
#ofPartitionsRead/day
47
48. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Result of Performance Measurement
Observed no performance degradation
48
49. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Operation in Production
49
50. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Final Upgrade Plan
• Preparation
– Increase Preserved IOPS to 10k for cold start
– Scale up replica server to the same as master for rollback
• Operation include downtime
– Invoke pg_upgrade via modify-db-instance API in scheduled
maintenance window
✓ ~8 mins downtime starts several minutes after invoking API
✓ Expected Read IOPS after upgrade ~5k
50
51. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Actual Impact
Requests were delayed 9 minutes (~ DB downtime)
51
52. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Request Queue Depth
Max 38k reqs
* 12 queues
= 450k reqs
Consumed in
11 mins
Normally < 1
Alert if > 200
52
53. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Actual Impact of Cold Start
IOPS Buffer cache misses
This could be stopped..
5k IOPS was expected
9k / 19k ~ 47% was hit on page cache
53
54. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Summary
• Choose appropriate strategy for your environment
– Downtime: pg_dump >> pg_upgrade > logical replication
– Operation easiness: pg_dump < pg_upgrade < logical replication
– Consider not only DB impact but also entire system impact
– Think of Downtime, Cold Start, Rollback
• Metrics is very important
– Estimate impact, Model workload, Evaluate performance, Measure system
healthiness
54