The document discusses using plProxy and pgBouncer to split a PostgreSQL database horizontally and vertically to improve scalability. It describes how plProxy allows functions to make remote calls to other databases and how pgBouncer can be used for connection pooling. The RUN ON clause of plProxy is also summarized, which allows queries to execute on all partitions or on a specific partition.
Connection Pooling in PostgreSQL using pgbouncer Sameer Kumar
The presentation was presented at 5th Postgres User Group, Singapore.
It explain how to setup pgbouncer and also shows a few demonstration graphs comparing the advantages/gains in performance when using pgbouncer instead of direct connections to PostgreSQL database.
[오픈소스컨설팅]Day #1 MySQL 엔진소개, 튜닝, 백업 및 복구, 업그레이드방법Ji-Woong Choi
MySQL 소개
간략한 소개
version history
MySQL 사용처
제품 군 변화
시장 변화
MySQL 구성
MySQL 클라이언트 / 서버 개념
클라이언트 프로그램
MySQL 설치
MySQL 버전
MySQL 설치
MySQL 환경 설정
환경설정, 변수 설정
MySQL 스토리지 엔진 소개
MySQL tuning 소개 및 방법
데이터 백업/복구 방법
백업
복구
MySQL Upgrade
In 40 minutes the audience will learn a variety of ways to make postgresql database suddenly go out of memory on a box with half a terabyte of RAM.
Developer's and DBA's best practices for preventing this will also be discussed, as well as a bit of Postgres and Linux memory management internals.
Connection Pooling in PostgreSQL using pgbouncer Sameer Kumar
The presentation was presented at 5th Postgres User Group, Singapore.
It explain how to setup pgbouncer and also shows a few demonstration graphs comparing the advantages/gains in performance when using pgbouncer instead of direct connections to PostgreSQL database.
[오픈소스컨설팅]Day #1 MySQL 엔진소개, 튜닝, 백업 및 복구, 업그레이드방법Ji-Woong Choi
MySQL 소개
간략한 소개
version history
MySQL 사용처
제품 군 변화
시장 변화
MySQL 구성
MySQL 클라이언트 / 서버 개념
클라이언트 프로그램
MySQL 설치
MySQL 버전
MySQL 설치
MySQL 환경 설정
환경설정, 변수 설정
MySQL 스토리지 엔진 소개
MySQL tuning 소개 및 방법
데이터 백업/복구 방법
백업
복구
MySQL Upgrade
In 40 minutes the audience will learn a variety of ways to make postgresql database suddenly go out of memory on a box with half a terabyte of RAM.
Developer's and DBA's best practices for preventing this will also be discussed, as well as a bit of Postgres and Linux memory management internals.
PostgreSQL continuous backup and PITR with BarmanEDB
How can I achieve an RPO of 5 minutes for the backups of my PostgreSQL databases? And what about RPO=0 for zero data loss backups? This talk will give you answers to those questions, by guiding you through an overview of Disaster Recovery of PostgreSQL databases with Barman, covering its key concepts and providing useful patterns and tips.
24시간 365일 서비스를 위한 MySQL DB 이중화.
MySQL 이중화 방안들에 대해 알아보고 운영하면서 겪은 고민들을 이야기해 봅니다.
목차
1. DB 이중화 필요성
2. 이중화 방안
- HW 이중화
- MySQL Replication 이중화
3. 이중화 운영 장애
4. DNS와 VIP
5. MySQL 이중화 솔루션 비교
대상
- MySQL을 서비스하고 있는 인프라 담당자
- MySQL 이중화에 관심 있는 개발자
PostgreSQL Tutorial For Beginners | EdurekaEdureka!
YouTube Link: https://youtu.be/-VO7YjQeG6Y
** MYSQL DBA Certification Training https://www.edureka.co/mysql-dba **
This Edureka PPT on PostgreSQL Tutorial For Beginners (blog: http://bit.ly/33GN7jQ) will help you learn PostgreSQL in depth. You will also learn how to install PostgreSQL on windows. The following topics will be covered in this session:
What is DBMS
What is SQL?
What is PostgreSQL?
Features of PostgreSQL
Install PostgreSQL
SQL Command Categories
DDL Commands
ER Diagram
Entity & Attributes
Keys in Database
Constraints in Database
Normalization
DML Commands
Operators
Nested Queries
Set Operations
Special Operators
Aggregate Functions
Limit, Offset & Fetch
Joins
Views
Procedures
Triggers
DCL Commands
TCL Commands
Export/ Import Data
UUID Datatype
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
This talk cover various advanced topics in the area of backups:
- incremental backups;
- archive management;
- backup validation;
- retention policies;
etc.
Based on these features, we'll compare various backup/recovery solutions for PostgreSQL.
This information will help you to choose the most appropriate tool for your system.
Presentation that I gave as a guest lecture for a summer intensive development course at nod coworking in Dallas, TX. The presentation targets beginning web developers with little, to no experience in databases, SQL, or PostgreSQL. I cover the creation of a database, creating records, reading/querying records, updating records, destroying records, joining tables, and a brief introduction to transactions.
Josh Berkus
You've heard that PostgreSQL is the highest-performance transactional open source database, but you're not seeing it on YOUR server. In fact, your PostgreSQL application is kind of poky. What should you do? While doing advanced performance engineering for really high-end systems takes years to learn, you can learn the basics to solve performance issues for 80% of PostgreSQL installations in less than an hour. In this session, you will learn: -- The parts of database application performance -- The performance setup procedure -- Basic troubleshooting tools -- The 13 postgresql.conf settings you need to know -- Where to look for more information.
PostgreSQL continuous backup and PITR with BarmanEDB
How can I achieve an RPO of 5 minutes for the backups of my PostgreSQL databases? And what about RPO=0 for zero data loss backups? This talk will give you answers to those questions, by guiding you through an overview of Disaster Recovery of PostgreSQL databases with Barman, covering its key concepts and providing useful patterns and tips.
24시간 365일 서비스를 위한 MySQL DB 이중화.
MySQL 이중화 방안들에 대해 알아보고 운영하면서 겪은 고민들을 이야기해 봅니다.
목차
1. DB 이중화 필요성
2. 이중화 방안
- HW 이중화
- MySQL Replication 이중화
3. 이중화 운영 장애
4. DNS와 VIP
5. MySQL 이중화 솔루션 비교
대상
- MySQL을 서비스하고 있는 인프라 담당자
- MySQL 이중화에 관심 있는 개발자
PostgreSQL Tutorial For Beginners | EdurekaEdureka!
YouTube Link: https://youtu.be/-VO7YjQeG6Y
** MYSQL DBA Certification Training https://www.edureka.co/mysql-dba **
This Edureka PPT on PostgreSQL Tutorial For Beginners (blog: http://bit.ly/33GN7jQ) will help you learn PostgreSQL in depth. You will also learn how to install PostgreSQL on windows. The following topics will be covered in this session:
What is DBMS
What is SQL?
What is PostgreSQL?
Features of PostgreSQL
Install PostgreSQL
SQL Command Categories
DDL Commands
ER Diagram
Entity & Attributes
Keys in Database
Constraints in Database
Normalization
DML Commands
Operators
Nested Queries
Set Operations
Special Operators
Aggregate Functions
Limit, Offset & Fetch
Joins
Views
Procedures
Triggers
DCL Commands
TCL Commands
Export/ Import Data
UUID Datatype
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
This talk cover various advanced topics in the area of backups:
- incremental backups;
- archive management;
- backup validation;
- retention policies;
etc.
Based on these features, we'll compare various backup/recovery solutions for PostgreSQL.
This information will help you to choose the most appropriate tool for your system.
Presentation that I gave as a guest lecture for a summer intensive development course at nod coworking in Dallas, TX. The presentation targets beginning web developers with little, to no experience in databases, SQL, or PostgreSQL. I cover the creation of a database, creating records, reading/querying records, updating records, destroying records, joining tables, and a brief introduction to transactions.
Josh Berkus
You've heard that PostgreSQL is the highest-performance transactional open source database, but you're not seeing it on YOUR server. In fact, your PostgreSQL application is kind of poky. What should you do? While doing advanced performance engineering for really high-end systems takes years to learn, you can learn the basics to solve performance issues for 80% of PostgreSQL installations in less than an hour. In this session, you will learn: -- The parts of database application performance -- The performance setup procedure -- Basic troubleshooting tools -- The 13 postgresql.conf settings you need to know -- Where to look for more information.
Managing Databases In A DevOps Environment 2016Robert Treat
Given at #pgdayphilly2016, this talk covers how configuration management, monitoring, and rapid deployments are impacting how we think about database management.
In recent years Zalando has adopted a decentralized setup for applications and databases. This has impacted our database engineers by transferring responsibility to small teams, each of which manages its own infrastructure. Decentralization is great for team autonomy, but can present challenges in terms of how to easily manage lots of PostgreSQL clusters. That’s why our team created Spilo: an open source HA-cluster (highly available PostgreSQL cluster). This talk will show how Spilo simplifies Postgres cluster management while preserving team autonomy. By building upon streaming replication, Spilo provides a set of clusters that require no human interaction for many administration tasks and most failure scenarios; takes care of managing the number of servers (adding and removing them); and creates backups. It implements our own version of Patroni (https://github.com/zalando/patroni): a process, derived from Compose’s Governor, that governs the Postgres cluster (promoting and demoting) and updates information in etcd (the distributed consensus key/value store created by CoreOS). I’ll explore the architecture of Patroni implemented with Spilo; a live demo will show some failovers as they occur. Finally, I’ll show how Spilo combines Patroni with cloud infrastructure architecture components (for example, AWS), adding autoscaling to run a HA-cluster and allowing AWS power users to create a new HA-cluster with very little effort. Spilo relies upon STUPS, Zalando’s open-source platform as a service (PaaS) for enabling multiple, autonomous teams to use AWS while remaining audit-compliant. By using Spilo and STUPS together, our engineers can create a new HA-cluster with just a few commands. After attending this talk the audience will understand how they can also use Spilo, Patroni and STUPS to manage their Postgres clusters more efficiently while working autonomously.
LAMP was originally Linux, Apache, MySQL, PHP. While the L & A have parts have become more flexible, most still use MySQL. With the recent acquisition by Oracle there's no better time to demystify PostgreSQL. For years PostgreSQL has had a reputation of being difficult, but this is the furthest from the truth.
This presentation by Asher Snyder will cover installation, basic queries, stored procedures, triggers, and full-text search.
Alert overload is bad for people and bad for business. In this presentation we discuss ways to get your business on board with actionable alerts, and how to prune poor alerts from your system.
Managing Databases In A DevOps EnvironmentRobert Treat
There’s a lot of talk in the devops world about bringing developer concepts to system administration, and discussion the other way about bringing the awareness of operations to developers, but a lot of the conversation leaves out what is often the most critical part of your technology stack: the database. Perhaps that’s because DBA’s have always had to keep one foot in development and one in production, before there was a devops. Or maybe DBA’s just suck at playing well with others. Bottom line; it doesn’t matter. If you are going to store data, you need a plan that both developers and operations people can understand and embrace.
At OmniTI we’ve worked with many of the leaders in the devops movement and we’ve found there are commonalties across these organizations. It’s not so much about the tools, but about the techniques they use to help people break down barriers between different roles and establish a common ownership of technology within their organizations.
Monitoring and visibility, managing schema changes and production pushes, optimization, configuration and backups; there are aspects to data storage that bring about unique challenges. You won’t need to adopt all of these techniques to be successful, but it’s time you had a frank conversation about what it takes to make your database truly “webscale”.
Slides from PGOpen 2011, But this talk was also delivered at Velocity 2011 as well.
Best Practices for a Complete Postgres Enterprise Architecture SetupEDB
This presentation provides the details of a best-practice reference architecture for deploying Postgres into your enterprise for large scale OLTP solutions. It reviews how to put all the key pieces together to build a robust, reliable and cost-effective Postgres infrastructure, providing recommendations for configuration and deployment guidance.
This presentation reviews:
* Standard requirements for robust and reliable OTLP architecture
* How to use open source based Postgres Plus building blocks to meet those requirements
* High availability system design with streaming replication
* Backup with logical and physical backup recommendations and setup for point-in-time recovery
* Replication – single master and multi-master considerations
* Database infrastructure monitoring with alerts
* Managing and tuning your Postgres database configuration
To listen to the recording visit www.enterprisedb.com - Resources - Webcasts - On Demand webcasts
Email sales@enterprisedb.com with your questions about Postgres.
Patterns and Tools for Database Versioning, Migration, Data Loading and Test ...Alan Pinstein
Talk given at CodeWorks PHP Conference in Atlanta on 9/29/2009.
Covers MP "migrations for php" project as well as other best-practices for managing database migrations for PHP projects.
Understanding ProxySQL internals and then interacting with some common features of ProxySQL such as query rewriting, mirroring, failovers, and ProxySQL Cluster
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit
Spark had been elected, deservedly, as the main massive parallel processing framework, and HDFS is the one of the most popular Big Data storage technologies. Therefore its combination is one of the most usual Big Data’s use cases. But, what happens with the security? Can these two technologies coexist in a secure environment? Furthermore, with the proliferation of BI technologies adapted to Big Data environments, that demands that several users interacts with the same cluster concurrently, can we continue to ensure that our Big Data environments are still secure? In this lecture, Abel and Jorge will explain which adaptations of Spark´s core they had to perform in order to guarantee the security of multiple concurrent users using a single Spark cluster, which can use any of its cluster managers, without degrading the outstanding Spark’s performance.
Setup oracle golden gate 11g replicationKanwar Batra
How to setup Oracle Goldengate Replication between 11gR2 RAC or Single node instances. For RAC setup the GoldenGate custom cluster service . Not part of this document
This presentation reveals many important aspects of the CUBRID Database, including its unique features, future roadmap, comparison with other databases, architecture, etc.
Container orchestration from theory to practiceDocker, Inc.
"Join Laura Frank and Stephen Day as they explain and examine technical concepts behind container orchestration systems, like distributed consensus, object models, and node topology. These concepts build the foundation of every modern orchestration system, and each technical explanation will be illustrated using SwarmKit and Kubernetes as a real-world example. Gain a deeper understanding of how orchestration systems work in practice and walk away with more insights into your production applications."
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
2. Vertical Split
All database access through functions requirement from the start
Moved out functionality into separate servers and databases as load increased.
Slony for replication, plpython for remote calls.
As number of databases grew we ran into problems
Slony replication became unmanageable because of listen/notify architecture
and locking.
We started looking for alternatives that ended up creating SkyTools PgQ and
Londiste.
userdb userdb analysisdb
userdb
shopdb
shopdb shopdb servicedb
Time
4. Horizontal Split
One server could not service one table anymore
We did first split with plpython remote calls
Replaced it with plProxy language but it had still several problems
complex configuration database
internal pooler and complex internal structure
scaleability issues
userdb userdb
userdb
userdb
userdb_p0 userdb_p2
userdb_p0 userdb_p1 userdb_p2 userdb_p3
userdb_p1 userdb_p3
Time
5. plProxy Version 2
plProxy second version
just remote call language FrontEnd WebServer
simplified internal structure
added flexibility
added features
configuration and management
userdb
improved
Connection pooling separated into
pgBouncer. pgBouncer
Resulting architecture is
Scaleable
Maintainable
Beautiful in it's simplicity :) userdb_p0 userdb_p1
7. plProxy: Installation
Download PL/Proxy from http://pgfoundry.org/projects/plproxy and extract.
Build PL/Proxy by running make and make install inside of the plproxy directory. If
your having problems make sure that pg_config from the postgresql bin directory is
in your path.
To install PL/Proxy in a database execute the commands in the plproxy.sql file. For
example psql -f $SHAREDIR/contrib/plproxy.sql mydatabase
Steps 1 and 2 can be skipped if your installed pl/proxy from a packaging system
such as RPM.
Create a test function to validate that plProxy is working as expected.
CREATE FUNCTION public.get_user_email(text) RETURNS text AS
$_$ connect 'dbname=userdb'; $_$
LANGUAGE plproxy SECURITY DEFINER;
8. plProxy Language
The language is similar to plpgsql - string quoting, comments, semicolon at the
statements end.It contains only 4 statements: CONNECT, CLUSTER, RUN and
SELECT.
Each function needs to have either CONNECT or pair of CLUSTER + RUN
statements to specify where to run the function.
CONNECT 'libpq connstr'; -- Specifies exact location where to connect and
execute the query. If several functions have same connstr, they will use same
connection.
CLUSTER 'cluster_name'; -- Specifies exact cluster name to be run on. The cluster
name will be passed to plproxy.get_cluster_* functions.
CLUSTER cluster_func(..); -- Cluster name can be dynamically decided upon
proxy function arguments. cluster_func should return text value of final cluster
name.
9. plProxy Language RUN ON ...
RUN ON ALL; -- Query will be run on all partitions in cluster in parallel.
RUN ON ANY; -- Query will be run on random partition.
RUN ON <NR>; -- Run on partition number <NR>.
RUN ON partition_func(..); -- Run partition_func() which should return one or more
hash values. (int4) Query will be run on tagged partitions. If more than one partition
was tagged, query will be sent in parallel to them.
CREATE FUNCTION public.get_user_email(text) RETURNS text AS
$_$
cluster 'userdb';
run on public.get_hash($1);
$_$
LANGUAGE plproxy SECURITY DEFINER;
10. plProxy Configuration
Schema plproxy and 3 functions are needed for plProxy
plproxy.get_cluster_partitions(cluster_name text) – initializes
plProxy connect strings to remote databases
plproxy.get_cluster_version(cluster_name text) – used by plProxy to
determine if configuration has changed and should be read again. Should be as fast
as possible because it is called for every function call that goes through plProxy.
plproxy.get_cluster_config(in cluster_name text, out key text,
out val text) – can be used to change plProxy parameters like connection
lifetime.
CREATE FUNCTION plproxy.get_cluster_version(i_cluster text)
RETURNS integer AS $$
SELECT 1;
$$ LANGUAGE sql SECURITY DEFINER;
CREATE FUNCTION plproxy.get_cluster_config(
cluster_name text, OUT "key" text, OUT val text)
RETURNS SETOF record AS $$
SELECT 'connection_lifetime'::text as key, text( 30*60 ) as val;
$$ LANGUAGE sql;
11. plProxy: Get Cluster Partitions
CREATE FUNCTION plproxy.get_cluster_partitions(cluster_name text)
RETURNS SETOF text AS $$
begin
if cluster_name = 'userdb' then
return next 'port=9000 dbname=userdb_p00 user=proxy';
return next 'port=9000 dbname=userdb_p01 user=proxy';
return;
end if;
raise exception 'no such cluster: %', cluster_name;
end; $$ LANGUAGE plpgsql SECURITY DEFINER;
CREATE FUNCTION plproxy.get_cluster_partitions(i_cluster_name text)
RETURNS SETOF text AS $$
declare
r record;
begin
for r in
select connect_string
from plproxy.conf
where cluster_name = i_cluster_name
loop
return next r.connect_string;
end loop;
if not found then
raise exception 'no such cluster: %', i_cluster_name;
end if;
return;
end; $$ LANGUAGE plpgsql SECURITY DEFINER;
12. plProxy: Remote Calls
We use remote calls mostly for read only queries in cases where it is not reasonable
to replicate data needed to calling database.
For example balance data is changing very often but whenever doing decisions
based on balance we must use the latest balance so we use remote call to get user
balance.
Another use case when occasionally archived data is needed together with online
data.
userDB
get_email
shopDB balanceDB
get_balance
archiveDB
get_old_orders
13. plProxy: Remote Calls (update)
plProxy remote calls inside transactions that change data in remote database
should have special handling
no 2 phase commit
some mechanism should be used to handle possible problems like inserting events
into PgQ queue and let consumer validate that transaction was committed or rolled
back and act accordingly.
balanceDB
change_balance
shopDB
balance
events
balance
change
handler
14. plProxy: Proxy Databases
Additional layer between application and databases.
Keep applications database connectivity simpler giving DBA's and developer's more
flexibility for moving data and functionality around.
Security layer. By giving access to proxy database DBA's can be sure that user has
no way of accessing tables by accident or by any other means as only functions
published in proxy database are visible to user.
BackOffice
Application
manualfixDb backofficeDb
(proxy) (proxy)
shopDb userDB internalDb
15. plProxy: Run On All CREATE FUNCTION stats._get_stats(
OUT stat_name text,
OUT stat_value bigint
Run on all executes ) RETURNS SETOF record AS
query on all partitions in $_$
cluster once. Partitions cluster 'userdb';
are identified by connect run on all;
$_$
strings. LANGUAGE plproxy SECURITY DEFINER;
Useful for gathering stats
CREATE FUNCTION stats.get_stats(
from several databases OUT stat_name text,
or database partitions. OUT stat_value bigint
) RETURNS SETOF record AS
Also usable when exact $_$
partition where data select stat_name
resides is not known. , (sum(stat_value))::bigint
Then function may be run from stats._get_stats()
group by stat_name
on all partitions and only order by stat_name;
the one that has data $_$
does something. LANGUAGE sql SECURITY DEFINER;
16. plProxy: Geographical
plProxy can be used to split database into partitions based on country code.
Example database is split into 'us' and 'row' (rest of the world)
Each function call caused by online users has country code as one of the
parameters
All data is replicated into internal database for use by internal applications and batch
jobs. That also reduces number of indexes needed in online databases.
CREATE FUNCTION public.get_cluster(
i_key_cc text
onlinedb
) RETURNS text AS (proxy)
$_$
BEGIN
IF i_key_cc = 'us' THEN
RETURN 'oltp_us';
ELSE onlinedb_US onlinedb_ROW backenddb
RETURN 'oltp_row';
END IF;
END;
$_$ LANGUAGE plpgsql;
17. plProxy: Partitioning Proxy Functions
We have partitioned most of our database by username using PostgreSQL hashtext
function to get equal distribution between partitions.
When splitting databases we usually prepare new partitions in other servers and
then switch all traffic at once to keep our life pleasant.
Multiple exact copies of proxy database are in use for scaleability and availability
considerations.
CREATE FUNCTION public.get_user_email(text) RETURNS text AS
$_$
cluster 'userdb';
run on public.get_hash($1);
$_$ LANGUAGE plproxy SECURITY DEFINER;
CREATE FUNCTION public.get_hash(i_user text) RETURNS integer AS
$_$
BEGIN
return hashtext(lower(i_user));
END;
$_$ LANGUAGE plpgsql SECURITY DEFINER;
18. plProxy: Partitioning Partition Functions
Couple of functions in partconf schema added to each partition:
partconf.global_id() - gives globally unique keys
partconf.check_hash() - checks that function call is in right partition
partconf.valid_hash() - used as trigger function
CREATE FUNCTION public.get_user_email(i_username text) RETURNS text AS
$_$
DECLARE
retval text;
BEGIN
PERFORM partconf.check_hash(lower(i_username));
SELECT email
FROM users
WHERE username = lower(i_username)
INTO retval;
RETURN retval;
END;
$_$ LANGUAGE plpgsql SECURITY DEFINER;
19. plProxy: Summary
plProxy adds 1-10ms overhead when used together with pgBouncer.
Quote from Gavin M. Roy's blog “After closely watching machine stats for the first 30
minutes of production, it appears that plProxy has very little if any impact on
machine resources in our infrastructure.”
On the other hand plProxy adds complexity to development and maintenance so it
must be used with care but that is true for most everything.
Our largest cluster is currently running in 16 partitions on 16 servers.
20. pgBouncer
pgBouncer is lightweight and robust connection pooler
for Postgres.
Applications
Low memory requirements (2k per connection by
default). This is due to the fact that PgBouncer does not
need to see full packet at once. thousands of
It is not tied to one backend server, the destination connections
databases can reside on different hosts.
Supports pausing activity on all or only selected pgBouncer
databases.
Supports online reconfiguration for most of the settings.
tens of
Supports online restart/upgrade without dropping client connections
connections.
Supports protocol V3 only, so backend version must be
>= 7.4. Database
Does not parse SQL so is very fast and uses little CPU
time.
21. pgBouncer Pooling Modes
Session pooling - Most polite method. When client connects, a server connection
will be assigned to it for the whole duration it stays connected. When client
disconnects, the server connection will be put back into pool. Should be used with
legacy applications that won't work with more efficient pooling modes.
Transaction pooling - Server connection is assigned to client only during a
transaction. When PgBouncer notices that transaction is over, the server will be put
back into pool. This is a hack as it breaks application expectations of backend
connection. You can use it only when application cooperates with such usage by not
using features that can break.
Statement pooling - Most aggressive method. This is transaction pooling with a
twist - multi-statement transactions are disallowed. This is meant to enforce
"autocommit" mode on client, mostly targeted for PL/Proxy.
22. pgBouncer Pictures
http://www.last.fm/user/Russ/journal/2008/02/21/654597/
Nice blog about pgBouncer
23. pgBalancer
Keep pgBouncer small stable and
client1
focused
clientN
pgBalancer is experiment with
existing software to implement
statement level load balancing.
So we created sample configuration
using existing software that shows lvs keepalived lvs
how load could be divided on several (master) daemon (slave)
read only databases.
LVS – Linux Virtual Server is used (
http://www.linuxvirtualserver.org/)
Different load balancing
algorithms
Weights on resources.
pgBouncer pgBouncer
shopdb_ro1 shopdb_ro2
24. SODI Framework Application layer
- java / php ...
Rapid application development - UI logic
. mostly generated
Cheap changes - sodi framework
Most of the design is in metadata.
Application design stays in sync with
application over time. AppServer layer:
Minimizes number of database - java / php( ...
roundtrips. - user authentication
- roles rights
Can send multiple recordsets to one - no business logic
function call.
Can receive several recordsets from
function call. Database layer
- business logic
- plPython
- PostgreSQL
- SkyTools
25. President
President configuration management
application
President President Java Client allows updates Web
Java RW Web RO
client view only access. All outside
access to system over HTTPS and
personal certificates can be
generated if needed.
Host Kuberner polls confdb periodically for
Apache
- Harvester commands and executes received
PHP
- Discovery commands.
Harvester runs inside each host and
writes machine hardware information
into file for Kuberner to read
Discovery plugin collects specific
ConfDB
configuration files and stores them
Kuberner into ConfDB
Kuberner: File upload plugin uploads
new configuration files into hosts.
26. SkyTools
SkyTools - Python scripting framework and collection of useful database scripts.
Most of our internal tools are based on this framework.
(centralized) logging and exception handling
database connection handling
configuration management
starting and stopping scripts
Some of scripts provided by SkyTools
londiste – nice and simple replication tool
walmgr - wal standby management script
serial consumer – Script for executing functions based on data in queue
queue mover – Move events from one queue into another
queue splitter – Move events from one queue into several queues
table dispatcher – writes data from queue into partitioned table
cube dispatcher - writes data from queue into daily tables
27. SkyTools: Queue Mover
Moves data from source queue in one database to another queue in other database.
Used to move events from online databases to queue databases.
We don't need to keep events in online database in case some consumer fails to
process them.
Consolidates events if there are several producers as in case of partitioned
databases.
Batch
OLTP queue job
mover
Batch
db
queue
OLTP mover Batch
job
28. SkyTools: Queue Splitter
Moves data from source queue in one database to one or more queue's in target
database based on producer. That is another version of queue_mover but it has it's
own benefits.
Used to move events from online databases to queue databases.
promotional
Reduces number of dependencies of online databases.
mailer
Producer
welcome_email
Queue:
Producer welcome_email
password_email
Queue:
Queue: password_email
user_events
queue
splitter transactional
mailer
29. SkyTools: Table Dispatcher
Has url encoded events as data source and writes them into table on target
database.
Used to partiton data. For example change log's that need to kept online only shortly
can be written to daily tables and then dropped as they become irrelevant.
Also allows to select which columns have to be written into target database
Creates target tables according to configuration file as needed
table Table: history
dispatcher history_2007_01
history_2007_02
Queue: ...
call_records
table
dispatcher Tabel: cr
cr_2007_01_01
cr_2007_01_02
...
30. SkyTools: Cube Dispatcher
Has url encoded events as data source and writes them into partitoned tables in
target database. Logutriga is used to create events.
Used to provide batches of data for business intelligence and data cubes.
Only one instance of each record is stored. For example if record is created and
then updated twice only latest version of record stays in that days table.
Does not support deletes.
Producer Tabel: invoices Tabel: payments
send_welcome_email
invoices
invoices_2007_01_01 payments_2007_01_01
invoices_2007_01_02 payments_2007_01_02
Producer
payments ... ...
Queue:
cube_events cube
dispatcher
31. Questions, Problems?
Use SkyTools, plProxy and pgBouncer mailing list at PgFoundry
skype me: askoja