This presentation can help you to apply partioning when appropriate, and to avoid problems when using it. The oneliner is: Simple Works Best. The illustrating demos are on Postgres12 (maybe -13 by the time of presenting) and show some of the problems and solutions that Partitioning can provide. Some of this “experience” is quite old and the demo runs near-identical on Oracle…
These problems are the same on any database.
Ansible is an open-source configuration management and deployment tool, which can be used to manage servers and software installations. This talk will briefly cover Ansible itself, and then explain how Ansible is used to install and configure PostgreSQL on a server. Examples will round up the talk.
Lessons PostgreSQL learned from commercial databases, and didn’tPGConf APAC
This is the ppt used by Illay for his presentation at pgDay Asia 2016 - "Lessons PostgreSQL learned from commercial
databases, and didn’t". The talk takes you through some of the really good things that PostgreSQL has done really well and somethings that PostgreSQL can learn from other databases
JSON is an important datatype transporting data between servers and many modern applications. Postgres has been at the forefront of bringing these capabilities into the hands of database users. JSONB data type allows for faster operations within PostgreSQL.
At this webinar we will look at:
- How to use JSON from applications
- How to store it in the database
- How to index JSON data
- Tips and tricks to optimize usage
We then closed with a review of the roadmap for new PostgreSQL features for JSON and JSON standards compliance.
Ansible is an open-source configuration management and deployment tool, which can be used to manage servers and software installations. This talk will briefly cover Ansible itself, and then explain how Ansible is used to install and configure PostgreSQL on a server. Examples will round up the talk.
Lessons PostgreSQL learned from commercial databases, and didn’tPGConf APAC
This is the ppt used by Illay for his presentation at pgDay Asia 2016 - "Lessons PostgreSQL learned from commercial
databases, and didn’t". The talk takes you through some of the really good things that PostgreSQL has done really well and somethings that PostgreSQL can learn from other databases
JSON is an important datatype transporting data between servers and many modern applications. Postgres has been at the forefront of bringing these capabilities into the hands of database users. JSONB data type allows for faster operations within PostgreSQL.
At this webinar we will look at:
- How to use JSON from applications
- How to store it in the database
- How to index JSON data
- Tips and tricks to optimize usage
We then closed with a review of the roadmap for new PostgreSQL features for JSON and JSON standards compliance.
PostgreSQL Enterprise Class Features and CapabilitiesPGConf APAC
These are the slides used by Venkar from Fujitsu for his presentation at pgDay Asia 2016. He spoke about some of the Enterprise Class features of PostgreSQL database.
Query Parallelism in PostgreSQL: What's coming next?PGConf APAC
This presentation was presented by Dilip Kumar (a PostgreSQL contributor) at pgDay Asia 2017. The presentation talks about Prallel query features released in v9.6, the infrastructure for the prallel query feature which was built in previous versions and what is the roadmap for prallel query.
This is the presentation used by Umari Shahid or 2nd Quadrant for his Presentation at pgDay Asia 2016. It takes you through usage of TABLESAMPLE clause of SELECT queries introduced in PostgreSQL v9.5.
Given on a free DevelopMentor webinar. A high level overview of big data and the need for Hadoop. Also covers Pig, Hive, Yarn, and the future of Hadoop.
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLArseny Chernov
Fast, demo-enabled 60-min lecture, aligned to curriculum of RDBMS / SQL course taught at Singapore University of Technology and Design (SUTD), a collaboration with MIT. More details about this lecture and some photos here: http://bit.ly/sutd-mit-lecture
Faster Data Integration Pipeline Execution using Spark-JobserverDatabricks
As you may already know, the open-source Spark Job Server offers a powerful platform for managing Spark jobs, jars, and contexts, turning Spark into a much more convenient and easy-to-use service. The Spark-Jobserver can keep Spark context warmed up and readily available for accepting new jobs. At Informatica we are leveraging the Spark-Jobserver offerings to solve the data-visualization use-case.
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKSkills Matter
In this talk of Hadoop User Group UK meeting, Aaron Kimball from Cloudera introduces Sqoop, the open source SQL-to-Hadoop tool. Sqoop helps users perform efficient imports of data from RDBMS sources to Hadoop's distributed file system, where it can be processed in concert with other data sources. Sqoop also allows users to export Hadoop-generated results back to an RDBMS for use with other data pipelines.
After this session, users will understand how databases and Hadoop fit together, and how to use Sqoop to move data between these systems. The talk will provide suggestions for best practices when integrating Sqoop and Hadoop in your data processing pipelines. We'll also cover some deeper technical details of Sqoop's architecture, and take a look at some upcoming aspects of Sqoop's development roadmap.
It talks about native compilation technology, why it is required, what it is?
Also how we can apply this technology to compile table and procedure to achieve considerable performance gain with very minimal changes.
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Data Con LA
Apache Kudu (incubating) is a new storage engine for the Hadoop ecosystem that enables extremely high-speed analytics without imposing data-visibility latencies. This talk provides an introduction to Kudu, and provides an overview of how, when, and why practitioners use Kudu as a platform for building analytics solutions.
Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.
Apache Sqoop is a tool designed for efficiently transferring bulk data between Hadoop and structured datastores such as relational databases. This slide deck aims at familiarizing the user with Sqoop and how to effectively use it in real deployments.
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Chicago Hadoop Users Group
John Leach Co-Founder and CTO of Splice Machine with 15+ years software development and machine learning experience will discuss how to use HBase co-processors to build an ANSI-99 SQL database with 1) parallelization of SQL execution plans, 2) ACID transactions with snapshot isolation and 3) consistent secondary indexing.
Transactions are critical in traditional RDBMSs because they ensure reliable updates across multiple rows and tables. Most operational applications require transactions, but even analytics systems use transactions to reliably update secondary indexes after a record insert or update.
In the Hadoop ecosystem, HBase is a key-value store with real-time updates, but it does not have multi-row, multi-table transactions, secondary indexes or a robust query language like SQL. Combining SQL with a full transactional model over HBase opens a whole new set of OLTP and OLAP use cases for Hadoop that was traditionally reserved for RDBMSs like MySQL or Oracle. However, a transactional HBase system has the advantage of scaling out with commodity servers, leading to a 5x-10x cost savings over traditional databases like MySQL or Oracle.
HBase co-processors, introduced in release 0.92, provide a flexible and high-performance framework to extend HBase. In this talk, we show how we used HBase co-processors to support a full ANSI SQL RDBMS without modifying the core HBase source. We will discuss how endpoint transactions are used to serialize SQL execution plans over to regions so that computation is local to where the data is stored. Additionally, we will show how observer co-processors simultaneously support both transactions and secondary indexing.
The talk will also discuss how Splice Machine extended the work of Google Percolator, Yahoo Labs’ OMID, and the University of Waterloo on distributed snapshot isolation for transactions. Lastly, performance benchmarks will be provided, including full TPC-C and TPC-H results that show how Hadoop/HBase can be a replacement of traditional RDBMS solutions.
To view the accompanying slide deck: http://www.slideshare.net/ChicagoHUG/
Andrew Ryan describes how Facebook operates Hadoop to provide access as a shared resource between groups.
More information and video at:
http://developer.yahoo.com/blogs/hadoop/posts/2011/02/hug-feb-2011-recap/
Apache Sqoop: Unlocking Hadoop for Your Relational Database huguk
Kathleen Ting, Technical Account Manager @ Cloudera and Sqoop Committer
Unlocking data stored in an organization's RDBMS and transferring it to Apache Hadoop is a major concern in the big data industry. Apache Sqoop enables users with information stored in existing SQL tables to use new analytic tools like Apache HBase and Apache Hive. This talk will go over how to deploy and apply Sqoop in your environment as well as transferring data from MySQL, Oracle, PostgreSQL, SQL Server, Netezza, Teradata, and other relational systems. In addition, we'll show you how to keep table data and Hadoop in sync by importing data incrementally as well as how to customize transferred data by calling various database functions.
Citus Architecture: Extending Postgres to Build a Distributed DatabaseOzgun Erdogan
Citus is a distributed database that scales out Postgres. By using the extension APIs, Citus distributes your tables across a cluster of machines and parallelizes SQL queires. This talk describes the Citus architecture by focusing on our learnings in distributed systems. We first describe how Citus leverages PostgreSQL's extension APIs. These APIs are rich enough to store distributed metadata, add new commands to Postgres to help with sharding, parallelize and execute queries in a distributed cluster, and handle automatic failover of machines. Second, we show the architecture of a distributed query planner. We first describe the join order planner and describe how it chooses between broadcast, co-located, and repartition joins to minimize network I/O. We then show how we map SQL queries into distributed relational algebra, and optimize these plans for parallel execution. Third, we note a primary challenge in distributed systems. No single executor works great for all workloads. We show how Citus chooses between three executors, each one optimized for a different workload: NoSQL, operational analytics, and data warehousing. We then conclude with a demo that shows Citus running on a large cluster.
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data typeJumping Bean
Our presentation from PGDay Asia 2016 on the JSON/JSONB data type in Postgres and how you can have the best of both the SQL and NoSQL worlds in one. There is JavaScript in my SQL.
PostgreSQL Enterprise Class Features and CapabilitiesPGConf APAC
These are the slides used by Venkar from Fujitsu for his presentation at pgDay Asia 2016. He spoke about some of the Enterprise Class features of PostgreSQL database.
Query Parallelism in PostgreSQL: What's coming next?PGConf APAC
This presentation was presented by Dilip Kumar (a PostgreSQL contributor) at pgDay Asia 2017. The presentation talks about Prallel query features released in v9.6, the infrastructure for the prallel query feature which was built in previous versions and what is the roadmap for prallel query.
This is the presentation used by Umari Shahid or 2nd Quadrant for his Presentation at pgDay Asia 2016. It takes you through usage of TABLESAMPLE clause of SELECT queries introduced in PostgreSQL v9.5.
Given on a free DevelopMentor webinar. A high level overview of big data and the need for Hadoop. Also covers Pig, Hive, Yarn, and the future of Hadoop.
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLArseny Chernov
Fast, demo-enabled 60-min lecture, aligned to curriculum of RDBMS / SQL course taught at Singapore University of Technology and Design (SUTD), a collaboration with MIT. More details about this lecture and some photos here: http://bit.ly/sutd-mit-lecture
Faster Data Integration Pipeline Execution using Spark-JobserverDatabricks
As you may already know, the open-source Spark Job Server offers a powerful platform for managing Spark jobs, jars, and contexts, turning Spark into a much more convenient and easy-to-use service. The Spark-Jobserver can keep Spark context warmed up and readily available for accepting new jobs. At Informatica we are leveraging the Spark-Jobserver offerings to solve the data-visualization use-case.
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKSkills Matter
In this talk of Hadoop User Group UK meeting, Aaron Kimball from Cloudera introduces Sqoop, the open source SQL-to-Hadoop tool. Sqoop helps users perform efficient imports of data from RDBMS sources to Hadoop's distributed file system, where it can be processed in concert with other data sources. Sqoop also allows users to export Hadoop-generated results back to an RDBMS for use with other data pipelines.
After this session, users will understand how databases and Hadoop fit together, and how to use Sqoop to move data between these systems. The talk will provide suggestions for best practices when integrating Sqoop and Hadoop in your data processing pipelines. We'll also cover some deeper technical details of Sqoop's architecture, and take a look at some upcoming aspects of Sqoop's development roadmap.
It talks about native compilation technology, why it is required, what it is?
Also how we can apply this technology to compile table and procedure to achieve considerable performance gain with very minimal changes.
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Data Con LA
Apache Kudu (incubating) is a new storage engine for the Hadoop ecosystem that enables extremely high-speed analytics without imposing data-visibility latencies. This talk provides an introduction to Kudu, and provides an overview of how, when, and why practitioners use Kudu as a platform for building analytics solutions.
Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.
Apache Sqoop is a tool designed for efficiently transferring bulk data between Hadoop and structured datastores such as relational databases. This slide deck aims at familiarizing the user with Sqoop and how to effectively use it in real deployments.
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Chicago Hadoop Users Group
John Leach Co-Founder and CTO of Splice Machine with 15+ years software development and machine learning experience will discuss how to use HBase co-processors to build an ANSI-99 SQL database with 1) parallelization of SQL execution plans, 2) ACID transactions with snapshot isolation and 3) consistent secondary indexing.
Transactions are critical in traditional RDBMSs because they ensure reliable updates across multiple rows and tables. Most operational applications require transactions, but even analytics systems use transactions to reliably update secondary indexes after a record insert or update.
In the Hadoop ecosystem, HBase is a key-value store with real-time updates, but it does not have multi-row, multi-table transactions, secondary indexes or a robust query language like SQL. Combining SQL with a full transactional model over HBase opens a whole new set of OLTP and OLAP use cases for Hadoop that was traditionally reserved for RDBMSs like MySQL or Oracle. However, a transactional HBase system has the advantage of scaling out with commodity servers, leading to a 5x-10x cost savings over traditional databases like MySQL or Oracle.
HBase co-processors, introduced in release 0.92, provide a flexible and high-performance framework to extend HBase. In this talk, we show how we used HBase co-processors to support a full ANSI SQL RDBMS without modifying the core HBase source. We will discuss how endpoint transactions are used to serialize SQL execution plans over to regions so that computation is local to where the data is stored. Additionally, we will show how observer co-processors simultaneously support both transactions and secondary indexing.
The talk will also discuss how Splice Machine extended the work of Google Percolator, Yahoo Labs’ OMID, and the University of Waterloo on distributed snapshot isolation for transactions. Lastly, performance benchmarks will be provided, including full TPC-C and TPC-H results that show how Hadoop/HBase can be a replacement of traditional RDBMS solutions.
To view the accompanying slide deck: http://www.slideshare.net/ChicagoHUG/
Andrew Ryan describes how Facebook operates Hadoop to provide access as a shared resource between groups.
More information and video at:
http://developer.yahoo.com/blogs/hadoop/posts/2011/02/hug-feb-2011-recap/
Apache Sqoop: Unlocking Hadoop for Your Relational Database huguk
Kathleen Ting, Technical Account Manager @ Cloudera and Sqoop Committer
Unlocking data stored in an organization's RDBMS and transferring it to Apache Hadoop is a major concern in the big data industry. Apache Sqoop enables users with information stored in existing SQL tables to use new analytic tools like Apache HBase and Apache Hive. This talk will go over how to deploy and apply Sqoop in your environment as well as transferring data from MySQL, Oracle, PostgreSQL, SQL Server, Netezza, Teradata, and other relational systems. In addition, we'll show you how to keep table data and Hadoop in sync by importing data incrementally as well as how to customize transferred data by calling various database functions.
Citus Architecture: Extending Postgres to Build a Distributed DatabaseOzgun Erdogan
Citus is a distributed database that scales out Postgres. By using the extension APIs, Citus distributes your tables across a cluster of machines and parallelizes SQL queires. This talk describes the Citus architecture by focusing on our learnings in distributed systems. We first describe how Citus leverages PostgreSQL's extension APIs. These APIs are rich enough to store distributed metadata, add new commands to Postgres to help with sharding, parallelize and execute queries in a distributed cluster, and handle automatic failover of machines. Second, we show the architecture of a distributed query planner. We first describe the join order planner and describe how it chooses between broadcast, co-located, and repartition joins to minimize network I/O. We then show how we map SQL queries into distributed relational algebra, and optimize these plans for parallel execution. Third, we note a primary challenge in distributed systems. No single executor works great for all workloads. We show how Citus chooses between three executors, each one optimized for a different workload: NoSQL, operational analytics, and data warehousing. We then conclude with a demo that shows Citus running on a large cluster.
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data typeJumping Bean
Our presentation from PGDay Asia 2016 on the JSON/JSONB data type in Postgres and how you can have the best of both the SQL and NoSQL worlds in one. There is JavaScript in my SQL.
Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...confluent
Kafka Streams is a flexible and powerful framework. The Domain Specific Language (DSL) is an obvious place from which to start, but not all requirements fit the DSL model. Many people are unaware of the Processor API (PAPI) – or are intimidated by it because of sinks, sources, edges and stores – oh my! But most of the power of the PAPI can be leveraged, simply through the DSL ”#process” method, which lets you attach the general building block ”Processor” interface to your -easy to use- DSL topology, to combine the best of both worlds.
In this talk you’ll get a look at the flexibility of the DSL’s process method and the possibilities it opens up. We’ll use real world use-cases borne from extensive experience in the field with multiple customers to explore power of direct write access to the state stores and how to perform range sub-selects. We’ll also see the options that punctuators bring to the table, as well as opportunities for major latency optimisations.
Key takeaways:
* Understanding of how to combine DSL and Processors
* Capabilities and benefits of Processors
* Real-world uses of Processors
TokuDB is an ACID/transactional storage engine that makes MySQL even better by increasing performance, adding high compression, and allowing for true schema agility. All of these features are made possible by Tokutek's Fractal Tree indexes.
Scaling Machine Learning Feature Engineering in Apache Spark at FacebookDatabricks
Machine Learning feature engineering is one of the most critical workloads on Spark at Facebook and serves as a means of improving the quality of each of the prediction models we have in production. Over the last year, we’ve added several features in Spark core/SQL to add first class support for Feature Injection and Feature Reaping in Spark. Feature Injection is an important prerequisite to (offline) ML training where the base features are injected/aligned with new/experimental features, with the goal to improve model performance over time. From a query engine’s perspective, this can be thought of as a LEFT OUTER join between the base training table and the feature table which, if implemented naively, could get extremely expensive. As part of this work, we added native support for writing indexed/aligned tables in Spark, wherein IF the data in the base table and the injected feature can be aligned during writes, the join itself can be performed inexpensively.
Slides for a talk.
Talk abstract:
In the dark of the night, if you listen carefully enough, you can hear databases cry. But why? As developers, we rarely consider what happens under the hood of widely used abstractions such as databases. As a consequence, we rarely think about the performance of databases. This is especially true to less widespread, but often very useful NoSQL databases.
In this talk we will take a close look at NoSQL database performance, peek under the hood of the most frequently used features to see how they affect performance and discuss performance issues and bottlenecks inherent to all databases.
Storage and computation is getting cheaper AND easily accessible on demand in the cloud. We now collect and store some really large data sets Eg: user activity logs, genome sequencing, sensory data etc. Hadoop and the ecosystem of projects built around it present simple and easy to use tools for storing and analyzing such large data collections on commodity hardware.
Topics Covered
* The Hadoop architecture.
* Thinking in MapReduce.
* Run some sample MapReduce Jobs (using Hadoop Streaming).
* Introduce PigLatin, a easy to use data processing language.
Speaker Profile: Mahesh Reddy is an Entrepreneur, chasing dreams. Works on large scale crawl and extraction of structured data from the web. He is a graduate frm IIT Kanpur(2000-05) and previously worked at Yahoo! Labs as Research Engineer/Tech Lead on Search and Advertising products.
Beyond the DSL - Unlocking the power of Kafka Streams with the Processor APIconfluent
Technical breakout during Confluent’s streaming event in Munich, presented by Antony Stubbs, Solution Architect at Confluent. This three-day hands-on course focused on how to build, manage, and monitor clusters using industry best-practices developed by the world’s foremost Apache Kafka™ experts. The sessions focused on how Kafka and the Confluent Platform work, how their main subsystems interact, and how to set up, manage, monitor, and tune your cluster.
Persistent Data Structures - partial::ConfIvan Vergiliev
The slides from my talk on Persistent Data Structures at http://partialconf.com/ . The "Implementation" part assumes a bit of prior knowledge on how persistent data structures work, but the rest should be generally accessible.
Learn from the author of SQLTXPLAIN the fundamentals of SQL Tuning: 1) Diagnostics Collection; 2) Root Cause Analysis (RCA); and 3) Remediation.
SQL Tuning is a complex and intimidating area of knowledge, and it requires years of frequent practice to master it. Nevertheless, there are some concepts and practices that are fundamental to succeed. From basic understanding of the Cost-based Optimizer (CBO) and the Execution Plans, to more advance topics such as Plan Stability and the caveats of using SQL Profiles and SQL Plan Baselines, this session is full of advice and experience sharing. Learn what works and what doesn't when it comes to SQL Tuning.
Participants of this session will also learn about several free tools (besides SQLTXPLAIN) that can be used to diagnose a SQL statement performing poorly, and some others to improve Execution Plan Stability.
Either if your are a novice DBA, or an experienced DBA or Developer, there will be something new for you on this session. And if this is your first encounter with SQL Tuning, at least you will learn the basic concepts and steps to succeed in your endeavor.
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSEDB
Moving to the cloud is hard, and moving Postgres databases to the cloud is even harder. Public cloud or private cloud? Infrastructure as a Service (IaaS), or Platform as a Service (PaaS)? Kubernetes for the application, or for the database and the application? This talk will juxtapose self-managed Kubernetes and container-based database solutions, Postgres deployments on IaaS, and Postgres DBaaS solutions of which EDB’s DBaaS BigAnimal is the latest example.
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr UnternehmenEDB
Dieses Webinar hilft Ihnen, die Unterschiede zwischen den verschiedenen Replikationsansätzen zu verstehen, die Anforderungen der jeweiligen Strategie zu erkennen und sich über die Möglichkeiten klar zu werden, was mit jeder einzelnen zu erreichen ist. Damit werden Sie hoffentlich eher in der Lage sein, herauszufinden, welche PostgreSQL-Replikationsarten Sie wirklich für Ihr System benötigen.
- Wie physische und logische Replikation in PostgreSQL funktionieren
- Unterschiede zwischen synchroner und asynchroner Replikation
- Vorteile, Nachteile und Herausforderungen bei der Multi-Master-Replikation
- Welche Replikationsstrategie für unterschiedliche Use-Cases besser geeignet ist
Referent:
Borys Neselovskyi, Regional Sales Engineer DACH, EDB
------------------------------------------------------------
For more #webinars, visit http://bit.ly/EDB-Webinars
Download free #PostgreSQL whitepapers: http://bit.ly/EDB-Whitepapers
Read our #Postgres Blog http://bit.ly/EDB-Blogs
Follow us on Facebook at http://bit.ly/EDB-FB
Follow us on Twitter at http://bit.ly/EDB-Twitter
Follow us on LinkedIn at http://bit.ly/EDB-LinkedIn
Reach us via email at marketing@enterprisedb.com
Cuando busca alternativas a Oracle en la nube, hacer el cambio puede parecer un trabajo duro. Entendemos que la migración involucra más que solo la base de datos. La compatibilidad es un punto clave, especialmente cuando se consideran los recursos que posiblemente ya haya invertido en Oracle, como por ejemplo el código de aplicación específico de Oracle.Este seminario web explorará las opciones y las principales consideraciones al pasar de las bases de datos de Oracle a la nube.
- Revisión detallada de las ofertas de bases de datos disponibles en la nube
- Factores críticos que se deben considerar considerar para elegir la oferta en la nube más adecuada
- Cómo la experiencia de EDB con PostgreSQL puede ayudarlo en su decisión
- Demostración de BigAnimal de EDB
Présentateur:
Sergio Romera, Senior Sales Engineer EMEA, EDB
------------------------------------------------------------
For more #webinars, visit http://bit.ly/EDB-Webinars
Download free #PostgreSQL whitepapers: http://bit.ly/EDB-Whitepapers
Read our #Postgres Blog http://bit.ly/EDB-Blogs
Follow us on Facebook at http://bit.ly/EDB-FB
Follow us on Twitter at http://bit.ly/EDB-Twitter
Follow us on LinkedIn at http://bit.ly/EDB-LinkedIn
Reach us via email at marketing@enterprisedb.com
Database come PostgreSQL non possono girare su Kubernetes. Questo è il ritornello che sentiamo continuamente, ma al tempo stesso la motivazione per noi di EDB di abbattere questo muro, una volta per tutte.
In questo webinar parleremo della nostra avventura finora per portare PostgreSQL su Kubernetes. Scopri perché crediamo che fare benchmark di storage e del database prima di andare in produzione porti a una più sana e longeva vita di un DBMS, anche su Kubernetes.
Condivideremo il nostro processo, i risultati fin qui ottenuti e sveleremo i nostri piani per il futuro con Cloud Native PostgreSQL.
Las Variaciones de la Replicación de PostgreSQLEDB
Replicación física, replicación lógica, síncrona, asíncrona, multi-maestro, escalabilidad horizontal, etc. Son muchos los términos asociados con la replicación de bases de datos. En esta charla revisaremos los conceptos fundamentales detrás de cada variación de la replicación de PostgreSQL, y en qué casos conviene usar una o la otra. La presentación incluye una parte práctica con demostraciones aunque no será un tutorial sobre como configurar un cluster. El enfoque está en entender cada variación para elegir la mejor dependiendo del caso de uso.
Cosas que aprenderán:
- Cómo funciona la replicación física en PostgreSQL
- Cómo funciona la replicación lógica en PostgreSQL
- Diferencias entre replicación síncrona y asíncrona
- Qué es replicación multi-maestro
NoSQL and Spatial Database Capabilities using PostgreSQLEDB
PostgreSQL is an object-relational database system. NoSQL on the other hand is a non-relational database and is document-oriented. Learn how the PostgreSQL database gives one the flexible options to combine NoSQL workloads with the relational query power by offering JSON data types. With PostgreSQL, new capabilities can be developed and plugged into the database as required.
Attend this webinar to learn:
- The new features and capabilities in PostgreSQL for new workloads, requiring greater flexibility in the data model
- NoSQL with JSON, Hstore and its performance and features for enterprises
- Spatial SQL - advanced features in PostGIS application with PostGIS extension
"Why use PgBouncer? It’s a lightweight, easy to configure connection pooler and it does one job well. As you’d expect from a talk on connection pooling, we’ll give a brief summary of connection pooling and why it increases efficiency. We’ll look at when not to use connection pooling, and we’ll demonstrate how to configure PgBouncer and how it works. But. Did you know you can also do this? 1. Scaling PgBouncer PgBouncer is single threaded which means a single instance of PgBouncer isn’t going to do you much good on a multi-threaded and/or multi-CPU machine. We’ll show you how to add more PgBouncer instances so you can use more than one thread for easy scaling. 2. Read-write / read only routing Using different pgBouncer databases you can route read-write traffic to the primary database and route read-only traffic to a number of standby databases. 3. Load balancing When we use multiple PgBouncer instances, load balancing comes for free. Load balancing can be directed to different standbys, and weighted according to ratios of load. 4. Silent failover You can perform silent failover during promotion of a new primary (assuming you have a VIP/DNS etc that always points to the primary). 5. And even: DoS prevention and protection from “badly behaved” applications! By using distinct port numbers you can provide database connections which deal with sudden bursts of incoming traffic in very different ways, which can help prevent the database from becoming swamped during high activity periods. You should leave the presentation wondering if there is anything PgBouncer can’t do."
In this talk I'll discuss how we can combine the power of PostgreSQL with TensorFlow to perform data analysis. By using the pl/python3 procedural language we can integrate machine learning libraries such as TensorFlow with PostgreSQL, opening the door for powerful data analytics combining SQL with AI. Typical use-cases might involve regression analysis to find relationships in an existing dataset and to predict results based on new inputs, or to analyse time series data and extrapolate future data taking into account general trends and seasonal variability whilst ignoring noise. Python is an ideal language for building custom systems to do this kind of work as it gives us access to a rich ecosystem of libraries such as Pandas and Numpy, in addition to TensorFlow itself.
Practical Partitioning in Production with PostgresEDB
Has your table become too large to handle? Have you thought about chopping it up into smaller pieces that are easier to query and maintain? What if it's in constant use? An introduction to the problems that can arise and how PostgreSQL's partitioning features can help, followed by a real-world scenario of partitioning an existing huge table on a live system. We will be looking at the problems caused by having very large tables in your database and how declarative table partitioning in Postgres can help. Also, how to perform dimensioning before but also after creating huge tables, partitioning key selection, the importance of upgrading to get the latest Postgres features and finally we will dive into a real-world scenario of having to partition an existing huge table in use on a production system.
There have been plenty of “explaining EXPLAIN” type talks over the years, which provide a great introduction to it. They often also cover how to identify a few of the more common issues through it. EXPLAIN is a deep topic though, and to do a good introduction talk, you have to skip over a lot of the tricky bits. As such, this talk will not be a good introduction to EXPLAIN, but instead a deeper dive into some of the things most don’t cover. The idea is to start with some of the more complex and unintuitive calculations needed to work out the relationships between operations, rows, threads, loops, timings, buffers, CTEs and subplans. Most popular tools handle at least several of these well, but there are cases where they don’t that are worth being conscious of and alert to. For example, we’ll have a look at whether certain numbers are averaged per-loop or per-thread, or both. We’ll also cover a resulting rounding issue or two to be on the lookout for. Finally, some per-operation timing quirks are worth looking out for where CTEs and subqueries are concerned, for example CTEs that are referenced more than once. As time allows, we can also look at a few rarer issues that can be spotted via EXPLAIN, as well as a few more gotchas that we’ve picked up along the way. This includes things like spotting when the query is JIT, planning, or trigger time dominated, spotting the signs of table and index bloat, issues like lossy bitmap scans or index-only scans fetching from the heap, as well as some things to be aware of when using auto_explain.
Internet of Things is a currently a burgeoning market, and is often associated with specialized data-stores. However PostgreSQL is just as capable at this use-case and can offer some compelling advantages. We’ll explore ways to store IoT data in PostgreSQL covering various ways to store and structure this kind of data. How range types and differing types of indexes can be of use. Also taking a quick look at some extensions designed for this use case. Then looking at powerful SQL features which can really help when analyzing IoT data streams, and how the power of a real SQL database can be a key advantage.
I would like you to join me on our journey from a complex, multi instance Oracle topology to a single logical database in PostgreSQL. Each technology and architectural decision point will be discussed describing how we arrived at our destination. There are five keys areas that will be covered: - Target architecture - Migration of database objects (tables, indexes, views, synonyms, etc) - Migration of database code (packages, functions, procedures, triggers) - Application tier - Migration of Data - with minimal downtime during cutover The target architecture is a BDR cluster, where the physical data model and data stored is different between the logical standbys and the lead master/shadow master. Will discuss how this allowed for the simplification of the topology, and the benefits this delivered. Before you go there, yes I know PostgreSQL does no have synonyms, but an alternative approach was needed. There is a significant amount of business logic in the database tier all of which needed to be translated into database code. Will look at the tools and extensions available to reproduce the functionality in PostgreSQL. Look at common non-ISO standard SQL embedded in the application tier, along with jdbc challenges. Finally a look at some of the data movement tools available. Full disclosure, we are still on the journey but have learnt a lot on the way.
The proposed talk will go through several questions. The first obvious one is why would I bother learn CLI when I can do whatever I need with a GUI tool?
We'll try to answer why knowing CLI is a MUST for some people (like Postgres DBAs, for example) whereas it's only a bonus for others (like data scientists, for example).
Then we'll go through the basics about 101 (how to connect to, interactive mode versus not interactive more, how to set psql environment to work comfortably and so on...)
The last part will be about tips and tricks that will make anyone's journey with psql more effective and enjoyable. I'm looking for the "TIL" effect in people's eyes.
EDB 13 - New Enhancements for Security and Usability - APJEDB
Database security is always of paramount importance to all organizations. In this webinar, we will explore the security, usability, and portability updates of the latest version of the EDB database server and tools.
Join us in this webinar to learn:
- The new security features such as SCRAM and the encryption of database passwords and traffic between Failover Manager agents
- Usability updates that automate partitioning, verify backup integrity, and streamline the management of failover and backups
- Portability improvements that simplify running PostgreSQL across on-premise and cloud environments
Dans ce webinar, nous allons parler des différences entre une sauvegarde physique et une sauvegarde logique. Nous allons lister les avantages et inconvénients, les principales considérations et les outils disponibles pour les deux méthodes.
- Perte de données
- Exports logiques
- Standbys
- WALs et Recovery
- Snapshots VM/Disques
- Sauvegardes physique
- Conclusion
Vieni a scoprire Cloud Native PostgreSQL (CNP), l’operatore per Kubernetes, direttamente da coloro che lo hanno ideato e lo sviluppano in EDB.
CNP facilita l’integrazione di database PostgreSQL con le tue applicazioni all’interno di cluster Kubernetes e OpenShift Container Platform di RedHat, grazie alla sua gestione automatica dell’architettura primario/standby che include: self-healing, failover, switchover, rolling update, backup, ecc.
Durante il webinar affronteremo i seguenti punti:
- DevOps e Cloud Native
- Introduzione a Cloud Native PostgreSQL
- Architetture
- Caratteristiche principali
- Esempi di uso e configurazione
- Kubernetes, Storage e Postgres
- Demo
- Conclusioni
New enhancements for security and usability in EDB 13EDB
EDB 13 enhances our flagship database server and tools. This webinar will explore its security, usability, and portability updates. Join us to learn how EDB 13 can help you improve your PostgreSQL productivity and data protection.
Webinar highlights include:
- New security features such as SCRAM and the encryption of database passwords and traffic between Failover Manager agents
- Usability updates that automate partitioning, verify backup integrity and streamline the management of failover and backups
- Portability improvements that simplify running PostgreSQL across on-premise and cloud environments
The webinar will review a multi-layered framework for PostgreSQL security, with a deeper focus on limiting access to the database and data, as well as securing the data.
Using the popular AAA (Authentication, Authorization, Auditing) framework we will cover:
- Best practices for authentication (trust, certificate, MD5, Scram, etc).
- Advanced approaches, such as password profiles.
- Deep dive of authorization and data access control for roles, database objects (tables, etc), view usage, row-level security, and data redaction.
- Auditing, encryption, and SQL injection attack prevention.
Note: this session is delivered in German
Speaker:
Borys Neselovskyi, Sales Engineer, EDB
EDB Cloud Native Postgres includes database container images and a Kubernetes Operator that manage the lifecycle of a database from deployment to operations. This Kubernetes Operator for Postgres is written by EDB entirely from scratch in the Go language and relies exclusively on the Kubernetes API.
Attend this webinar to learn about:
- DevOps & Cloud Native
- Overview of Cloud Native Postgres
- Storage for Postgres workloads in Kubernetes
- Using Cloud Native Postgres
- Demo
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
1. PDVBV
Partitioning
Simple works Best…
Piet de Visser
Simple Oracle DBA
Piet de Visser - PDVBV
Quotes: “The Limitation shows the master” (Goethe), “Simplicity is not a luxury, it is a necessity.
Unfortunately, “Complex’ solutions sell better. (EW Dijkstra). (skofja.Loka-Tolmin golden horn)
2. PDVBV
PostgreSQL
Click to edit Master title style
2
Logo Cloud
• Portbase
• (dutch gov)
• Shell
• Philips
• ING bank
• Nokia
• Insinger, BNP
• Etihad
• NHS
• BT
• Claritas, Nielse
• Unilever
• Exxon
• GEDon’t waste time on Self-Inflation… but Hey, this was such a cool Idea (from a marketing guy)…
Logos of my major customers over time. If you want your logo here: Hire me.
3. PDVBV
PostgreSQL
Click to edit Master title style
What does it look like..
•
Couldn’t resist… after this changing room, not allowed to take pictures anymore..
For travel pictures from various continents: some other time…
3
4. PDVBV
PostgreSQL
Click to edit Master title style
4
Agenda ( 45min +/- my “Dev/DBA” preso.. )
Partitioning…
Why ? … I’ve seen too many “failures”
Summary: Design !!
(see final slides. ;-) )
Top-Tip: Keep It Simple.
Discussion: Please… (I miss the live-, in person-events..)
Agenda. No longer allowed when presenting online (c.f. Connor…)
Oh, BTW: I am known for Typos.. Find a typo = get a drink..
5. PDVBV
PostgreSQL
Click to edit Master title style
5
Basics; What + Why Paritioning ?
• Partitioning: Split 1 table into “Many”
• Two Main Advantages:
• 1. Avoid WAL
• 2. Scan less data on Qrys.
• Many more… later.
– Range, List, Hash…
– Tablespaces => location of data.
– Read-only storage tiers
– Later… (next year’s ppt...)
(competitor: paid-for-EE-option…) Two main Advantages, will try to illustrate both
Piffalls later… other advantages: later. (add coffee break…)
6. PDVBV
PostgreSQL
Click to edit Master title style
6
Table and Index. Conventional.
A quick illustration of table and indexes…. Data in the tabler is randomly spread out,
but the indexes contain ordered lists and pointers to the table-records.
•Table •(Global) Index on ID
1
1
2
3
4
2
3
4
7. PDVBV
PostgreSQL
Click to edit Master title style
7
Partitioned table and (local) index.
A quick illustration of (range) partitions and local indexes…. Partitions are just small tables with
known (ranges) of data.. The database “Knows” those ranges.
1 1
2
3
4
2
3
4
Smaller pieces
“known” content
Still “One Table”
Local indexes !
8. PDVBV
PostgreSQL
Click to edit Master title style
8
1st Advantage: Less WAL
• Ins / Upd / Del is “Work…”
– ~ WAL (and vacuum activity)
–Local I/O, streaming, Remote I/O…
• Delete?
–Drop or Truncate is “Much Faster”
• You Can! - Drop Partitions!
• But…
–Only if your partitioning is suitable.
–Only on “drop” or “attach/detach”
Explain deleting old data with drop-partition.
Typical use-case: ingest + remove of data with limited lifetime in the DB.. You can save half the wal..
9. PDVBV
PostgreSQL
Click to edit Master title style
9
Drop Partition… (Fast, no WAL)
Instantaeous Delete of the “range” inside a partition. Very Little Effort.
Note: inserts and updates will still require redo… and Global indeses.. Well, just wait.
1 1
2
3
4
2
3
4
pg-# Drop Table PT_1 ;
10. PDVBV
PostgreSQL
Click to edit Master title style
10
Demo time..
• T = Table
• PT = Partitioned table
• Delete from T => WAL
• Delete from PT => still WAL..
• Drop partition => Much More Efficient..
Pg-> i pg_demo_part.sql
Pg-> i pg_demo_part_0.sql
demo deleting (old) data with dorp-partition.
Best use of partitioning IMHO. (oracle: show problems with global index: demo_part_0a.sql)
11. PDVBV
PostgreSQL
Click to edit Master title style
11
2nd Advantage: (some) Queries Go Faster…
• Scan Less Data
–less blocks, less IO, less Cache
• Typical use-case:
–Queries / Aggregates over 1 or few Partitions.
• Anti-pattern:
–Loop over All Partitions… (later)
• Next slides: show me how..
Ideally, queries scan as little data als possible to return results .. Fast
Reduce the work…
12. PDVBV
PostgreSQL
Click to edit Master title style
12
Aggregates, FTS over Conventional table
Data can be all over the table..
Hence FTS or inefficient range-scan + rowid-access needed…
•Table
1
2
3
4
• Data all over the Table..
Select Sum (amt)
Where [range]
Group by ..
• Probably FTS
13. PDVBV
PostgreSQL
Click to edit Master title style
13
Aggregates on Partitions: less data to scan?
Some (most) searches / scans can be limited to just the relevant partitions..
This Will Only Work if we can eliminate sufficient partitions. (Design!!). - note : No Indexes.
1
2
3
4
• IF… we know where to look..
• Then… FTS on…
• just 1 Part. ?
• Design !
–Know your data.
–Control your SQL
14. PDVBV
PostgreSQL
Click to edit Master title style
14
Demo time..
• T (Table)
• PT (partitioned)
Select Range, SUM(amt)
From T/PT
Where range Between 10000 and 19999
Group by Range;
• pg-> i pg_demo_part.sql
• pg-> i pg_demo_part_sum.sql
This is what we will see. In demo.. -- What do we Expect ?
(don’t forget to initiate the data)
15. PDVBV
PostgreSQL
Click to edit Master title style
15
More Queries: Find Specific Records
•Where ID = :n
Find 1 record; Easy, use (local) index.
•Where Active = ’Y’
Find Multiple records, all over…
Index..? But “local” … How many Partitions ?
Global index..? Not yet....
• Anti-pattern:
–Loop over All Partitions…
When you need “Fast” return of a small set, you need an index… Global or Local
But avoid having to loop/scan many partitions…
16. PDVBV
PostgreSQL
Click to edit Master title style
16
Conventional. QRY for 1 record; on PK/UK.
A quick illustration of table and indexes…. Data in the tabler is randomly spread out,
but the indexes contain ordered lists and pointers to the table-records.
•Table •(Global) Index on ID
1
1
2
3
4
2
3
4
ID = 2 ?
PK lookup
17. PDVBV
PostgreSQL
Click to edit Master title style
17
Table, index… QRY for a set; Active=Y
Same situation, different index.
The few active=Y fields can be all over the table (and in all partitions..).
•Table
•(Global) Index on active..
1. (active=Y)
N
2
3
4 (active=Y)
N
N
Y
Active = ‘Y’ ?
Range Scan
18. PDVBV
PostgreSQL
Click to edit Master title style
18
Partitioned table + local index on PK
Searching for the PK or partition key is Easy… Visit 1 local index, and find the record.
CBO can see from the where-clause which (local) index-partition it needs…
1 1
2
3
4
Id= 1 ?
2
3
4
PG “knows”:
Only 1 partition…
19. PDVBV
PostgreSQL
Click to edit Master title style
19
LOCAL index, active=Y…
If the SQL does not gives us a clue for the Partittion,
We need to Search Through Every Local Index… (Parition-Range-All.. Looping)
1 Active=Y
N
Y
2
3
4 Active=Y
Active=Y
N
Y
N
Y
N
Y
Looping over..
7 x 365
partitions..?
20. PDVBV
PostgreSQL
Click to edit Master title style
20
Demo time..
• T (conventional)
• PT (partitioned)
Select id, active
From T/PT
Where active = ‘Y’;
• Demo the (local) index.
• pg- > @pg_demo_part
• pg- > @pg_demo_part_1
This is what we will see. In demo.. -- What do we Expect ?
(don’t forget to initiate the data)
21. PDVBV
PostgreSQL
Click to edit Master title style
21
Soon: Global Indexes; …Problem ?
• Most partitioned-databases… FAILed.
– (old joke: First Attempt In Learning…)
– Some were “saved by hardware”
• Partition by date/time. But…
• PK on integer, varchar or guid.
• PK-Uniqueness enforced by Index…
• Global index… Let me illustrate…
Of the 10 or so partitioned (other) databases Ive seen: Only 2 where a Straight-up success.
Some problems could be “hidden in hardware”, and some just Failed…
22. PDVBV
PostgreSQL
Click to edit Master title style
22
Partitioned table; Global index; Active=‘Y’
illustration of GLOBAL indexes… The index is now One Single object, Pointing to all partitions.
The impact pro + con, of this will shown in next slides..
1. Active=Y
2
3
4 Active=Y
N
N
N
Y
GLOBAL index,
Points to all Parts
Table still Partitioned..
23. PDVBV
PostgreSQL
Click to edit Master title style
23
Global index; Active=‘Y’
illustration of GLOBAL indexes…. And the ups and downs, SQL is equally efficient as on ”Table”
But Point out the need for rebuild if you drop 1 partition: 25% of pointers is gone…
1. Active=Y
2
3
4 Active=Y
N
N
N
Y
Active=Y
Potentially
Effective:
No looping.
24. PDVBV
PostgreSQL
Click to edit Master title style
24
Global index; Now drop a Partition…
illustration of GLOBAL indexes… Can no longer “truncate” index, index points to whole range..
On “drop-partition, will need rebuild of WHOLE index…
1. Active=Y
2
3
4 Active=Y
N
N
N
Y
pg-# Drop table PT_1 ;
The Challenge
Of Global Indx
25. PDVBV
PostgreSQL
Click to edit Master title style
25
Bonus-Trick: a PK-Key for Partitioning. 1/3
(not saying this is a good idea… YMMV ! )
• Partitions = mostly a “date thing”
– Not always: List-part on Cstmr-ID also happens.
• No Global Indexing
• Only 1 Unique Key
• Hence UK = PK = Partition key.
• (did I say: Up Front Design?)
If no GLOBAL index, and partition on date, then what will be my PK?
Suggestions ?
26. PDVBV
PostgreSQL
Click to edit Master title style
26
• Take a bigint – image 2-parts of integer...
–Date + Sequence: YYYY DDD SSSS nnnnnn
–Date: YYYY DDD SSSS
–Sequence: nnnnnn, cycle at 999,999
• Id = “epoch” (10 digits) + seq (16 digits)
• Id = YYYY DDD SSSSS + seq (18 digits)
• Id = YYYYMMDD HH24MISS + seq (20 digits)
Also check : “GUID as PK” (@franckpachot)
•Lightbulb ?
Bonus-Trick: a PK-Key for Partitioning. 2/3
Artificial PK, order-able, unique on 1M/sec, integer hence small+efficient.
More Suggestions ? DISCUSS!!
27. PDVBV
PostgreSQL
Click to edit Master title style
27
• Two part key (64bit integer)
• Id = YYYY DDD SSSSS 000999 (18 digits, 10 bytes)
• Range partitioning on “YYYY DDD SSSSS 000000”
– EPAS : can automatically create the partitions…
• Limit all Queries on last 30 days:
– Where id > to_number ( to_char ( sysdate – 30 )…. ,
– Hence only limited nr of partitions in each query..
• Discuss ?
Bonus-Trick: a PK-Key for Partitioning. 3/3
Using the “known format” of the ID, we can have automatic (interval-) partiions,
And give each where-clause a 30-day-limit. (this slide only one that mentions EPAS)
28. PDVBV
PostgreSQL
Click to edit Master title style
28
Summary (the watch of the cstmr)
• Partitioning: Only From Design.
• 1. Less WAL (on drop/attach/detach)
• 2. Faster Queries (need the Partition Key)
• Use(ful) Cases:
– Time Series / Audit data
– Fast Moving data (batch-deletions…)
– List partitioning = Sharding (discuss !)
• Know + Control your Database + App.
In my opinion: For Large sets of fast moving, time-ordered data. Save on Redo, Optimize SQL.
You must understand the limitations! (before digging deeper… )
29. PDVBV
PostgreSQL
Click to edit Master title style
29
Pitfalls; What to Avoid…
• Avoid Global Indexes
–Extra work on drop-partition
• Avoid “Partition Range All”
–Looping, multiplies the work…
• Consequence:
–All Qries Need “The Part-Key”
• Up Front Design!
Two main Advantages, will try to illustrate both
Piffalls later… other advantages: later.
30. PDVBV
PostgreSQL
Click to edit Master title style
30
Interesting Times Ahead…
• Many Improvements
–(global indexes – soon ?)
• Many Features, Possibilities
–Global Indexes
–List-Partitioning (= Sharding… ?)
–Storage tiers, compression…
• Discuss
–What should be in next year’s ppt...?
Watch this space… Lots of interesting new features + tricks.
Would love to test some of those for Real… But. Beware of over-engineering.
31. PDVBV
PostgreSQL
Click to edit Master title style
31
Don’t Take my word for it…
RTFM: start there!
Test, Play, Test…
@sdjh2000 (Hermann Baer @ vendor)
Simplicity
– In case of doubt: Simplify!
SimpleOracleDba . Blogspot . com
@pdevisser (twitter)
Firefox
literature
Goethe ______________........ (simplicity)
Majority of times, I have been WRONG.So go see for yourself - but don’t complicate life
Favorite quote: “Simplicity shows the Master” .
32. PDVBV
PostgreSQL
Click to edit Master title style
32
Quick Q & A (3 min ;-) 3 .. 2 .. 1 .. Zero
• Questions ?
• Reactions ?
• Experiences from the audience ?
• @pdevisser (twitter..)
Question and Answer time. Discussion welcome (what about that Razor?)
Teach me something: Tell me where you do NOT AGREE.
Thank You !
33. PDVBV
PostgreSQL
Click to edit Master title style
33
He got it …
As Simple as Possible, but not too simple
Simplicity is a Requirement - but Comlexity just sells better (EWD).
35. PDVBV
PostgreSQL
Click to edit Master title style
35
Intermezzo: End of Part-1…
• After the break…
• Q+A, if any
• Bonus Trick PK; Avoid Global Index
• Some Ref-Partitioning, Quirks
• Discussion time…
There is more..
36. PDVBV
Partitioning – P2
Positives and Pitfalls…
Piet de Visser
Simple Oracle DBA
Piet de Visser - PDVBV
Quotes: “The Limitation shows the master” (Goethe), “Simplicity is not a luxury, it is a necessity.
Unfortunately, “Complex’ solutions sell better. (EW Dijkstra). (skofja.Loka-Tolmin golden horn)
37. PDVBV
PostgreSQL
Click to edit Master title style
37
• Use-Case: Parent and Child Tables…
– E.g. “Document” and “Properties”
– (big data.. NoSQL ? )
• Note: going back to Hierarchical datamodel
– With benefits of “RDBMS” (and less data “in JSON”)
• Discuss ?
– Stricter checking, Better Data Quality!
– BDUF ?
– You need SDUF (Some Design Up Front)
Ref-Partitioning… 1/n
Ref partitioning can be used on “hierarchies”, for example if your data is “a document”
But only if you can do some design up front
38. PDVBV
PostgreSQL
Click to edit Master title style
38
Ref Partitioning 2/n
Hieararchie of ref-partitioned table, 3 levels…
*I realize I need better drawing for this… imagine the indexes…
MMT: Parent Table
MMT_CHD
MMT_CHD_CHD
39. PDVBV
PostgreSQL
Click to edit Master title style
39
• Demo: SQL > @demo_part_r1
• Global Index came back to haunt us..
– Default indexes (even for partition-key-PK) …. Global
– Default indexes on dependent-tables… Global.
• Check indexes in SQLDeveloper..
• Demo: SQL > @demo_part_r2
• Discuss ?
Ref-Partitioning… 3/3
Instead of stuffing everything in one or several JSON columns, use real tables+columns..
Devs don’t like the limitation of “Design”.
40. PDVBV
PostgreSQL
Click to edit Master title style
40
Interesting Times Ahead…
• Many Improvements..
–(global indexes – are improving !!)
• Many Other New Features.
–Partial indexing
–Hybrid Partitioned-tbls…. Wow ??!
• Discuss
–What should be in next year’s ppt...
Watch this space… Lots of interesting new features + tricks.
Would love to test some of those for Real… But. Beware of over-engineering.