There have been plenty of “explaining EXPLAIN” type talks over the years, which provide a great introduction to it. They often also cover how to identify a few of the more common issues through it. EXPLAIN is a deep topic though, and to do a good introduction talk, you have to skip over a lot of the tricky bits. As such, this talk will not be a good introduction to EXPLAIN, but instead a deeper dive into some of the things most don’t cover. The idea is to start with some of the more complex and unintuitive calculations needed to work out the relationships between operations, rows, threads, loops, timings, buffers, CTEs and subplans. Most popular tools handle at least several of these well, but there are cases where they don’t that are worth being conscious of and alert to. For example, we’ll have a look at whether certain numbers are averaged per-loop or per-thread, or both. We’ll also cover a resulting rounding issue or two to be on the lookout for. Finally, some per-operation timing quirks are worth looking out for where CTEs and subqueries are concerned, for example CTEs that are referenced more than once. As time allows, we can also look at a few rarer issues that can be spotted via EXPLAIN, as well as a few more gotchas that we’ve picked up along the way. This includes things like spotting when the query is JIT, planning, or trigger time dominated, spotting the signs of table and index bloat, issues like lossy bitmap scans or index-only scans fetching from the heap, as well as some things to be aware of when using auto_explain.
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...InfluxData
Flux was designed to work across databases and data stores. In this talk, Adam will walk through the steps necessary for you to add your own database or custom data source to Flux.
InfluxDB 1.0 - Optimizing InfluxDB by Sam DillardInfluxData
Learn how to optimize InfluxDB 1.0 for performance including hardware and architecture choices, schema design, configuration setup, and running queries. In this InfluxDays NYC 2019 presentation, Sam Dillard provides numerous actionable tips and insights into InfluxDB optimization.
QConSF 2014 talk on Netflix Mantis, a stream processing systemDanny Yuan
Justin and I gave this talk in QCon SF 2014 about the Mantis, a stream processing system that features a reactive programming API, auto scaling, and stream locality
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...InfluxData
Dean will provide practical tips and techniques learned from helping hundreds of customers deploy InfluxDB and InfluxDB Enterprise. This includes hardware and architecture choices, schema design, configuration setup, and running queries.
Wayfair Use Case: The four R's of Metrics DeliveryInfluxData
Wayfair currently uses both Graphite and InfluxDB as a time series platform - The data is used by their developers, business stakeholders, by their internal alerting engine. Most importantly their 24x7 Ops Monitoring Center is using this data to constantly analyze the vital signs of Wayfair’s IT infrastructure and storefront operations.
InfluxQL is a powerful query language for InfluxDB, and TICKScript is a domain specific language used by Kapacitor to define tasks involving the extraction, transformation and loading of data and also involving the tracking of arbitrary changes and detection of events within data. The combination of these two can make your monitoring apps powerful. During this session, InfluxData Engineer Michael DeSa will share best practices for using these powerful tools. Prerequisite: Intro To Kapacitor.
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...InfluxData
Flux was designed to work across databases and data stores. In this talk, Adam will walk through the steps necessary for you to add your own database or custom data source to Flux.
InfluxDB 1.0 - Optimizing InfluxDB by Sam DillardInfluxData
Learn how to optimize InfluxDB 1.0 for performance including hardware and architecture choices, schema design, configuration setup, and running queries. In this InfluxDays NYC 2019 presentation, Sam Dillard provides numerous actionable tips and insights into InfluxDB optimization.
QConSF 2014 talk on Netflix Mantis, a stream processing systemDanny Yuan
Justin and I gave this talk in QCon SF 2014 about the Mantis, a stream processing system that features a reactive programming API, auto scaling, and stream locality
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...InfluxData
Dean will provide practical tips and techniques learned from helping hundreds of customers deploy InfluxDB and InfluxDB Enterprise. This includes hardware and architecture choices, schema design, configuration setup, and running queries.
Wayfair Use Case: The four R's of Metrics DeliveryInfluxData
Wayfair currently uses both Graphite and InfluxDB as a time series platform - The data is used by their developers, business stakeholders, by their internal alerting engine. Most importantly their 24x7 Ops Monitoring Center is using this data to constantly analyze the vital signs of Wayfair’s IT infrastructure and storefront operations.
InfluxQL is a powerful query language for InfluxDB, and TICKScript is a domain specific language used by Kapacitor to define tasks involving the extraction, transformation and loading of data and also involving the tracking of arbitrary changes and detection of events within data. The combination of these two can make your monitoring apps powerful. During this session, InfluxData Engineer Michael DeSa will share best practices for using these powerful tools. Prerequisite: Intro To Kapacitor.
Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData InfluxData
This talk introduces the SQL data source for Flux. It will start with examples of using data from MySQL or Postgres with time series data from InfluxDB. It will then go over the details of how the SQL data source was created.
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesAltinity Ltd
Slides for the Webinar, presented on March 6, 2019
For the webinar video visit https://www.altinity.com/
Extracting business insight from massive pools of machine-generated data is the central analytic problem of the digital era. ClickHouse data warehouse addresses it with sub-second SQL query response on petabyte-scale data sets. In this talk we'll discuss the features that make ClickHouse increasingly popular, show you how to install it, and teach you enough about how ClickHouse works so you can try it out on real problems of your own. We'll have cool demos (of course) and gladly answer your questions at the end.
Speaker Bio:
Robert Hodges is CEO of Altinity, which offers enterprise support for ClickHouse. He has over three decades of experience in data management spanning 20 different DBMS types. ClickHouse is his current favorite. ;)
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and TelegrafInfluxData
Did you know you can use InfluxDB to monitor your BBQ and to ensure the tastiest results? Join this meetup to learn two different approaches to using a time series database to monitor a BBQ or a smoker. Learn how Will Cooke uses Python, MQTT, Telegraf and InfluxDB 2.0 to monitor his smoker and to gain insight into temperature changes, the stall, and other important stats about his brisket. Scott Anderson will demonstrate how he uses a FireBoard wireless thermometer, Telegraf and InfluxDB 2.0 to continuously work towards the perfect smoke.
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...Accumulo Summit
Talk Abstract
As with all open-source databases, Accumulo developers often compete between building exciting new features and hacking on performance and stability. As the core features solidify and expand, we see many opportunities to improve performance. An effective methodology for performance improvement is scientific in nature, and follows a well-definite modeling and simulation approach, matching theory to experimentation in an iterative fashion.
Ingest performance is one of the most differentiating characteristics of Accumulo. However, there is much room for improvement for typical ingest-heavy applications. Accumulo supports two mechanisms to bring data in: streaming ingest and bulk ingest. In bulk ingest, the goal is to maximize throughput without constraining latency. Bulk ingest involves creating a set of files that conform to Accumulo's internal RFile format and then registering those files with Accumulo. MapReduce provides a framework for generating, sorting, and storing key/value pairs, which form the primary elements of preparing RFiles for bulk ingest. MapReduce has been used many times over the years to break sorting records, such as Terasort. We can expect it is a reasonable choice for maximizing bulk ingest throughput. However, the theory often proves challenging to implement as there are many performance pitfalls along the way.
In this talk, we dive deep into optimizing MapReduce for Accumulo bulk ingest. We share detailed theoretical and empirical performance models, we discuss techniques for profiling performance, and we suggest reusable techniques for squeezing the maximum performance out of enterprise-grade Accumulo bulk ingest.
Speaker
Chris McCubbin
Director of Data Science, Sqrrl
Chris is the Director of Data Science for Sqrrl. He has extensive experience with the Hadoop ecosystem and applying scientific computation algorithms to real-world datasets. Previously, Chris developed Big Data analysis tools for the Intelligence Community and applied artificial intelligence techniques to unmanned vehicle systems. He holds a MS in Computer Science and BS in Computer Science and Mathematics from the University of Maryland.
Kapacitor - Real Time Data Processing EnginePrashant Vats
Kapacitor is a native data processing engine.Kapacitor is a native data processing engine.It can process both stream and batch data from InfluxDB.It lets you plug in your own custom logic or user-defined functions to process alerts with dynamic thresholds. Key Kapacitor Capabilities
-Alerting
-ETL (Extraction, Transformation and Loading)
-Action Oriented
-Streaming Analytics
-Anomaly Detection
Kapacitor uses a DSL (Domain Specific Language) called TICKscript to define tasks.
A guide for what to pay attention to when it comes to monitoring and managing your PostgreSQL database. The content is targeted at application developers as opposed to the typical DBA.
Paul will outline his vision around the platform and give the latest updates on IFQL ( a new query language), the decoupling of query and storage, the impact of hybrid cloud environments on architecture, cardinality, and discuss the technical directions of the platform. This talk will walk through the vision and architecture with demonstrations of working prototypes of the projects.
codecentric AG: Using Cassandra and Clojure for Data Crunching backendsDataStax Academy
Choosing tooling for Data Crunching and Analytics is no easy task. Understanding the way Cassandra works and treats the data can open up a lot of opportunities for optimization in your back-end. Learn how we developed a Smart Platform for complex, multidimensional analytics using the best available tools.
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco SlotCitus Data
One of the unique things about postgres is that extensions can add new functionality that falls outside the scope of a SQL database. Several postgres extensions add the ability to perform commands or query data on other servers. These extensions can be combined in interesting ways to form advanced distributed systems on top of postgres.
In this talk we will explore how extensions such as dblink, postgres_fdw, pglogical, pg_cron, and citus together with PL/pgSQL can be used as building blocks for distributed systems. We will give several demonstrations of using PostgreSQL as a distributed computing platform, including a MapReduce implementation that can transform very large tables, and a Kafka-like distributed queue.
Are you using the fastest query tool for Hadoop? Provide and discuss the latest performance results of the industry standard TPC_H benchmarks executed across an assortment of open source query tools such as Hive (using MR, TEZ, LLAP, SPARK), SparkSQL, Presto, and Drill. Additionally, the performance tests will utilize a variety of data sizes and popular storage formats such as ORC, Parquet and Text and compression codecs.
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...Accumulo Summit
Talk Abstract
Accumulo has a solid theoretical foundation, endowing it with huge scalability, high reliability, and the makings of class-leading performance for NoSQL operations. Several publications show Accumulo achieving multi-petabyte scalability and outperforming other databases in its class by orders of magnitude. However, there are challenges arising in practice that slow down that performance and introduce bottlenecks.
The root of Accumulo's distributed scale and performance while maintaining consistency comes from a multi-level amplification. Zookeeper bootstraps the consistency with a highly durable quorum. The Accumulo root table uses buffering and caching to boost that performance for sorted key/value operations. With the metadata tablets and data tables, Accumulo continues to boost performance and divides and conqures a highly scalable key/value space to leverage the resources of a large cluster. The challenge arrises when metadata operations at the core of Accumulo bottleneck performance for the entire cluster.
In this talk we will describe the Accumulo metadata operations model in detail. With a couple of prototypical application scenarios, we will show a few areas that are current bottlenecks or that we can expect to be bottlenecks in the near future. We will also propose modifications to the current model and outline projects that the community can take on to keep Accumulo in the lead for performance and scalability.
Speaker
Adam Fuchs
Chief Technology Officer, Sqrrl
As the Chief Technology Officer and co-founder of Sqrrl, Adam Fuchs is responsible for ensuring that Sqrrl is leading the world in Big Data Infrastructure technology. Previously at the National Security Agency, Adam was an innovator and technical director for several database projects, handling some of the world’s largest and most diverse data sets. He is a co-founder of the Apache Accumulo project. Adam has a BS in Computer Science from the University of Washington and has completed extensive graduate-level course work at the University of Maryland.
Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData InfluxData
This talk introduces the SQL data source for Flux. It will start with examples of using data from MySQL or Postgres with time series data from InfluxDB. It will then go over the details of how the SQL data source was created.
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesAltinity Ltd
Slides for the Webinar, presented on March 6, 2019
For the webinar video visit https://www.altinity.com/
Extracting business insight from massive pools of machine-generated data is the central analytic problem of the digital era. ClickHouse data warehouse addresses it with sub-second SQL query response on petabyte-scale data sets. In this talk we'll discuss the features that make ClickHouse increasingly popular, show you how to install it, and teach you enough about how ClickHouse works so you can try it out on real problems of your own. We'll have cool demos (of course) and gladly answer your questions at the end.
Speaker Bio:
Robert Hodges is CEO of Altinity, which offers enterprise support for ClickHouse. He has over three decades of experience in data management spanning 20 different DBMS types. ClickHouse is his current favorite. ;)
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and TelegrafInfluxData
Did you know you can use InfluxDB to monitor your BBQ and to ensure the tastiest results? Join this meetup to learn two different approaches to using a time series database to monitor a BBQ or a smoker. Learn how Will Cooke uses Python, MQTT, Telegraf and InfluxDB 2.0 to monitor his smoker and to gain insight into temperature changes, the stall, and other important stats about his brisket. Scott Anderson will demonstrate how he uses a FireBoard wireless thermometer, Telegraf and InfluxDB 2.0 to continuously work towards the perfect smoke.
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...Accumulo Summit
Talk Abstract
As with all open-source databases, Accumulo developers often compete between building exciting new features and hacking on performance and stability. As the core features solidify and expand, we see many opportunities to improve performance. An effective methodology for performance improvement is scientific in nature, and follows a well-definite modeling and simulation approach, matching theory to experimentation in an iterative fashion.
Ingest performance is one of the most differentiating characteristics of Accumulo. However, there is much room for improvement for typical ingest-heavy applications. Accumulo supports two mechanisms to bring data in: streaming ingest and bulk ingest. In bulk ingest, the goal is to maximize throughput without constraining latency. Bulk ingest involves creating a set of files that conform to Accumulo's internal RFile format and then registering those files with Accumulo. MapReduce provides a framework for generating, sorting, and storing key/value pairs, which form the primary elements of preparing RFiles for bulk ingest. MapReduce has been used many times over the years to break sorting records, such as Terasort. We can expect it is a reasonable choice for maximizing bulk ingest throughput. However, the theory often proves challenging to implement as there are many performance pitfalls along the way.
In this talk, we dive deep into optimizing MapReduce for Accumulo bulk ingest. We share detailed theoretical and empirical performance models, we discuss techniques for profiling performance, and we suggest reusable techniques for squeezing the maximum performance out of enterprise-grade Accumulo bulk ingest.
Speaker
Chris McCubbin
Director of Data Science, Sqrrl
Chris is the Director of Data Science for Sqrrl. He has extensive experience with the Hadoop ecosystem and applying scientific computation algorithms to real-world datasets. Previously, Chris developed Big Data analysis tools for the Intelligence Community and applied artificial intelligence techniques to unmanned vehicle systems. He holds a MS in Computer Science and BS in Computer Science and Mathematics from the University of Maryland.
Kapacitor - Real Time Data Processing EnginePrashant Vats
Kapacitor is a native data processing engine.Kapacitor is a native data processing engine.It can process both stream and batch data from InfluxDB.It lets you plug in your own custom logic or user-defined functions to process alerts with dynamic thresholds. Key Kapacitor Capabilities
-Alerting
-ETL (Extraction, Transformation and Loading)
-Action Oriented
-Streaming Analytics
-Anomaly Detection
Kapacitor uses a DSL (Domain Specific Language) called TICKscript to define tasks.
A guide for what to pay attention to when it comes to monitoring and managing your PostgreSQL database. The content is targeted at application developers as opposed to the typical DBA.
Paul will outline his vision around the platform and give the latest updates on IFQL ( a new query language), the decoupling of query and storage, the impact of hybrid cloud environments on architecture, cardinality, and discuss the technical directions of the platform. This talk will walk through the vision and architecture with demonstrations of working prototypes of the projects.
codecentric AG: Using Cassandra and Clojure for Data Crunching backendsDataStax Academy
Choosing tooling for Data Crunching and Analytics is no easy task. Understanding the way Cassandra works and treats the data can open up a lot of opportunities for optimization in your back-end. Learn how we developed a Smart Platform for complex, multidimensional analytics using the best available tools.
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco SlotCitus Data
One of the unique things about postgres is that extensions can add new functionality that falls outside the scope of a SQL database. Several postgres extensions add the ability to perform commands or query data on other servers. These extensions can be combined in interesting ways to form advanced distributed systems on top of postgres.
In this talk we will explore how extensions such as dblink, postgres_fdw, pglogical, pg_cron, and citus together with PL/pgSQL can be used as building blocks for distributed systems. We will give several demonstrations of using PostgreSQL as a distributed computing platform, including a MapReduce implementation that can transform very large tables, and a Kafka-like distributed queue.
Are you using the fastest query tool for Hadoop? Provide and discuss the latest performance results of the industry standard TPC_H benchmarks executed across an assortment of open source query tools such as Hive (using MR, TEZ, LLAP, SPARK), SparkSQL, Presto, and Drill. Additionally, the performance tests will utilize a variety of data sizes and popular storage formats such as ORC, Parquet and Text and compression codecs.
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...Accumulo Summit
Talk Abstract
Accumulo has a solid theoretical foundation, endowing it with huge scalability, high reliability, and the makings of class-leading performance for NoSQL operations. Several publications show Accumulo achieving multi-petabyte scalability and outperforming other databases in its class by orders of magnitude. However, there are challenges arising in practice that slow down that performance and introduce bottlenecks.
The root of Accumulo's distributed scale and performance while maintaining consistency comes from a multi-level amplification. Zookeeper bootstraps the consistency with a highly durable quorum. The Accumulo root table uses buffering and caching to boost that performance for sorted key/value operations. With the metadata tablets and data tables, Accumulo continues to boost performance and divides and conqures a highly scalable key/value space to leverage the resources of a large cluster. The challenge arrises when metadata operations at the core of Accumulo bottleneck performance for the entire cluster.
In this talk we will describe the Accumulo metadata operations model in detail. With a couple of prototypical application scenarios, we will show a few areas that are current bottlenecks or that we can expect to be bottlenecks in the near future. We will also propose modifications to the current model and outline projects that the community can take on to keep Accumulo in the lead for performance and scalability.
Speaker
Adam Fuchs
Chief Technology Officer, Sqrrl
As the Chief Technology Officer and co-founder of Sqrrl, Adam Fuchs is responsible for ensuring that Sqrrl is leading the world in Big Data Infrastructure technology. Previously at the National Security Agency, Adam was an innovator and technical director for several database projects, handling some of the world’s largest and most diverse data sets. He is a co-founder of the Apache Accumulo project. Adam has a BS in Computer Science from the University of Washington and has completed extensive graduate-level course work at the University of Maryland.
How the query planner in PostgreSQL works? Index access methods, join execution types, aggregation & pipelining. Optimizing queries with WHERE conditions, ORDER BY and GROUP BY. Composite indexes, partial and expression indexes. Exploiting assumptions about data and denormalization.
Abstract:
Many machine learning algorithms can be implemented to run parallel operations on graphics cards. Deeplearning4j is a Java-based machine learning library, which includes implementations of many popular neural-network algorithms. Deeplearning4j uses uses a library called Nd4j to run matrix algebra operations on either CPUs or GPUs with NVIDIA’s CUDA API.
In this talk, I will show how to get a simple machine learning algorithm running on the GPU. I will also cover how to get started with CUDA development: how to get your code to run on the GPU, how to monitor the device, and how to write code to make effective use of parralelization.
Bio: Gary Sieling is a Lead Software Engineer at IQVIA, in Blue Bell, PA, with an interests in database technologies, machine learning, and software engineering practices. He has been involved in curating talks for a company lunch and learn program and the organizing committee for a tech conference. Building on these experiences, he built a search engine called FindLectures.com to help find great talks and speakers.
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course PROIDEA
While losing to Oracle in features, it's losing marginally. While not so long on the market, it's still second best. While not so funky and shiny like new NoSQL DBs, it's arguably most shiny of all relational DBs and it has a colourful history. So, let me tell you about Postgresql architecture and internals, walk you through query path and optimization, let me hint about no hinting and how and why, in another thread we'll talk about MVCC and vacuum and if there will be time for more, we'll have a round of questions.
Speaker: Oscar Westra van Holthe - Kind
Genre & level: Backend, Medior
Data congestion slows down even the fastest database. Want to avoid getting your data jammed? I’ll show you how you can turn your database into an information highway. Get your recipes for query performance improvements here.
This is the speech Shen Li gave at GopherChina 2017.
TiDB is an open source distributed database. Inspired by the design of Google F1/Spanner, TiDB features in infinite horizontal scalability, strong consistency, and high availability. The goal of TiDB is to serve as a one-stop solution for data storage and analysis.
In this talk, we will mainly cover the following topics:
- What is TiDB
- TiDB Architecture
- SQL Layer Internal
- Golang in TiDB
- Next Step of TiDB
Did you know that Python preallocates integers from -5 to 257? Reusing them 1000 times, instead of allocating memory for a bigger integer, can save you a couple milliseconds of code’s execution time. If you want to learn more about this kind of optimizations then, … well, probably this presentation is not for you :) Instead of going into such small details, I will talk about more “sane” ideas for writing faster code.
After a brief overview of how you can speed up your Python code in general, we will dig into source code optimization. I will show you some simple and fast ways of measuring the execution time of your code, and then we will discuss examples of how to improve some common code structures.
You will see:
* The fastest way of removing duplicates from a list
* How much faster your code is when you reuse the built-in functions instead of trying to reinvent the wheel
* What is faster than the “for loop”
* If the lookup is faster in a list or a set
* When it’s better to beg for forgiveness than to ask for permission
GALE: Geometric active learning for Search-Based Software EngineeringCS, NcState
Multi-objective evolutionary algorithms (MOEAs) help software engineers find novel solutions to complex problems. When automatic tools explore too many options, they are slow to use and hard to comprehend. GALE is a near-linear time MOEA that builds a piecewise approximation to the surface of best solutions along the Pareto frontier. For each piece, GALE mutates solutions towards the better end. In numerous case studies, GALE finds comparable solutions to standard methods (NSGA-II, SPEA2) using far fewer evaluations (e.g. 20 evaluations, not 1,000). GALE is recommended when a model is expensive to evaluate, or when some audience needs to browse and understand how an MOEA has made its conclusions.
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...GeeksLab Odessa
SubScript - это расширение языка Scala, добавляющее поддержку конструкций и синтаксиса аглебры общающихся процессов (Algebra of Communicating Processes, ACP). SubScript является перспективным расширением, применимым как для разработки высоконагруженных параллельных систем, так и для простых персональных приложений.
This one is about advanced indexing in PostgreSQL. It guides you through basic concepts as well as through advanced techniques to speed up the database.
All important PostgreSQL Index types explained: btree, gin, gist, sp-gist and hashes.
Regular expression indexes and LIKE queries are also covered.
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSEDB
Moving to the cloud is hard, and moving Postgres databases to the cloud is even harder. Public cloud or private cloud? Infrastructure as a Service (IaaS), or Platform as a Service (PaaS)? Kubernetes for the application, or for the database and the application? This talk will juxtapose self-managed Kubernetes and container-based database solutions, Postgres deployments on IaaS, and Postgres DBaaS solutions of which EDB’s DBaaS BigAnimal is the latest example.
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr UnternehmenEDB
Dieses Webinar hilft Ihnen, die Unterschiede zwischen den verschiedenen Replikationsansätzen zu verstehen, die Anforderungen der jeweiligen Strategie zu erkennen und sich über die Möglichkeiten klar zu werden, was mit jeder einzelnen zu erreichen ist. Damit werden Sie hoffentlich eher in der Lage sein, herauszufinden, welche PostgreSQL-Replikationsarten Sie wirklich für Ihr System benötigen.
- Wie physische und logische Replikation in PostgreSQL funktionieren
- Unterschiede zwischen synchroner und asynchroner Replikation
- Vorteile, Nachteile und Herausforderungen bei der Multi-Master-Replikation
- Welche Replikationsstrategie für unterschiedliche Use-Cases besser geeignet ist
Referent:
Borys Neselovskyi, Regional Sales Engineer DACH, EDB
------------------------------------------------------------
For more #webinars, visit http://bit.ly/EDB-Webinars
Download free #PostgreSQL whitepapers: http://bit.ly/EDB-Whitepapers
Read our #Postgres Blog http://bit.ly/EDB-Blogs
Follow us on Facebook at http://bit.ly/EDB-FB
Follow us on Twitter at http://bit.ly/EDB-Twitter
Follow us on LinkedIn at http://bit.ly/EDB-LinkedIn
Reach us via email at marketing@enterprisedb.com
Cuando busca alternativas a Oracle en la nube, hacer el cambio puede parecer un trabajo duro. Entendemos que la migración involucra más que solo la base de datos. La compatibilidad es un punto clave, especialmente cuando se consideran los recursos que posiblemente ya haya invertido en Oracle, como por ejemplo el código de aplicación específico de Oracle.Este seminario web explorará las opciones y las principales consideraciones al pasar de las bases de datos de Oracle a la nube.
- Revisión detallada de las ofertas de bases de datos disponibles en la nube
- Factores críticos que se deben considerar considerar para elegir la oferta en la nube más adecuada
- Cómo la experiencia de EDB con PostgreSQL puede ayudarlo en su decisión
- Demostración de BigAnimal de EDB
Présentateur:
Sergio Romera, Senior Sales Engineer EMEA, EDB
------------------------------------------------------------
For more #webinars, visit http://bit.ly/EDB-Webinars
Download free #PostgreSQL whitepapers: http://bit.ly/EDB-Whitepapers
Read our #Postgres Blog http://bit.ly/EDB-Blogs
Follow us on Facebook at http://bit.ly/EDB-FB
Follow us on Twitter at http://bit.ly/EDB-Twitter
Follow us on LinkedIn at http://bit.ly/EDB-LinkedIn
Reach us via email at marketing@enterprisedb.com
Database come PostgreSQL non possono girare su Kubernetes. Questo è il ritornello che sentiamo continuamente, ma al tempo stesso la motivazione per noi di EDB di abbattere questo muro, una volta per tutte.
In questo webinar parleremo della nostra avventura finora per portare PostgreSQL su Kubernetes. Scopri perché crediamo che fare benchmark di storage e del database prima di andare in produzione porti a una più sana e longeva vita di un DBMS, anche su Kubernetes.
Condivideremo il nostro processo, i risultati fin qui ottenuti e sveleremo i nostri piani per il futuro con Cloud Native PostgreSQL.
Las Variaciones de la Replicación de PostgreSQLEDB
Replicación física, replicación lógica, síncrona, asíncrona, multi-maestro, escalabilidad horizontal, etc. Son muchos los términos asociados con la replicación de bases de datos. En esta charla revisaremos los conceptos fundamentales detrás de cada variación de la replicación de PostgreSQL, y en qué casos conviene usar una o la otra. La presentación incluye una parte práctica con demostraciones aunque no será un tutorial sobre como configurar un cluster. El enfoque está en entender cada variación para elegir la mejor dependiendo del caso de uso.
Cosas que aprenderán:
- Cómo funciona la replicación física en PostgreSQL
- Cómo funciona la replicación lógica en PostgreSQL
- Diferencias entre replicación síncrona y asíncrona
- Qué es replicación multi-maestro
NoSQL and Spatial Database Capabilities using PostgreSQLEDB
PostgreSQL is an object-relational database system. NoSQL on the other hand is a non-relational database and is document-oriented. Learn how the PostgreSQL database gives one the flexible options to combine NoSQL workloads with the relational query power by offering JSON data types. With PostgreSQL, new capabilities can be developed and plugged into the database as required.
Attend this webinar to learn:
- The new features and capabilities in PostgreSQL for new workloads, requiring greater flexibility in the data model
- NoSQL with JSON, Hstore and its performance and features for enterprises
- Spatial SQL - advanced features in PostGIS application with PostGIS extension
"Why use PgBouncer? It’s a lightweight, easy to configure connection pooler and it does one job well. As you’d expect from a talk on connection pooling, we’ll give a brief summary of connection pooling and why it increases efficiency. We’ll look at when not to use connection pooling, and we’ll demonstrate how to configure PgBouncer and how it works. But. Did you know you can also do this? 1. Scaling PgBouncer PgBouncer is single threaded which means a single instance of PgBouncer isn’t going to do you much good on a multi-threaded and/or multi-CPU machine. We’ll show you how to add more PgBouncer instances so you can use more than one thread for easy scaling. 2. Read-write / read only routing Using different pgBouncer databases you can route read-write traffic to the primary database and route read-only traffic to a number of standby databases. 3. Load balancing When we use multiple PgBouncer instances, load balancing comes for free. Load balancing can be directed to different standbys, and weighted according to ratios of load. 4. Silent failover You can perform silent failover during promotion of a new primary (assuming you have a VIP/DNS etc that always points to the primary). 5. And even: DoS prevention and protection from “badly behaved” applications! By using distinct port numbers you can provide database connections which deal with sudden bursts of incoming traffic in very different ways, which can help prevent the database from becoming swamped during high activity periods. You should leave the presentation wondering if there is anything PgBouncer can’t do."
In this talk I'll discuss how we can combine the power of PostgreSQL with TensorFlow to perform data analysis. By using the pl/python3 procedural language we can integrate machine learning libraries such as TensorFlow with PostgreSQL, opening the door for powerful data analytics combining SQL with AI. Typical use-cases might involve regression analysis to find relationships in an existing dataset and to predict results based on new inputs, or to analyse time series data and extrapolate future data taking into account general trends and seasonal variability whilst ignoring noise. Python is an ideal language for building custom systems to do this kind of work as it gives us access to a rich ecosystem of libraries such as Pandas and Numpy, in addition to TensorFlow itself.
Practical Partitioning in Production with PostgresEDB
Has your table become too large to handle? Have you thought about chopping it up into smaller pieces that are easier to query and maintain? What if it's in constant use? An introduction to the problems that can arise and how PostgreSQL's partitioning features can help, followed by a real-world scenario of partitioning an existing huge table on a live system. We will be looking at the problems caused by having very large tables in your database and how declarative table partitioning in Postgres can help. Also, how to perform dimensioning before but also after creating huge tables, partitioning key selection, the importance of upgrading to get the latest Postgres features and finally we will dive into a real-world scenario of having to partition an existing huge table in use on a production system.
Internet of Things is a currently a burgeoning market, and is often associated with specialized data-stores. However PostgreSQL is just as capable at this use-case and can offer some compelling advantages. We’ll explore ways to store IoT data in PostgreSQL covering various ways to store and structure this kind of data. How range types and differing types of indexes can be of use. Also taking a quick look at some extensions designed for this use case. Then looking at powerful SQL features which can really help when analyzing IoT data streams, and how the power of a real SQL database can be a key advantage.
I would like you to join me on our journey from a complex, multi instance Oracle topology to a single logical database in PostgreSQL. Each technology and architectural decision point will be discussed describing how we arrived at our destination. There are five keys areas that will be covered: - Target architecture - Migration of database objects (tables, indexes, views, synonyms, etc) - Migration of database code (packages, functions, procedures, triggers) - Application tier - Migration of Data - with minimal downtime during cutover The target architecture is a BDR cluster, where the physical data model and data stored is different between the logical standbys and the lead master/shadow master. Will discuss how this allowed for the simplification of the topology, and the benefits this delivered. Before you go there, yes I know PostgreSQL does no have synonyms, but an alternative approach was needed. There is a significant amount of business logic in the database tier all of which needed to be translated into database code. Will look at the tools and extensions available to reproduce the functionality in PostgreSQL. Look at common non-ISO standard SQL embedded in the application tier, along with jdbc challenges. Finally a look at some of the data movement tools available. Full disclosure, we are still on the journey but have learnt a lot on the way.
The proposed talk will go through several questions. The first obvious one is why would I bother learn CLI when I can do whatever I need with a GUI tool?
We'll try to answer why knowing CLI is a MUST for some people (like Postgres DBAs, for example) whereas it's only a bonus for others (like data scientists, for example).
Then we'll go through the basics about 101 (how to connect to, interactive mode versus not interactive more, how to set psql environment to work comfortably and so on...)
The last part will be about tips and tricks that will make anyone's journey with psql more effective and enjoyable. I'm looking for the "TIL" effect in people's eyes.
EDB 13 - New Enhancements for Security and Usability - APJEDB
Database security is always of paramount importance to all organizations. In this webinar, we will explore the security, usability, and portability updates of the latest version of the EDB database server and tools.
Join us in this webinar to learn:
- The new security features such as SCRAM and the encryption of database passwords and traffic between Failover Manager agents
- Usability updates that automate partitioning, verify backup integrity, and streamline the management of failover and backups
- Portability improvements that simplify running PostgreSQL across on-premise and cloud environments
Dans ce webinar, nous allons parler des différences entre une sauvegarde physique et une sauvegarde logique. Nous allons lister les avantages et inconvénients, les principales considérations et les outils disponibles pour les deux méthodes.
- Perte de données
- Exports logiques
- Standbys
- WALs et Recovery
- Snapshots VM/Disques
- Sauvegardes physique
- Conclusion
Vieni a scoprire Cloud Native PostgreSQL (CNP), l’operatore per Kubernetes, direttamente da coloro che lo hanno ideato e lo sviluppano in EDB.
CNP facilita l’integrazione di database PostgreSQL con le tue applicazioni all’interno di cluster Kubernetes e OpenShift Container Platform di RedHat, grazie alla sua gestione automatica dell’architettura primario/standby che include: self-healing, failover, switchover, rolling update, backup, ecc.
Durante il webinar affronteremo i seguenti punti:
- DevOps e Cloud Native
- Introduzione a Cloud Native PostgreSQL
- Architetture
- Caratteristiche principali
- Esempi di uso e configurazione
- Kubernetes, Storage e Postgres
- Demo
- Conclusioni
New enhancements for security and usability in EDB 13EDB
EDB 13 enhances our flagship database server and tools. This webinar will explore its security, usability, and portability updates. Join us to learn how EDB 13 can help you improve your PostgreSQL productivity and data protection.
Webinar highlights include:
- New security features such as SCRAM and the encryption of database passwords and traffic between Failover Manager agents
- Usability updates that automate partitioning, verify backup integrity and streamline the management of failover and backups
- Portability improvements that simplify running PostgreSQL across on-premise and cloud environments
The webinar will review a multi-layered framework for PostgreSQL security, with a deeper focus on limiting access to the database and data, as well as securing the data.
Using the popular AAA (Authentication, Authorization, Auditing) framework we will cover:
- Best practices for authentication (trust, certificate, MD5, Scram, etc).
- Advanced approaches, such as password profiles.
- Deep dive of authorization and data access control for roles, database objects (tables, etc), view usage, row-level security, and data redaction.
- Auditing, encryption, and SQL injection attack prevention.
Note: this session is delivered in German
Speaker:
Borys Neselovskyi, Sales Engineer, EDB
EDB Cloud Native Postgres includes database container images and a Kubernetes Operator that manage the lifecycle of a database from deployment to operations. This Kubernetes Operator for Postgres is written by EDB entirely from scratch in the Go language and relies exclusively on the Kubernetes API.
Attend this webinar to learn about:
- DevOps & Cloud Native
- Overview of Cloud Native Postgres
- Storage for Postgres workloads in Kubernetes
- Using Cloud Native Postgres
- Demo
The webinar will review a multi-layered framework for PostgreSQL security, with a deeper focus on limiting access to the database and data, as well as securing the data.
Using the popular AAA (Authentication, Authorization, Auditing) framework we will cover:
- Best practices for authentication (trust, certificate, MD5, Scram, etc).
- Advanced approaches, such as password profiles.
- Deep dive of authorization and data access control for roles, database objects (tables, etc), view usage, row-level security, and data redaction.
- Auditing, encryption, and SQL injection attack prevention.
Note: this session is delivered in French
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
2. Hi, I’m Michael
Half of the team behind pgMustard
Spent a lot of time looking into EXPLAIN
Background: product management, database tools
pgmustard.com/docs/explain
michael@pgmustard.com
michristofides
3. Picking up from other EXPLAIN talks
Not the basics*
1) Some of the less intuitive arithmetic
2) Some less well covered issues
* postgresql.org/docs/current/performance-tips
thoughtbot: reading EXPLAIN ANALYZE
YouTube: Josh Berkus Explaining EXPLAIN
4. Picking up from other EXPLAIN talks
Not the basics*
1) Arithmetic: why is this query slow?
2) Issues: what can we do about it?
* postgresql.org/docs/current/performance-tips
thoughtbot: reading EXPLAIN ANALYZE
YouTube: Josh Berkus Explaining EXPLAIN
5. Arithmetic: loops
Many of the stats are a per-loop average
This includes costs, rows, timings
Watch out for rounding, especially to 0 rows
7. Nested Loop
(cost=0.84..209.82 rows=16 width=11)
(actual time=0.076..0.368 rows=86 loops=1)
-> Index Only Scan using a on b
(cost=0.42..4.58 rows=9 width=4)
(actual time=0.013..0.019 rows=9 loops=1)
-> Index Scan using x on y
(cost=0.42..22.73 rows=7 width=15)
(actual time=0.012..0.030 rows=10 loops=9)
Nested Loop: 86 rows
Index Scan: 9 * 10 = 90 rows
(Rounding not too bad here)
8. Arithmetic: threads
Costs, rows, and timings are also per-thread
Shown as loops
Threads = workers + 1
Tip: use VERBOSE
<- the leader
11. Nested Loop (... loops=1)
Buffers: shared hit=105
-> Index Only Scan using a on b (... loops=1)
Buffers: shared hit=4
-> Index Scan using x on y (... loops=9)
Buffers: shared hit=101
Nested Loop buffers:
105 - (101 + 4) = 0 blocks
13. Nested Loop
(cost=0.84..209.82 rows=16 width=11)
(actual time=0.076..0.368 rows=86 loops=1)
-> Index Only Scan using a on b
(cost=0.42..4.58 rows=9 width=4)
(actual time=0.013..0.019 rows=9 loops=1)
-> Index Scan using x on y
(cost=0.42..22.73 rows=7 width=15)
(actual time=0.012..0.030 rows=10 loops=9)
Index Scan: 0.030 * 9 = 0.27 ms
Nested Loop: 0.368 - 0.27 - 0.019
= 0.079 ms
14. WITH init AS (
SELECT * FROM pg_sleep_for('100ms')
UNION ALL
SELECT * FROM pg_sleep_for('200ms')
)
(SELECT * FROM init LIMIT 1)
UNION ALL
(SELECT * FROM init);
Credit @felixge
15. Append (actual time=100.359..
300.688 … )
CTE init
-> Append (actual time=100.334..
300.652 … )
-> Function Scan (actual time=100.333..
100.335 … )
-> Function Scan (actual time=200.310..
200.312 … )
-> Limit (actual time=100.358..
100.359 … )
-> CTE Scan a (actual time=100.355..100.356 … )
-> CTE Scan b (actual time=0.001..
200.322 … )
Execution Time: 300.789 ms
Further reading:
flame-explain.com/docs/general/quirk-correction
Some double-counting in this case.
16. Arithmetic: tools can help
eg explain.depesz.com
explain.dalibo.com
flame-explain.com
pgmustard.com
<- fellow calculations nerd
<- 👋
17. Issues: let’s skip the basics
Seq Scans with large filters
Bad row estimates
Sorts and Hashes on disk
18. Issues: inefficient index scans
Looks out for lots of rows being filtered
Filters are per-loop
So again, watch out for rounding
19. -> Index Scan using x on y
(cost=0.42..302502.05 rows=1708602 width=125)
(actual time=172810.219..173876.540 rows=1000 loops=1)
Index Cond: (id = another_id)
Filter: (status = 1)
Rows Removed by Filter: 3125626
Index efficiency: 1000/(1000+3125626) = 0.03%
Watch out for high loops
20. Issues: lossy bitmap scans
When bitmap would otherwise exceed work_mem
Point to a block rather than a row (Tuple Id)
Lossy blocks are a total (ie not per-loop)
21. -> Bitmap Heap Scan on table
(cost=49153.29..4069724.27 rows=3105598 width=1106)
(actual time=591.928..56472.895 rows=3853272 loops=1)
Recheck Cond: (something > something_else)
Rows Removed by Index Recheck: 5905323
Heap Blocks: exact=14280 lossy=1951048
Lossy blocks: 1951048/(1951048+14280) = 99%
Extra rows read: 5.9 million
22. Issues: lots of data read
Requires BUFFERS
Lots of data being read for the amount returned
Can be a sign of bloat
Default block size: 8kB
23. -> Index Scan using x on y
(cost=0.57..2.57 rows=1 width=8)
(actual time=0.064..0.064 rows=1 loops=256753)
Index Cond: (id = another_id)
Filter: (status = 1)
Buffers: shared hit=1146405 read=110636
Caveats: width estimated, rows rounded
Data read: (1146405 + 110636) * 8kB = 10GB
Data returned: 1 * 256753 * 8 bytes = 2MB
24. Issues: planning time
At the end of the query plan
Can be planning related: eg joins, partitions
But other things too: eg extensions, locks
Warning: not included in auto_explain
25. (...)
Planning Time: 27.844 ms
Execution Time: 11.162 ms
Planning proportion:
27.844/(27.844 + 11.162) = 71%
26. Issues: Just In Time compilation
At the end of the query plan
Included in execution time
On by default in PostgreSQL 12 and 13
Start-up time can be a tell-tale
28. Very suspicious actual start-up time
from a JIT dominated plan.
-> Seq Scan on table
(cost=0.00..3.57 rows=72 width=8)
(actual time=2262.312..2262.343 rows=54 loops=1)
Buffers: shared hit=3
29. Issues: triggers
At the end of the query plan
Total time across calls
Check foreign keys indexed
Before triggers vs after triggers
30. Planning Time: 0.227 ms
Trigger: RI_ConstraintTrigger_a_12345 on table
time=83129.491 calls=2222623
Execution Time: 87645.739 ms
Trigger proportion:
83129.491/(0.227 + 87645.739) = 95%
31. Summary: check the arithmetic
Watch out for loops and threads
Watch out for CTEs
Tools can help, if in doubt check two
32. Summary: keep rarer issues in mind
Check the end section first
Also look out for filters, rechecks, lossy
blocks, amount of data
Tools, mailing lists, and communities can help
33. Thank you! Any questions?
michael@pgmustard.com
michristofides
Further reading:
* flame-explain.com/docs/general/quirk-correction
* pgmustard.com/docs/explain
* wiki.postgresql.org/wiki/Slow_Query_Questions