This document discusses PostgreSQL's support for JSON and JSONB data types, including:
- The JsQuery language for querying JSONB data, which allows expressing complex conditions over JSON arrays and fields using a simple textual syntax with operators like # (any element), % (any key), and @> (contains).
- Examples of using JsQuery to find documents containing the tag "NYC", find companies where the CEO or CTO is named Neil, and count products similar to a given ID within a sales rank range.
- Performance comparisons showing JsQuery outperforms the @> operator for complex queries, and is more readable than alternatives using jsonb_array_elements and subqueries.
-
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...HostedbyConfluent
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Current 2022
Kafka Streams is a powerful stream processing library that offers stateful operations. Along with the stateful operations is a unique feature of Kafka Streams, Interactive Queries, or IQ. IQ allows you to leverage the state of the application from the outside by directly querying the state stores.
But Kafka Streams is a distributed application; often, no single instance has a state store with all the data. So Kafka Streams provides the infrastructure, so a developer doesn't need to worry about querying the correct instance.
But the implementation of communication between app instances (Remote Procedure Call or RPC) is not provided, leaving the developer to Google searches on how to get started building one. In this talk, I'll discuss and demonstrate what's needed to build an RPC mechanism between Kafka Stream instances, including:
* The background of Interactive Queries
* Using Spring Boot to expose your Interactive Query Service
* How to route queries between app instances.
* Building a view to render the results
You'll leave with the knowledge on how to get started building an Interactive Query service and a reference application to get started.
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
Cassandra community has consistently requested that we cover C* schema design concepts. This presentation goes in depth on the following topics:
- Schema design
- Best Practices
- Capacity Planning
- Real World Examples
Lambda and Stream Master class - part 1José Paumard
These are the slides of the talk we made with Stuart Marks at Devoxx Belgium 2018. This first part covers Lambda Expressions and API design with functional interfaces.
What is the state of lambda expressions in Java 11? Lambda expressions are the major feature of Java 8, having an impact on most of the API, including the Streams and Collections API. We are now living the Java 11 days; new features have been added and new patterns have emerged. This highly technical Deep Dive session will visit all these patterns, the well-known ones and the new ones, in an interactive hybrid of lecture and laboratory. We present a technique and show how it helps solve a problem. We then present another problem, and give you some time to solve it yourself. Finally, we present a solution, and open for questions, comments, and discussion. Bring your laptop set up with JDK 11 and your favorite IDE, and be prepared to think!
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
CMP325 talk for AWS re:Invent 2017, by Brendan Gregg. "
At Netflix we make the best use of AWS EC2 instance types and features to create a high performance cloud, achieving near bare metal speed for our workloads. This session will summarize the configuration, tuning, and activities for delivering the fastest possible EC2 instances, and will help other EC2 users improve performance, reduce latency outliers, and make better use of EC2 features. We'll show how we choose EC2 instance types, how we choose between EC2 Xen modes: HVM, PV, and PVHVM, and the importance of EC2 features such SR-IOV for bare-metal performance. SR-IOV is used by EC2 enhanced networking, and recently for the new i3 instance type for enhanced disk performance as well. We'll also cover kernel tuning and observability tools, from basic to advanced. Advanced performance analysis includes the use of Java and Node.js flame graphs, and the new EC2 Performance Monitoring Counter (PMC) feature released this year."
Craig will discuss architecture patterns with InfluxDB Enterprise, covering an overview of InfluxDB Enterprise, features, ingestion and query rates, deployment examples, replication patterns, and general advice.
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...HostedbyConfluent
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Current 2022
Kafka Streams is a powerful stream processing library that offers stateful operations. Along with the stateful operations is a unique feature of Kafka Streams, Interactive Queries, or IQ. IQ allows you to leverage the state of the application from the outside by directly querying the state stores.
But Kafka Streams is a distributed application; often, no single instance has a state store with all the data. So Kafka Streams provides the infrastructure, so a developer doesn't need to worry about querying the correct instance.
But the implementation of communication between app instances (Remote Procedure Call or RPC) is not provided, leaving the developer to Google searches on how to get started building one. In this talk, I'll discuss and demonstrate what's needed to build an RPC mechanism between Kafka Stream instances, including:
* The background of Interactive Queries
* Using Spring Boot to expose your Interactive Query Service
* How to route queries between app instances.
* Building a view to render the results
You'll leave with the knowledge on how to get started building an Interactive Query service and a reference application to get started.
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
Cassandra community has consistently requested that we cover C* schema design concepts. This presentation goes in depth on the following topics:
- Schema design
- Best Practices
- Capacity Planning
- Real World Examples
Lambda and Stream Master class - part 1José Paumard
These are the slides of the talk we made with Stuart Marks at Devoxx Belgium 2018. This first part covers Lambda Expressions and API design with functional interfaces.
What is the state of lambda expressions in Java 11? Lambda expressions are the major feature of Java 8, having an impact on most of the API, including the Streams and Collections API. We are now living the Java 11 days; new features have been added and new patterns have emerged. This highly technical Deep Dive session will visit all these patterns, the well-known ones and the new ones, in an interactive hybrid of lecture and laboratory. We present a technique and show how it helps solve a problem. We then present another problem, and give you some time to solve it yourself. Finally, we present a solution, and open for questions, comments, and discussion. Bring your laptop set up with JDK 11 and your favorite IDE, and be prepared to think!
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
CMP325 talk for AWS re:Invent 2017, by Brendan Gregg. "
At Netflix we make the best use of AWS EC2 instance types and features to create a high performance cloud, achieving near bare metal speed for our workloads. This session will summarize the configuration, tuning, and activities for delivering the fastest possible EC2 instances, and will help other EC2 users improve performance, reduce latency outliers, and make better use of EC2 features. We'll show how we choose EC2 instance types, how we choose between EC2 Xen modes: HVM, PV, and PVHVM, and the importance of EC2 features such SR-IOV for bare-metal performance. SR-IOV is used by EC2 enhanced networking, and recently for the new i3 instance type for enhanced disk performance as well. We'll also cover kernel tuning and observability tools, from basic to advanced. Advanced performance analysis includes the use of Java and Node.js flame graphs, and the new EC2 Performance Monitoring Counter (PMC) feature released this year."
Craig will discuss architecture patterns with InfluxDB Enterprise, covering an overview of InfluxDB Enterprise, features, ingestion and query rates, deployment examples, replication patterns, and general advice.
In this Meetup Victor Perepelitsky - R&D Technical Leader at LivePerson leading the 'Real Time Event Processing Platform' team , will talk about Java 8', 'Stream API', 'Lambda', and 'Method reference'.
Victor will clarify what functional programming is and how can you use java 8 in order to create better software.
Victor will also cover some pain points that Java 8 did not solve regarding functionality and see how you can work around it.
This presentation focuses on optimization of queries in MySQL from developer’s perspective. Developers should care about the performance of the application, which includes optimizing SQL queries. It shows the execution plan in MySQL and explain its different formats - tabular, TREE and JSON/visual explain plans. Optimizer features like optimizer hints and histograms as well as newer features like HASH joins, TREE explain plan and EXPLAIN ANALYZE from latest releases are covered. Some real examples of slow queries are included and their optimization explained.
How to Analyze and Tune MySQL Queries for Better Performanceoysteing
Tutorial at Oracle Open World 2015:
Performance of SQL queries plays a big role in application performance. If some queries execute slowly, these queries or the database schema may need tuning. This tutorial covers query processing, optimization methods, and how the MySQL optimizer chooses a specific plan to execute SQL. See demonstrations on how to use tools such as EXPLAIN (including the JSON-based variant), optimizer trace, and performance schema to analyze query plans. See how the Visual Explain functionality in MySQL Workbench helps you to visualize these plans. Based on the analysis, the tutorial covers how to take the next steps for performance tuning. It might mean forcing a particular index, changing the schema, or modifying configuration parameters.
Lambdas and Streams Master Class Part 2José Paumard
These are the slides of the talk we made with Stuart Marks at Devoxx Belgium 2018. This second part covers the Stream API, reduction and the Collector API.
What is the state of lambda expressions in Java 11? Lambda expressions are the major feature of Java 8, having an impact on most of the API, including the Streams and Collections API. We are now living the Java 11 days; new features have been added and new patterns have emerged. This highly technical Deep Dive session will visit all these patterns, the well-known ones and the new ones, in an interactive hybrid of lecture and laboratory. We present a technique and show how it helps solve a problem. We then present another problem, and give you some time to solve it yourself. Finally, we present a solution, and open for questions, comments, and discussion. Bring your laptop set up with JDK 11 and your favorite IDE, and be prepared to think!
Postgres vs Mongo / Олег Бартунов (Postgres Professional)Ontico
РИТ++ 2017, Backend Conf
Зал Конгресс-холл, 6 июня, 17:00
Тезисы:
http://backendconf.ru/2017/abstracts/2781.html
Я хочу немного порушить стереотипы, что Postgres - это чисто реляционная СУБД из прошлого века, плохо приспособленная под реалии современных проектов. Недавно мы прогнали YCSB для последних версий Postgres и Mongodb и увидели их плюсы и минусы на разных типах нагрузки, о которых я буду рассказывать. ...
Erlang is successfully used for game servers. Lua is a friendly, small language that is big in game scripting. And VoltDB is a new high speed ACID database from PostgreSQL inventor Mike Stonebreaker. Add some JSON, Websockets and JavaScript and you have an exciting HTML5 game stack.
This talk is about research and results from the work on a very high throughput game server with embedded Lua for game logic and VoltDB as SQL database backend. The stack aims to maximize robustness, throughput and ease of scripting. Its special feature is linearly scaling true ACID with a goal of a million transactions per second, while being easy on the game logic scripter.
This stack is part of the development effort of games start-up Eonblast for the cross-media Science Fiction franchise Solar Tribes, which will feature film elements merged with cross-platform online games. The stack is created to allow for a unified, non-sharded game world experience, thus for a unified story line and also otherwise transcends current MMO conventions. Players will share the same stories at the same time and be empowered to leave a stronger imprint on their environment. Interaction with other players is taken beyond the limitations currently seen in social games. All of which are technical challenges, asking for fastest possible data access and better data integrity guarantees.
A triple asynchronous server side architecture is described that communicates non-blocking with a fat JS/HTML5 browser client. The related Open Source projects Erlvolt and Fleece are introduced, which were created for this stack. The embedded Lua driver Erlualib and Robert Virding's new project, the Lua VM emulator Luerl are compared and evaluated. A brief overview of capabilities of new databases is given to explain why VoltDB was chosen for the task.
(c) 2012 Eonblast Corporation
MongoDB .local Toronto 2019: Tips and Tricks for Effective IndexingMongoDB
Query performance can either be a constant headache or the unsung hero of an application. MongoDB provides extremely powerful querying capabilities when used properly. I will share more common mistakes observed and some tips and tricks to avoiding them.
This talk is focused on tuning analysing and optimizing MongoDB query and index with the use of Database Profiler and "explain()" function.
Also, performance of database can also be impacted by configuring the underline ( Linux ) OS with some recommended settings which do not come by default.
Benchmarking is hard. Benchmarking databases, harder. Benchmarking databases that follow different approaches (relational vs document) is even harder.
But the market demands these kinds of benchmarks. Despite the different data models that MongoDB and PostgreSQL expose, many organizations face the challenge of picking either technology. And performance is arguably the main deciding factor.
Join this talk to discover the numbers! After $30K spent on public cloud and months of testing, there are many different scenarios to analyze. Benchmarks on three distinct categories have been performed: OLTP, OLAP and comparing MongoDB 4.0 transaction performance with PostgreSQL's.
What would be faster, MongoDB or PostgreSQL?
Deep Dive on Amazon Aurora MySQL Performance Tuning (DAT429-R1) - AWS re:Inve...Amazon Web Services
Amazon Aurora offers several options for monitoring and optimizing MySQL database performance. These include Enhanced Monitoring and Performance Insights, an easy-to-use tool for assessing the load on your database and identifying slow-performing queries. In this session, learn how to tune the performance of your Aurora database with MySQL compatibility, whether your application is in development or in production.
Redis is an advanced in memory key-value store designed for a world where "Memory is the new disk and disk is the new tape". Redis has some unique properties -- like blazing read and write speed, rich atomic operations and asynchronous persistence -- which make it ideally suited for a number of situations.
The tutorial includes an introduction to redis, data types used for redis, performance related to redis, sweet spots of redis, design consideration/best practices, adopters of redis. Begins with a section giving an introduction to redis which includes an introduction to redis and the features of redis. It also includes a brief history of redis, characteristics of redis, language support of redis. Following is a data types section. It includes the data types of redis like strings, lists, hashes, sets, sorted sets. It also includes not thinking of redis as an RDBMS, installation, atomicity of commands, key expiration.
Moreover, it also includes operations on lists, sets, sorted sets, hashes, keys, redis administration command. Alongside there is a section about performance of redis which includes the performance given by redis like hardware, payload size, batch size. It also includes benchmarks attained by redis, a demo version of redis, data durability and advantages of persistence. Then comes a section about sweet spots of redis. It includes sweet spots like cache server, tag cloud, auto completion, activity feeds and many more sweet spots. It also includes case studies as a video marketing platform, content publishing app etc.
A neighbouring section to this is about best practices which includes the design considerations and best practices of redis like avoid excessive long keys, have human readable keys, all data must fit in memory, polygot persistence is a smart choice and many more practices and design considerations. The last section of this tutorial includes some adopters of redis like stack overflow, craiglist, github, Instagram, blizzard entertainment.
Present and future of Jsonb in PostgreSQL
Json - is an ubiquitous data format, which supported in almost any popular databases. PostgreSQL was the first relational database, which received support of native textual json and very efficient binary jsonb data types. Recently published SQL 2016 standard describes the JSON data type and specifies the functions and operators to work with json in SQL, which greatly simplifies the further development of json support in PostgreSQL. We compare existing features of json/jsonb data types with proposed SQL standard and discuss the ways how we could improve json/jsonb support in PostgreSQL.
PostgreSQL offers to application developers a rich support of json data type, providing known advantages of the json data model with traditional benefits of relational databases, such as declarative query language, rich query processing, transaction management providing ACID safety guarantees. However, current support of json is far from ideal, for example, json is still "foreign" data type to SQL - existed jsquery extension tries to implement their own query language, which is being powerfull, is opaque to Postgres planner and optimizer and not extendable. Extending SQL to support json, without commonly accepted standard, is difficult and perspectiveless task. Recently published SQL 2016 standard describes the JSON data type and specifies the functions and operators to work with json in SQL, which makes clear the direction of future development of json support in PostgreSQL. We present our ideas and prototype of future json data type in PostgreSQL with some further non-standard extensions and improvements in storage requirement and index support.
In this Meetup Victor Perepelitsky - R&D Technical Leader at LivePerson leading the 'Real Time Event Processing Platform' team , will talk about Java 8', 'Stream API', 'Lambda', and 'Method reference'.
Victor will clarify what functional programming is and how can you use java 8 in order to create better software.
Victor will also cover some pain points that Java 8 did not solve regarding functionality and see how you can work around it.
This presentation focuses on optimization of queries in MySQL from developer’s perspective. Developers should care about the performance of the application, which includes optimizing SQL queries. It shows the execution plan in MySQL and explain its different formats - tabular, TREE and JSON/visual explain plans. Optimizer features like optimizer hints and histograms as well as newer features like HASH joins, TREE explain plan and EXPLAIN ANALYZE from latest releases are covered. Some real examples of slow queries are included and their optimization explained.
How to Analyze and Tune MySQL Queries for Better Performanceoysteing
Tutorial at Oracle Open World 2015:
Performance of SQL queries plays a big role in application performance. If some queries execute slowly, these queries or the database schema may need tuning. This tutorial covers query processing, optimization methods, and how the MySQL optimizer chooses a specific plan to execute SQL. See demonstrations on how to use tools such as EXPLAIN (including the JSON-based variant), optimizer trace, and performance schema to analyze query plans. See how the Visual Explain functionality in MySQL Workbench helps you to visualize these plans. Based on the analysis, the tutorial covers how to take the next steps for performance tuning. It might mean forcing a particular index, changing the schema, or modifying configuration parameters.
Lambdas and Streams Master Class Part 2José Paumard
These are the slides of the talk we made with Stuart Marks at Devoxx Belgium 2018. This second part covers the Stream API, reduction and the Collector API.
What is the state of lambda expressions in Java 11? Lambda expressions are the major feature of Java 8, having an impact on most of the API, including the Streams and Collections API. We are now living the Java 11 days; new features have been added and new patterns have emerged. This highly technical Deep Dive session will visit all these patterns, the well-known ones and the new ones, in an interactive hybrid of lecture and laboratory. We present a technique and show how it helps solve a problem. We then present another problem, and give you some time to solve it yourself. Finally, we present a solution, and open for questions, comments, and discussion. Bring your laptop set up with JDK 11 and your favorite IDE, and be prepared to think!
Postgres vs Mongo / Олег Бартунов (Postgres Professional)Ontico
РИТ++ 2017, Backend Conf
Зал Конгресс-холл, 6 июня, 17:00
Тезисы:
http://backendconf.ru/2017/abstracts/2781.html
Я хочу немного порушить стереотипы, что Postgres - это чисто реляционная СУБД из прошлого века, плохо приспособленная под реалии современных проектов. Недавно мы прогнали YCSB для последних версий Postgres и Mongodb и увидели их плюсы и минусы на разных типах нагрузки, о которых я буду рассказывать. ...
Erlang is successfully used for game servers. Lua is a friendly, small language that is big in game scripting. And VoltDB is a new high speed ACID database from PostgreSQL inventor Mike Stonebreaker. Add some JSON, Websockets and JavaScript and you have an exciting HTML5 game stack.
This talk is about research and results from the work on a very high throughput game server with embedded Lua for game logic and VoltDB as SQL database backend. The stack aims to maximize robustness, throughput and ease of scripting. Its special feature is linearly scaling true ACID with a goal of a million transactions per second, while being easy on the game logic scripter.
This stack is part of the development effort of games start-up Eonblast for the cross-media Science Fiction franchise Solar Tribes, which will feature film elements merged with cross-platform online games. The stack is created to allow for a unified, non-sharded game world experience, thus for a unified story line and also otherwise transcends current MMO conventions. Players will share the same stories at the same time and be empowered to leave a stronger imprint on their environment. Interaction with other players is taken beyond the limitations currently seen in social games. All of which are technical challenges, asking for fastest possible data access and better data integrity guarantees.
A triple asynchronous server side architecture is described that communicates non-blocking with a fat JS/HTML5 browser client. The related Open Source projects Erlvolt and Fleece are introduced, which were created for this stack. The embedded Lua driver Erlualib and Robert Virding's new project, the Lua VM emulator Luerl are compared and evaluated. A brief overview of capabilities of new databases is given to explain why VoltDB was chosen for the task.
(c) 2012 Eonblast Corporation
MongoDB .local Toronto 2019: Tips and Tricks for Effective IndexingMongoDB
Query performance can either be a constant headache or the unsung hero of an application. MongoDB provides extremely powerful querying capabilities when used properly. I will share more common mistakes observed and some tips and tricks to avoiding them.
This talk is focused on tuning analysing and optimizing MongoDB query and index with the use of Database Profiler and "explain()" function.
Also, performance of database can also be impacted by configuring the underline ( Linux ) OS with some recommended settings which do not come by default.
Benchmarking is hard. Benchmarking databases, harder. Benchmarking databases that follow different approaches (relational vs document) is even harder.
But the market demands these kinds of benchmarks. Despite the different data models that MongoDB and PostgreSQL expose, many organizations face the challenge of picking either technology. And performance is arguably the main deciding factor.
Join this talk to discover the numbers! After $30K spent on public cloud and months of testing, there are many different scenarios to analyze. Benchmarks on three distinct categories have been performed: OLTP, OLAP and comparing MongoDB 4.0 transaction performance with PostgreSQL's.
What would be faster, MongoDB or PostgreSQL?
Deep Dive on Amazon Aurora MySQL Performance Tuning (DAT429-R1) - AWS re:Inve...Amazon Web Services
Amazon Aurora offers several options for monitoring and optimizing MySQL database performance. These include Enhanced Monitoring and Performance Insights, an easy-to-use tool for assessing the load on your database and identifying slow-performing queries. In this session, learn how to tune the performance of your Aurora database with MySQL compatibility, whether your application is in development or in production.
Redis is an advanced in memory key-value store designed for a world where "Memory is the new disk and disk is the new tape". Redis has some unique properties -- like blazing read and write speed, rich atomic operations and asynchronous persistence -- which make it ideally suited for a number of situations.
The tutorial includes an introduction to redis, data types used for redis, performance related to redis, sweet spots of redis, design consideration/best practices, adopters of redis. Begins with a section giving an introduction to redis which includes an introduction to redis and the features of redis. It also includes a brief history of redis, characteristics of redis, language support of redis. Following is a data types section. It includes the data types of redis like strings, lists, hashes, sets, sorted sets. It also includes not thinking of redis as an RDBMS, installation, atomicity of commands, key expiration.
Moreover, it also includes operations on lists, sets, sorted sets, hashes, keys, redis administration command. Alongside there is a section about performance of redis which includes the performance given by redis like hardware, payload size, batch size. It also includes benchmarks attained by redis, a demo version of redis, data durability and advantages of persistence. Then comes a section about sweet spots of redis. It includes sweet spots like cache server, tag cloud, auto completion, activity feeds and many more sweet spots. It also includes case studies as a video marketing platform, content publishing app etc.
A neighbouring section to this is about best practices which includes the design considerations and best practices of redis like avoid excessive long keys, have human readable keys, all data must fit in memory, polygot persistence is a smart choice and many more practices and design considerations. The last section of this tutorial includes some adopters of redis like stack overflow, craiglist, github, Instagram, blizzard entertainment.
Present and future of Jsonb in PostgreSQL
Json - is an ubiquitous data format, which supported in almost any popular databases. PostgreSQL was the first relational database, which received support of native textual json and very efficient binary jsonb data types. Recently published SQL 2016 standard describes the JSON data type and specifies the functions and operators to work with json in SQL, which greatly simplifies the further development of json support in PostgreSQL. We compare existing features of json/jsonb data types with proposed SQL standard and discuss the ways how we could improve json/jsonb support in PostgreSQL.
PostgreSQL offers to application developers a rich support of json data type, providing known advantages of the json data model with traditional benefits of relational databases, such as declarative query language, rich query processing, transaction management providing ACID safety guarantees. However, current support of json is far from ideal, for example, json is still "foreign" data type to SQL - existed jsquery extension tries to implement their own query language, which is being powerfull, is opaque to Postgres planner and optimizer and not extendable. Extending SQL to support json, without commonly accepted standard, is difficult and perspectiveless task. Recently published SQL 2016 standard describes the JSON data type and specifies the functions and operators to work with json in SQL, which makes clear the direction of future development of json support in PostgreSQL. We present our ideas and prototype of future json data type in PostgreSQL with some further non-standard extensions and improvements in storage requirement and index support.
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesJonathan Katz
All data is relational and can be represented through relational algebra, right? Perhaps, but there are other ways to represent data, and the PostgreSQL team continues to work on making it easier and more efficient to do so!
With the upcoming 9.4 release, PostgreSQL is introducing the "JSONB" data type which allows for fast, compressed, storage of JSON formatted data, and for quick retrieval. And JSONB comes with all the benefits of PostgreSQL, like its data durability, MVCC, and of course, access to all the other data types and features in PostgreSQL.
How fast is JSONB? How do we access data stored with this type? What can it do with the rest of PostgreSQL? What can't it do? How can we leverage this new data type and make PostgreSQL scale horizontally? Follow along with our presentation as we try to answer these questions.
Jsquery - the jsonb query language with GIN indexing supportAlexander Korotkov
PostgreSQL 9.4 has new jsonb data type, which was designed for efficient work with json data. However, its query language is very limited and supports only a few operators. In this talk we introduce jsquery - the jsonb query language, which is flexible, expandable and has GIN indexing support. Jsquery provides postgres users an ability to talk to json data in an efficient way on par with NoSQL databases. The preliminary prototype was presented at PCGon-2014 and has got a good feedback, so now we want to show to european users the new version of jsquery (with some enhancements), which is compatible with 9.4 release and can be installed as an extension. We'll also discuss current issues of jsquery and possible ways of improvements.
JSON Processing in the Database using PostgreSQL 9.4 :: Data Wranglers DC :: ...Ryan B Harvey, CSDP, CSM
This slide deck was prepared for a talk on getting, processing, and reshaping JSON data using PostgreSQL 9.4 at the Data Wranglers DC meetup on January 7, 2015.
Associated materials are on GitHub at:
https://github.com/nihonjinrxs/dwdc-january2015
Meetup information: http://www.meetup.com/Data-Wranglers-DC/events/219112410/
Recently PostgreSQL introduced the JSON and JSONB data types. At first, we ignored this news, we could not see how the JSON types could be of use to us. But then we used it once and then twice and then we almost went overboard with JSONs. This prompted us to stop, take a step back and analyze when a JSON type is useful and when it causes unnecessary hassle.
Overview of the new JSON processing functionality in MySQL: the new JSON type, the in-line JSON path expressions, the JSON functions and how to go about indexing JSON
NoSQL Best Practices for PostgreSQL / Дмитрий Долгов (Mindojo)Ontico
HighLoad++ 2017
Зал «Сингапур», 7 ноября, 18:00
Тезисы:
http://www.highload.ru/2017/abstracts/2980.html
Каждый специалист в области баз данных уже знаком с Jsonb - одной из самых привлекательный фич PostgreSQL, позволяющей эффективно использовать документо-ориентированный подход без необходимости жертвовать консистентностью и возможностью использования проверенных временем подходов реляционных баз данных. Но как именно устроен этот тип данных, какие он имеет ограничения, и какие опасности (a.k.a грабли) можно незаметно для себя получить при работе с ним?
В докладе мы обсудим все эти вопросы, преимущества и недостатки использования Jsonb в различных ситуациях в сравнении с другими существующими решениями. Поговорим также о важных best practices (как писать компактные
запросы и как избежать распространенных и не очень проблем).
(EN) Last years NoSQL databases have matured and spread, without however threatening relational databases massive adoption. Relation database, however, did not relaxed but sooner or later they all introduced native support to JSON format. In this evening talk we will see how and when to use native JSON PostgreSQL data type.
(IT) Da qualche anno i database NoSQL sono maturati e si sono diffusi, senza riuscire tuttavia a minacciare il predominio dei database relazionali. Questi ultimi, dal canto loro, non sono rimasti a guardare ma – chi prima chi dopo – hanno introdotto il supporto nativo al formato JSON. In questa serata vedremo come e quando usare il formato JSON nativo di PostgreSQL.
JSON is an important datatype transporting data between servers and many modern applications. Postgres has been at the forefront of bringing these capabilities into the hands of database users. JSONB data type allows for faster operations within PostgreSQL.
At this webinar we will look at:
- How to use JSON from applications
- How to store it in the database
- How to index JSON data
- Tips and tricks to optimize usage
We then closed with a review of the roadmap for new PostgreSQL features for JSON and JSON standards compliance.
2015-12-05 Александр Коротков, Иван Панченко - Слабо-структурированные данные...HappyDev
Появление большого количества NoSQL СУБД обусловлено требованиями современных информационных систем, которым большинство традиционных реляционных баз данных не удовлетворяет. Одним из таких требований является поддержка данных, структура которых заранее не определена. Однако при выборе NoSQL БД ради отсутствия схем данных можно потерять ряд преимуществ, которые дают зрелые SQL-решения, а именно: транзакции, скорость чтения строк из таблиц. PostgreSQL, являющаяся передовой реляционной СУБД, имела поддержку слабо-структурированных данных задолго до появления NoSQL, которая обрела новое дыхание в последнем релизе в виде типа данных jsonb, который не только поддерживает стандарт JSON, но и обладает производительностью, сравнимой или даже превосходящей наиболее популярные NoSQL СУБД.
PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...pgdayrussia
Доклад был представлен на официальной российской конференции PG Day'14 Russia, посвященной вопросам разработки и эксплуатации PostgreSQL.
<сарказм> MongoDB правит бал в мире слабо-структурированных данных. Привлеченные в MongoDB инвестиции часто затмевают разум (особенно начинающих и доверчивых) разработчиков, которые с радостью бросаются в океан возможностей, предоставляемых NoSQL (это же круто!). Энтузиазм затихает после осознания того факта, что бесплатно ничего не бывает и надо писать своими руками то, что десятилетиями хорошо работает в традиционных реляционных базах данных, которые прекрасно справляются с нагрузками и данными 99% проектов, и ваш проект не входит в оставшийся один процент. </сарказм>
Мир баз данных за последние годы существенно изменился. Всеместное проникновение Интернет-технологий привело к необходимости работы с большим количеством разнородных данных в реальном времени, к чему традиционные реляционные СУБД оказались не готовы. Принято считать, что слабая масштабируемость и излишняя “жесткость” модели данных реляционных СУБД и являются основными причинами появляния и роста популярности NoSQL баз данных (далее, NoSQL).
В докладе мы остановимся на концептуальных предпосылках появления NoSQL и их классификации. Одним из “жупелов” NoSQL является поддержка типа данных JSON, который реализует документо-ориентированную модель данных. Документо-ориентированная модель данных является более гибкой и позволяет менять схему данных “на лету”, что сделать очень трудно в реляционных СУБД, особенно в системах, работающих под большой нагрузкой. Несмотря на успех NoSQL (активно распиаренный использованием в некоторых популярных Интернет-проектах), многие пользователи не готовы приносить в жертву целостность данных в угоду масштабируемости, но хотят иметь гибкость схемы данных в проверенных и надежных реляционных СУБД.
Нами была предложена и реализована поддержка документо-ориентированной модели в PostgreSQL (версия 9.4). Уже более 10 лет в PostgreSQL существует возможность работать со schema-less данными, используя наш модуль расширения hstore. Hstore предлагает хранилище вида "ключ-значение" с сохранением всех реляционных возможностей, что сделало его самым используемым
JSONB in PostgreSQL is one of the main attractive feature for modern
application developers, no matter what some RDBMS purists are thinking.
People often use simple one-column-json schema for their projects and rely
on ability of database to store,index and query json. Postgres has long
history of supporting the non-structured data and has pioneered the
adoption of JSON by relational databases, so eventually JSON became and
official feature (SQL/JSON) of SQL standard.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
10. Найти что-нибудь красное
• WITH RECURSIVE t(id, value) AS ( SELECT * FROM
js_test
UNION ALL
(
SELECT
t.id,
COALESCE(kv.value, e.value) AS value
FROM
t
LEFT JOIN LATERAL
jsonb_each(
CASE WHEN jsonb_typeof(t.value) =
'object' THEN t.value
ELSE NULL END) kv ON true
LEFT JOIN LATERAL
jsonb_array_elements(
CASE WHEN
jsonb_typeof(t.value) = 'array' THEN t.value
ELSE NULL END) e ON true
WHERE
kv.value IS NOT NULL OR e.value IS
NOT NULL
)
)
SELECT
js_test.*
FROM
(SELECT id FROM t WHERE value @> '{"color":
"red"}' GROUP BY id) x
JOIN js_test ON js_test.id = x.id;
• Весьма непростое решение !
12. Jsonb querying an array: simple case
Find bookmarks with tag «NYC»:
SELECT *
FROM js
WHERE js @> '{"tags":[{"term":"NYC"}]}';
13. Jsonb querying an array: complex case
Find companies where CEO or CTO is called Neil.
One could write...
SELECT * FROM company
WHERE js @> '{"relationships":[{"person":
{"first_name":"Neil"}}]}' AND
(js @> '{"relationships":[{"title":"CTO"}]}' OR
js @> '{"relationships":[{"title":"CEO"}]}');
14. Jsonb querying an array: complex case
Each «@>» is processed independently.
SELECT * FROM company
WHERE js @> '{"relationships":[{"person":
{"first_name":"Neil"}}]}' AND
(js @> '{"relationships":[{"title":"CTO"}]}' OR
js @> '{"relationships":[{"title":"CEO"}]}');
Actually, this query searches for companies with some
CEO or CTO and someone called Neil...
15. Jsonb querying an array: complex case
The correct version is:
SELECT * FROM company
WHERE js @> '{"relationships":[{"title":"CEO",
"person":{"first_name":"Neil"}}]}' OR
js @> '{"relationships":[{"title":"CTO",
"person":{"first_name":"Neil"}}]}';
When constructing complex conditions over same array
element, query length can grow exponentially.
16. Jsonb querying an array: another approach
Using subselect and jsonb_array_elements:
SELECT * FROM company
WHERE EXISTS (
SELECT 1
FROM jsonb_array_elements(js -> 'relationships') t
WHERE t->>'title' IN ('CEO', 'CTO') AND
t ->'person'->>'first_name' = 'Neil');
17. Jsonb querying an array: summary
Using «@>»
• Pro
• Indexing support
• Cons
• Checks only equality for scalars
• Hard to explain complex logic
Using subselect and
jsonb_array_elements
• Pro
• Full power of SQL can be used to
express condition over element
• Cons
• No indexing support
• Heavy syntax
18. Что хочется ?
• Need Jsonb query language
• Simple and effective way to search in arrays (and other iterative
searches)
• More comparison operators (сейчас только =)
• Types support
• Schema support (constraints on keys, values)
• Indexes support
19. NoSQL для PostgreSQL:
Язык запросов JsQuery
Codefest-2015, Novosibirsk
Alexander Korotkov, Oleg Bartunov, Teodor Sigaev
Postgres Professional
21. Citus dataset
{
"customer_id": "AE22YDHSBFYIP",
"product_category": "Business & Investing",
"product_group": "Book",
"product_id": "1551803542",
"product_sales_rank": 11611,
"product_subcategory": "General",
"product_title": "Start and Run a Coffee Bar (Start & Run a)",
"review_date": {
"$date": 31363200000
},
"review_helpful_votes": 0,
"review_rating": 5,
"review_votes": 10,
"similar_product_ids": [
"0471136174",
"0910627312",
"047112138X",
"0786883561",
"0201570483"
]
}
• 3023162 reviews from Citus
1998-2000 years
• 1573 MB
22. Jsonb query
• Need Jsonb query language
• Simple and effective way to search in arrays (and other iterative
searches)
• More comparison operators
• Types support
• Schema support (constraints on keys, values)
• Indexes support
• Introduce Jsquery - textual data type and @@ match operator
jsonb @@ jsquery
23. Jsonb query language (Jsquery)
• # - any element array
• % - any key
• * - anything
• $ - current element
• Use "double quotes" for key !
SELECT '{"a": {"b": [1,2,3]}}'::jsonb @@ 'a.b.# = 2';
SELECT '{"a": {"b": [1,2,3]}}'::jsonb @@ '%.b.# = 2';
SELECT '{"a": {"b": [1,2,3]}}'::jsonb @@ '*.# = 2';
select '{"a": {"b": [1,2,3]}}'::jsonb @@ 'a.b.# ($ = 2 OR $ < 3)';
select 'a1."12222" < 111'::jsquery;
path ::= key
| path '.' key_any
| NOT '.' key_any
key ::= '*'
| '#'
| '%'
| '$'
| STRING
….....
key_any ::= key
| NOT
24. Jsonb query language (Jsquery)
value_list
::= scalar_value
| value_list ',' scalar_value
array ::= '[' value_list ']'
scalar_value
::= null
| STRING
| true
| false
| NUMERIC
| OBJECT
…....
Expr ::= path value_expr
| path HINT value_expr
| NOT expr
| NOT HINT value_expr
| NOT value_expr
| path '(' expr ')'
| '(' expr ')'
| expr AND expr
| expr OR expr
path ::= key
| path '.' key_any
| NOT '.' key_any
key ::= '*'
| '#'
| '%'
| '$'
| STRING
….....
key_any ::= key
| NOT
value_expr
::= '=' scalar_value
| IN '(' value_list ')'
| '=' array
| '=' '*'
| '<' NUMERIC
| '<' '=' NUMERIC
| '>' NUMERIC
| '>' '=' NUMERIC
| '@' '>' array
| '<' '@' array
| '&' '&' array
| IS ARRAY
| IS NUMERIC
| IS OBJECT
| IS STRING
| IS BOOLEAN
26. Jsonb query language (Jsquery)
• Type checking
select '{"x": true}' @@ 'x IS boolean'::jsquery,
'{"x": 0.1}' @@ 'x IS numeric'::jsquery;
?column? | ?column?
----------+----------
t | t
IS BOOLEAN
IS NUMERIC
IS ARRAY
IS OBJECT
IS STRINGselect '{"a":{"a":1}}' @@ 'a IS object'::jsquery;
?column?
----------
t
select '{"a":["xxx"]}' @@ 'a IS array'::jsquery, '["xxx"]' @@ '$ IS array'::jsquery;
?column? | ?column?
----------+----------
t | t
27. Jsonb query language (Jsquery)
• How many products are similar to "B000089778" and have
product_sales_rank in range between 10000-20000 ?
• SQL
SELECT count(*) FROM jr WHERE (jr->>'product_sales_rank')::int > 10000
and (jr->> 'product_sales_rank')::int < 20000 and
….boring stuff
• Jsquery
SELECT count(*) FROM jr WHERE jr @@ ' similar_product_ids &&
["B000089778"] AND product_sales_rank( $ > 10000 AND $ < 20000)'
• Mongodb
db.reviews.find( { $and :[ {similar_product_ids: { $in ["B000089778"]}},
{product_sales_rank:{$gt:10000, $lt:20000}}] } ).count()
28. Найти что-нибудь красное
• WITH RECURSIVE t(id, value) AS ( SELECT * FROM
js_test
UNION ALL
(
SELECT
t.id,
COALESCE(kv.value, e.value) AS value
FROM
t
LEFT JOIN LATERAL
jsonb_each(
CASE WHEN jsonb_typeof(t.value) =
'object' THEN t.value
ELSE NULL END) kv ON true
LEFT JOIN LATERAL
jsonb_array_elements(
CASE WHEN
jsonb_typeof(t.value) = 'array' THEN t.value
ELSE NULL END) e ON true
WHERE
kv.value IS NOT NULL OR e.value IS
NOT NULL
)
)
SELECT
js_test.*
FROM
(SELECT id FROM t WHERE value @> '{"color":
"red"}' GROUP BY id) x
JOIN js_test ON js_test.id = x.id;
• Jsquery
SELECT * FROM js_test
WHERE
value @@ '*.color = "red"';
29. «#», «*», «%» usage rules
Each usage of «#», «*», «%» means separate element
• Find companies where CEO or CTO is called Neil.
SELECT count(*) FROM company WHERE js @@ 'relationships.#(title in
("CEO", "CTO") AND person.first_name = "Neil")'::jsquery;
count
-------
12
• Find companies with some CEO or CTO and someone called Neil
SELECT count(*) FROM company WHERE js @@ 'relationships(#.title in
("CEO", "CTO") AND #.person.first_name = "Neil")'::jsquery;
count
-------
69
30. Jsonb query language (Jsquery)
explain( analyze, buffers) select count(*) from jb where jb @> '{"tags":[{"term":"NYC"}]}'::jsonb;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Aggregate (cost=191517.30..191517.31 rows=1 width=0) (actual time=1039.422..1039.423 rows=1 loops=1)
Buffers: shared hit=97841 read=78011
-> Seq Scan on jb (cost=0.00..191514.16 rows=1253 width=0) (actual time=0.006..1039.310 rows=285 loops=1)
Filter: (jb @> '{"tags": [{"term": "NYC"}]}'::jsonb)
Rows Removed by Filter: 1252688
Buffers: shared hit=97841 read=78011
Planning time: 0.074 ms
Execution time: 1039.444 ms
explain( analyze,costs off) select count(*) from jb where jb @@ 'tags.#.term = "NYC"';
QUERY PLAN
--------------------------------------------------------------------
Aggregate (actual time=891.707..891.707 rows=1 loops=1)
-> Seq Scan on jb (actual time=0.010..891.553 rows=285 loops=1)
Filter: (jb @@ '"tags".#."term" = "NYC"'::jsquery)
Rows Removed by Filter: 1252688
Execution time: 891.745 ms
31. Jsquery (indexes)
• GIN opclasses with jsquery support
• jsonb_value_path_ops — use Bloom filtering for key matching
{"a":{"b":{"c":10}}} → 10.( bloom(a) or bloom(b) or bloom(c) )
• Good for key matching (wildcard support) , not good for range query
• jsonb_path_value_ops — hash path (like jsonb_path_ops)
{"a":{"b":{"c":10}}} → hash(a.b.c).10
• No wildcard support, no problem with ranges
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-------------------------+-------+----------+--------------+---------+-------------
public | jb | table | postgres | | 1374 MB |
public | jb_value_path_idx | index | postgres | jb | 306 MB |
public | jb_gin_idx | index | postgres | jb | 544 MB |
public | jb_path_value_idx | index | postgres | jb | 306 MB |
public | jb_path_idx | index | postgres | jb | 251 MB |
32. Jsquery (indexes)
explain( analyze,costs off) select count(*) from jb where jb @@ 'tags.#.term = "NYC"';
QUERY PLAN
--------------------------------------------------------------------------------------
Aggregate (actual time=0.609..0.609 rows=1 loops=1)
-> Bitmap Heap Scan on jb (actual time=0.115..0.580 rows=285 loops=1)
Recheck Cond: (jb @@ '"tags".#."term" = "NYC"'::jsquery)
Heap Blocks: exact=285
-> Bitmap Index Scan on jb_value_path_idx (actual time=0.073..0.073 rows=285
loops=1)
Index Cond: (jb @@ '"tags".#."term" = "NYC"'::jsquery)
Execution time: 0.634 ms
(7 rows)
33. Jsquery (indexes)
explain( analyze,costs off) select count(*) from jb where jb @@ '*.term = "NYC"';
QUERY PLAN
--------------------------------------------------------------------------------------
Aggregate (actual time=0.688..0.688 rows=1 loops=1)
-> Bitmap Heap Scan on jb (actual time=0.145..0.660 rows=285 loops=1)
Recheck Cond: (jb @@ '*."term" = "NYC"'::jsquery)
Heap Blocks: exact=285
-> Bitmap Index Scan on jb_value_path_idx (actual time=0.113..0.113 rows=285
loops=1)
Index Cond: (jb @@ '*."term" = "NYC"'::jsquery)
Execution time: 0.716 ms
(7 rows)
34. Jsquery (indexes)
explain (analyze, costs off) select count(*) from jr where
jr @@ ' similar_product_ids && ["B000089778"]';
QUERY PLAN
--------------------------------------------------------------------------------------
Aggregate (actual time=0.359..0.359 rows=1 loops=1)
-> Bitmap Heap Scan on jr (actual time=0.084..0.337 rows=185 loops=1)
Recheck Cond: (jr @@ '"similar_product_ids" && ["B000089778"]'::jsquery)
Heap Blocks: exact=107
-> Bitmap Index Scan on jr_path_value_idx (actual time=0.057..0.057 rows=185
loops=1)
Index Cond: (jr @@ '"similar_product_ids" && ["B000089778"]'::jsquery)
Execution time: 0.394 ms
(7 rows)
35. Jsquery (indexes)
explain (analyze, costs off) select count(*) from jr where
jr @@ ' similar_product_ids && ["B000089778"]
AND product_sales_rank( $ > 10000 AND $ < 20000)';
QUERY PLAN
-------------------------------------------------------------------------------------------
Aggregate (actual time=126.149..126.149 rows=1 loops=1)
-> Bitmap Heap Scan on jr (actual time=126.057..126.143 rows=45 loops=1)
Recheck Cond: (jr @@ '("similar_product_ids" && ["B000089778"] &
"product_sales_rank"($ > 10000 & $ < 20000))'::jsquery)
Heap Blocks: exact=45
-> Bitmap Index Scan on jr_path_value_idx (actual time=126.029..126.029
rows=45 loops=1)
Index Cond: (jr @@ '("similar_product_ids" && ["B000089778"] &
"product_sales_rank"($ > 10000 & $ < 20000))'::jsquery)
Execution time: 129.309 ms !!! No statistics
• No statistics, no planning :(
Not selective, better not use index!
37. Jsquery (indexes)
explain (analyze,costs off) select count(*) from jr where
jr @@ ' similar_product_ids && ["B000089778"]'
and (jr->>'product_sales_rank')::int>10000 and (jr->>'product_sales_rank')::int<20000;
---------------------------------------------------------------------------------------
Aggregate (actual time=0.479..0.479 rows=1 loops=1)
-> Bitmap Heap Scan on jr (actual time=0.079..0.472 rows=45 loops=1)
Recheck Cond: (jr @@ '"similar_product_ids" && ["B000089778"]'::jsquery)
Filter: ((((jr ->> 'product_sales_rank'::text))::integer > 10000) AND
(((jr ->> 'product_sales_rank'::text))::integer < 20000))
Rows Removed by Filter: 140
Heap Blocks: exact=107
-> Bitmap Index Scan on jr_path_value_idx (actual time=0.041..0.041 rows=185
loops=1)
Index Cond: (jr @@ '"similar_product_ids" && ["B000089778"]'::jsquery)
Execution time: 0.506 ms Potentially, query could be faster Mongo !
• If we rewrite query and use planner
38. Jsquery (optimizer) — NEW !
• Jsquery now has built-in simple optimiser.
explain (analyze, costs off) select count(*) from jr where
jr @@ 'similar_product_ids && ["B000089778"]
AND product_sales_rank( $ > 10000 AND $ < 20000)'
-------------------------------------------------------------------------------
Aggregate (actual time=0.422..0.422 rows=1 loops=1)
-> Bitmap Heap Scan on jr (actual time=0.099..0.416 rows=45 loops=1)
Recheck Cond: (jr @@ '("similar_product_ids" && ["B000089778"] AND
"product_sales_rank"($ > 10000 AND $ < 20000))'::jsquery)
Rows Removed by Index Recheck: 140
Heap Blocks: exact=107
-> Bitmap Index Scan on jr_path_value_idx (actual time=0.060..0.060
rows=185 loops=1)
Index Cond: (jr @@ '("similar_product_ids" && ["B000089778"] AND
"product_sales_rank"($ > 10000 AND $ < 20000))'::jsquery)
Execution time: 0.480 ms vs 7 ms MongoDB !
39. Jsquery (optimizer) — NEW !
• Since GIN opclasses can't expose something special to explain output,
jsquery optimiser has its own explain functions:
• text gin_debug_query_path_value(jsquery) — explain for jsonb_path_value_ops
# SELECT gin_debug_query_path_value('x = 1 AND (*.y = 1 OR y = 2)');
gin_debug_query_path_value
----------------------------
x = 1 , entry 0 +
• text gin_debug_query_value_path(jsquery) — explain for jsonb_value_path_ops
# SELECT gin_debug_query_value_path('x = 1 AND (*.y = 1 OR y = 2)');
gin_debug_query_value_path
----------------------------
AND +
x = 1 , entry 0 +
OR +
*.y = 1 , entry 1 +
y = 2 , entry 2 +
40. Jsquery (optimizer) — NEW !
Jsquery now has built-in optimiser for simple queries.
Analyze query tree and push non-selective parts to recheck (like filter)
Selectivity classes:
1) Equality (x = c)
2) Range (c1 < x < c2)
3) Inequality (c > c1)
4) Is (x is type)
5) Any (x = *)
41. Jsquery (optimizer) — NEW !
AND children can be put into recheck.
# SELECT gin_debug_query_path_value('x = 1 AND y > 0');
gin_debug_query_path_value
----------------------------
x = 1 , entry 0 +
While OR children can't. We can't handle false negatives.
# SELECT gin_debug_query_path_value('x = 1 OR y > 0');
gin_debug_query_path_value
----------------------------
OR +
x = 1 , entry 0 +
y > 0 , entry 1 +
42. Jsquery (optimizer) — NEW !
Can't do much with NOT, because hash is lossy. After NOT false positives
turns into false negatives which we can't handle.
# SELECT gin_debug_query_path_value('x = 1 AND (NOT y = 0)');
gin_debug_query_path_value
----------------------------
x = 1 , entry 0 +
43. Jsquery (optimizer) — NEW !
• Jsquery optimiser pushes non-selective operators to recheck
explain (analyze, costs off) select count(*) from jr where
jr @@ 'similar_product_ids && ["B000089778"]
AND product_sales_rank( $ > 10000 AND $ < 20000)'
-------------------------------------------------------------------------------
Aggregate (actual time=0.422..0.422 rows=1 loops=1)
-> Bitmap Heap Scan on jr (actual time=0.099..0.416 rows=45 loops=1)
Recheck Cond: (jr @@ '("similar_product_ids" && ["B000089778"] AND
"product_sales_rank"($ > 10000 AND $ < 20000))'::jsquery)
Rows Removed by Index Recheck: 140
Heap Blocks: exact=107
-> Bitmap Index Scan on jr_path_value_idx (actual time=0.060..0.060
rows=185 loops=1)
Index Cond: (jr @@ '("similar_product_ids" && ["B000089778"] AND
"product_sales_rank"($ > 10000 AND $ < 20000))'::jsquery)
Execution time: 0.480 ms
44. Jsquery (HINTING) — NEW !
• Jsquery now has HINTING ( if you don't like optimiser)!
explain (analyze, costs off) select count(*) from jr where jr @@ 'product_sales_rank > 10000'
----------------------------------------------------------------------------------------------------------
Aggregate (actual time=2507.410..2507.410 rows=1 loops=1)
-> Bitmap Heap Scan on jr (actual time=1118.814..2352.286 rows=2373140 loops=1)
Recheck Cond: (jr @@ '"product_sales_rank" > 10000'::jsquery)
Heap Blocks: exact=201209
-> Bitmap Index Scan on jr_path_value_idx (actual time=1052.483..1052.48
rows=2373140 loops=1)
Index Cond: (jr @@ '"product_sales_rank" > 10000'::jsquery)
Execution time: 2524.951 ms
• Better not to use index — HINT /* --noindex */
explain (analyze, costs off) select count(*) from jr where jr @@ 'product_sales_rank /*-- noindex */>
10000';
----------------------------------------------------------------------------------
Aggregate (actual time=1376.262..1376.262 rows=1 loops=1)
-> Seq Scan on jr (actual time=0.013..1222.123 rows=2373140 loops=1)
Filter: (jr @@ '"product_sales_rank" /*-- noindex */ > 10000'::jsquery)
Rows Removed by Filter: 650022
Execution time: 1376.284 ms
45. Jsquery (HINTING) — NEW !
• If you know that inequality is selective then use HINT /* --index */
# explain (analyze, costs off) select count(*) from jr where jr @@ 'product_sales_rank /*-- index*/ >
3000000 AND review_rating = 5'::jsquery;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Aggregate (actual time=12.307..12.307 rows=1 loops=1)
-> Bitmap Heap Scan on jr (actual time=11.259..12.244 rows=739 loops=1)
Recheck Cond: (jr @@ '("product_sales_rank" /*-- index */ > 3000000 AND "review_rating" =
5)'::jsquery)
Heap Blocks: exact=705
-> Bitmap Index Scan on jr_path_value_idx (actual time=11.179..11.179 rows=739 loops=1)
Index Cond: (jr @@ '("product_sales_rank" /*-- index */ > 3000000 AND "review_rating" =
5)'::jsquery)
Execution time: 12.359 ms vs 1709.901 ms (without hint)
(7 rows)
46. Jsquery use case: schema specification
CREATE TABLE js (
id serial primary key,
v jsonb,
CHECK(v @@ 'name IS STRING AND
coords IS ARRAY AND
NOT coords.# ( NOT (
x IS NUMERIC AND
y IS NUMERIC ) )'::jsquery));
Non-numeric coordinates don't exist => All coordinates are numeric
48. Contrib/jsquery
• Jsquery index support is quite efficient ( 0.5 ms vs Mongo 7 ms ! )
• Future direction
• Make jsquery planner friendly
• Need statistics for jsonb
• Availability
• Jsquery + opclasses are available as extensions
• Grab it from https://github.com/akorotkov/jsquery (branch master) ,
we need your feedback !
51. postgrespro.ru
• Ищем инженеров:
• 24x7 поддержка
• Консалтинг & аудит
• Разработка админских
приложений
• Пакеты
• Ищем си-шников для работы
над постгресом:
• Неубиваемый и масштабируемый
кластер
• Хранилища (in-memory, column-
storage...)
52. JsQuery limitations
• Variables are always on the left size
x = 1 – OK
1 = x – Error!
• No calculations in query
x + y = 0 — Error!
• No extra datatypes and search operators
point(x,y) <@ '((0,0),(1,1),(2,1),(1,0))'::polygon
55. JsQuery language goals
•Provide rich enough query language for jsonb in 9.4.
•Indexing support for 'jsonb @@ jsquery':
•Two GIN opclasses are in jsquery itself
•VODKA opclasses was tested on jsquery
It's NOT intended to be solution for jsonb querying in
long term!
56. What JsQuery is NOT?
It's not designed to be another extendable, full weight:
•Parser
•Executor
•Optimizer
It's NOT SQL inside SQL.
57. Jsonb querying an array: summary
Using «@>»
• Pro
• Indexing support
• Cons
• Checks only equality for
scalars
• Hard to explain complex
logic
Using subselect and
jsonb_array_elements
• Pro
• SQL-rich
• Cons
• No indexing support
• Heavy syntax
JsQuery
• Pro
• Indexing support
• Rich enough for typical
applications
• Cons
• Not extendable
Still looking for a better solution!
59. Jsonb query: future
Users want jsonb query language to be as rich
as SQL. How to satisfy them?
Bring all required features to SQL-level!
60. Jsonb query: future
Functional equivalents:
• SELECT * FROM company WHERE EXISTS (SELECT 1
FROM jsonb_array_elements(js->'relationships') t
WHERE t->>'title' IN ('CEO', 'CTO') AND
t->'person'->>'first_name' = 'Neil');
• SELECT count(*) FROM company WHERE js @@ 'relationships(#.title in
("CEO", "CTO") AND #.person.first_name = "Neil")'::jsquery;
• SELECT * FROM company WHERE
ANYELEMENT OF js-> 'relationships' AS t (
t->>'title' IN ('CEO', 'CTO') AND
t ->'person'->>'first_name' = 'Neil');
61. Jsonb query: ANYELEMENT
Possible implementation steps:
• Implement ANYELEMENT just as syntactic sugar and only for
arrays.
• Support for various data types (extendable?)
• Handle ANYLEMENT as expression not subselect (problem with
alias).
• Indexing support over ANYELEMENT expressions.
62. Another idea about ANYLEMENENT
Functional equivalents:
• SELECT t
FROM company,
LATERAL (SELECT t FROM
jsonb_array_elements(js->'relationships') t) el;
• SELECT t
FROM company,
ANYELEMENT OF js->'relationships' AS t;