All data is relational and can be represented through relational algebra, right? Perhaps, but there are other ways to represent data, and the PostgreSQL team continues to work on making it easier and more efficient to do so!
With the upcoming 9.4 release, PostgreSQL is introducing the "JSONB" data type which allows for fast, compressed, storage of JSON formatted data, and for quick retrieval. And JSONB comes with all the benefits of PostgreSQL, like its data durability, MVCC, and of course, access to all the other data types and features in PostgreSQL.
How fast is JSONB? How do we access data stored with this type? What can it do with the rest of PostgreSQL? What can't it do? How can we leverage this new data type and make PostgreSQL scale horizontally? Follow along with our presentation as we try to answer these questions.
Working with JSON Data in PostgreSQL vs. MongoDBScaleGrid.io
In this post, we are going to show you tips and techniques on how to effectively store and index JSON data in PostgreSQL vs. MongoDB. Learn more in the blog post: https://scalegrid.io/blog/using-jsonb-in-postgresql-how-to-effectively-store-index-json-data-in-postgresql
Apache Calcite (a tutorial given at BOSS '21)Julian Hyde
Apache Calcite is a dynamic data management framework. Think of it as a toolkit for building databases: it has an industry-standard SQL parser, validator, highly customizable optimizer (with pluggable transformation rules and cost functions, relational algebra, and an extensive library of rules), but it has no preferred storage primitives.
In this tutorial (given at BOSS '21 in Copenhagen as part of VLDB '21) the attendees will use Apache Calcite to build a fully fledged query processor from scratch with very few lines of code. This processor is a full implementation of SQL over an Apache Lucene storage engine. (Lucene does not support SQL queries and lacks a declarative language for performing complex operations such as joins or aggregations.) Attendees will also learn how to use Calcite as an effective tool for research.
Presenters: Julian Hyde and Stamatis Zampetakis
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...Altinity Ltd
JSON is the king of data formats and ClickHouse has a plethora of features to handle it. This webinar covers JSON features from A to Z starting with traditional ways to load and represent JSON data in ClickHouse. Next, we’ll jump into the JSON data type: how it works, how to query data from it, and what works and doesn’t work. JSON data type is one of the most awaited features in the 2022 ClickHouse roadmap, so you won’t want to miss out. Finally, we’ll talk about Jedi master techniques like adding bloom filter indexing on JSON data.
JSON is an important datatype transporting data between servers and many modern applications. Postgres has been at the forefront of bringing these capabilities into the hands of database users. JSONB data type allows for faster operations within PostgreSQL.
At this webinar we will look at:
- How to use JSON from applications
- How to store it in the database
- How to index JSON data
- Tips and tricks to optimize usage
We then closed with a review of the roadmap for new PostgreSQL features for JSON and JSON standards compliance.
Working with JSON Data in PostgreSQL vs. MongoDBScaleGrid.io
In this post, we are going to show you tips and techniques on how to effectively store and index JSON data in PostgreSQL vs. MongoDB. Learn more in the blog post: https://scalegrid.io/blog/using-jsonb-in-postgresql-how-to-effectively-store-index-json-data-in-postgresql
Apache Calcite (a tutorial given at BOSS '21)Julian Hyde
Apache Calcite is a dynamic data management framework. Think of it as a toolkit for building databases: it has an industry-standard SQL parser, validator, highly customizable optimizer (with pluggable transformation rules and cost functions, relational algebra, and an extensive library of rules), but it has no preferred storage primitives.
In this tutorial (given at BOSS '21 in Copenhagen as part of VLDB '21) the attendees will use Apache Calcite to build a fully fledged query processor from scratch with very few lines of code. This processor is a full implementation of SQL over an Apache Lucene storage engine. (Lucene does not support SQL queries and lacks a declarative language for performing complex operations such as joins or aggregations.) Attendees will also learn how to use Calcite as an effective tool for research.
Presenters: Julian Hyde and Stamatis Zampetakis
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...Altinity Ltd
JSON is the king of data formats and ClickHouse has a plethora of features to handle it. This webinar covers JSON features from A to Z starting with traditional ways to load and represent JSON data in ClickHouse. Next, we’ll jump into the JSON data type: how it works, how to query data from it, and what works and doesn’t work. JSON data type is one of the most awaited features in the 2022 ClickHouse roadmap, so you won’t want to miss out. Finally, we’ll talk about Jedi master techniques like adding bloom filter indexing on JSON data.
JSON is an important datatype transporting data between servers and many modern applications. Postgres has been at the forefront of bringing these capabilities into the hands of database users. JSONB data type allows for faster operations within PostgreSQL.
At this webinar we will look at:
- How to use JSON from applications
- How to store it in the database
- How to index JSON data
- Tips and tricks to optimize usage
We then closed with a review of the roadmap for new PostgreSQL features for JSON and JSON standards compliance.
Presentation that I gave as a guest lecture for a summer intensive development course at nod coworking in Dallas, TX. The presentation targets beginning web developers with little, to no experience in databases, SQL, or PostgreSQL. I cover the creation of a database, creating records, reading/querying records, updating records, destroying records, joining tables, and a brief introduction to transactions.
Is it easier to add functional programming features to a query language, or to add query capabilities to a functional language? In Morel, we have done the latter.
Functional and query languages have much in common, and yet much to learn from each other. Functional languages have a rich type system that includes polymorphism and functions-as-values and Turing-complete expressiveness; query languages have optimization techniques that can make programs several orders of magnitude faster, and runtimes that can use thousands of nodes to execute queries over terabytes of data.
Morel is an implementation of Standard ML on the JVM, with language extensions to allow relational expressions. Its compiler can translate programs to relational algebra and, via Apache Calcite’s query optimizer, run those programs on relational backends.
In this talk, we describe the principles that drove Morel’s design, the problems that we had to solve in order to implement a hybrid functional/relational language, and how Morel can be applied to implement data-intensive systems.
(A talk given by Julian Hyde at Strange Loop 2021, St. Louis, MO, on October 1st, 2021.)
This one is about advanced indexing in PostgreSQL. It guides you through basic concepts as well as through advanced techniques to speed up the database.
All important PostgreSQL Index types explained: btree, gin, gist, sp-gist and hashes.
Regular expression indexes and LIKE queries are also covered.
The paperback version is available on lulu.com there http://goo.gl/fraa8o
This is the first volume of the postgresql database administration book. The book covers the steps for installing, configuring and administering a PostgreSQL 9.3 on Linux debian. The book covers the logical and physical aspect of PostgreSQL. Two chapters are dedicated to the backup/restore topic.
This is a recording of my Advanced Oracle Troubleshooting seminar preparation session - where I showed how I set up my command line environment and some of the main performance scripts I use!
The openCypher Project - An Open Graph Query LanguageNeo4j
We want to present the openCypher project, whose purpose is to make Cypher available to everyone – every data store, every tooling provider, every application developer. openCypher is a continual work in progress. Over the next few months, we will move more and more of the language artifacts over to GitHub to make it available for everyone.
openCypher is an open source project that delivers four key artifacts released under a permissive license: (i) the Cypher reference documentation, (ii) a Technology compatibility kit (TCK), (iii) Reference implementation (a fully functional implementation of key parts of the stack needed to support Cypher inside a data platform or tool) and (iv) the Cypher language specification.
We are also seeking to make the process of specifying and evolving the Cypher query language as open as possible, and are actively seeking comments and suggestions on how to improve the Cypher query language.
The purpose of this talk is to provide more details regarding the above-mentioned aspects.
We want to present the openCypher project, whose purpose is to make Cypher available to everyone – every data store, every tooling provider, every application developer. openCypher is a continual work in progress. Over the next few months, we will move more and more of the language artifacts over to GitHub to make it available for everyone.
openCypher is an open source project that delivers four key artifacts released under a permissive license: (i) the Cypher reference documentation, (ii) a Technology compatibility kit (TCK), (iii) Reference implementation (a fully functional implementation of key parts of the stack needed to support Cypher inside a data platform or tool) and (iv) the Cypher language specification.
We are also seeking to make the process of specifying and evolving the Cypher query language as open as possible, and are actively seeking comments and suggestions on how to improve the Cypher query language.
The purpose of this talk is to provide more details regarding the above-mentioned aspects.
This presentation covers all aspects of PostgreSQL administration, including installation, security, file structure, configuration, reporting, backup, daily maintenance, monitoring activity, disk space computations, and disaster recovery. It shows how to control host connectivity, configure the server, find the query being run by each session, and find the disk space used by each database.
Spencer Christensen
There are many aspects to managing an RDBMS. Some of these are handled by an experienced DBA, but there are a good many things that any sys admin should be able to take care of if they know what to look for.
This presentation will cover basics of managing Postgres, including creating database clusters, overview of configuration, and logging. We will also look at tools to help monitor Postgres and keep an eye on what is going on. Some of the tools we will review are:
* pgtop
* pg_top
* pgfouine
* check_postgres.pl.
Check_postgres.pl is a great tool that can plug into your Nagios or Cacti monitoring systems, giving you even better visibility into your databases.
by Mahesh Pakal, AWS
PostgreSQL is a powerful, enterprise class open source object-relational database system with an emphasis on extensibility and standards-compliance. PostgreSQL boasts many sophisticated features and runs stored procedures in more than a dozen programming languages. We’ll explore the advantages and limitations of PostgreSQL, examples of where it is best suited for use, and examples of who is using PostgreSQL to power their applications.
Presentation that I gave as a guest lecture for a summer intensive development course at nod coworking in Dallas, TX. The presentation targets beginning web developers with little, to no experience in databases, SQL, or PostgreSQL. I cover the creation of a database, creating records, reading/querying records, updating records, destroying records, joining tables, and a brief introduction to transactions.
Is it easier to add functional programming features to a query language, or to add query capabilities to a functional language? In Morel, we have done the latter.
Functional and query languages have much in common, and yet much to learn from each other. Functional languages have a rich type system that includes polymorphism and functions-as-values and Turing-complete expressiveness; query languages have optimization techniques that can make programs several orders of magnitude faster, and runtimes that can use thousands of nodes to execute queries over terabytes of data.
Morel is an implementation of Standard ML on the JVM, with language extensions to allow relational expressions. Its compiler can translate programs to relational algebra and, via Apache Calcite’s query optimizer, run those programs on relational backends.
In this talk, we describe the principles that drove Morel’s design, the problems that we had to solve in order to implement a hybrid functional/relational language, and how Morel can be applied to implement data-intensive systems.
(A talk given by Julian Hyde at Strange Loop 2021, St. Louis, MO, on October 1st, 2021.)
This one is about advanced indexing in PostgreSQL. It guides you through basic concepts as well as through advanced techniques to speed up the database.
All important PostgreSQL Index types explained: btree, gin, gist, sp-gist and hashes.
Regular expression indexes and LIKE queries are also covered.
The paperback version is available on lulu.com there http://goo.gl/fraa8o
This is the first volume of the postgresql database administration book. The book covers the steps for installing, configuring and administering a PostgreSQL 9.3 on Linux debian. The book covers the logical and physical aspect of PostgreSQL. Two chapters are dedicated to the backup/restore topic.
This is a recording of my Advanced Oracle Troubleshooting seminar preparation session - where I showed how I set up my command line environment and some of the main performance scripts I use!
The openCypher Project - An Open Graph Query LanguageNeo4j
We want to present the openCypher project, whose purpose is to make Cypher available to everyone – every data store, every tooling provider, every application developer. openCypher is a continual work in progress. Over the next few months, we will move more and more of the language artifacts over to GitHub to make it available for everyone.
openCypher is an open source project that delivers four key artifacts released under a permissive license: (i) the Cypher reference documentation, (ii) a Technology compatibility kit (TCK), (iii) Reference implementation (a fully functional implementation of key parts of the stack needed to support Cypher inside a data platform or tool) and (iv) the Cypher language specification.
We are also seeking to make the process of specifying and evolving the Cypher query language as open as possible, and are actively seeking comments and suggestions on how to improve the Cypher query language.
The purpose of this talk is to provide more details regarding the above-mentioned aspects.
We want to present the openCypher project, whose purpose is to make Cypher available to everyone – every data store, every tooling provider, every application developer. openCypher is a continual work in progress. Over the next few months, we will move more and more of the language artifacts over to GitHub to make it available for everyone.
openCypher is an open source project that delivers four key artifacts released under a permissive license: (i) the Cypher reference documentation, (ii) a Technology compatibility kit (TCK), (iii) Reference implementation (a fully functional implementation of key parts of the stack needed to support Cypher inside a data platform or tool) and (iv) the Cypher language specification.
We are also seeking to make the process of specifying and evolving the Cypher query language as open as possible, and are actively seeking comments and suggestions on how to improve the Cypher query language.
The purpose of this talk is to provide more details regarding the above-mentioned aspects.
This presentation covers all aspects of PostgreSQL administration, including installation, security, file structure, configuration, reporting, backup, daily maintenance, monitoring activity, disk space computations, and disaster recovery. It shows how to control host connectivity, configure the server, find the query being run by each session, and find the disk space used by each database.
Spencer Christensen
There are many aspects to managing an RDBMS. Some of these are handled by an experienced DBA, but there are a good many things that any sys admin should be able to take care of if they know what to look for.
This presentation will cover basics of managing Postgres, including creating database clusters, overview of configuration, and logging. We will also look at tools to help monitor Postgres and keep an eye on what is going on. Some of the tools we will review are:
* pgtop
* pg_top
* pgfouine
* check_postgres.pl.
Check_postgres.pl is a great tool that can plug into your Nagios or Cacti monitoring systems, giving you even better visibility into your databases.
by Mahesh Pakal, AWS
PostgreSQL is a powerful, enterprise class open source object-relational database system with an emphasis on extensibility and standards-compliance. PostgreSQL boasts many sophisticated features and runs stored procedures in more than a dozen programming languages. We’ll explore the advantages and limitations of PostgreSQL, examples of where it is best suited for use, and examples of who is using PostgreSQL to power their applications.
An perspective into the raise of NoSQL systems and an comparison between RDBMS and NoSQL technologies.
The basic idea of the presentation originated while trying to understand the different alternatives available for managing data while building a fast, highly scalable, available, and reliable enterprise application.
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionEtu Solution
講者:Informatica 資深產品顧問 | 尹寒柏
議題簡介:Big Data 時代,比的不是數據數量,而是了解數據的深度。現在,因為 Big Data 技術的成熟,讓非資訊背景的 CXO 們,可以讓過去像是專有名詞的 CI (Customer Intelligence) 變成動詞,從 BI 進入 CI,更連結消費者經濟的脈動,洞悉顧客的意圖。不過,有個 Big Data 時代要 注意的思維,那就是競爭到最後,不單只是看數據量的增長,還要比誰能更了解數據的深度。而 Informatica 正是這個最佳解決的答案。我們透過 Informatica 解決在企業及時提供可信賴數據的巨大壓力;同時隨著日益增高的數據量和複雜程度,Informatica 也有能力提供更快速彙集數據技術,從而讓數據變的有意義並可供企業用來促進效率提升、完善品質、保證確定性和發揮優勢的功能。Inforamtica 提供了更為快速有效地實現此目標的方案,是精誠集團在 Big Data 時代的最佳工具。
It’s been an exciting year for Amazon Aurora, the MySQL-compatible relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. In this deep dive session, we’ll discuss best practices and explore new features, include high availability options and new integrations with AWS services. We’ll also discuss the recently-announced Aurora with PostgreSQL compatibility.
A comparison of different solutions for full-text search in web applications using PostgreSQL and other technology. Presented at the PostgreSQL Conference West, in Seattle, October 2009.
Josh Berkus
You've heard that PostgreSQL is the highest-performance transactional open source database, but you're not seeing it on YOUR server. In fact, your PostgreSQL application is kind of poky. What should you do? While doing advanced performance engineering for really high-end systems takes years to learn, you can learn the basics to solve performance issues for 80% of PostgreSQL installations in less than an hour. In this session, you will learn: -- The parts of database application performance -- The performance setup procedure -- Basic troubleshooting tools -- The 13 postgresql.conf settings you need to know -- Where to look for more information.
Postgres vs Mongo / Олег Бартунов (Postgres Professional)Ontico
РИТ++ 2017, Backend Conf
Зал Конгресс-холл, 6 июня, 17:00
Тезисы:
http://backendconf.ru/2017/abstracts/2781.html
Я хочу немного порушить стереотипы, что Postgres - это чисто реляционная СУБД из прошлого века, плохо приспособленная под реалии современных проектов. Недавно мы прогнали YCSB для последних версий Postgres и Mongodb и увидели их плюсы и минусы на разных типах нагрузки, о которых я буду рассказывать. ...
JSONB in PostgreSQL is one of the main attractive feature for modern
application developers, no matter what some RDBMS purists are thinking.
People often use simple one-column-json schema for their projects and rely
on ability of database to store,index and query json. Postgres has long
history of supporting the non-structured data and has pioneered the
adoption of JSON by relational databases, so eventually JSON became and
official feature (SQL/JSON) of SQL standard.
PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...pgdayrussia
Доклад был представлен на официальной российской конференции PG Day'14 Russia, посвященной вопросам разработки и эксплуатации PostgreSQL.
<сарказм> MongoDB правит бал в мире слабо-структурированных данных. Привлеченные в MongoDB инвестиции часто затмевают разум (особенно начинающих и доверчивых) разработчиков, которые с радостью бросаются в океан возможностей, предоставляемых NoSQL (это же круто!). Энтузиазм затихает после осознания того факта, что бесплатно ничего не бывает и надо писать своими руками то, что десятилетиями хорошо работает в традиционных реляционных базах данных, которые прекрасно справляются с нагрузками и данными 99% проектов, и ваш проект не входит в оставшийся один процент. </сарказм>
Мир баз данных за последние годы существенно изменился. Всеместное проникновение Интернет-технологий привело к необходимости работы с большим количеством разнородных данных в реальном времени, к чему традиционные реляционные СУБД оказались не готовы. Принято считать, что слабая масштабируемость и излишняя “жесткость” модели данных реляционных СУБД и являются основными причинами появляния и роста популярности NoSQL баз данных (далее, NoSQL).
В докладе мы остановимся на концептуальных предпосылках появления NoSQL и их классификации. Одним из “жупелов” NoSQL является поддержка типа данных JSON, который реализует документо-ориентированную модель данных. Документо-ориентированная модель данных является более гибкой и позволяет менять схему данных “на лету”, что сделать очень трудно в реляционных СУБД, особенно в системах, работающих под большой нагрузкой. Несмотря на успех NoSQL (активно распиаренный использованием в некоторых популярных Интернет-проектах), многие пользователи не готовы приносить в жертву целостность данных в угоду масштабируемости, но хотят иметь гибкость схемы данных в проверенных и надежных реляционных СУБД.
Нами была предложена и реализована поддержка документо-ориентированной модели в PostgreSQL (версия 9.4). Уже более 10 лет в PostgreSQL существует возможность работать со schema-less данными, используя наш модуль расширения hstore. Hstore предлагает хранилище вида "ключ-значение" с сохранением всех реляционных возможностей, что сделало его самым используемым
Present and future of Jsonb in PostgreSQL
Json - is an ubiquitous data format, which supported in almost any popular databases. PostgreSQL was the first relational database, which received support of native textual json and very efficient binary jsonb data types. Recently published SQL 2016 standard describes the JSON data type and specifies the functions and operators to work with json in SQL, which greatly simplifies the further development of json support in PostgreSQL. We compare existing features of json/jsonb data types with proposed SQL standard and discuss the ways how we could improve json/jsonb support in PostgreSQL.
PostgreSQL offers to application developers a rich support of json data type, providing known advantages of the json data model with traditional benefits of relational databases, such as declarative query language, rich query processing, transaction management providing ACID safety guarantees. However, current support of json is far from ideal, for example, json is still "foreign" data type to SQL - existed jsquery extension tries to implement their own query language, which is being powerfull, is opaque to Postgres planner and optimizer and not extendable. Extending SQL to support json, without commonly accepted standard, is difficult and perspectiveless task. Recently published SQL 2016 standard describes the JSON data type and specifies the functions and operators to work with json in SQL, which makes clear the direction of future development of json support in PostgreSQL. We present our ideas and prototype of future json data type in PostgreSQL with some further non-standard extensions and improvements in storage requirement and index support.
2015-12-05 Александр Коротков, Иван Панченко - Слабо-структурированные данные...HappyDev
Появление большого количества NoSQL СУБД обусловлено требованиями современных информационных систем, которым большинство традиционных реляционных баз данных не удовлетворяет. Одним из таких требований является поддержка данных, структура которых заранее не определена. Однако при выборе NoSQL БД ради отсутствия схем данных можно потерять ряд преимуществ, которые дают зрелые SQL-решения, а именно: транзакции, скорость чтения строк из таблиц. PostgreSQL, являющаяся передовой реляционной СУБД, имела поддержку слабо-структурированных данных задолго до появления NoSQL, которая обрела новое дыхание в последнем релизе в виде типа данных jsonb, который не только поддерживает стандарт JSON, но и обладает производительностью, сравнимой или даже превосходящей наиболее популярные NoSQL СУБД.
Media owners are turning to MongoDB to drive social interaction with their published content. The way customers consume information has changed and passive communication is no longer enough. They want to comment, share and engage with publishers and their community through a range of media types and via multiple channels whenever and wherever they are. There are serious challenges with taking this semi-structured and unstructured data and making it work in a traditional relational database. This webinar looks at how MongoDB’s schemaless design and document orientation gives organisation’s like the Guardian the flexibility to aggregate social content and scale out.
JSON Processing in the Database using PostgreSQL 9.4 :: Data Wranglers DC :: ...Ryan B Harvey, CSDP, CSM
This slide deck was prepared for a talk on getting, processing, and reshaping JSON data using PostgreSQL 9.4 at the Data Wranglers DC meetup on January 7, 2015.
Associated materials are on GitHub at:
https://github.com/nihonjinrxs/dwdc-january2015
Meetup information: http://www.meetup.com/Data-Wranglers-DC/events/219112410/
PostgreSQL has kept up the momentum around JSON with version 9.4 featuring JSONB as demand for working with unstructured data continues to grow. In this talk delivered during Postgres Open 2014, Vibhor Kumar, principal systems engineer at EnterpriseDB, offered some scenarios for working with JSON in PostgreSQL and demonstrated performance metrics. This session also gave some instruction on how to use different operations and explored comparisons to BSON.
Jsquery - the jsonb query language with GIN indexing supportAlexander Korotkov
PostgreSQL 9.4 has new jsonb data type, which was designed for efficient work with json data. However, its query language is very limited and supports only a few operators. In this talk we introduce jsquery - the jsonb query language, which is flexible, expandable and has GIN indexing support. Jsquery provides postgres users an ability to talk to json data in an efficient way on par with NoSQL databases. The preliminary prototype was presented at PCGon-2014 and has got a good feedback, so now we want to show to european users the new version of jsquery (with some enhancements), which is compatible with 9.4 release and can be installed as an extension. We'll also discuss current issues of jsquery and possible ways of improvements.
Володимир Гоцик. Getting maximum of python, django with postgres 9.4. PyCon B...Alina Dolgikh
Postgres предоставляет много встроенных возможностей для создания эфективных приложений, использующих базы данных. А в версии 9.4 появляется еще и полноценное JSON поле, при правильном использовании которого, отпадает необходимость использвания NoSQL баз данных. В докладе мы рассмотрим, как использовать этот потенциал по максимуму в своих Python/Django приложениях.
Webinar: General Technical Overview of MongoDB for Dev TeamsMongoDB
In this talk we will focus on several of the reasons why developers have come to love the richness, flexibility, and ease of use that MongoDB provides. First we will give a brief introduction of MongoDB, comparing and contrasting it to the traditional relational database. Next, we’ll give an overview of the APIs and tools that are part of the MongoDB ecosystem. Then we’ll look at how MongoDB CRUD (Create, Read, Update, Delete) operations work, and also explore query, update, and projection operators. Finally, we will discuss MongoDB indexes and look at some examples of how indexes are used.
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Jonathan Katz
Vectors are a centuries old, well-studied mathematical concept, yet they pose many challenges around efficient storage and retrieval in database systems. The heightened ease-of-use of AI/ML has lead to a surge of interested of storing vector data alongside application data, leading to some unique challenges. PostgreSQL has seen this story before with JSON, when JSON became the lingua franca of the web. So how can you use PostgreSQL to manage your vector data, and what challenges should you be aware of?
In this session, we'll review what vectors are, how they are used in applications, and what users are looking for in vector storage and search systems. We'll then see how you can search for vector data in PostgreSQL, including looking at best practices for using pgvector, an extension that adds additional vector search capabilities to PostgreSQL. Finally, we'll review ongoing development in both PostgreSQL and pgvector that will make it easier and more performant to search vector data in PostgreSQL.
There are parallels between storing JSON data in PostgreSQL and storing vectors that are produced from AI/ML systems. This lightning talk briefly covers the similarities in use-cases in storing JSON and vectors in PostgreSQL, shows some of the use-cases developers have for querying vectors in Postgres, and some roadmap items for improving PostgreSQL as a vector database.
This talk explores PostgreSQL 15 enhancements (along with some history) and looks at how they improve developer experience (MERGE and SQL/JSON), optimize support for backups and compression, logical replication improvements, enhanced security and performance, and more.
Build a Complex, Realtime Data Management App with Postgres 14!Jonathan Katz
Congratulations: you've been selected to build an application that will manage reservations for rooms!
On the surface, this sounds simple, but you are building a system for managing a high traffic reservation web page, so we know that a lot of people will be accessing the system. Therefore, we need to ensure that the system can handle all of the eager users that will be flooding the website checking to see what availability each room has.
Fortunately, PostgreSQL is prepared for this! And even better, we will be using Postgres 14 to make the problem even easier!
We will explore the following PostgreSQL features:
* Data types and their functionality, such as:
* Data/Time types
* Ranges / Multirnages
Indexes such as:
* GiST
* Common Table Expressions and Recursion (though multiranges will make things easier!)
* Set generating functions and LATERAL queries
* Functions and the PL/PGSQL
* Triggers
* Logical decoding and streaming
We will be writing our application primary with SQL, though we will sneak in a little bit of Python and using Kafka to demonstrate the power of logical decoding.
At the end of the presentation, we will have a working application, and you will be happy knowing that you provided a wonderful user experience for all users made possible by the innovation of PostgreSQL!
Get Your Insecure PostgreSQL Passwords to SCRAMJonathan Katz
Passwords: they just seem to work. You connect to your PostgreSQL database and you are prompted for your password. You type in the correct character combination, and presto! you're in, safe and sound.
But what if I told you that all was not as it seemed. What if I told you there was a better, safer way to use passwords with PostgreSQL? What if I told you it was imperative that you upgraded, too?
PostgreSQL 10 introduced SCRAM (Salted Challenge Response Authentication Mechanism), introduced in RFC 5802, as a way to securely authenticate passwords. The SCRAM algorithm lets a client and server validate a password without ever sending the password, whether plaintext or a hashed form of it, to each other, using a series of cryptographic methods.
In this talk, we will look at:
* A history of the evolution of password storage and authentication in PostgreSQL
* How SCRAM works with a step-by-step deep dive into the algorithm (and convince you why you need to upgrade!)
* SCRAM channel binding, which helps prevent MITM attacks during authentication
* How to safely set and modify your passwords, as well as how to upgrade to SCRAM-SHA-256 (which we will do live!)
all of which will be explained by some adorable elephants and hippos!
At the end of this talk, you will understand how SCRAM works, how to ensure your PostgreSQL drivers supports it, how to upgrade your passwords to using SCRAM-SHA-256, and why you want to tell other PostgreSQL password mechanisms to SCRAM!
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMJonathan Katz
PostgreSQL 10 introduced SCRAM (Salted Challenge Response Authentication Mechanism), introduced in RFC 5802, as a way to securely authenticate passwords. The SCRAM algorithm lets a client and server validate a password without ever sending the password, whether plaintext or a hashed form of it, to each other, using a series of cryptographic methods.
At the end of this talk, you will understand how SCRAM works, how to ensure your PostgreSQL drivers supports it, how to upgrade your passwords to using SCRAM-SHA-256, and why you want to tell other PostgreSQL password mechanisms to SCRAM!
Operating PostgreSQL at Scale with KubernetesJonathan Katz
The maturation of containerization platforms has changed how people think about creating development environments and has eliminated many inefficiencies for deploying applications. These concept and technologies have made its way into the PostgreSQL ecosystem as well, and tools such as Docker and Kubernetes have enabled teams to run their own “database-as-a-service” on the infrastructure of their choosing.
All this sounds great, but if you are new to the world of containers, it can be very overwhelming to find a place to start. In this talk, which centers around demos, we will see how you can get PostgreSQL up and running in a containerized environment with some advanced sidecars in only a few steps! We will also see how it extends to a larger production environment with Kubernetes, and what the future holds for PostgreSQL in a containerized world.
We will cover the following:
* Why containers are important and what they mean for PostgreSQL
* Create a development environment with PostgreSQL, pgadmin4, monitoring, and more
* How to use Kubernetes to create your own "database-as-a-service"-like PostgreSQL environment
* Trends in the container world and how it will affect PostgreSQL
At the conclusion of the talk, you will understand the fundamentals of how to use container technologies with PostgreSQL and be on your way to running a containerized PostgreSQL environment at scale!
Building a Complex, Real-Time Data Management ApplicationJonathan Katz
Congratulations: you've been selected to build an application that will manage whether or not the rooms for PGConf.EU are being occupied by a session!
On the surface, this sounds simple, but we will be managing the rooms of PGConf.EU, so we know that a lot of people will be accessing the system. Therefore, we need to ensure that the system can handle all of the eager users that will be flooding the PGConf.EU website checking to see what availability each of the PGConf.EU rooms has.
To do this, we will explore the following PGConf.EU features:
* Data types and their functionality, such as:
* Data/Time types
* Ranges
Indexes such as:
* GiST
* SP-Gist
* Common Table Expressions and Recursion
* Set generating functions and LATERAL queries
* Functions and the PL/PGSQL
* Triggers
* Logical decoding and streaming
We will be writing our application primary with SQL, though we will sneak in a little bit of Python and using Kafka to demonstrate the power of logical decoding.
At the end of the presentation, we will have a working application, and you will be happy knowing that you provided a wonderful user experience for all PGConf.EU attendees made possible by the innovation of PGConf.EU!
Using PostgreSQL With Docker & Kubernetes - July 2018Jonathan Katz
The maturation of containerization platforms has changed how people think about creating development environments and has eliminated many inefficiencies for deploying applications. These concept and technologies have made its way into the PostgreSQL ecosystem as well, and tools such as Docker and Kubernetes have enabled teams to run their own “database-as-a-service” on the infrastructure of their choosing.
In this talk, we will cover the following:
- Why containers are important and what they mean for PostgreSQL
- Setting up and managing a PostgreSQL along with pgadmin4 and monitoring
- Running PostgreSQL on Kubernetes with a Demo
- Trends in the container world and how it will affect PostgreSQL
An Introduction to Using PostgreSQL with Docker & KubernetesJonathan Katz
The maturation of containerization platforms has changed how people think about creating development environments and has eliminated many inefficiencies for deploying applications. These concept and technologies have made its way into the PostgreSQL ecosystem as well, and tools such as Docker and Kubernetes have enabled teams to run their own “database-as-a-service” on the infrastructure of their choosing.
In this talk, we will cover the following:
- Why containers are important and what they mean for PostgreSQL
- Setting up and managing a PostgreSQL container
- Extending your setup with a pgadmin4 container
- Container orchestration: What this means, and how to use Kubernetes to leverage database-as-a-service with PostgreSQL
- Trends in the container world and how it will affect PostgreSQL
Developing and Deploying Apps with the Postgres FDWJonathan Katz
I couldn't wait to use the Postgres Foreign Data Wrapper (postgres_fdw) in a project; imagine being able to read and write data to many databases all from a single database! I finally found a project where it made sense to use this amazing technology.
I mapped out my architecture and began to code, and realized there were some things that did not work as expected: I could not call remote functions or insert into a table with a serial primary key and have it autoupdate. I found workarounds (which I will share), so the project went on.
We tested the setup, everything seemed to work well, and then we went to deploy to production. And then the real fun began.
Despite the title, I still love the Postgres FDW but wanted to provide some cautionary tales from a hybrid developer/DBA perspective on how to properly use them in your working environment. This talk will cover:
* Basic Postgres FDW setup in a development environment vs. production environment
* Handling some common FDW uses case that you think are trivial but are not
* Working with advanced Postgres constructs such as schemas and sequences with FDWs
* Putting it all together to make sure your production application is safe with your FDWs
* ...and when you really, really need to make a remote call and it is not supported by a FDW, how to do that too!
What's the great thing about a database? Why, it stores data of course! However, one feature that makes a database useful is the different data types that can be stored in it, and the breadth and sophistication of the data types in PostgreSQL is second-to-none, including some novel data types that do not exist in any other database software!
This talk will take an in-depth look at the special data types built right into PostgreSQL version 9.4, including:
* INET types
* UUIDs
* Geometries
* Arrays
* Ranges
* Document-based Data Types:
* Key-value store (hstore)
* JSON (text [JSON] & binary [JSONB])
We will also have some cleverly concocted examples to show how all of these data types can work together harmoniously.
Accelerating Local Search with PostgreSQL (KNN-Search)Jonathan Katz
KNN-GiST indexes were added in PostgreSQL 9.1 and greatly accelerate some common queries in the geospatial and textual search realms. This presentation will demonstrate the power of KNN-GiST indexes on geospatial and text searching queries, but also their present limitations through some of my experimentations. I will also discuss some of the theory behind KNN (k-nearest neighbor) as well as some of the applications this feature can be applied too.
To see a version of the talk given at PostgresOpen 2011, please visit http://www.youtube.com/watch?v=N-MD08QqGEM
PostgreSQL comes built-in with a variety of indexes, some of which are further extensible to build powerful new indexing schemes. But what are all these index types? What are some of the special features of these indexes? What are the size & performance tradeoffs? How do I know which ones are appropriate for my application?
Fortunately, this talk aims to answer all of these questions as we explore the whole family of PostgreSQL indexes: B-tree, expression, GiST (of all flavors), GIN and how they are used in theory and practice.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
2. Who Are We?
● Jonathan S. Katz
– CTO, VenueBook
– jonathan@venuebook.com
– @jkatz05
● Jim Mlodgenski
– CTO, OpenSCG
– jimm@openscg.com
– @jim_mlodgenski
3. Edgar Frank “Ted” Codd
"A Relational Model of Data for
Large Shared Data Banks"
4. The Relational Model
● All data => “n-ary relations”
● Relation => set of n-tuples
● Tuple => ordered set of attribute values
● Attribute Value => (attribute name, type name)
● Type => classification of the data (“domain”)
● Data is kept consistent via “constraints”
● Data is manipulated using “relational algebra”
6. Relation Model ≠ SQL
● (Well yeah, SQL is derived from relational algebra,
but still…)
● SQL deviates from the relational model with:
– duplicate rows
– anonymous columns (think functions, operations)
– strict column order with storage
– NULL
10. Now Back in the Real World…
• Data is imperfect
• Data is stored imperfectly
• Data is sometimes transferred between
different systems
• And sometimes we just don’t want to go
through the hassle of SQL
16. JSON and PostgreSQL
●
Started in 2010 as a Google Summer of Code Project
– https://wiki.postgresql.org/wiki/JSON_datatype_GSo
C_2010
●
Goal: be similar to XML data type functionality in
Postgres
●
Be committed as an extension for PostgreSQL
9.1
17. What Happened?
• Different proposals over how to finalize the
implementation
– binary vs. text
• Core vs Extension
• Discussions between “old” vs. “new” ways of
packaging for extensions
20. PostgreSQL 9.2: JSON
• JSON data type in core PostgreSQL
• based on RFC 4627
• only “strictly” follows if your database encoding is
UTF-8
• text-based format
• checks for validity
21. PostgreSQL 9.2: JSON
SELECT '[{"PUG": "NYC"}]'::json;
json
------------------
[{"PUG": "NYC"}]
SELECT '[{"PUG": "NYC"]'::json;
ERROR: invalid input syntax for type json at character 8
DETAIL: Expected "," or "}", but found "]".
CONTEXT: JSON data, line 1: [{"PUG": "NYC"]
25. PostgreSQL 9.3: JSON Ups its Game
• Added operators and functions to read / prepare JSON
• Added casts from hstore to JSON
26. PostgreSQL 9.3: JSON
Operator Description Example
-> return JSON array element OR
JSON object field
'[1,2,3]'::json -> 0;
'{"a": 1, "b": 2, "c": 3}'::json -> 'b';
->> return JSON array element OR
JSON object field AS text
['1,2,3]'::json ->> 0;
'{"a": 1, "b": 2, "c": 3}'::json ->> 'b';
#> return JSON object using path '{"a": 1, "b": 2, "c": [1,2,3]}'::json #> '{c, 0}';
#>> return JSON object using path AS
text
'{"a": 1, "b": 2, "c": [1,2,3]}'::json #> '{c, 0}';
27. Operator Gotchas
SELECT * FROM category_documents
WHERE data->'title' = 'PostgreSQL';
ERROR: operator does not exist: json = unknown
LINE 1: ...ECT * FROM category_documents WHERE data-
>'title' = 'Postgre...
^HINT: No operator
matches the given name and argument type(s). You
might need to add explicit type casts.
28. Operator Gotchas
SELECT * FROM category_documents
WHERE data->>'title' = 'PostgreSQL';
-----------------------
{"cat_id":252739,"cat_pages":14,"cat_subcats":0,
"cat_files":0,"title":"PostgreSQL"}
(1 row)
29. For the Upcoming Examples
• Wikipedia English category titles – all 1,823,644 that I downloaded
• Relation looks something like:
Column | Type | Modifiers
-------------+---------+--------------------
cat_id | integer | not null
cat_pages | integer | not null default 0
cat_subcats | integer | not null default 0
cat_files | integer | not null default 0
title | text |
30. Performance?
EXPLAIN ANALYZE SELECT * FROM category_documents
WHERE data->>'title' = 'PostgreSQL';
---------------------
Seq Scan on category_documents
(cost=0.00..57894.18 rows=9160 width=32) (actual
time=360.083..2712.094 rows=1 loops=1)
Filter: ((data ->> 'title'::text) =
'PostgreSQL'::text)
Rows Removed by Filter: 1823643
Total runtime: 2712.127 ms
31. Performance?
CREATE INDEX category_documents_idx ON
category_documents (data);
ERROR: data type json has no default operator
class for access method "btree"
HINT: You must specify an operator class for
the index or define a default operator class for
the data type.
32. Let’s Be Clever
• json_extract_path, json_extract_path_text
– LIKE (#>, #>>) but with list of args
SELECT json_extract_path(
'{"a": 1, "b": 2, "c": [1,2,3]}’::json,
'c', ‘0’);
--------
1
33. Performance Revisited
CREATE INDEX category_documents_data_idx
ON category_documents
(json_extract_path_text(data, ‘title'));
EXPLAIN ANALYZE
SELECT * FROM category_documents
WHERE json_extract_path_text(data, 'title') = 'PostgreSQL';
--------------------
Bitmap Heap Scan on category_documents (cost=303.09..20011.96 rows=9118
width=32) (actual time=0.090..0.091 rows=1 loops=1)
Recheck Cond: (json_extract_path_text(data, VARIADIC '{title}'::text[]) =
'PostgreSQL'::text)
-> Bitmap Index Scan on category_documents_data_idx (cost=0.00..300.81
rows=9118 width=0) (actual time=0.086..0.086 rows=1 loops=1)
Index Cond: (json_extract_path_text(data, VARIADIC '{title}'::text[])
= 'PostgreSQL'::text)
Total runtime: 0.105 ms
34. The Relation vs JSON
• Size on Disk
• category (relation) - 136MB
• category_documents (JSON) - 238MB
• Index Size for “title”
• category - 89MB
• category_documents - 89MB
• Average Performance for looking up “PostgreSQL”
• category - 0.065ms
• category_documents - 0.070ms
35. JSON => SET
• to_json
• json_each, json_each_text
SELECT * FROM
json_each('{"a": 1, "b": [2,3,4], "c":
"wow"}'::json);
key | value
-----+---------
a | 1
b | [2,3,4]
c | "wow"
37. Populating JSON Records
• json_populate_record
CREATE TABLE stuff (a int, b text, c int[]);
SELECT *
FROM json_populate_record(
NULL::stuff, '{"a": 1, "b": “wow"}'
);
a | b | c
---+-----+---
1 | wow |
SELECT *
FROM json_populate_record(
NULL::stuff, '{"a": 1, "b": "wow", "c": [4,5,6]}’
);
ERROR: cannot call json_populate_record on a nested object
39. JSON Aggregates
• (this is pretty cool)
• json_agg
SELECT b, json_agg(stuff)
FROM stuff
GROUP BY b;
b | json_agg
------+----------------------------------
neat | [{"a":4,"b":"neat","c":[4,5,6]}]
wow | [{"a":1,"b":"wow","c":[1,2,3]}, +
| {"a":3,"b":"wow","c":[7,8,9]}]
cool | [{"a":2,"b":"cool","c":[4,5,6]}]
40. hstore gets in the game
• hstore_to_json
• converts hstore to json, treating all values as strings
• hstore_to_json_loose
• converts hstore to json, but also tries to distinguish
between data types and “convert” them to proper JSON
representations
SELECT hstore_to_json_loose(‘"a key"=>1, b=>t, c=>null,
d=>12345, e=>012345, f=>1.234, g=>2.345e+4');
----------------
{"b": true, "c": null, "d": 12345, "e": "012345", "f":
1.234, "g": 2.345e+4, "a key": 1}
41. Next Steps?
• In PostgreSQL 9.3, JSON became
much more useful, but…
• Difficult to search within JSON
• Difficult to build new JSON objects
42. “Nested hstore”
• Proposed at PGCon 2013 by Oleg Bartunov and Teodor
Sigaev
• Hierarchical key-value storage system that supports
arrays too and stored in binary format
• Takes advantage of GIN indexing mechanism in
PostgreSQL
• “Generalized Inverted Index”
• Built to search within composite objects
• Arrays, fulltext search, hstore
• …JSON?
43. How JSONB Came to Be
• JSON is the “lingua franca per trasmissione la data
nella web”
• The PostgreSQL JSON type was in a text format and
preserved text exactly as input
• e.g. duplicate keys are preserved
• Create a new data type that merges the nested Hstore
work to create a JSON type stored in a binary format:
JSONB
44. JSONB ≠ BSON
BSON is a data type created by MongoDB
as a “superset of JSON”
JSONB lives in PostgreSQL and is just JSON
that is stored in a binary format on disk
45. JSONB Gives Us More Operators
• a @> b - is b contained within a?
• { "a": 1, "b": 2 } @> { "a": 1} -- TRUE
• a <@ b - is a contained within b?
• { "a": 1 } <@ { "a": 1, "b": 2 } -- TRUE
• a ? b - does the key “b” exist in JSONB a?
• { "a": 1, "b": 2 } ? 'a' -- TRUE
• a ?| b - does the array of keys in “b” exist in JSONB a?
• { "a": 1, "b": 2 } ?| ARRAY['b', 'c'] -- TRUE
• a ?& b - does the array of keys in "b" exist in JSONB a?
• { "a": 1, "b": 2 } ?& ARRAY['a', 'b'] -- TRUE
46. JSONB Gives Us Flexibility
SELECT * FROM category_documents WHERE
data @> '{"title": "PostgreSQL"}';
----------------
{"title": "PostgreSQL", "cat_id": 252739,
"cat_files": 0, "cat_pages": 14, "cat_subcats": 0}
SELECT * FROM category_documents WHERE
data @> '{"cat_id": 5432 }';
----------------
{"title": "1394 establishments", "cat_id": 5432,
"cat_files": 0, "cat_pages": 4, "cat_subcats": 2}
47. JSONB Gives us GIN
• Recall - GIN indexes are used to "look inside" objects
• JSONB has two flavors of GIN:
• Standard - supports @>, ?, ?|, ?&
CREATE INDEX category_documents_data_idx USING
gin(data);
• "Path Ops" - supports only @>
CREATE INDEX category_documents_path_data_idx
USING gin(data jsonb_path_ops);
48. JSONB Gives Us Speed
EXPLAIN ANALYZE SELECT * FROM category_documents
WHERE data @> '{"title": "PostgreSQL"}';
------------
Bitmap Heap Scan on category_documents (cost=38.13..6091.65 rows=1824
width=153) (actual time=0.021..0.022 rows=1 loops=1)
Recheck Cond: (data @> '{"title": "PostgreSQL"}'::jsonb)
Heap Blocks: exact=1
-> Bitmap Index Scan on category_documents_path_data_idx
(cost=0.00..37.68 rows=1824 width=0) (actual time=0.012..0.012 rows=1
loops=1)
Index Cond: (data @> '{"title": "PostgreSQL"}'::jsonb)
Planning time: 0.070 ms
Execution time: 0.043 ms
49. JSONB + Wikipedia Categories:
By the Numbers
• Size on Disk
• category (relation) - 136MB
• category_documents (JSON) - 238MB
• category_documents (JSONB) - 325MB
• Index Size for “title”
• category - 89MB
• category_documents (JSON with one key using an expression index) - 89MB
• category_documents (JSONB, all GIN ops) - 311MB
• category_documents (JSONB, just @>) - 203MB
• Average Performance for looking up “PostgreSQL”
• category - 0.065ms
• category_documents (JSON with one key using an expression index) - 0.070ms
• category_documents (JSONB, all GIN ops) - 0.115ms
• category_documents (JSONB, just @>) - 0.045ms
50. JSONB Gives Us WTF:
A Note On Operator Indexability
EXPLAIN ANALYZE SELECT * FROM documents WHERE data @> ’{ "f1": 10 }’;
QUERY PLAN
-----------
Bitmap Heap Scan on documents (cost=27.75..3082.65 rows=1000 width=66) (actual time=0.029..0.
rows=1 loops=1)
Recheck Cond: (data @> ’{"f1": 10}’::jsonb)
Heap Blocks: exact=1
-> Bitmap Index Scan on documents_data_gin_idx (cost=0.00..27.50 rows=1000 width=0)
(actual time=0.014..0.014 rows=1 loops=1)
Index Cond: (data @> ’{"f1": 10}’::jsonb)
Execution time: 0.084 ms
EXPLAIN ANALYZE SELECT * FROM documents WHERE ’{ "f1": 10 }’ <@ data;
QUERY PLAN
-----------
Seq Scan on documents (cost=0.00..24846.00 rows=1000 width=66) (actual time=0.015..245.924 ro
Filter: (’{"f1": 10}’::jsonb <@ data)
Rows Removed by Filter: 999999
Execution time: 245.947 ms
51. JSON ≠ Schema-less
Some agreements must be made about the document
The document must be validated somewhere
Ensure that all of your code no matter who writes it conforms
to a basic document structure
52. Enter PL/V8
● Write your database functions in Javascript
● Validate your JSON inside of the database
● http://pgxn.org/dist/plv8/doc/plv8.html
CREATE EXTENSION plv8;
53. Create A Validation Function
CREATE OR REPLACE FUNCTION has_valid_keys(doc json)
RETURNS boolean AS
$$
if (!doc.hasOwnProperty('data'))
return false;
if (!doc.hasOwnProperty('meta'))
return false;
return true;
$$ LANGUAGE plv8 IMMUTABLE;
54. Add A Constraint
ALTER TABLE collection
ADD CONSTRAINT collection_key_chk
CHECK (has_valid_keys(doc::json));
scale=# INSERT INTO collection (doc) VALUES ('{"name":
"postgresql"}');
ERROR: new row for relation "collection" violates check
constraint "collection_key_chk"
DETAIL: Failing row contains (ea438788-b2a0-4ba3-b27d-
a58726b8a210, {"name": "postgresql"}).
55. Schema-Less ≠ Web-Scale
Web-Scale needs to run on commodity hardware or the cloud
Web-Scale needs horizontal scalability
Web-Scale needs no single point of failure
56. Enter PL/Proxy
● Developed by Skype
● Allows for scalability and parallelization
● http://pgfoundry.org/projects/plproxy/
● Used by many large organizations around the
world
58. Setting Up A Proxy Server
CREATE EXTENSION plproxy;
CREATE SERVER datacluster FOREIGN DATA WRAPPER plproxy
OPTIONS (connection_lifetime '1800',
p0 'dbname=data1 host=localhost',
p1 'dbname=data2 host=localhost' );
CREATE USER MAPPING FOR PUBLIC SERVER datacluster;
59. Create a “Get” Function
CREATE OR REPLACE FUNCTION get_doc(i_id uuid)
RETURNS SETOF jsonb AS $$
CLUSTER 'datacluster';
RUN ON hashtext(i_id::text) ;
SELECT doc FROM collection WHERE id =
i_id;
$$ LANGUAGE plproxy;
60. Create a “Put” Function
CREATE OR REPLACE FUNCTION put_doc(
i_doc jsonb,
i_id uuid DEFAULT uuid_generate_v4())
RETURNS uuid AS $$
CLUSTER 'datacluster';
RUN ON hashtext(i_id::text);
$$ LANGUAGE plproxy;
61. Need a “Put” Function on the
Shard
CREATE OR REPLACE FUNCTION
put_doc(i_doc jsonb, i_id uuid)
RETURNS uuid AS $$
INSERT INTO collection (id, doc)
VALUES ($2,$1);
SELECT $2;
$$ LANGUAGE SQL;
62. Parallelize A Query
CREATE OR REPLACE FUNCTION get_doc_by_id (v_id varchar)
RETURNS SETOF jsonb AS $$
CLUSTER 'datacluster';
RUN ON ALL;
SELECT doc FROM collection
WHERE doc @> CAST('{"id" : "' || v_id || '"}' AS
jsonb);
$$ LANGUAGE plproxy;