As a developer using PostgreSQL one of the most important tasks you have to deal with is modeling the database schema for your application. In order to achieve a solid design, it’s important to understand how the schema is then going to be used as well as the trade-offs it involves.
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...Citus Data
As a developer using PostgreSQL one of the most important tasks you have to deal with is modeling the database schema for your application. In order to achieve a solid design, it’s important to understand how the schema is then going to be used as well as the trade-offs it involves.
As Fred Brooks said: “Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.”
In this talk we're going to see practical normalisation examples and their benefits, and also review some anti-patterns and their typical PostgreSQL solutions, including Denormalization techniques thanks to advanced Data Types.
As a developer using PostgreSQL one of the most important tasks you have to deal with is modeling the database schema for your application. In order to achieve a solid design, it’s important to understand how the schema is then going to be used as well as the trade-offs it involves.
As Fred Brooks said: “Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.”
In this talk we're going to see practical normalisation examples and their benefits, and also review some anti-patterns and their typical PostgreSQL solutions, including Denormalization techniques thanks to advanced Data Types.
January 2016 Meetup: Speeding up (big) data manipulation with data.table packageZurich_R_User_Group
Abstract: Both practitioners and researchers spend significant amount of their time on data preparation, cleaning and exploration. It gets more complicated and interesting if a dataset is big, or if it has a lot of groups in it which require per-group analysis. In this talk I will introduce an innovative data.table package as an alternative to the standard data.frame which significantly cuts your programming and execution time with easier code. It is also the first step to working with big data in R. The talk will be beneficial for R users from all disciplines, as well as for big data professionals looking for more explicit data exploration tools.
PostgreSQL comes built-in with a variety of indexes, some of which are further extensible to build powerful new indexing schemes. But what are all these index types? What are some of the special features of these indexes? What are the size & performance tradeoffs? How do I know which ones are appropriate for my application?
Fortunately, this talk aims to answer all of these questions as we explore the whole family of PostgreSQL indexes: B-tree, expression, GiST (of all flavors), GIN and how they are used in theory and practice.
It covers- Introduction to R language, Creating, Exploring data with Various Data Structures e.g. Vector, Array, Matrices, and Factors. Using Methods with examples.
This hands-on R course will guide users through a variety of programming functions in the open-source statistical software program, R. Topics covered include indexing, loops, conditional branching, S3 classes, and debugging. Full workshop materials available from http://projects.iq.harvard.edu/rtc/r-prog
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...Citus Data
As a developer using PostgreSQL one of the most important tasks you have to deal with is modeling the database schema for your application. In order to achieve a solid design, it’s important to understand how the schema is then going to be used as well as the trade-offs it involves.
As Fred Brooks said: “Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.”
In this talk we're going to see practical normalisation examples and their benefits, and also review some anti-patterns and their typical PostgreSQL solutions, including Denormalization techniques thanks to advanced Data Types.
As a developer using PostgreSQL one of the most important tasks you have to deal with is modeling the database schema for your application. In order to achieve a solid design, it’s important to understand how the schema is then going to be used as well as the trade-offs it involves.
As Fred Brooks said: “Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.”
In this talk we're going to see practical normalisation examples and their benefits, and also review some anti-patterns and their typical PostgreSQL solutions, including Denormalization techniques thanks to advanced Data Types.
January 2016 Meetup: Speeding up (big) data manipulation with data.table packageZurich_R_User_Group
Abstract: Both practitioners and researchers spend significant amount of their time on data preparation, cleaning and exploration. It gets more complicated and interesting if a dataset is big, or if it has a lot of groups in it which require per-group analysis. In this talk I will introduce an innovative data.table package as an alternative to the standard data.frame which significantly cuts your programming and execution time with easier code. It is also the first step to working with big data in R. The talk will be beneficial for R users from all disciplines, as well as for big data professionals looking for more explicit data exploration tools.
PostgreSQL comes built-in with a variety of indexes, some of which are further extensible to build powerful new indexing schemes. But what are all these index types? What are some of the special features of these indexes? What are the size & performance tradeoffs? How do I know which ones are appropriate for my application?
Fortunately, this talk aims to answer all of these questions as we explore the whole family of PostgreSQL indexes: B-tree, expression, GiST (of all flavors), GIN and how they are used in theory and practice.
It covers- Introduction to R language, Creating, Exploring data with Various Data Structures e.g. Vector, Array, Matrices, and Factors. Using Methods with examples.
This hands-on R course will guide users through a variety of programming functions in the open-source statistical software program, R. Topics covered include indexing, loops, conditional branching, S3 classes, and debugging. Full workshop materials available from http://projects.iq.harvard.edu/rtc/r-prog
Overview of a few ways to group and summarize data in R using sample airfare data from DOT/BTS's O&D Survey.
Starts with naive approach with subset() & loops, shows base R's tapply() & aggregate(), highlights doBy and plyr packages.
Presented at the March 2011 meeting of the Greater Boston useR Group.
Slides for a Machine Learning Course in R,
includes an introduction to R and several ML methods for classification, regression, clustering and dimensionality reduction.
Frequent Pattern Mining with Serialization and De-Serializationiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Introduction to Pandas and Time Series Analysis [PyCon DE]Alexander Hendorf
Most data is allocated to a period or to some point in time. We can gain a lot of insight by analyzing what happened when. The better the quality and accuracy of our data, the better our predictions can become.
Unfortunately the data we have to deal with is often aggregated for example on a monthly basis, but not all months are the same, they may have 28 days, 31 days, have four or five weekends,…. It’s made fit to our calendar that was made fit to deal with the earth surrounding the sun, not to please Data Scientists.
Dealing with periodical data can be a challenge. This talk will show to how you can deal with it with Pandas.
R is a programming language and environment commonly used in statistical computing, data analytics and scientific research.
It is one of the most popular languages used by statisticians, data analysts, researchers and marketers to retrieve, clean, analyze, visualize and present data.
Due to its expressive syntax and easy-to-use interface, it has grown in popularity in recent years.
1) Tree
2) General Tree
3) Binary Tree
4) Full Binay Tree, Complete Binay Tree
5) Binary Tree Traversal (DFS & BFS)
6) Binary Search Tree
7) Reconstruction of Binay Tree
8) Expression Tree
9) Evaluation of postfix expression
10) Infix to Prefix using stack
11) Infix to Postfix using stack
12) Threaded Binary Tree
13) AVL-Tree
14) AVL-Tree Rotation
Talk given at ClojureD conference, Berlin
Apache Spark is an engine for efficiently processing large amounts of data. We show how to apply the elegance of Clojure to Spark - fully exploiting the REPL and dynamic typing. There will be live coding using our gorillalabs/sparkling API.
In the presentation, we will of course introduce the core concepts of Spark, like resilient distributed data sets (RDD). And you will learn how the Spark concepts resembles those well-known from Clojure, like persistent data structures and functional programming.
Finally, we will provide some Do’s and Don’ts for you to kick off your Spark program based upon our experience.
About Paulus Esterhazy and Christian Betz
Being a LISP hacker for several years, and a Java-guy for some more, Chris turned to Clojure for production code in 2011. He’s been Project Lead, Software Architect, and VP Tech in the meantime, interested in AI and data-visualization.
Now, working on the heart of data driven marketing for Performance Media in Hamburg, he turned to Apache Spark for some Big Data jobs. Chris released the API-wrapper ‘chrisbetz/sparkling’ to fully exploit the power of his compute cluster.
Paulus Esterhazy
Paulus is a philosophy PhD turned software engineer with an interest in functional programming and a penchant for hammock-driven development.
He currently works as Senior Web Developer at Red Pineapple Media in Berlin.
Attached here is a presentation that I made covering some bits and pieces of what I got to discover about Data Science and Machine Learning using R Programming Language.
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Citus Data
As a developer using PostgreSQL one of the most important tasks you have to deal with is modeling the database schema for your application. In order to achieve a solid design, it’s important to understand how the schema is then going to be used as well as the trade-offs it involves.
As Fred Brooks said: “Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.”
In this talk we're going to see practical normalisation examples and their benefits, and also review some anti-patterns and their typical PostgreSQL solutions, including Denormalization techniques thanks to advanced Data Types.
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Citus Data
As a developer using PostgreSQL one of the most important tasks you have to deal with is modeling the database schema for your application. In order to achieve a solid design, it’s important to understand how the schema is then going to be used as well as the trade-offs it involves.
As Fred Brooks said: “Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.”
In this talk we're going to see practical normalisation examples and their benefits, and also review some anti-patterns and their typical PostgreSQL solutions, including Denormalization techniques thanks to advanced Data Types.
Overview of a few ways to group and summarize data in R using sample airfare data from DOT/BTS's O&D Survey.
Starts with naive approach with subset() & loops, shows base R's tapply() & aggregate(), highlights doBy and plyr packages.
Presented at the March 2011 meeting of the Greater Boston useR Group.
Slides for a Machine Learning Course in R,
includes an introduction to R and several ML methods for classification, regression, clustering and dimensionality reduction.
Frequent Pattern Mining with Serialization and De-Serializationiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Introduction to Pandas and Time Series Analysis [PyCon DE]Alexander Hendorf
Most data is allocated to a period or to some point in time. We can gain a lot of insight by analyzing what happened when. The better the quality and accuracy of our data, the better our predictions can become.
Unfortunately the data we have to deal with is often aggregated for example on a monthly basis, but not all months are the same, they may have 28 days, 31 days, have four or five weekends,…. It’s made fit to our calendar that was made fit to deal with the earth surrounding the sun, not to please Data Scientists.
Dealing with periodical data can be a challenge. This talk will show to how you can deal with it with Pandas.
R is a programming language and environment commonly used in statistical computing, data analytics and scientific research.
It is one of the most popular languages used by statisticians, data analysts, researchers and marketers to retrieve, clean, analyze, visualize and present data.
Due to its expressive syntax and easy-to-use interface, it has grown in popularity in recent years.
1) Tree
2) General Tree
3) Binary Tree
4) Full Binay Tree, Complete Binay Tree
5) Binary Tree Traversal (DFS & BFS)
6) Binary Search Tree
7) Reconstruction of Binay Tree
8) Expression Tree
9) Evaluation of postfix expression
10) Infix to Prefix using stack
11) Infix to Postfix using stack
12) Threaded Binary Tree
13) AVL-Tree
14) AVL-Tree Rotation
Talk given at ClojureD conference, Berlin
Apache Spark is an engine for efficiently processing large amounts of data. We show how to apply the elegance of Clojure to Spark - fully exploiting the REPL and dynamic typing. There will be live coding using our gorillalabs/sparkling API.
In the presentation, we will of course introduce the core concepts of Spark, like resilient distributed data sets (RDD). And you will learn how the Spark concepts resembles those well-known from Clojure, like persistent data structures and functional programming.
Finally, we will provide some Do’s and Don’ts for you to kick off your Spark program based upon our experience.
About Paulus Esterhazy and Christian Betz
Being a LISP hacker for several years, and a Java-guy for some more, Chris turned to Clojure for production code in 2011. He’s been Project Lead, Software Architect, and VP Tech in the meantime, interested in AI and data-visualization.
Now, working on the heart of data driven marketing for Performance Media in Hamburg, he turned to Apache Spark for some Big Data jobs. Chris released the API-wrapper ‘chrisbetz/sparkling’ to fully exploit the power of his compute cluster.
Paulus Esterhazy
Paulus is a philosophy PhD turned software engineer with an interest in functional programming and a penchant for hammock-driven development.
He currently works as Senior Web Developer at Red Pineapple Media in Berlin.
Attached here is a presentation that I made covering some bits and pieces of what I got to discover about Data Science and Machine Learning using R Programming Language.
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Citus Data
As a developer using PostgreSQL one of the most important tasks you have to deal with is modeling the database schema for your application. In order to achieve a solid design, it’s important to understand how the schema is then going to be used as well as the trade-offs it involves.
As Fred Brooks said: “Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.”
In this talk we're going to see practical normalisation examples and their benefits, and also review some anti-patterns and their typical PostgreSQL solutions, including Denormalization techniques thanks to advanced Data Types.
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Citus Data
As a developer using PostgreSQL one of the most important tasks you have to deal with is modeling the database schema for your application. In order to achieve a solid design, it’s important to understand how the schema is then going to be used as well as the trade-offs it involves.
As Fred Brooks said: “Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.”
In this talk we're going to see practical normalisation examples and their benefits, and also review some anti-patterns and their typical PostgreSQL solutions, including Denormalization techniques thanks to advanced Data Types.
Logical Data Fabric: Architectural ComponentsDenodo
Watch full webinar here: https://bit.ly/39MWm7L
Is the Logical Data Fabric one monolithic technology or does it comprise of various components? If so, what are they? In this presentation, Denodo CTO Alberto Pan will elucidate what components make up the logical data fabric.
Modern Oracle DBAs have spent years acquiring extremely valuable skills, even while facing increased responsibility for growing numbers of diverse multi-version databases, demands to transition to public cloud computing Infrastructure, and a never-ending drumbeat for upskilling and relevance in our industry. It’s the perfect time to consider a transition in your career by leveraging your expertise with the Oracle database in a new role as a Data Engineer (DE).
Data Con LA 2022 - Open Source Large Knowledge Graph FactoryData Con LA
Russell Jurney, Founder, Graphlet AI
The knowledge graph and graph database markets have long asked themselves: why aren't we larger? The vision of the semantic web was that many datasets could be cross-referenced between independent graph databases to map all knowledge on the web from myriad disparate datasets into one or more authoritative ontologies which could be accessed by writing SPARQL queries to work across knowledge graphs. The reality of dirty data made this vision impossible. Most time is spent cleaning data which isn't in the format you need to solve your business problems. Multiple datasets in different formats each have quirks. Deduplicate data using entity resolution is an unsolved problem for large graphs. Once you merge duplicate nodes and edges, you rarely have the edge types you need to make a problem easy to solve. It turns out the most likely type of edge in a knowledge graph that solves your problem easily is defined by the output of a Python program using the machine learning. For large graphs, this program needs to run on a horizontally scalable platform PySpark and extend rather than be isolated inside a graph databases. The quality of developer's experience is critical. In this talk I will review an approach to an Open Source Large Knowledge Graph Factory built on top of Spark that follows the ingest / build / refine / public / query model that open source big data is based upon.
A short study on telecom information models & offeringsSayak Majumder
This document takes a holistic & non-commercial view on some of the available Information Model offerings in tele-communications industry from Traditional Business Intelligence perspective and in context with the Tele Management Forums Information Architecture; SID (Shared Information Data Model, currently known as the Information Framework). This document does not serve as a business use case or a promotional use case for any particular offering. This is rather a knowledge artifact which provides some insight on the currently available solutions. The target audience of this document is the Information management team(s) for various Telecom Operators who deal with the information management solutions and information architectures for telecom service providers.
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...DATAVERSITY
Once developers have a knowledge management model - covered in our August webinar - they still have to deal with real world implementation constraints. Big data is a fact of life for most modern AI/cognitive computing apps, which usually means ingesting, sampling, or analyzing large data sets from disparate sources, ranging from IOT sensors to social media streams to news feeds and weather forecasts. Frequently, historical data in legacy systems will also be required to generate new insights.
This webinar will present a framework to help participants evaluate streaming data management tools, IOT technology stacks, and graph databases as support tools for their modern AI/cognitive computing projects. They will also learn about emerging open source projects and ecosystems that can help kick start their projects today.
Matthias Deeg - Bypassing an Enterprise-Grade Biometric Face Authentication S...hacktivity
Biometric authentication systems have long, checkered history in IT security and are regarded as a highly controversial technology. Many manufacturers and users love them because of their usability and the personal touch they give to human-computer interaction when it comes to an often annoying but necessary task like user authentication. Other people hate them because of data privacy and security concerns. Despite all the controversy, biometric authentication systems are still here and they seem to stay.
In fall 2017, SySS GmbH started a research project concerning the enterprise-grade face authentication system Microsoft Windows Hello Face Authentication based on near infrared technology.
In our talk, we will present the results of our research project concerning the enterprise-grade face authentication system Windows Hello Face Authentication by Microsoft based on near infrared and visible light and will demonstrate how different versions of it can be bypassed by rather simple means.
Come diventare data scientist - Si ringrazie per le slide Paolo Pellegrini, Senior Consultant presso P4I (Partners4Innovation) e referente di tutte le progettualità relative alle tematiche Data Science e Big Data Analytics. Owner del primo gruppo in Italia dedicato dai Data Scientist.
Similar to Data Modeling, Normalization and Denormalization | Nordic PGDay 2018 | Dimitri Fontaine (20)
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Citus Data
A story about powering a 1.5 petabyte internal analytics application at Microsoft with 2816 cores and 18.7 TB of memory in the Citus cluster.
The internal RQV analytics dashboard at Microsoft helps the Windows team to assess the quality of upcoming Windows releases. The system tracks 20,000 diagnostic and quality metrics, digests data from 800 million Windows devices and currently supports over 6 million queries per day, with hundreds of concurrent users. The RQV analytics dashboard relies on Postgres—along with the Citus extension to Postgres to scale out horizontally—and is deployed on Microsoft Azure.
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...Citus Data
When do you use jsonb, and when don’t you? How do you make it fast? What operators are available, and what can they do? How will this change? These are all very good questions, but jsonb support in Postgres moves so fast that it’s hard to keep up.
In this talk, you will get details on these topics, complete with practical examples and real-world stories:
- When to use jsonb, what it’s good for, and when to not use it
- Operators and how to use them effectively
- Indexing, operator support for indexes, and the tradeoffs involved
- Postgres 12 improvements and new features
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Citus Data
One of the strongest features of any database is its extensibility and PostgreSQL comes with a rich extension API. It allows you to define new functions, types, and operators. It even allows you to modify some of its core parts like planner, executor or storage engine. You read it right, you can even change the behavior of PostgreSQL planner. How cool is that?
Such freedom in extensibility created strong extension community around PostgreSQL and made way for a vast amount of extensions such as pg_stat_statements, citus, postgresql-hll and many more.
In this tutorial, we will look at how you can create your own PostgreSQL extension. We will start with more common stuff like defining new functions and types but gradually explore less known parts of the PostgreSQL's extension API like C level hooks which lets you change the behavior of planner, executor and other core parts of the PostgreSQL. We will see how to code, debug, compile and test our extension. After that, we will also look into how to package and distribute our extension for other people to use.
To get the best benefit from the tutorial, C and SQL knowledge would be beneficial. Some knowledge on PostgreSQL internals would also be useful but we will cover the necessary details, so it is not necessary.
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensCitus Data
Postgres is a powerful database, it continues to improve in terms of performance, extensibility, and more broadly in features. However it is not perfect.
Here I'll cover a highly opinionated view of all the areas Postgres falls flat, with some rough thought ideas on how we can make it better. Opinions are all informed by 10 years of interacting with customers running literally millions of databases for users.
When it all goes wrong | PGConf EU 2019 | Will LeinweberCitus Data
You're woken up in the middle of the night to your phone. Your app is down and you're on call to fix it. Eventually you track it down to "something with the db," but what exactly is wrong? And of course, you're sure that nothing changed recently…
Knowing what to fix, and even where to start looking, is a skill that takes a long time to develop. Especially since Postgres normally works very well for months at a time, not letting you get practice!
In this talk, I'll share not only the more common failure cases and how to fix them, but also a general approach to efficiently figuring out what's wrong in the first place.
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise GrandjoncCitus Data
SQL can seem like an obscure and complex but powerful language. Learning it can be intimidating. As a developer, we can easily be tempted using basic SQL provided by the ORM. But did you know that you can use window functions in some ORMs? Same goes for a lot of other fun SQL functionalities.
In this talk we will explore some advanced SQL features that you might find useful. We will discover the wonderful world of joins (lateral, cross…), subqueries, grouping sets, window functions, common table expressions.
But most importantly this talk is not only a talk to show you how great SQL is. This talk is here to show you how to use it in real life. What are the features supported by your ORM? And how can you use them if they don’t support them?
Wether you know SQL or not, whether you are a developer or a DBA working with developers, you might learn a lot about SQL, ORMs, and application development using Postgres.
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...Citus Data
Many people have asked us: “Why did Microsoft acquire Citus Data?” and “What do you plan to do with the Citus open source extension to Postgres?” Come join us to see the exciting work we are doing with Postgres and open source at Microsoft.
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff DavisCitus Data
Postgres relies heavily on an extension ecosystem, but that is almost 100% dependent on C; which cuts out developers, libraries, and ideas from the world of Postgres. postgres-extension.rs changes that by supporting development of extensions in Rust. Rust is a memory-safe language that integrates nicely in any environment, has powerful libraries, a vibrant ecosystem, and a prolific developer community.
Rust is a unique language because it supports high-level features but all the magic happens at compile-time, and the resulting code is not dependent on an intrusive or bulky runtime. That makes it ideal for integrating with postgres, which has a lot of its own runtime, like memory contexts and signal handlers. postgres-extension.rs offers this integration, allowing the development of extensions in rust, even if deeply-integrated into the postgres internals, and helping handle tricky issues like error handling. This is done through a collection of Rust function declarations, macros, and utility functions that allow rust code to call into postgres, and safely handle resulting errors.
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...Citus Data
I spent the early part of my career working on developer tools, operating systems, high-speed file systems, and scale-out storage. Not databases. Frankly, I always thought that databases were a bit boring. So almost 2 years in to my new job at a Postgres company, I continue to be amazed at the enthusiasm of the PostgreSQL developer community and users. I mean, people’s eyes light up when you ask them why they love Postgres. Sure, a lot of us get animated when talking about our newest gadget, or Ronaldo’s phenomenal free-kick goal in the World Cup, or mint chip gelato from La Strega Nocciola—but most platform software simply doesn’t trigger this kind of passion. So why does Postgres? Why is this open source database having such a “moment”? Well, I’ve been trying to understand, looking at this “Postgres moment” from a few different angles. In this talk I’ll share what I’ve observed to be the top 10 business, technology, and community reasons so many of you have so much affection for PostgreSQL.
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncCitus Data
Want to know everything about indexes in postgres? Here are the slides for a postgresql talk, and if you want to know more, you can read articles on www.louisemeta.com.
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...Citus Data
Many in today’s developer world look down on marketing. I mean, after all, the marketing team is usually “not technical.” And they’re not developers. It’s 2019 and while we try to promote inclusiveness of all types, inclusiveness doesn’t seem to apply to marketers. Why? Is that OK? Who does that hurt? I grew up in engineering and spent the first 15 years of my career as a developer or an engineering manager of some type. So now that I’m in marketing, it surprised me when one of my engineering colleagues blurted out “But it’s a technical conference!” when he learned one of my talks was accepted to a technical conference.
This keynote is about why developers really need marketing. About how good marketing managers can make it so visitors to your website don’t leave empty-handed, confused about what your technology actually does or why it matters. About how the ability to translate technology into what-users-actually-care-about can make your project be the one that takes off. About why Dormain Drewitz said at Monktoberfest: “I work in product marketing. My preferred programming language is English.” Finally, this talk explores how to be sensitive to the bias against marketing that pervades some of our teams—and how to instead embrace teamwork best practices employed by sailors, where everyone in the boat has an important role to play if you are to win the race.
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineCitus Data
PostgreSQL is the World’s Most Advanced Open Source Relational Database and by the end of this talk you will understand what that means for you, an application developer. What kind of problems PostgreSQL can solve for you, and how much you can rely on PostgreSQL in your daily activities, including unit-testing.
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...Citus Data
I’m a Postgres person. Period. After talking to many Rails developers about their application performance, I realized many performance issues can be solved by understanding your database a bit better. So I thought I’d share the statistics Postgres captures for you and how you can use them to find slow queries, un-used indexes, or tables which are not getting vacuumed correctly. This talk will cover Postgres tools and tips for the above, including pgstatstatements, useful catalog tables, and recently added Postgres features such as CREATE STATISTICS.
When it all goes wrong (with Postgres) | RailsConf 2019 | Will LeinweberCitus Data
You're woken up in the middle of the night to your phone. Your app is down and you're on call to fix it. Eventually you track it down to "something with the db," but what exactly is wrong? And of course, you're sure that nothing changed recently…
Knowing what to fix, and even where to start looking, is a skill that takes a long time to develop. Especially since Postgres normally works very well for months at a time, not letting you get practice!
In this talk, I'll share not only the more common failure cases and how to fix them, but also a general approach to efficiently figuring out what's wrong in the first place.
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineCitus Data
PostgreSQL is the World’s Most Advanced Open Source Relational Database and by the end of this talk you will understand what that means for you, an application developer. What kind of problems PostgreSQL can solve for you, and how much you can rely on PostgreSQL in your daily activities, including unit-testing.
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...Citus Data
Watch Sai Srirampur, Solutions Engineer at Citus Data (now part of the Microsoft family), give a live demo of how you can use Postgres and the Citus extension to Postgres to manage real-time analytics workloads.
View if you & your application need:
>> A relational database that scales for customer-facing analytics dashboards, with real-time data ingest and a large volume of queries
>> A way to scale out Postgres horizontally, to address the performance hiccups you’re experiencing as you run into the resource limits of single-node Postgres
>> A way to roll-up and pre-aggregate data to build fast data pipelines and enable sub-second response times.
>> A way to consolidate your database platforms, to avoid having separate stores for your transactional and analytics workloads
Using a 4-node Citus database cluster in the cloud, Sai will show you how Citus shards Postgres to give you lightning fast performance, at scale. Also featuring rollups.
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineCitus Data
Most of the time we see finished SQL queries, either in code repositories, blog posts of talk slides. This talk focus on the process of how to write an SQL query, from a problem statement expressed in English to code review and long term maintenance of SQL code.
When it all Goes Wrong |Nordic PGDay 2019 | Will LeinweberCitus Data
You're woken up in the middle of the night to your phone. Your app is down and you're on call to fix it. Eventually you track it down to "something with the db," but what exactly is wrong? And of course, you're sure that nothing changed recently…
Knowing what to fix, and even where to start looking, is a skill that takes a long time to develop. Especially since Postgres normally works very well for months at a time, not letting you get practice!
In this talk, I'll share not only the more common failure cases and how to fix them, but also a general approach to efficiently figuring out what's wrong in the first place.
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire GiordanoCitus Data
I spent the early part of my career working on developer tools, operating systems, high-speed file systems, and scale-out storage. Not databases. Frankly, I always thought that databases were a bit boring. So one year in to my new job at a Postgres company, I continue to be amazed at the enthusiasm of the PostgreSQL developer community and users. I mean, people’s eyes light up when you ask them why they love Postgres. Sure, a lot of us get animated when talking about our newest iPhone, or Ronaldo’s phenomenal free-kick goal in the World Cup, or mint chip gelato from La Strega Nociola—but most platform software simply doesn’t trigger this kind of passion. So why does Postgres? Why is this open source database having such a “moment”? Why now? Well, I’ve been trying to find out, looking at this “Postgres moment” from a few different angles. In this talk I’ll share what I’ve observed to be the top 10 business, technology, and community reasons so many of you have so much affection for PostgreSQL.
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...Citus Data
There are a number of data architectures you could use when building a multi-tenant app. Some, such as using one database per customer or one schema per customer. These two options scale to an extent when you have say 10s of tenants. However as you start scaling to hundreds and thousands of tenants, you start running into challenges both from performance and maintenance of tenants perspective. You could solve the above problem by adding the notion of tenancy directly into the logic of your SaaS application. How to implement/automate this in Django-ORM is a challenge? We will talk about how to make the django app tenant aware and at a broader level explain how scale out applications that are built on top of Django ORM and follow a multi tenant data model. We'd take postgresql as our database of choice and the logic/implementation can be extended to any other relational databases as well.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Data Modeling, Normalization and Denormalization | Nordic PGDay 2018 | Dimitri Fontaine
1. Data Modeling, Normalization and Denormalization
Nordic PgDay 2018, Oslo
Dimitri Fontaine
CitusData
March 13, 2018
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 1 / 49
2. Data Modeling, Normalization and Denormalization
Dimitri Fontaine
PostgreSQL Major Contributor
pgloader
CREATE EXTENSION
CREATE EVENT TRIGGER
Bi-Directional Réplication
apt.postgresql.org
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 2 / 49
3. Mastering PostgreSQL in Application Development
I wrote a book!
Mastering PostgreSQL in Application
Development teaches SQL to devel-
oppers: learn to replace thousands of
lines of code with simple queries.
http://MasteringPostgreSQL.com
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 3 / 49
4. Rob Pike, Notes on Programming in C
Rule 5. Data dominates.
If you’ve chosen the right data structures
and organized things well, the algorithms
will almost always be self-evident. Data
structures, not algorithms, are central to
programming. (Brooks p. 102.)
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 4 / 49
5. Database Anomalies
We normalize a database model so as to avoid Database Anomalies. We
also follow simple data structure design rules to make the data easy to
understand, maintain and query.
Database Anomalies
Update anomaly
Insertion anomaly
Deletion anomaly
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 5 / 49
9. Database Design and User Workflow
Show me your flowcharts and conceal your
tables, and I shall continue to be mystified.
Show me your tables, and I won’t usually
need your flowcharts; they’ll be obvious.
(Fred Brooks)
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 9 / 49
10. Database Modeling & Tooling
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 10 / 49
11. Tooling for Database Modeling
We can use psql and SQL scripts to edit database schemas:
BEGIN;
create schema if not exists sandbox;
create table sandbox.category
(
id serial primary key,
name text not null
);
insert into sandbox.category(name)
values ('sport'),('news'),('box office'),('music');
ROLLBACK;
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 11 / 49
12. Object Relational Mapping
The R in ORM stands for “Relation”. The result of a SQL query is a
relation. That’s what you should be mapping, not your base tables!
When mapping base tables, you end up trying to solve different complex
issues at the same time:
User Workflow
Consistent view of the whole world at all time
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 12 / 49
14. Basics of the Unix Philosophy: principles
Some design rules that apply to Unix and to database design too:
Rule of Clarity
Clarity is better than cleverness.
Rule of Simplicity
Design for simplicity; add complexity only where you must.
Rule of Transparency
Design for visibility to make inspection and debugging easier.
Rule of Robustness
Robustness is the child of transparency and simplicity.
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 14 / 49
15. Normal Forms
The Normal Forms are designed to avoid database anomalies, and they
help in following the listed rules seen before.
1st Normal Form, Codd, 1970
1 There are no duplicated rows in the table.
2 Each cell is single-valued (no repeating groups or arrays).
3 Entries in a column (field) are of the same kind.
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 15 / 49
16. Second Normal Form
2nd Normal Form, Codd, 1971
A table is in 2NF if it is in 1NF and if all non-key attributes are
dependent on all of the key. Since a partial dependency occurs when a
non-key attribute is dependent on only a part of the composite key, the
definition of 2NF is sometimes phrased as:
“A table is in 2NF if it is in 1NF and if it has no partial dependencies.”
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 16 / 49
17. Third Normal Form and Boyce-Codd Normal Form
3rd Normal Form (Codd, 1971)
and BCNF (Boyce and Codd, 1974)
3NF A table is in 3NF if it is in 2NF
and if it has no transitive
dependencies.
BCNF A table is in BCNF if it is in
3NF and if every determinant is a
candidate key.
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 17 / 49
18. More Normal Forms!
Each level builds on the previous one.
4NF A table is in 4NF if it is in BCNF and if it has no multi-valued
dependencies.
5NF A table is in 5NF, also called “Projection-join Normal Form”
(PJNF), if it is in 4NF and if every join dependency in the table is a
consequence of the candidate keys of the table.
DKNF A table is in DKNF if every constraint on the table is a
logical consequence of the definition of keys and domains.
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 18 / 49
19. Database Constraints
Primary Keys, Surrogate Keys, Foreign Keys, and more. . .
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 19 / 49
20. Primary Keys
First Normal Form requires no duplicated row. I know, let’s use a Primary
Key!
create table sandbox.article
(
id bigserial primary key,
category integer references sandbox.category(id),
pubdate timestamptz,
title text not null,
content text
);
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 20 / 49
21. Primary Keys, Surrogate Keys
Artificially generated key is named a surrogate key because it is a
substitute for natural key. A natural key would allow preventing
duplicate entries in our data set.
insert into sandbox.article (category, pubdate, title)
values (2, now(), 'Hot from the Press'),
(2, now(), 'Hot from the Press')
returning *;
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 21 / 49
22. Primary Keys, Surrogate Keys
Oops.
-[ RECORD 1 ]---------------------------
id | 3
category | 2
pubdate | 2018-03-12 15:15:02.384105+01
title | Hot from the Press
content |
-[ RECORD 2 ]---------------------------
id | 4
category | 2
pubdate | 2018-03-12 15:15:02.384105+01
title | Hot from the Press
content |
INSERT 0 2
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 22 / 49
23. Primary Keys, Surrogate Keys
Fixing the model is easy enough: implement a natural primary key.
create table sandboxpk.article
(
category integer references sandbox.category(id),
pubdate timestamptz,
title text not null,
content text,
primary key(category, pubdate, title)
);
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 23 / 49
24. Primary Keys, Foreign Keys
Now we have to reference the whole natural key everywhere:
create table sandboxpk.comment
(
a_category integer not null,
a_pubdate timestamptz not null,
a_title text not null,
pubdate timestamptz,
content text,
primary key(a_category, a_pubdate, a_title, pubdate, content),
foreign key(a_category, a_pubdate, a_title)
references sandboxpk.article(category, pubdate, title)
);
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 24 / 49
25. Primary Keys, Foreign Keys
One solution is to have both a surrogate and a natural key:
create table sandbox.article
(
id integer generated always as identity,
category integer not null references sandbox.category(id)
pubdate timestamptz not null,
title text not null,
content text,
primary key(category, pubdate, title),
unique(id)
);
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 25 / 49
26. Normalization Helpers: database constraints
To help you implement Normal Forms with strong guarantees even when
having to deal with concurrent access to the database, we have
constraints.
Primary Keys
Foreign Keys
Not Null
Check Constraints
Domains
Exclusion Constraints
1 create table rates
2 (
3 currency text,
4 validity daterange,
5 rate numeric,
6
7 exclude using gist
8 (
9 currency with =,
10 validity with &&
11 )
12 );
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 26 / 49
28. Denormalization
The first rule
of denormalization is that you
don’t
do denormalization.
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 28 / 49
29. Denormalization is an optimization technique
Programmers waste enormous amounts of time thinking about, or
worrying about, the speed of noncritical parts of their programs,
and these attempts at efficiency actually have a strong negative
impact when debugging and maintenance are considered. We
should forget about small efficiencies, say about 97% of the time:
premature optimization is the
root of all evil. Yet we should not pass up our
opportunities in that critical 3%.
Donald Knuth, "Structured Programming with Goto Statements".
Computing Surveys 6:4 (December 1974), pp. 261–301, §1.
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 29 / 49
30. Denormalization: cache data
The main trick: repeat data to make it locally available, breaking
functional dependency rules. You know have a cache.
Implement Cache Invalidation.
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 30 / 49
31. Denormalization example
set season 2017
select drivers.surname as driver,
constructors.name as constructor,
sum(points) as points
from results
join races using(raceid)
join drivers using(driverid)
join constructors using(constructorid)
where races.year = :season
group by grouping sets(drivers.surname, constructors.name)
having sum(points) > 150
order by drivers.surname is not null, points desc;
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 31 / 49
32. Denormalization example
create view v.season_points as
select year as season, driver, constructor, points
from seasons left join lateral
(
select drivers.surname as driver,
constructors.name as constructor,
sum(points) as points
from results
join races using(raceid)
join drivers using(driverid)
join constructors using(constructorid)
where races.year = seasons.year
group by grouping sets(drivers.surname, constructors.name
order by drivers.surname is not null, points desc
)
as points on true
order by year, driver is null, points desc;
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 32 / 49
33. Denormalization example
And now cache the results of the view into a durable relation:
create materialized view cache.season_points as
select * from v.season_points;
create index on cache.season_points(season);
When you need to invalidate the cache, just refresh the view:
refresh materialized view cache.season_points;
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 33 / 49
34. Denormalization example
And now rewrite your application’s query as:
select driver, constructor, points
from cache.season_points
where season = 2017
and points > 150;
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 34 / 49
35. Other denormalization use cases
Audit Trails
History Tables
Partitionning
Scaling Out
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 35 / 49
36. History tables and audit trails
Another case where you might have to denormalize your database model is
when keeping a history of all changes.
Foreign key references to other tables won’t be possible when those
reference changes and you want to keep a history that, by definition,
doesn’t change.
The schema of your main table evolves and the history table shouldn’t
rewrite the history for rows already written.
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 36 / 49
37. History tables with JSONB
JSONB is very flexible, and can host the archives for all your database
model versions in the same table, or for all your source tables at once even.
create schema if not exists archive;
create type archive.action_t
as enum('insert', 'update', 'delete');
create table archive.older_versions
(
table_name text,
date timestamptz default now(),
action archive.action_t,
data jsonb
);
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 37 / 49
38. Validity periods
A variant of the historic requirement is to keep track of data changes and
be able to use the value that were valid at a known time. Currency
exchange rates applied to invoices is an example:
create table rates
(
currency text,
validity daterange,
rate numeric,
exclude using gist (currency with =,
validity with &&)
);
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 38 / 49
39. Validity periods
Here’s how to use the data from a known time in the past:
select currency, validity, rate
from rates
where currency = 'Euro'
and validity @> date '2017-05-18';
-[ RECORD 1 ]---------------------
currency | Euro
validity | [2017-05-18,2017-05-19)
rate | 1.240740
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 39 / 49
40. Denormalization Helpers: advanced datatypes
Composite datatypes help with denormalization. It’s possible to keep
several values in the same column thanks to them. Spare matrix becomes
an extra field of jsonb type.
Composite Types
Arrays
JSONb
Enumerated Types
hstore
ltree
intarray
pg_trgm
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 40 / 49
41. Partitioning
Partitioning comes with demormalization trade-offs in PostgreSQL 10:
Index are managed at the partition level
No Primary Key, No Unique Index, No Exclusion Constraint
No Foreign Key pointing to a partitionned table
Lack of ON CONFLICT support
Lack of UPDATE support for re-balancing
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 41 / 49
42. Not Only SQL
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 42 / 49
43. Schemaless design
PostgreSQL includes several composite types (multi-value data). JSONb
allows the implementation of schemaless design right within PostgreSQL.
select jsonb_pretty(data)
from magic.cards
where data @> '{"type":"Enchantment",
"artist":"Jim Murray",
"colors":["White"]
}';
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 43 / 49
44. NoSQL and Durability Trade-Offs
PostgreSQL setup is made with GUC, or Great Unified Configuration. You
can edit values in the postgresql.conf file, or dynamically change it in
the session. Or in the transaction with SET LOCAL. Or have per-user or
per-database settings.
create role dbowner with login;
create role app with login;
create role critical with login in role app inherit;
create role notsomuch with login in role app inherit;
create role dontcare with login in role app inherit;
alter user critical set synchronous_commit to remote_apply;
alter user notsomuch set synchronous_commit to local;
alter user dontcare set synchronous_commit to off;
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 44 / 49
45. Automatic Per-Transaction Durability Setting
SET demo.threshold TO 1000;
CREATE OR REPLACE FUNCTION public.syncrep_important_delta()
RETURNS TRIGGER
LANGUAGE PLpgSQL
AS
$$ DECLARE
threshold integer := current_setting('demo.threshold')::int;
delta integer := NEW.abalance - OLD.abalance;
BEGIN
IF delta > threshold
THEN
SET LOCAL synchronous_commit TO on;
END IF;
RETURN NEW;
END;
$$;
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 45 / 49
47. Five sharding data models and which is right?
If you were here this morning you’ve seen Craig’s talk, so that’s about it.
Sharding by geography
Sharding by entity id
Sharding a graph
Time partitioning
Depends. . .
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 47 / 49
48. Denormalization and Sharding
Adding the sharding key to every
table is another case of duplicating
information for maintaining a
cache. Beware of database
anomalies
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 48 / 49
49. Questions?
Now is the time to ask!
https://2018.nordicpgday.org/feedback
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018 49 / 49