What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf EU 2019 | Utku Azman

•

0 likes•399 views

Many people have asked us: “Why did Microsoft acquire Citus Data?” and “What do you plan to do with the Citus open source extension to Postgres?” Come join us to see the exciting work we are doing with Postgres and open source at Microsoft.

PostgreSQL is more popular than everMicrosoft | 100:17:7

17,000,000
Partners
7,000,000,000
People
100,000
Employees
PostgreSQL is more popular than everMicrosoft | 100:17:7

Community fueled momentum
30+ years
of active
development
2 years in a row
DBMS of the year
(ranked by db-
engines)
Unmatched
community support
Fastest growing
DB by popularity
https://db-engines.com/en/ranking_trend/system/PostgreSQL
Most
extensible
database
Ecosystem

Azure Database for
PostgreSQL is
fully-managed, community
PostgreSQL
Global
reach
Security
Scale up
& out
Built-in HA
Compliance
Intelligent
performance
Easy ecosystem
integration
Extension
support
Extensions
JSONB
Full text
search
Geospatial
support
Rich
indexing

Performance
 Postgres 10,11
 Query Store
 Query Performance Insights
 Performance recommendations
 Restart option
 Dblink
 AKS pgBouncer sidecar
 Read replicas in same region
 Read replicas cross region
Scalability
 Scale across general purpose, memory optimized
 16 TB storage
 Resource move
 New SKUs: 64 vcore general purpose, 32 vcore memory optimized
 Hyperscale (Citus)
Intelligence
 Geo-restore
 Query Store
 Query Performance Insights
 Performance recommendations
 HypoPG
 plv8
 Resource Health Check
Security
 VNet Service endpoints
 Advanced threat protection
 Compliance and Certifications
 pgaudit
Available in all public cloud, China, and government clouds
Twelve months of
rapid progress

And more, including:
To the
community
With the
community
PostgreSQL contributions

INSTALLS (last 30 days)
85K+
QUERIES RUN (last 30 days)
260K+

Using data and analytics to
improve Windows customer
experience

Azure Database for PostgreSQL: Aka.ms/azure-postgres
Citus Community GitHub:
Blog Aka.ms/azure-postgres-blog

1-day, single track, community PostgreSQL conference
San Francisco, CA
CFP opening next week!
JANUARY 21, 2020

A story about powering a 1.5 petabyte internal analytics application at Microsoft with 2816 cores and 18.7 TB of memory in the Citus cluster. The internal RQV analytics dashboard at Microsoft helps the Windows team to assess the quality of upcoming Windows releases. The system tracks 20,000 diagnostic and quality metrics, digests data from 800 million Windows devices and currently supports over 6 million queries per day, with hundreds of concurrent users. The RQV analytics dashboard relies on Postgres—along with the Citus extension to Postgres to scale out horizontally—and is deployed on Microsoft Azure.

Building Data Applications with Apache Druid

Imply

One of the most popular use cases for Apache Druid is building data applications. Data applications exist to deliver data into the hands of everyone on a team in a business, and are used by these teams to make faster, better decisions. To fulfill this role, they need to support granular drill down, because the devil is in the details, but also be extremely fast, because otherwise people won't use them! In this talk, Gian Merlino will cover: *The unique technical challenges of powering data-driven applications *What attributes of Druid make it a good platform for data applications *Some real-world data applications powered by Druid

Azure Hd insigth news

nnakasone

Redis Streams plus Spark Structured Streaming

Dave Nielsen

Continuous applications have 3 things in common: They collect data from sources (ex: IoT devices), process them in real-time (example: ETL), and deliver them to machine learning serving layer for decision making. Continuous applications face many challenges as they grow to production. Often, due to the rapid increase in the number of devices or end-users or other data sources, the size of their data set grows exponentially. This results in a backlog of data to be processed. The data will no longer be processed in near-real-time. Redis Streams enables you to collect both binary and text data in the time series format. The consumer groups of Redis Stream help you match the data processing rate of your continuous application with the rate of data arrival from various sources. Apache Spark’s Structured Streaming API enables real-time decision making for Continuous Applications. In this session, Dave will perform a live demonstration of how to integrate open source Redis with Apache Spark’s Structured Streaming API using Spark-Redis library. I will also walk through the code and run a live continuous application.

HDInsight Informative articles

Karan Gulati

Azure SQL Data Warehouse for beginners

Michaela Murray

Slides from QSSUG Aug 2017 by David Alzamendi: When on-premise, Data Warehouses are not the only option, many questions arise surrounding Azure SQL Data Warehouse. In this session, David will cover the fundamentals of using Azure SQL Data Warehouse from a beginner's perspective. He'll discuss the benefits, demystify the pricing measurements and explain the difference between Azure SQL Database and Big Data. By the end of this session, you will know how to deploy this service in just a few minutes using some of the latest techniques like extracting data from Azure data lakes and accessing Azure blob storage through PolyBase.

Redis accelerates Apache Spark execution by 45 times, when used as a shared distributed in-memory datastore for Spark in analyses like time series data range queries. With the redis module for machine learning, redis-ml, implementation of spark-ml models gains a new real time serving layer that offloads processing of models directly in Redis, allows multiple applications to reuse the same models and speeds up classification and execution of these models by 13x. Join this session to learn more about the Redis Labs’ connector for Apache Spark that enhances production implementations of real-time big data processing.

Webinar : Nouveautés de MongoDB 3.2

MongoDB

De nouvelles générations de technologies de bases de données permettent aux organisations de créer des applications jusque-là inédites, à une vitesse et une échelle inimaginables auparavant. MongoDB est la base de données qui connaît la croissance la plus rapide au monde. La nouvelle version 3.2 offre les avantages des architectures de bases de données modernes à une gamme toujours plus large d'applications et d'utilisateurs.

The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)

Ontico

Database sharding involves spreading database contents across multiple servers, with each server holding only part of the database. While it is possible to vertically scale Postgres, and to scale read-only workloads across multiple servers, only sharding allows multi-server read-write scaling. This presentation will cover the advantages of sharding and future Postgres sharding implementation requirements, including foreign data wrapper enhancements, parallelism, and global snapshot and transaction control. This is a followup to my Postgres Scaling Opportunities presentation.

Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...

Databricks

When you run an Apache Spark application on a large cluster, you want to make sure you’re getting the most from that cluster. Any CPU or memory left on the table represents either a waste of money or a lost opportunity to speed up your Spark jobs. What many people don’t realize is how sensitive Spark cluster utilization is to the resource manager. Resource managers decide how to allocate cluster resources among the many users and applications contending for them. In this deep dive session, we will discuss how Spark integrates with two common open source resource managers, YARN and Mesos, as well as a new commercial product called IBM Spectrum Conductor with Spark. You will learn how resource managers arbitrate resources in multi-user/multi-tenant Spark clusters, and how this affects application performance. You will come away with new techniques for tuning Spark resource management to optimize goals like speed and fairness. The session will include a demo of a new open source benchmark designed to help analyse Spark multi-user/multi-tenant performance. The benchmark uses Spark SQL and machine learning jobs to load the cluster, and can be used during a pre-production cycle to tune Spark and resource manager configurations.

NoSQL benchmarking

Prasoon Kumar

Splunk: Druid on Kubernetes with Druid-operator

Imply

We went through the journey of deploying Apache Druid clusters on Kubernetes(K8s) and created a druid-operator (https://github.com/druid-io/druid-operator). This talk introduces the druid kubernetes operator, how to use it to deploy druid clusters and how it works under the hood. We will share how we use this operator to deploy Druid clusters at Splunk. Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. Druid is a complex stateful distributed system and a Druid cluster consists of multiple web services such as Broker, Historical, Coordinator, Overlord, MiddleManager etc each deployed with multiple replicas. Deploying a single web service on K8s requires creating few K8s resources via YAML files and it multiplies due to multiple services inside of a Druid cluster. Now doing it for multiple Druid clusters (dev, staging, production environments) makes it even more tedious and error prone. K8s enables creation of application (such as Druid) specific extension, called “Operator”, that combines kubernetes and application specific knowledge into a reusable K8s extension that makes deploying complex applications simple.

What's new in MongoDB 2.6

Matias Cascallares

Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole

Vasu S

Microsoft Azure Data Warehouse Overview

Justin Munsters

Real-Time Analytics in Transactional Applications by Brian Bulkowski

Data Con LA

Abstract:- BI and analytics are at the top of corporate agendas. Competition is intense, and, more than ever, organizations require fast access to insights about their customers, markets, and internal operations to make better decisionsäóîoften, in real time. Enterprises face challenges powering real-time business analytics and systems of engagement (SOEs). Analytic applications and SOEs need to be fast and consistent, but traditional database approaches, including RDBMS and first-generation NoSQL solutions, can be complex, a challenge to maintain, and costly. Companies should aim to simplify traditional systems and architectures while also reducing vendors. One way to do this is by embracing an emerging hybrid memory architecture, which removes an entire caching layer from your front-end application. This talk discusses real-world examples of implementing this pattern to improve application agility and reduce operational database spend.

SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes

Arnon Shimoni

This talk will present SQream’s journey to building an analytics data warehouse powered by GPUs. SQream DB is an SQL data warehouse designed for larger than main-memory datasets (up to petabytes). It’s an on-disk database that combines novel ideas and algorithms to rapidly analyze trillions of rows with the help of high-throughput GPUs. We will explore some of SQream’s ideas and approaches to developing its analytics database – from simple prototype and tech demos, to a fully functional data warehouse product containing the most important features for enterprise deployment. We will also describe the challenges of working with exotic hardware like GPUs, and what choices had to be made in order to combine the CPU and GPU capabilities to achieve industry-leading performance – complete with real world use case comparisons. As part of this discussion, we will also share some of the real issues that were discovered, and the engineering decisions that led to the creation of SQream DB’s high-speed columnar storage engine, designed specifically to take advantage of streaming architectures like GPUs.

Manage Microservices & Fast Data Systems on One Platform w/ DC/OS

Mesosphere Inc.

The application landscape inside our data center is changing: Along with the trend of moving toward microservices and containers, there are a number of new distributed data processing frameworks such as Kafka or Cassandra being released on a weekly basis. These changes have implications for the ways we think about infrastructure. With the growing need for computing power and the rise of distributed applications comes the need for a reliable and simple-use cluster manager and programming abstraction. In this presentation, Mesosphere explains how to use DC/OS to manage microservices and fast data systems on a single platform. We will look at how container orchestration, including resource management and service management, can be streamlined to process fast data in a matter of seconds, allowing for predictive user interfaces, product recommendations, and billing charge back, among other modern app components.

(CMP202) Engineering Simulation and Analysis in the Cloud

Amazon Web Services

"Building great products, ones that are aesthetically appealing as well as functionally sound, requires cutting-edge design and engineering. Given the high cost of physical testing prototypes, engineering organizations are turning to simulation and analysis using digital models, but compute requirements for these have traditionally required expensive on-premises infrastructure. But now, engineering organizations can use high-performance computing services from AWS and solutions from AWS technology partners to innovate at scale globally, with no up-front capital infrastructure investment. In this session, AWS Partner Ansys shares how they help customers of all sizes design and engineer better products through digital simulation and analysis using HPC on AWS."

Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...

What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf EU 2019 | Utku Azman

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf EU 2019 | Utku Azman

Similar to What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf EU 2019 | Utku Azman (20)

More from Citus Data

More from Citus Data (20)

Recently uploaded

Recently uploaded (20)

What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf EU 2019 | Utku Azman