Skype uses PostgreSQL for its databases and has over 100 database servers handling over 10,000 transactions per second across 200 databases storing billions of records. The databases are split both vertically and horizontally to scale with load. All database access is through stored procedures for security and flexibility. Replication and remote procedure calls connect the databases. Tools like pgBouncer, PL/Proxy, PgQ, and Londiste help partition, connect, and load balance the databases to form a large and complex but manageable distributed database architecture.
The database world is undergoing a major upheaval. NoSQL databases such as MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offering significantly better scalability and performance. But these databases have a very different and unfamiliar data model and APIs as well as a limited transaction model. Moreover, the relational world is fighting back with so-called NewSQL databases such as VoltDB, which by using a radically different architecture offers high scalability and performance as well as the familiar relational model and ACID transactions. Sounds great but unlike the traditional relational database you can't use JDBC and must partition your data.
In this presentation you will learn about popular NoSQL databases - MongoDB, and Cassandra - as well at VoltDB. We will compare and contrast each database's data model and Java API using NoSQL and NewSQL versions of a use case from the book POJOs in Action. We will learn about the benefits and drawbacks of using NoSQL and NewSQL databases.
Postgres-XC as a Key Value Store Compared To MongoDBMason Sharp
This presentation discusses how Postgres-XC can be used as a PostgreSQL-based key-value store using features like hstore and JSON. It also compares performance to MongoDB for a read workload
The database world is undergoing a major upheaval. NoSQL databases such as MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offering significantly better scalability and performance. But these databases have a very different and unfamiliar data model and APIs as well as a limited transaction model. Moreover, the relational world is fighting back with so-called NewSQL databases such as VoltDB, which by using a radically different architecture offers high scalability and performance as well as the familiar relational model and ACID transactions. Sounds great but unlike the traditional relational database you can't use JDBC and must partition your data.
In this presentation you will learn about popular NoSQL databases - MongoDB, and Cassandra - as well at VoltDB. We will compare and contrast each database's data model and Java API using NoSQL and NewSQL versions of a use case from the book POJOs in Action. We will learn about the benefits and drawbacks of using NoSQL and NewSQL databases.
Postgres-XC as a Key Value Store Compared To MongoDBMason Sharp
This presentation discusses how Postgres-XC can be used as a PostgreSQL-based key-value store using features like hstore and JSON. It also compares performance to MongoDB for a read workload
HBaseCon 2012 | HBase, the Use Case in eBay Cassini Cloudera, Inc.
eBay marketplace has been working hard on the next generation search infrastructure and software system, code-named Cassini. The new search engine processes over 250 million search queries and serves more than 2 billion page views each day. Its indexing platform is based on Apache Hadoop and Apache HBase. Apache HBase is a distributed persistent layer built on Hadoop to support billions of updates per day. Its easy sharding character, fast writes, and table scans, super fast data bulk load, and natural integration to Hadoop provide the cornerstones for successful continuous index builds. We will share with the audience the technical details and share the difficulties and challenges that we’ve gone through and that we are still facing in the process.
The Google Chubby lock service for loosely-coupled distributed systemsRomain Jacotin
The Google Chubby lock service presented in 2006 is the inspiration for Apache ZooKeeper: let's take a deep dive into Chubby to better understand ZooKeeper and distributed consensus.
An Elastic Metadata Store for eBay’s Media PlatformMongoDB
In order to build a robust, multi-tenant, highly available storage services that meet the business’ SLA your databases has to be sharded. But if your service has to scale continuously through the incremental additions of storage without service interruption or human intervention, basic static sharding is not enough. At eBay, we are building MStore to solve this problem, with MongoDB as the storage engine. In this presentation, we will dive into the key design concepts of this solution.
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB
How do you determine whether your MongoDB Atlas cluster is over provisioned, whether the new feature in your next application release will crush your cluster, or when to increase cluster size based upon planned usage growth? MongoDB Atlas provides over a hundred metrics enabling visibility into the inner workings of MongoDB performance, but how do apply all this information to make capacity planning decisions? This presentation will enable you to effectively analyze your MongoDB performance to optimize your MongoDB Atlas spend and ensure smooth application operation into the future.
NewSQL overview:
- History of RDBMs
- The reasons why NoSQL concept appeared
- Why NoSQL was not enough, the necessity of NewSQL
- Characteristics of NewSQL
- 7 DBs that belongs to NewSQL
- Overview Table with main properties
MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...MongoDB
As a software adventurer, Charles “Indy” Sarrazin, has brought numerous customers through the MongoDB world, using his extensive knowledge to make sure they always got the most out of their databases.
Let us embark on a journey inside the Document Model, where we will identify, analyze and fix anti-patterns. I will also provide you with tools to ease migration strategies towards the Temple of Lost Performance!
Be warned, though! You might want to learn about design patterns before, in order to survive this exhilarating trial!
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
What kind of dummy would volunteer to do a presentation on a product he hasn't even tried before? Perhaps the kind that has three weeks off from work in Dec./Jan. Or, perhaps the kind that hopes others will join him in this radical experiment. I'm very interested in learning more about djatoka so propose to share what I learn over the next two months in a twenty minute presentation. -- Kevin S. Clarke, Appalachian State University; John Fereira, Cornell University
HBaseCon 2012 | HBase, the Use Case in eBay Cassini Cloudera, Inc.
eBay marketplace has been working hard on the next generation search infrastructure and software system, code-named Cassini. The new search engine processes over 250 million search queries and serves more than 2 billion page views each day. Its indexing platform is based on Apache Hadoop and Apache HBase. Apache HBase is a distributed persistent layer built on Hadoop to support billions of updates per day. Its easy sharding character, fast writes, and table scans, super fast data bulk load, and natural integration to Hadoop provide the cornerstones for successful continuous index builds. We will share with the audience the technical details and share the difficulties and challenges that we’ve gone through and that we are still facing in the process.
The Google Chubby lock service for loosely-coupled distributed systemsRomain Jacotin
The Google Chubby lock service presented in 2006 is the inspiration for Apache ZooKeeper: let's take a deep dive into Chubby to better understand ZooKeeper and distributed consensus.
An Elastic Metadata Store for eBay’s Media PlatformMongoDB
In order to build a robust, multi-tenant, highly available storage services that meet the business’ SLA your databases has to be sharded. But if your service has to scale continuously through the incremental additions of storage without service interruption or human intervention, basic static sharding is not enough. At eBay, we are building MStore to solve this problem, with MongoDB as the storage engine. In this presentation, we will dive into the key design concepts of this solution.
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB
How do you determine whether your MongoDB Atlas cluster is over provisioned, whether the new feature in your next application release will crush your cluster, or when to increase cluster size based upon planned usage growth? MongoDB Atlas provides over a hundred metrics enabling visibility into the inner workings of MongoDB performance, but how do apply all this information to make capacity planning decisions? This presentation will enable you to effectively analyze your MongoDB performance to optimize your MongoDB Atlas spend and ensure smooth application operation into the future.
NewSQL overview:
- History of RDBMs
- The reasons why NoSQL concept appeared
- Why NoSQL was not enough, the necessity of NewSQL
- Characteristics of NewSQL
- 7 DBs that belongs to NewSQL
- Overview Table with main properties
MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...MongoDB
As a software adventurer, Charles “Indy” Sarrazin, has brought numerous customers through the MongoDB world, using his extensive knowledge to make sure they always got the most out of their databases.
Let us embark on a journey inside the Document Model, where we will identify, analyze and fix anti-patterns. I will also provide you with tools to ease migration strategies towards the Temple of Lost Performance!
Be warned, though! You might want to learn about design patterns before, in order to survive this exhilarating trial!
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
What kind of dummy would volunteer to do a presentation on a product he hasn't even tried before? Perhaps the kind that has three weeks off from work in Dec./Jan. Or, perhaps the kind that hopes others will join him in this radical experiment. I'm very interested in learning more about djatoka so propose to share what I learn over the next two months in a twenty minute presentation. -- Kevin S. Clarke, Appalachian State University; John Fereira, Cornell University
Josh Berkus
You've heard that PostgreSQL is the highest-performance transactional open source database, but you're not seeing it on YOUR server. In fact, your PostgreSQL application is kind of poky. What should you do? While doing advanced performance engineering for really high-end systems takes years to learn, you can learn the basics to solve performance issues for 80% of PostgreSQL installations in less than an hour. In this session, you will learn: -- The parts of database application performance -- The performance setup procedure -- Basic troubleshooting tools -- The 13 postgresql.conf settings you need to know -- Where to look for more information.
Slides from Secon'2015 - Software Developers Conference. Penza, Russia.
The database is an essential element of any project. The database must be stable and provide good performance. If you plan to use PostgreSQL in your project, you will run into question the choice of operating system. Linux is one of the most popular operating system today. The combination of flexibility and stability makes Linux a good candidate as a platform for PostgreSQL. However, the default settings are suitable for a wide range of workloads. In this report, I will talk about what settings should pay attention and how they affect the performance of PostgreSQL. Which of these settings are more important, and what - no. How do the PostgreSQL more predictable and stable under normal circumstances or in cases of increasing load.
Neutron Done the SDN Way
Dragonflow is an open source distributed control plane implementation of Neutron which is an integral part of OpenStack. Dragonflow introduces innovative solutions and features to implement networking and distributed network services in a manner that is both lightweight and simple to extend, yet targeted towards performance-intensive and latency-sensitive applications. Dragonflow aims at solving the performance
LuSql: (Quickly and easily) Getting your data from your DBMS into Luceneeby
Need to move your data from your DBMS to Lucene? The recently released LuSql allows you to do this in a single line. LuSql is a high performance, low use barrier application for getting DBMS data into Lucene. This presentation will introduce LuSql, and give a brief tutorial on simple to crazy complicated use cases, including per document sub-queries and out-of-band document transformations. -- Glen Newton, CISTI, National Research Council
We have seen tremendous growth in near real-time ("nearline") processing at LinkedIn in recent years. LinkedIn now uses Apache Samza to process well over a Trillion messages every day across thousands of applications. Apache Samza serves as the foundation for several application platforms at LinkedIn, spanning a wide variety of use cases like security, notifications, machine learning, monitoring, search, and more. In this talk we will explore various features of Apache Samza that provide the flexibility and scalability to we need to power stream processing at massive scale.
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
The traditional lambda architecture has been a popular solution for joining offline batch operations with real time operations. This setup incurs a lot of developer and operational overhead since it involves maintaining code that produces the same result in two, potentially different distributed systems. In order to alleviate these problems, we need a unified framework for processing and building data pipelines across batch and stream data sources.
Based on our experiences running and developing Apache Samza at LinkedIn, we have enhanced the framework to support: a) Pluggable data sources and sinks; b) A deployment model supporting different execution environments such as Yarn or VMs; c) A unified processing API for developers to work seamlessly with batch and stream data. In this talk, we will cover how these design choices in Apache Samza help tackle the overhead of lambda architecture. We will use some real production use-cases to elaborate how LinkedIn leverages Apache Samza to build unified data processing pipelines.
Speaker
Navina Ramesh, Sr. Software Engineer, LinkedIn
Putting rails and couch db on the cloud - Indicthreads cloud computing confe...IndicThreads
Session presented at the 2nd IndicThreads.com Conference on Cloud Computing held in Pune, India on 3-4 June 2011.
http://CloudComputing.IndicThreads.com
Session Abstract: Apache CouchDB is a document-oriented NoSQL database that can be queried and indexed in a MapReduce fashion using JavaScript. CouchDB offers an easy way to get introduced to the world of NoSQL.In this session we will learn how to work with CouchDB, how to install it over an Amazon EC2 instance and how to insert and query data on it. We will then create a Ruby on Rails application, host it on the cloud through Heroku and integrate it with our CouchDB.
After this session, the audience will be able to work with CouchDB, understand it’s strengths and work with it over an EC2 instance. The audience will also be able to appreciate the ease of hosting Rails application with Heroku and how quickly one can launch and scale applications over the cloud with the combination of these two technologies.
Speaker:
Rocky Jaiswal is Software Architect at McKinsey & Company and has more than 8 years of experience in software analysis, design and programming. His primary area of expertise is application development using Java/JEE/Spring & Hibernate. He has worked as a consultant for major investment banks like Goldman Sachs and Morgan Stanley. He has extensive international experience and has worked in the UK, USA, Netherlands, Japan and Mexico. Rocky is a strong believer in Agile methodologies for software development particularly Scrum and XP.
Intro to big data analytics using microsoft machine learning server with sparkAlex Zeltov
Alex Zeltov - Intro to Big Data Analytics using Microsoft Machine Learning Server with Spark
By combining enterprise-scale R analytics software with the power of Apache Hadoop and Apache Spark, Microsoft R Server for HDP or HDInsight gives you the scale and performance you need. Multi-threaded math libraries and transparent parallelization in R Server handle up to 1000x more data and up to 50x faster speeds than open-source R, which helps you to train more accurate models for better predictions. R Server works with the open-source R language, so all of your R scripts run without changes.
Microsoft Machine Learning Server is your flexible enterprise platform for analyzing data at scale, building intelligent apps, and discovering valuable insights across your business with full support for Python and R. Machine Learning Server meets the needs of all constituents of the process – from data engineers and data scientists to line-of-business programmers and IT professionals. It offers a choice of languages and features algorithmic innovation that brings the best of open source and proprietary worlds together.
R support is built on a legacy of Microsoft R Server 9.x and Revolution R Enterprise products. Significant machine learning and AI capabilities enhancements have been made in every release. In 9.2.1, Machine Learning Server adds support for the full data science lifecycle of your Python-based analytics.
This meetup will NOT be a data science intro or R intro to programming. It is about working with data and big data on MLS .
- How to Scale R
- Work with R and Hadoop + Spark
-Demo of MLS on HDP/HDInsight server with RStudio
- How to operationalize deploying models using MLS Webservice operationalization features on MLS Server or on the cloud Azure ML (PaaS) offering. Speaker Bio:
Alex Zeltov is Big Data Solutions Architect / Software Engineer / Programmer Analyst / Data Scientist with over 19 years of industry experience in Information Technology and most recently in Big Data and Predictive Analytics. He currently works as Global black belt Technical Specialist in Microsoft where he concentrates on Big Data and Advanced Analytics use cases. Previously to joining Microsoft he worked as a Sr. Solutions Engineer at Hortonworks where he specialized in HDP and HDF platforms.
Встреча докладчиков, Программного комитета и активистов конференции разработчиков высоконагруженных систем HighLoad++. Обсудили результаты 2014 года и наметили планы на 2015-й.
Рассказ о новых возможностях конференции разработчиков высоконагруженных систем HighLoad++: экспертной зоне, домашних заданиях, новом подходе к спонсорству и так далее!
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
2. Skype Databases
– Skype is communications company that provides calling and chatting services
over internet. Thanks to the fact that a lot of activity goes on in p2p network not
everything that user does hits databases.
– We have used PostgreSQL from the beginning for all our OLTP databases.
– Over 100 database servers and 200 databases
– Largest OLTP table has several billions of records.
– Over 10,000 transactions per second.
– Databases are used mainly for web store, billing and other centrally provided
services.
– We use low end servers. Most common server has 4 cores 16 Gb of memory 8
internal disks.
3. Stored Procedure API
– Skype has had from the beginning the requirement that all database access must
be implemented through stored procedures.
– This approach gives following benefits:
• Data access and modification logic is in stored procedures near the data it
operates on.
• Simple security model. No need to grant rights on tables as applications don' t
need to see any tables.
• Seamless re- factoring of underlying databases. No need to change
applications when
– underlying tables are optimized or modified
– functionality is moved into separate databases
4. Vertical Split
– Skype started with couple of databases.
– Moved out functionality into separate servers and databases as load increased.
– Used replication and remote procedure calls between databases.
userdb userdb analysisdb
userdb
shopdb
shopdb shopdb servicedb
Time
5. Horizontal Split
– One server could not service one table anymore
– Created plProxy for horizontal partitioning of databases.
– Largest cluster is served by 16 servers plus failover servers.
userdb userdb
(proxy) (proxy)
userdb
userdb_p0 userdb_p2 userdb_p0 userdb_p1 userdb_p2 userdb_p3
userdb_p1 userdb_p3
Time
7. pgBouncer: PostgreSQL Connection Pooler
– pgBouncer is lightweight and robust
connection pooler for PostgreSQL.
FrontEnd WebServer
– Reduces thousands of incoming
connections to only tens of connections in
database.
– Optimal number of connections is
important because each connection uses pgBouncer
computer resources and each new
connection is quite expensive as prepared
userdb
plans have to be created each time from
scratch.
8. plProxy: Remote Call Language
– PL/Proxy is compact language for remote calls
between PostgreSQL databases.
shopdb
– With PL/Proxy user can create proxy functions that
have same signature as remote functions to be
called.
– plProxy adds very little overhead.
billingdb
– Adds complexity to maintenance and development.
CREATE FUNCTION get_user_email(username text) RETURNS text AS $$
CONNECT 'dbname=billingdb';
$$ LANGUAGE plproxy;
9. plProxy: Remote Calls
– We use remote calls mostly for read only queries in cases where it is not
reasonable to replicate data needed to calling database.
– Get balance from accounting database
– Get user email
– Get archived records from archive database
– Care with read write calls. No two phase commit.
userDB
shopDb
billingDB
10. plProxy: Proxy Databases
– Additional layer between application and databases.
– Database connectivity simple for applications.
– Security layer. BackOffice
Application
manualfixDb backofficeDb
(proxy) (proxy)
shopDb userDB internalDb
11. plProxy: Vertical Partitioning
– plProxy can be used to split database into partitions based on some category or
field like country code. Example database is split into ' ru' and ' row' )rest of the
world(
– Another scenario would be to split by product category.
onlinedb
(proxy)
onlinedb_RU onlinedb_ROW
12. plProxy: Horizontal Partitioning
– We have partitioned most of our database by username using PostgreSQL
hashtext function to get equal distribution between partitions.
– When splitting databases we usually prepare new partitions in other servers and
then switch all traffic at once to keep our life simple.
– As proxy databases are stateless we can have several exact duplicates for load
balancing and high availability.
userdb userdb
userdb_p0 userdb_p1 userdb_p2 userdb_p3
13. SkyTools: Package of Database Tools
– Contain most everything we have found useful in
our everyday work with databases and database
PostgreSQL.
– PgQ that adds event queues to PostgreSQL.
– Londiste replication.
batch
– Walmgr for wal based log shipping. database job
– DBScript framework which provide database
connectivity, logging, stats management, encoding,
decoding etc for batch jobs.
– SkyTools contains tens of reusable generic scripts database
for doing various data related tasks.
14. PgQ: PostgreSQL Queueing Implementation
PgQ is PostgreSQL based event processing system. It is part of SkyTools package
that contains several useful tools built on it.
Producers - applications Consumers -
that write events into applications that read
queue. Producer can events from queue.
be written in any Consumers can be
language that is able Database written in any
to run stored language that can
function:
procedures in pgq.create_event interact with
PostgreSQL. PostgreSQL.
queue
python, java:, php, c++ python, java, c++, php:
producer consumer
15. PgQ: Properties
– Transactional . Event insertion is committed or rolled back together with the
other things that transaction changes.
– Efficient. Usually database based queue implementations are not efficient but
we managed to solve it because PostgreSQL makes transaction state visible to
users.
– Fast. Events are processed in batches which gives low per event overhead.
– Flexible. Each queue can have several producers and consumers and any
number of event types handled in it.
– Robust. PgQ guarantees that each consumers sees event at least once. There
several methods how the processed events can be tracked depending on
business needs.
16. PgQ: Use cases
– Batch jobs. PgQ is very convenient media for providing events for background
processing.
• email senders
• SMS senders
• external partners communication handling )payment handling(
– Moving data out of online databases into internal databases )logs, detail records(.
– Replication. Copying data between databases.
– Partitioning data into daily batches for business intelligence.
17. Londiste: Replication
– We use replication
• to transfer online data into internal databases shopdb
(online)
• to create failover databases for online databases.
• to switch between PostgreSQL versions
• to distribute internally produced data into online servers
shopdb_ro
• to create read only replicas for load balancing (read only)
shopdb
– Londiste is implemented using PgQ as transport layer. (failover)
– It has DBA friendly command line interface.
analysisdb
(internal)
18. Summary
– We have managed to create a beautiful mess of vertically and horizontally split
databases that are connected by hundreds of lines of replication processes and
remote calls.
– But all this mess is still quite easy to manage and we see no architectural limits
for it to grow some more magnitudes.
– And all of it is done on open source software so it' s free.
– One of most beautiful features for all these tools is that you don' t need high data
load for them to be useful. All of them have features that can be used already on
one server and one database setups.
– Questions?