The document discusses MongoDB backups and point-in-time recovery (PITR). It covers the reasons for backups including disaster recovery and high availability. It describes logical and physical backup types in MongoDB and tools like mongodump, mongoexport, and filesystem snapshots. It also explains how to perform PITR using incremental backups of the oplog and restoring from backups up to a chosen point-in-time.
A look at what HA is and what PostgreSQL has to offer for building an open source HA solution. Covers various aspects in terms of Recovery Point Objective and Recovery Time Objective. Includes backup and restore, PITR (point in time recovery) and streaming replication concepts.
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)Jean-François Gagné
To get better replication speed and less lag, MySQL implements parallel replication in the same schema, also known as LOGICAL_CLOCK. But fully benefiting from this feature is not as simple as just enabling it.
In this talk, I explain in detail how this feature works. I also cover how to optimize parallel replication and the improvements made in MySQL 8.0 and back-ported in 5.7 (Write Sets), greatly improving the potential for parallel execution on replicas (but needing RBR).
Come to this talk to get all the details about MySQL 5.7 and 8.0 Parallel Replication.
Kubernetes is a solid leader among different cloud orchestration engines and its adoption rate is growing on a daily basis. Naturally people want to run both their applications and databases on the same infrastructure.
There are a lot of ways to deploy and run PostgreSQL on Kubernetes, but most of them are not cloud-native. Around one year ago Zalando started to run HA setup of PostgreSQL on Kubernetes managed by Patroni. Those experiments were quite successful and produced a Helm chart for Patroni. That chart was useful, albeit a single problem: Patroni depended on Etcd, ZooKeeper or Consul.
Few people look forward to deploy two applications instead of one and support them later on. In this talk I would like to introduce Kubernetes-native Patroni. I will explain how Patroni uses Kubernetes API to run a leader election and store the cluster state. I’m going to live-demo a deployment of HA PostgreSQL cluster on Minikube and share our own experience of running more than 130 clusters on Kubernetes.
Patroni is a Python open-source project developed by Zalando in cooperation with other contributors on GitHub: https://github.com/zalando/patroni
How to do a LIVE-demo with minikube:
1. git clone https://github.com/zalando/patroni
2. cd patroni
3. git checkout feature/demo
4. cd kubernetes
5. open demo.sh and edit line #4 (specify the minikube context )
6. docker build -t patroni .
7. may be docker push patroni
8. may be edit patroni_k8s.yaml line #22 and put the name of patroni image you build there
9. install tmux
10. run tmux in one terminal
11. run bash demo.sh in another terminal and press Enter from time to time
High availability is critical for PostgreSQL database systems, especially for organizations that depend on their databases to support their operations. In this presentation, we will explore the different options available for achieving high availability in PostgreSQL.
This presentation covers MySQL data encryption at disk. How to encrypt all tablespaces and MySQL related files for the compliances ? The new releases in MySQL 8.0 take care of the encryption of the system tablespace and supporting tables unlike MySQL 5.7.
A look at what HA is and what PostgreSQL has to offer for building an open source HA solution. Covers various aspects in terms of Recovery Point Objective and Recovery Time Objective. Includes backup and restore, PITR (point in time recovery) and streaming replication concepts.
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)Jean-François Gagné
To get better replication speed and less lag, MySQL implements parallel replication in the same schema, also known as LOGICAL_CLOCK. But fully benefiting from this feature is not as simple as just enabling it.
In this talk, I explain in detail how this feature works. I also cover how to optimize parallel replication and the improvements made in MySQL 8.0 and back-ported in 5.7 (Write Sets), greatly improving the potential for parallel execution on replicas (but needing RBR).
Come to this talk to get all the details about MySQL 5.7 and 8.0 Parallel Replication.
Kubernetes is a solid leader among different cloud orchestration engines and its adoption rate is growing on a daily basis. Naturally people want to run both their applications and databases on the same infrastructure.
There are a lot of ways to deploy and run PostgreSQL on Kubernetes, but most of them are not cloud-native. Around one year ago Zalando started to run HA setup of PostgreSQL on Kubernetes managed by Patroni. Those experiments were quite successful and produced a Helm chart for Patroni. That chart was useful, albeit a single problem: Patroni depended on Etcd, ZooKeeper or Consul.
Few people look forward to deploy two applications instead of one and support them later on. In this talk I would like to introduce Kubernetes-native Patroni. I will explain how Patroni uses Kubernetes API to run a leader election and store the cluster state. I’m going to live-demo a deployment of HA PostgreSQL cluster on Minikube and share our own experience of running more than 130 clusters on Kubernetes.
Patroni is a Python open-source project developed by Zalando in cooperation with other contributors on GitHub: https://github.com/zalando/patroni
How to do a LIVE-demo with minikube:
1. git clone https://github.com/zalando/patroni
2. cd patroni
3. git checkout feature/demo
4. cd kubernetes
5. open demo.sh and edit line #4 (specify the minikube context )
6. docker build -t patroni .
7. may be docker push patroni
8. may be edit patroni_k8s.yaml line #22 and put the name of patroni image you build there
9. install tmux
10. run tmux in one terminal
11. run bash demo.sh in another terminal and press Enter from time to time
High availability is critical for PostgreSQL database systems, especially for organizations that depend on their databases to support their operations. In this presentation, we will explore the different options available for achieving high availability in PostgreSQL.
This presentation covers MySQL data encryption at disk. How to encrypt all tablespaces and MySQL related files for the compliances ? The new releases in MySQL 8.0 take care of the encryption of the system tablespace and supporting tables unlike MySQL 5.7.
Deploying MariaDB databases with containers at Nokia NetworksMariaDB plc
Nokia is focused on providing software and products that facilitate rapid development, deployment and scaling of products and services to customers. The Common Software Foundation (CSF) within Nokia develops and supports product reuse by multiple applications within Nokia, including MariaDB. Their focus over the last year has been to develop a containerized MariaDB solution supporting multiple architectures, including both clustering and primary/secondary replication with MariaDB MaxScale. In this talk, Rick Lane discusses this journey of these containerized solutions from development to customer trials, including problems encountered and solutions.
There are many ways to run high availability with PostgreSQL. Here, we present a template for you to create your own customized, high-availability solution using Python and for maximum accessibility, a distributed configuration store like ZooKeeper or etcd.
How Discord Migrated Trillions of Messages from Cassandra to ScyllaDBScyllaDB
Learn why and how Discord’s persistence team recently completed their most ambitious migration yet: moving their massive set of trillions of messages from Cassandra to ScyllaDB. Bo Ingram, Senior Software Engineer at Discord, provides a technical look, including:
- Their reasons for moving from Apache Cassandra to ScyllaDB
- Their strategy for migrating trillions of messages
- How they designed a new storage topology – using a hybrid-RAID1 architecture – for extremely low latency on GCP
- The role of their existing Rust messages service, new Rust data service library, and new Rust data migrator in this project
- What they’ve achieved so far, lessons learned, and what they’re tackling next
This is a presentation at Bengaluru TechDay -October2019 for Oracle Database Admin and Architects presented by Karthik P R ( CEO Mydbops ). He explains the possible High Availability options in MySQL ecosystem.
https://www.meetup.com/All-India-Oracle-Users-Group-Bangalore-Chapter/events/265252214/
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
Ceph Object Storage Reference Architecture Performance and Sizing GuideKaran Singh
Together with my colleagues at Red Hat Storage Team, i am very proud to have worked on this reference architecture for Ceph Object Storage.
If you are building Ceph object storage at scale, this document is for you.
Percona Live 2022 - PBM - The Backup Open Source Tool for MongoDBJean Da Silva
Backup and restore are two of the most important things for databases. We don't often use the backup, but during a disaster situation, it is crucial to work.
In this session, we will discuss Percona Backup for MongoDB (PBM short).
We will walk through the process of taking backups and executing restores. We will also introduce the newest backup method that PBM offers, the physical backup in addition to the logical backup. After the introduction of the backup methods, we will evaluate the backup and restore times, and how to store the backup on remote backup storage.
Deploying MariaDB databases with containers at Nokia NetworksMariaDB plc
Nokia is focused on providing software and products that facilitate rapid development, deployment and scaling of products and services to customers. The Common Software Foundation (CSF) within Nokia develops and supports product reuse by multiple applications within Nokia, including MariaDB. Their focus over the last year has been to develop a containerized MariaDB solution supporting multiple architectures, including both clustering and primary/secondary replication with MariaDB MaxScale. In this talk, Rick Lane discusses this journey of these containerized solutions from development to customer trials, including problems encountered and solutions.
There are many ways to run high availability with PostgreSQL. Here, we present a template for you to create your own customized, high-availability solution using Python and for maximum accessibility, a distributed configuration store like ZooKeeper or etcd.
How Discord Migrated Trillions of Messages from Cassandra to ScyllaDBScyllaDB
Learn why and how Discord’s persistence team recently completed their most ambitious migration yet: moving their massive set of trillions of messages from Cassandra to ScyllaDB. Bo Ingram, Senior Software Engineer at Discord, provides a technical look, including:
- Their reasons for moving from Apache Cassandra to ScyllaDB
- Their strategy for migrating trillions of messages
- How they designed a new storage topology – using a hybrid-RAID1 architecture – for extremely low latency on GCP
- The role of their existing Rust messages service, new Rust data service library, and new Rust data migrator in this project
- What they’ve achieved so far, lessons learned, and what they’re tackling next
This is a presentation at Bengaluru TechDay -October2019 for Oracle Database Admin and Architects presented by Karthik P R ( CEO Mydbops ). He explains the possible High Availability options in MySQL ecosystem.
https://www.meetup.com/All-India-Oracle-Users-Group-Bangalore-Chapter/events/265252214/
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
Ceph Object Storage Reference Architecture Performance and Sizing GuideKaran Singh
Together with my colleagues at Red Hat Storage Team, i am very proud to have worked on this reference architecture for Ceph Object Storage.
If you are building Ceph object storage at scale, this document is for you.
Percona Live 2022 - PBM - The Backup Open Source Tool for MongoDBJean Da Silva
Backup and restore are two of the most important things for databases. We don't often use the backup, but during a disaster situation, it is crucial to work.
In this session, we will discuss Percona Backup for MongoDB (PBM short).
We will walk through the process of taking backups and executing restores. We will also introduce the newest backup method that PBM offers, the physical backup in addition to the logical backup. After the introduction of the backup methods, we will evaluate the backup and restore times, and how to store the backup on remote backup storage.
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...NETWAYS
The pg_stat_monitor is the statistics collection tool based on PostgreSQL’s contrib module pg_stat_statements. PostgreSQL’s pg_stat_statements provides only basic statistics, which is sometimes not enough. The major shortcoming in pg_stat_statements is that it accumulates all the queries and statistics, but does not provide aggregated statistics or histogram information. In this case, a user needs to calculate the aggregate, which is quite expensive. Pg_stat_monitor provides the pre-calculated aggregates. pg_stat_monitor collects and aggregates data on a bucket basis. The size and number of buckets should be configured using GUC (Grand Unified Configuration). The buckets are used to collect the statistics and aggregate them in a bucket. The talk will cover the usage of pg_stat_monitor and how it is better than pg_stat_statements.
Protecting data with CSI Volume Snapshots on KubernetesDoKC
The talk will cover volume snapshots on Kubernetes CSI, with a deep dive on Kubernetes CSI snapshots. A short demo on the basics of the CSI snapshots on Kubernetes will help show how easy it is to start using CSI snapshots.
After covering the basics on both container schedulers, we’ll deep dive into how backup systems can programmatically interact with the Kubernetes CSI snapshots client. This example will be based on Stork, an open-source backup system that has implemented the CSI spec.
This talk was given by Grant Griffiths for DoK Day Europe @ KubeCon 2022.
A look at some of the ways available to deploy Postgres in a Kubernetes cloud environment, either in small scale using simple configurations, or in larger scale using tools such as Helm charts and the Crunchy PostgreSQL Operator. A short introduction to Kubernetes will be given to explain the concepts involved, followed by examples from each deployment method and observations on the key differences.
Enter the world of cloud computing and software development with PaaS. What it takes to create a production ready application with Heroku and how to run it?
What if …
- Traditional, labour-intensive backup and archive practices for your MySQL, MariaDB, MongoDB and PostgreSQL databases were a thing of the past?
- You could have one backup management solution for all your business data?
- You could ensure integrity of all your backups?
- You could leverage the competitive pricing and almost limitless capacity of cloud-based backup while meeting cost, manageability, and compliance requirements from the business.
Welcome to our webinar on Backup Management with ClusterControl.
ClusterControl’s centralized backup management for open source databases provides you with hot backups of large datasets, point in time recovery in a couple of clicks, at-rest and in-transit data encryption, data integrity via automatic restore verification, cloud backups (AWS, Google and Azure) for Disaster Recovery, retention policies to ensure compliance, and automated alerts and reporting.
Whether you are looking at rebuilding your existing backup infrastructure, or updating it, this webinar is for you!
AGENDA
- Backup and recovery management of local or remote databases
- Logical or physical backups
- Full or Incremental backups
- Position or time-based Point in Time Recovery (for MySQL and PostgreSQL)
- Upload to the cloud (Amazon S3, Google Cloud Storage, Azure Storage)
- Encryption of backup data
- Compression of backup data
- One centralized backup system for your open source databases (Demo)
- Schedule, manage and operate backups
- Define backup policies, retention, history
- Validation - Automatic restore verification
- Backup reporting
SPEAKER
Bartlomiej Oles, Senior Support Engineer at Severalnines, is a MySQL and Oracle DBA, with over 15 years experience in managing highly available production systems at IBM, Nordea Bank, Acxiom, Lufthansa, and other Fortune 500 companies. In the past five years, his focus has been on building and applying automation tools to manage multi-datacenter database environments.
Robert Bates, SVP Sales Engineering of Crunchy Data explains how you can tackle Data Gravity, Kubernetes, and strategies/best practices to run, scale, and leverage stateful containers in production.
An overview of reference architectures for PostgresEDB
EDB Reference Architectures are designed to help new and existing users alike to quickly design a deployment architecture that suits their needs. They can be used as either the blueprint for a deployment, or as the basis for a design that enhances and extends the functionality and features offered.
Add-on architectures allow users to easily extend their core database server deployment to add additional features and functionality "building block" style.
In this webinar, we will review the following architectures:
- Single Node
- Multi Node with Asynchronous Replication
- Multi Node with Synchronous Replication
- Add-on Architectures
OSMC 2018 | Why we recommend PMM to our clients by Matthias CrauwelsNETWAYS
As service providers, one of our responsibilities is helping clients understand what causes contributed to a production downtime incident, and how to avoid (as much as possible) them from happening again. We do this with Incident Reports, and one common recommendation we make is to have a historical monitoring system in place. All our clients have point-in-time monitoring solutions in place, solutions that can alert them when a system is down or behaving in unacceptable ways. But historical monitoring is still not common, and we believe a lot of companies can benefit from deploying one of them. In most cases, we have recommended Percona Monitoring and Management (PMM), as a good and Open Source solution for this problem. In this session, we will talk about the reasons why we recommend PMM as a way to prevent incidents, and also to investigate their possible causes when one has happened.
Our secure remote connectivity tool provides full video recording of all work our engineers perform on client systems. We have requirements to analyze the video log to detect suspicious activity, provide forensic and root cause analysis capabilities. Some of the obvious use cases include detection of credit card patterns or personally identifiable information (PII) as well as malicious activity like dropping database objects. We need to process hundreds of gigabytes per day representing thousands of hours of video. Our solution leverages a variety of Hadoop components to perform optical text recognition and indexing, keyboard and mouse movement analysis as well as integration with variety of other data sources such as our monitoring, documentation, ticketing and communication systems. We will present our complete architecture starting from multi-source data ingestion through data processing and analysis up to the end user interface, reporting and integration layer.
Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...Andrejs Prokopjevs
This presentation is a successor to the "Running Oracle EBS in the cloud." session held at the UKOUG Apps16 event (or other conferences later). The author would like to go through the latest updates of the year 2022 on what is still actual, what is not, key recommendations, and a comparison of the public cloud platforms certified. The cloud journey is a continuously client-demanding topic, and there are uncertainties still around the cloud journey options for Oracle E-Business Suite customers.
Always upgrade! There are hundreds of fixes between each PostgreSQL release, and an important number of them are security fixes! Logical replication allows making major upgrades with minimal downtime and feasible cons.
Sharding and things we'd like to see improvedIgor Donchovski
Starting with MongoDB version 5.0, we can reshard a collection using the new shard key. But there are still something we think could be improved like:
Cluster metadata lack of details on chunks
TTL indexes
Improvements in the rebalancing process
Performance issues on rebalancing data
Observability
Redundancy and high availability are the basis for all production deployments. With MongoDB high availability is achieved with replica sets which provides automatic fail-over in case the Primary goes down. In this session we will review multiple maintenance scenarios that will include the proper steps for keeping the high availability while we perform the maintenance steps without causing downtime.
This session will cover Database upgrades, OS server patching, Hardware upgrades, Network maintenance and more.
How MongoDB HA works
Replica sets components/deployment typologies
Database upgrades
System patching/upgrade
Network maintenance
Add/Remove members to the replica set
Reconfiguring replica set members
Building indexes
Backups and restores
Exploring the replication and sharding in MongoDBIgor Donchovski
Redundancy and high availability are the basis for all production deployments. Database systems with large data sets or high throughput applications can challenge the capacity of a single server like CPU for high query rates or RAM for large working sets. Adding more CPU and RAM for vertical scaling is limited. Systems need horizontal scaling by distributing data across multiple servers. MongoDB supports horizontal scaling through sharding. Each shard consist of replica set that provides Redundancy and high availability.
Redundancy and high availability are the basis for all production deployments. Database systems with large data sets or high throughput applications can challenge the capacity of a single server like CPU for high query rates or RAM for large working sets. Adding more CPU and RAM for vertical scaling is limited. Systems need horizontal scaling by distributing data across multiple servers. MongoDB supports horizontal scaling through sharding.
Redundancy and high availability are the basis for all production deployments. With MongoDB this can be achieved by deploying a replica set. In this talk, we'll explore how MongoDB replication works and what the components are of a replica set. Using examples of wrong deployment configurations, we will highlight how to properly run replica sets in production, whether it comes to on-premise deployment or in the cloud.
In this day and age, maintaining privacy throughout our electronic communications is absolutely necessary. Creating user accounts, and not exposing your MongoDB environment to the wider internet, are basic concepts that have been missed in the past. Once that has been addressed, individuals and organizations interested in becoming PCI compliant must turn to securing their data through encryption. With MongoDB, we have two options for encryption: at rest and transport encryption.
Redundancy and high availability are the basis for all production deployments. With MongoDB this can be achieved by deploying replica set. In this slides we are exploring how the replication works with MongoDB, why you should use replication, what are the features and go over different deployment use cases. At the end we are comparing some features with MySQL replication and what are the differences between the two
Working with MongoDB as MySQL DBA. Comparing commands from MongoDB to MySQL, similarities and differences. Exploring replication features, failover and recovery, adjusting the variables and checking status and using DML, DDL with different storage engines
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main