Gluster for Geeks: Performance Tuning Tips & TricksGlusterFS
In this Gluster for Geeks technical webinar, Jacob Shucart, Senior Systems Engineer, will provide useful tips and tricks to make a Gluster cluster meet your performance requirements. He will review considerations for all different phases including planning, configuration, implementation, tuning, and benchmarking.
Topics covered will include:
• Protocols (CIFS, NFS, GlusterFS)
• Hardware configuration
• Tuning parameters
• Performance benchmarks
Gluster for Geeks: Performance Tuning Tips & TricksGlusterFS
In this Gluster for Geeks technical webinar, Jacob Shucart, Senior Systems Engineer, will provide useful tips and tricks to make a Gluster cluster meet your performance requirements. He will review considerations for all different phases including planning, configuration, implementation, tuning, and benchmarking.
Topics covered will include:
• Protocols (CIFS, NFS, GlusterFS)
• Hardware configuration
• Tuning parameters
• Performance benchmarks
This is to introduce the related components in SUSE Linux Enterprise High Availability Extension product to build High Available Storage (ha-lvm/drbd/iscsi/nfs, clvm, ocfs2, cluster-raid1).
Distributed Storage and Compute With Ceph's librados (Vault 2015)Sage Weil
The Ceph distributed storage system sports object, block, and file interfaces to a single storage cluster. These interface are built on a distributed object storage and compute platform called RADOS, which exports a conceptually simple yet powerful interface for storing and processing large amounts of data and is well-suited for backing web-scale applications and data analytics. In features a rich object model, efficient key/value storage, atomic transactions (including efficient compare-and-swap semantics), object cloning and other primitives for supporting snapshots, simple inter-client communication and coordination (ala Zookeeper), and the ability to extend the object interface using arbitrary code executed on the storage node. This talk will focus on librados API, how it is used, the security model, and some examples of RADOS classes implementing interesting functionality.
A Backup Today Saves Tomorrow is a presentation from Percona Live 2013 that provides insight into planning and the tools used today to capture MySQL backups.
Redundant Arrays of independent disks is a family of techniques that use multiple disks that are organized to provide high performance and/or reliability
This is to introduce the related components in SUSE Linux Enterprise High Availability Extension product to build High Available Storage (ha-lvm/drbd/iscsi/nfs, clvm, ocfs2, cluster-raid1).
Distributed Storage and Compute With Ceph's librados (Vault 2015)Sage Weil
The Ceph distributed storage system sports object, block, and file interfaces to a single storage cluster. These interface are built on a distributed object storage and compute platform called RADOS, which exports a conceptually simple yet powerful interface for storing and processing large amounts of data and is well-suited for backing web-scale applications and data analytics. In features a rich object model, efficient key/value storage, atomic transactions (including efficient compare-and-swap semantics), object cloning and other primitives for supporting snapshots, simple inter-client communication and coordination (ala Zookeeper), and the ability to extend the object interface using arbitrary code executed on the storage node. This talk will focus on librados API, how it is used, the security model, and some examples of RADOS classes implementing interesting functionality.
A Backup Today Saves Tomorrow is a presentation from Percona Live 2013 that provides insight into planning and the tools used today to capture MySQL backups.
Redundant Arrays of independent disks is a family of techniques that use multiple disks that are organized to provide high performance and/or reliability
Slides presented at Percona Live Europe Open Source Database Conference 2019, Amsterdam, 2019-10-01.
Imagine a world where all Wikipedia articles disappear due to a human error or software bug. Sounds unreal? According to some estimations, it would take an excess of hundreds of million person-hours to be written again. To prevent that scenario from ever happening, our SRE team at Wikimedia recently refactored the relational database recovery system.
In this session, we will discuss how we backup 550TB of MariaDB data without impacting the 15 billion page views per month we get. We will cover what were our initial plans to replace the old infrastructure, how we achieved recovering 2TB databases in less than 30 minutes while maintaining per-table granularity, as well as the different types of backups we implemented. Lastly, we will talk about lessons learned, what went well, how our original plans changed and future work.
Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil
Ceph is designed around the assumption that all components of the system (disks, hosts, networks) can fail, and has traditionally leveraged replication to provide data durability and reliability. The CRUSH placement algorithm is used to allow failure domains to be defined across hosts, racks, rows, or datacenters, depending on the deployment scale and requirements.
Recent releases have added support for erasure coding, which can provide much higher data durability and lower storage overheads. However, in practice erasure codes have different performance characteristics than traditional replication and, under some workloads, come at some expense. At the same time, we have introduced a storage tiering infrastructure and cache pools that allow alternate hardware backends (like high-end flash) to be leveraged for active data sets while cold data are transparently migrated to slower backends. The combination of these two features enables a surprisingly broad range of new applications and deployment configurations.
This talk will cover a few Ceph fundamentals, discuss the new tiering and erasure coding features, and then discuss a variety of ways that the new capabilities can be leveraged.
Webinar slides: Top 9 Tips for building a stable MySQL Replication environmentSeveralnines
MySQL replication is a widely known and proven solution to build scalable clusters of databases. It is very easy to deploy, even easier with GTID. Easy deployment doesn't mean you don't need knowledge and skills to operate it correctly. If you'd like to learn what is needed to build a stable environment using MySQL replication, this webinar is for you.
AGENDA
1. Sanity checks before migrating into MySQL replication setup
2. Operating system configuration
3. Replication
4. Backup
5. Provisioning
6. Performance
7. Schema changes
8. Reporting
9. Disaster recovery
SPEAKER
Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.
Similar to High Availability Redundancy vs Backup vs Archiving Databases (20)
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
High Availability Redundancy vs Backup vs Archiving Databases
1. High Availability
Redundancy vs Backup vs Archiving
Databases (MySQL, PostgreSQL, MongoDB), Data
Rafał Gołąb <rafal.golab@codibly.com>
Kraków, 19.02.2015r.
3. Redundancy, backup, archiving theory
● Redundancy
- establishes a straight copy of an entire system, ready to take over if the original
system fails
● Backup
- create a second copy of data at specific points in time
- ideally keeping multiple historic copies
- must be consistent
● Archiving
- makes a primary copy of selected data with the aim of retaining data in the long-
term
6. Objectives
● understanding how big is a problem
● sleep well
● extend knowledge
● know the differences
● increase data safety
7. What will can happen?
● location
● networking
● hardware
● operating system
● data storage
● app layer
What can we do?
● load balancing
● fail-over
● disaster recovery
8. Detailed problem solving
● DNS problems
- round robin
- low ttl
- gslb (dnsmadeeasy.com, akamai)
● HTTP problems
- HAproxy, nginx (LB algorithms)
- memcache servers
- failover ip addresses
● MAIL problems
- few MX servers
- LB SMTP servers
● DATABASES & STORAGE problems
- next part of presentation
19. LVM - snapshots
LVM snapshots allow for a consistent backup even if files are open during the backup. The
snapshot volume needs enough space to store changes that occur during the backup.
100GB
5GB
100GB
21. MySQL Backup Tools (real examples)
● mylvmbackup
● xtrabackup
- no tables locks
- only for innodb
● mysqldump
- tables locks
- long time recovery
- for small databases
23. Conclusions
● High Availability is complex problem and different on each organisation.
● The best practice when it comes to protecting your data is using all of
solutions (redundancy, backup and archiving) when possible.
● Redundancy isn’t backup
● Backup is more important than redundancy
● Using LVM is the best solution for preparation DBs backups
24. Thank you for your attention. Questions?
Rafał Gołąb
Linux System Administrator
E-mail: rafal.golab@codibly.com
Mob.: (+48) 506 514 543
www.codibly.com