P99CONF — What We Need to Unlearn About Persistent Storage

•

0 likes•546 views

System software engineers have long been taught that disks are slow and sequential I/O is key to performance. With SSD drives I/O really got much faster but not simpler. In this brave new world of rocket-speed throughputs an engineer has to distinguish sustained workload from bursts, (still) take care about I/O buffer sizes, account for disks’ internal parallelism and study mixed I/O characteristics in advance. In this talk we will share some key performance measurements of the modern hardware we’re taking at ScyllaDB and our opinion about the implications for the database and system software design.

Technology

Brought to you by
What We Need to Unlearn
About Persistent Storage
Pavel Emelyanov
Principal Engineer @ ScyllaDB

Why HDD is hard to deal with
■ HDD has moving parts inside
● Each IOP is probably a seek
● Seek time can be milliseconds
■ Working with HDD in an eﬃcient way: try not to move the head
● Use sequential IO
● Use larger buffers (batch)
■ DB commitlog was designed with that in mind

Why SSD is cool
■ SSD has RAM-like storage inside
● Each IO is can be constant time
■ Working with SSD in an eﬃcient way: just do the IO
● Spoiler: not really

Is your disk fast or slow?
■ SSD is usually described by 4 “speeds”
● Throughput in MB/s
● IOPS in Hz (op/s)
● Both for read and write
■ The larger the “speed” numbers are – the better the disk should be

Now why my IO sucks?
■ SSD block overwrite problem
■ Internal caching
■ Internal parallelism
■ Bandwidth depends on buffer size
■ Mixed IO
■ Noisy neighbours (in clouds)

Internal structure
■ Read/Write is done in pages (e.g. 4k)
■ Erasure is done in blocks (e.g. 128 pages)
■ Overwrite is not possible
■ Disk controller has
● a mapping table to map IO offset to in-disk offset
● relocates pages in the background

IO sucks because ...
■ Disk is aged out
● Virtually sequential IO results in physically random one
● Background GC is taking place

How to make it suck faster?
■ Sequential IO with large buffers is back from the dead
■ Discard unused blocks
● Filesystems may do it for you

More on internal structure
■ Flash cells are prepended with faster cache
● Read-ahead
■ Parallel IO lanes
● Lare (N * page size) IO may be served by several chips in parallel
● Internal indirection may hide it

What’s measured in ads
■ Reported numbers can show burst performance
■ Sustained IO may be, and usually is, somewhat slower

How to live with it?
■ Get your disk’s sustained performance

Throughput vs IOPS
■ IOPS limit is the ability to process requests
● Measured with minimally possible buffers (usually a page-size)
■ Throughput is the ability to process data
● Measured with “large” buffers (~1MB and larger)

What if the buffer size is in between?
■ It depends on the disk
■ Some drop down to 70% of both bandwidth and IOPS peaks

What’s the optimal IO size?
■ Depends on the application
■ Less IO size – better latency
■ Larger IO size – better throughput, but it really scales

Is my WRITE safe?
■ Write can be cached at many levels
● Application
● Linux page cache
● In-disk cache
■ Cache means faster but less reliable writes

Is my WRITE safe? (cont.)
■ There are different buzzwords that refer to writing for real
● O_DIRECT – prevent Linux from caching
● O_DSYNC – prevent disk from caching
● FUA – do write the data into energy-independent place
■ Not all disks handle O_DSYNC at the same speed as regular writes

How to write the data?
■ Check if the disk is O_DSYNC-friendly
● Most cloud disks are
■ Chose between speed and safety
● It may happen that losing last few seconds of writes is not critical

What’s really-really measured in ads
■ Bandwidth and IOPS of a pure IO
● Only read or only write
■ Mixed mode is incredibly worse
● Concurrency matters
● Disks often prefer writes over reads

What if I do read and write at the same time?
■ Not much
■ Hold on requests for better latencies

Brought to you by
Pavel Emelyanov
xemul@scylladb.com

What's hot

RedHat built a distributed object storage solution named Ceph which first debuted ten years ago. Now we are seeing rapid developments in the industry and we want to take advantage of them. In this talk, we will briefly introduce Ceph, revisit the problems we are seeing when profiling its I/O performance with flash device, and explain why we want to embrace the future by switching to Seastar. We’ll share our experiences with the audience of how and when we are porting our software to this framework.

Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...

ScyllaDB

Three engineers, at various points, each take their own approach adding Rust to a C codebase, each being more and more ambitious. I initially just wanted to replace the server’s networking and event loop with an equally fast Rust implementation. We’d reuse many core components that were in C and just call into them from Rust. Surely it wouldn’t be that much code… Pelikan is Twitter’s open source and modular framework for in-memory caching, allowing us to replace Memcached and Redis forks with a single codebase and achieve better performance. At Twitter, we operate hundreds of cache clusters storing hundreds of terabytes of small objects in memory. In-memory caching is critical, and demands performance, reliability, and efficiency. In this talk, I’ll share my adventures in working on Pelikan and how rewriting it in Rust can be more than just a meme.

Whoops! I Rewrote It in Rust

ScyllaDB

G1 has been around for quite some time now and since JDK 9 it is the default garbage collector in OpenJDK. The community working on G1 is big and the contributions over the last few years have made a significant impact on the overall performance. This talk will focus on some of these features and how they have improved G1 in various ways, including smaller memory footprint and shorter P99 pause times. We will also take a brief look at what features we have lined up for the future.

G1: To Infinity and Beyond

ScyllaDB

Ceph is an open source distributed file system addressing file, block, and object storage use cases. Next generation storage devices require a change in strategy, so the community has been developing crimson-osd, an eventual replacement for ceph-osd intended to minimize cpu overhead and improve throughput and latency. Seastore is a new backing store for crimson-osd targeted at emerging storage technologies including persistent memory and ZNS devices.

Seastore: Next Generation Backing Store for Ceph

ScyllaDB

Sharding: Past, Present and Future with Krutika Dhananjay

Gluster.org

Rust, Wright's Law, and the Future of Low-Latency Systems

ScyllaDB

Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...

ScyllaDB

Pulsar is used by a portfolio of products at Splunk for stream processing of different types of data, including metrics and logs. In this talk, Karthik Ramasamy will share how Splunk helped a flagship customer scale a Pulsar deployment to handle 10 PB/day in a single cluster. He will talk about the journey, the challenges faced, and the trade-offs made to scale Pulsar and operate it reliably and stably in Google Cloud Platform (GCP).

Scaling Apache Pulsar to 10 Petabytes/Day

ScyllaDB

Modern systems are large, and complicated, and it is often difficult to account precisely where CPU cycles are spent in production. Once you begin measuring, you will find all sorts of strange surprises - like cleaning out strange objects from an attic that has accumulated stuff for decades. This talk discusses surprising places where we found CPU waste in real-world production environments: From Kubelet consuming multiple percent of whole-cluster CPU, via popular machine learning libraries spending their time juggling exceptions instead of classifying, to EC2 time sources being much slower than necessary. CPU cycles are being lost in surprising places, and often it isn't in your own code.

Where Did All These Cycles Go?

ScyllaDB

Avoiding Data Hotspots at Scale

ScyllaDB

Rust has something unique to offer that languages in that space have never had before, and that is a degree of safety that languages like C and C++ have never had. Rust promises to deliver equivalent or better performance and greater productivity with guaranteed memory safety and data race freedom while allowing complete and direct control over memory. This video will cover: What is Rust? Benefits of Rust Rust Ecosystem Popular Applications in Rust

Rust Primer

Knoldus Inc.

Storage performance is becoming much more important. KVM io_uring attempts to bring the I/O performance of a virtual machine on almost the same level of bare metal. Apache CloudStack has support for io_uring since version 4.16. Wido will show the difference in performance io_uring brings to the table. Wido den Hollander is the CTO of CLouDinfra, an infrastructure company offering total Webhosting solutions. CLDIN provides datacenter, IP and virtualization services for the companies within TWS. Wido den Hollander is a PMC member of the Apache CloudStack Project and a Ceph expert. He started with CloudStack 9 years ago. What attracted his attention is the simplicity of CloudStack and the fact that it is an open-source solution. During the years Wido became a contributor, a PMC member and he was a VP of the project for a year. He is one of our most active members, who puts a lot of efforts to keep the project active and transform it into a turnkey solution for cloud builders. ----------------------------------------- The CloudStack European User Group 2022 took place on 7th April. The day saw a virtual get together for the European CloudStack Community, hosting 265 attendees from 25 countries. The event hosted 10 sessions with from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more. ------------------------------------------ About CloudStack: https://cloudstack.apache.org/

Boosting I/O Performance with KVM io_uring

ShapeBlue

Integration of Glusterfs in to commvault simpana

Gluster.org

Cassandra To Infinity And Beyond

Romain Hardouin

[POSS 2019] OVirt and Ceph: Perfect Combination.?

Worteks

Using Ceph in OStack.de - Ceph Day Frankfurt

Ceph Community

Application Caching: The Hidden Microservice (SAConf)

Scott Mansfield

Build Low-Latency Applications in Rust on ScyllaDB

ScyllaDB

Live migration: pros, cons and gotchas -- Pavel Emelyanov

OpenVZ

What's hot (19)

Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...

Whoops! I Rewrote It in Rust

G1: To Infinity and Beyond

Seastore: Next Generation Backing Store for Ceph

Sharding: Past, Present and Future with Krutika Dhananjay

Rust, Wright's Law, and the Future of Low-Latency Systems

Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...

Scaling Apache Pulsar to 10 Petabytes/Day

Where Did All These Cycles Go?

Avoiding Data Hotspots at Scale

Rust Primer

Boosting I/O Performance with KVM io_uring

Integration of Glusterfs in to commvault simpana

Cassandra To Infinity And Beyond

[POSS 2019] OVirt and Ceph: Perfect Combination.?

Using Ceph in OStack.de - Ceph Day Frankfurt

Application Caching: The Hidden Microservice (SAConf)

Build Low-Latency Applications in Rust on ScyllaDB

Live migration: pros, cons and gotchas -- Pavel Emelyanov

Similar to P99CONF — What We Need to Unlearn About Persistent Storage

SSD-Bondi.pptx

ssuserfc2c45

(Hugh O'Brien, Jet.com) Kafka Summit SF 2018 You’re doing disk IO wrong, let ZFS show you the way. ZFS on Linux is now stable. Say goodbye to JBOD, to directories in your reassignment plans, to unevenly used disks. Instead, have 8K Cloud IOPS for $25, SSD speed reads on spinning disks, in-kernel LZ4 compression and the smartest page cache on the planet. (Fear compactions no more!) Learn how Jet’s Kafka clusters squeeze every drop of disk performance out of Azure, all completely transparent to Kafka. -Striping cheap disks to maximize instance IOPS -Block compression to reduce disk usage by ~80% (JSON data) -Instance SSD as the secondary read cache (storing compressed data), eliminating >99% of disk reads and safe across host redeployments -Upcoming features: Compressed blocks in memory, potentially quadrupling your page cache (RAM) for free We’ll cover: -Basic Principles -Adapting ZFS for cloud instances (gotchas) -Performance tuning for Kafka -Benchmarks

Kafka on ZFS: Better Living Through Filesystems

confluent

SSDs, IMDGs and All the Rest - Jax London

Uri Cohen

How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...

JAXLondon2014

Josh Berkus You've heard that PostgreSQL is the highest-performance transactional open source database, but you're not seeing it on YOUR server. In fact, your PostgreSQL application is kind of poky. What should you do? While doing advanced performance engineering for really high-end systems takes years to learn, you can learn the basics to solve performance issues for 80% of PostgreSQL installations in less than an hour. In this session, you will learn: -- The parts of database application performance -- The performance setup procedure -- Basic troubleshooting tools -- The 13 postgresql.conf settings you need to know -- Where to look for more information.

5 Steps to PostgreSQL Performance

Command Prompt., Inc

Five steps perform_2009 (1)

PostgreSQL Experts, Inc.

P1 – typical computer components

Drew7Williams

The life and times

Abeer Naskar

Solid State Drives are the new solutions to permanent storage on devices from Personal Computers (PC) to PDA's (Personal Digital Assistant). They are much faster, lighter, contain no moving parts, smaller, no noise, shock resistant but little expensive. ;) They are built with memory chips (Flash Memory). This Presentation cover all of the advantages, disadvantages and comparison about SSD vs HDD.

Solid State Drive (SSD)

Harish S

A smaller Version of this ppt is available here: https://www.slideshare.net/HarishST1/solid-state-drive-ssd-75559174 Solid State Drives are the new solutions to permanent storage on devices from Personal Computers (PC) to PDA's (Personal Digital Assistant). They are much faster, lighter, contain no moving parts, smaller, no noise, shock resistant but little expensive. ;) They are built with memory chips (Flash Memory). This Presentation cover all of the advantages, disadvantages and comparison about SSD vs HDD.

Solid State Drive (SSD)

Harish S

SSD - Solid State Drive PPT by Atishay Jain

Atishay Jain

[G2]fa ce deview_2012

NAVER D2

Design Tradeoffs for SSD Performance

jimmytruong

strangeloop 2012 apache cassandra anti patterns

Matthew Dennis

Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...

ScyllaDB

SSD для вашей базы данных, Петр Зайцев (Percona)

Ontico

How to optimize your windows computer

Malik Browne

$Distro Recipes 2013 : My ${favorite_linux_distro} is slow!$ $Distro Recipes 2013 : My ${favorite_linux_distro} is slow!$

Distro Recipes 2013 : My ${favorite_linux_distro} is slow!

Anne Nicolas

LECTURE13nvjlfdihbkzbjvzbfmdnmzbxckbn.ppt

NikhilKumarJaiswal2

Solid state drives

Manmath Agarwal

Similar to P99CONF — What We Need to Unlearn About Persistent Storage (20)

SSD-Bondi.pptx

Kafka on ZFS: Better Living Through Filesystems

SSDs, IMDGs and All the Rest - Jax London

How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...

5 Steps to PostgreSQL Performance

Five steps perform_2009 (1)

P1 – typical computer components

The life and times

Solid State Drive (SSD)

SSD - Solid State Drive PPT by Atishay Jain

[G2]fa ce deview_2012

Design Tradeoffs for SSD Performance

strangeloop 2012 apache cassandra anti patterns

Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...

SSD для вашей базы данных, Петр Зайцев (Percona)

How to optimize your windows computer

$Distro Recipes 2013 : My ${favorite_linux_distro} is slow!$ $Distro Recipes 2013 : My ${favorite_linux_distro} is slow!$

Distro Recipes 2013 : My ${favorite_linux_distro} is slow!

LECTURE13nvjlfdihbkzbjvzbfmdnmzbxckbn.ppt

Solid state drives

More from ScyllaDB

ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. But before you squeeze, make sure you know what to monitor! Watch our experienced Postgres developer work through monitoring and performance strategies that help him understand what mistakes he’s made moving to NoSQL. And learn with him as our database performance expert offers friendly guidance on how to use monitoring and performance tuning to get his sample Rust application on the right track. This webinar focuses on using monitoring and performance tuning to discover and correct mistakes that commonly occur when developers move from SQL to NoSQL. For example: - Common issues getting up and running with the monitoring stack - Using the CQL optimizations dashboard - Common issues causing high latency in a node - Common issues causing replica imbalance - What a healthy system looks like in terms of memory - Key metrics to keep an eye on This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.

Optimizing NoSQL Performance Through Observability

ScyllaDB

Event-Driven Architecture Masterclass: Challenges in Stream Processing

ScyllaDB

Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...

ScyllaDB

Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...

ScyllaDB

See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the data modeling transition ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning. Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts. This webinar focuses on common data modeling and querying mistakes that occur when developers move from SQL to NoSQL. For example: - Understanding query first design principles - Planning for schema evolution - Steering clear of common pitfalls and anti-patterns - Assessing data access patterns This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.

Developer Data Modeling Mistakes: From Postgres to NoSQL

ScyllaDB

See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the transition ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning. Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts. Our first webinar of this series will cover common mistakes with practices such as: - Translating the data model to NoSQL - Optimizing table design - Optimizing query performance - Planning for partitioning This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.

What Developers Need to Unlearn for High Performance NoSQL

ScyllaDB

Expert tips on how to maximize your database performance at scale Untangle the complexity of achieving database performance at scale. Join this webinar to discover commonly overlooked ways to get predictable low latency, even at extreme scale. Our Solution Architects will walk you through the strategies and pitfalls learned by working on thousands of real-world distributed database projects, many reaching 1M OPS with single-digit MS latencies. In addition to offering clear recommendations, we’ll also explain the process behind how we arrived at them – so you can benefit from the lessons learned by other teams. We’ll cover how to: - Design and deploy a large-scale distributed database cluster - Optimize your clients’ interactions with it - Expand the cluster horizontally and globally - Ensure it survives whatever disasters the world throws at it

Low Latency at Extreme Scale: Proven Practices & Pitfalls

ScyllaDB

Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments. Join us for an interactive session where we dissect four specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma, we'll: - Examine the context and technical requirements - Talk about potential solutions and cover the pros and cons of each - Disclose what approach the team took, and how it worked out About the speaker: Felipe is an IT specialist with years of experience on distributed systems and open-source technologies. He is one of the co-authors of "Database Performance at Scale", an Open Access, freely available publication for individuals interested on improving database performance. At ScyllaDB, he works as a Solution Architect.

Dissecting Real-World Database Performance Dilemmas

ScyllaDB

Linear scaling (sometimes near linear scaling) is often mentioned in several benchmarks, articles and product comparisons as proof that a given technology and algorithmic optimizations perform better than another. But is that really what performance is all about, and should you even care? This webinar discusses performance beyond linear scalability, including what typically matters more when running high throughput and low latency workloads at scale. We'll cover how ScyllaDB offers unparalleled performance and share our insights on: - The hidden aspects of linear scaling - When linear scaling matters most and when it’s simply irrelevant - Often overlooked considerations for optimizing and measuring distributed systems performance Watch now to learn from our experience (and lessons learned) in building the fastest NoSQL database in the world.

Beyond Linear Scaling: A New Path for Performance with ScyllaDB

ScyllaDB

Navigating Complex Database Performance Hurdles Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments. Join us for an interactive session where we dissect 4 specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma: - The presenters will describe the context and technical requirements - Together, we’ll talk about potential solutions and cover the pros and cons of each - Finally, we’ll disclose what approach the team took, and how it worked out Throughout the event, we’ll have opportunities to win ScyllaDB swag and prizes! Come prepared to engage in lively discussions and gain valuable insight into database performance strategies.

Dissecting Real-World Database Performance Dilemmas

ScyllaDB

Database Performance at Scale Masterclass: Workload Characteristics by Felipe...

ScyllaDB

Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna

ScyllaDB

Technical risks of putting a cache in front of your database– and what to do instead Teams experiencing subpar latency commonly turn to an external cache to meet the required SLAs. Placing a cache in front of your database might seem like a fast and easy fix, but it often ends up introducing unanticipated complexity, costs, and risks. External caches can be one of the more problematic components of distributed application architecture. Join this webinar for a technical discussion of the risks associated with using an external cache and a look at how ScyllaDB’s cache implementation simplifies your architecture without compromising latency. We’ll cover: - Different approaches to caching (pre-caching vs. caching, side cache vs. transparent cache) - 7 specific reasons why external caching ia a bad choice - Why Linux’s default caching doesn’t work well for databases - The advantages & architecture of ScyllaDB's specialized row-based cache - Real-world examples of why and how teams eliminated their external cache with ScyllaDB

Replacing Your Cache with ScyllaDB

ScyllaDB

Discover how your team can achieve low latency at the extreme scale that your data-intensive applications require. We’ll walk you through an example of how ScyllaDB scales linearly to achieve 1M and then 2M OPS – with <1ms P99 latency. We’ll cover how this works on a sample realtime app (an ML feature store), share best practices for performance, and talk about the most important tradeoffs you’ll need to negotiate. Join us to learn: - Why and how to ensure your database takes full advantage of your cloud infrastructure - What architectural considerations matter most for high throughput and low latency - Key factors to consider when selecting a high-performance database

Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability

ScyllaDB

Teams experiencing subpar latency commonly turn to an external cache to meet the required SLAs. Placing a cache in front of your database might seem like a fast and easy fix, but it often ends up introducing unanticipated complexity, costs, and risks. Caches can be one of the more problematic components of distributed application architecture. Join this webinar for a technical discussion of the risks associated with using an external cache and a look at an alternative strategy that simplifies your architecture without compromising latency. We’ll cover: - Different approaches to caching (pre-caching vs. caching, side cache vs. transparent cache) - 7 specific reasons why external caching can be a bad choice - Why Linux’s default caching doesn’t work well for databases - The advantages & architecture of specialized row-based caches - Real-world examples of why and how teams eliminated their external cache

7 Reasons Not to Put an External Cache in Front of Your Database.pptx

ScyllaDB

Expert tips on how to maximize your database potential If you’re considering or getting started with ScyllaDB, you’re probably intrigued by its potential to achieve high throughput and predictable low latency at a reasonable cost. So how do you ensure that you’re maximizing that potential for your team’s specific workloads and use case? This webinar offers practical advice for navigating the various decision points you’ll face as you assess whether ScyllaDB is a good fit for your team and later roll it out into production. We’ll cover the most critical considerations, tradeoffs, and recommendations related to: - Infrastructure selection - ScyllaDB configuration - Client-side setup - Data modeling

Getting the most out of ScyllaDB

ScyllaDB

NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration

ScyllaDB

NoSQL Database Migration Masterclass - Session 3: Migration Logistics

ScyllaDB

NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges

ScyllaDB

Build the foundation for success with ScyllaDB Ready to try out ScyllaDB and want to make sure you’re “doing it right?” We’ll help you get up and running, fast. Spend an hour with our architects for a crash course in what ScyllaDB is all about, the core concepts you need to know, and a step-by-step demonstration of how to get started. During the live, interactive one-hour session, you will learn: - Critical considerations for designing a NoSQL system and NoSQL data model - The technology underlying ScyllaDB’s high performance, availability, and scalability – and best practices for taking advantage of it - How to install, deploy and operate a full working ScyllaDB system, including multi-data center deployment, monitoring, and connecting an application to the ScyllaDB cluster By the end of the session, you’ll have the knowledge and tools you need to get ScyllaDB running on your laptop, connect your application to it, and see what it’s like to use ScyllaDB for your specific use case.

ScyllaDB Virtual Workshop

ScyllaDB

More from ScyllaDB (20)

Optimizing NoSQL Performance Through Observability

Event-Driven Architecture Masterclass: Challenges in Stream Processing

Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...

Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...

Developer Data Modeling Mistakes: From Postgres to NoSQL

What Developers Need to Unlearn for High Performance NoSQL

Low Latency at Extreme Scale: Proven Practices & Pitfalls

Dissecting Real-World Database Performance Dilemmas

Beyond Linear Scaling: A New Path for Performance with ScyllaDB

Dissecting Real-World Database Performance Dilemmas

Database Performance at Scale Masterclass: Workload Characteristics by Felipe...

Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna

Replacing Your Cache with ScyllaDB

Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability

7 Reasons Not to Put an External Cache in Front of Your Database.pptx

Getting the most out of ScyllaDB

NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration

NoSQL Database Migration Masterclass - Session 3: Migration Logistics

NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges

ScyllaDB Virtual Workshop

Recently uploaded

Unlock the mysteries of successful Salesforce interviews in this insightful session hosted by Hugo Rosario (Salesforce Customer), a seasoned hiring manager that leads the Salesforce Department of multinational company with over 100 interviews under their belt. Step into the manager's chair and gain exclusive behind-the-scenes insights into what makes a Salesforce consultant stand out during the interview process. From deciphering the unspoken cues to mastering key strategies, we'll explore the intricacies of the interview process and provide practical tips for consultants looking to not only pass interviews but also thrive in their roles. Whether you're a seasoned professional or just starting your Salesforce journey, this session is your backstage pass to the secrets that hiring managers wish you knew.

Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...

CzechDreamin

Microsoft CSP Briefing Pre-Engagement - Questionnaire

Exakis Nelite

This talk focuses on the practical aspects of integrating various telephony systems with Salesforce, drawing on examples from implementations in the Czech scene. It aims to inform attendees about the spectrum of telephony solutions available, from small to large scale, and their compatibility with Salesforce. The presentation will highlight key considerations for selecting a telephony provider that integrates smoothly with Salesforce, including important questions to support the decision-making process. It will also discuss methods for integrating existing telephony systems with Salesforce, aimed at companies contemplating or in the process of adopting this CRM platform. The discussion is designed to provide a straightforward overview of the steps and considerations involved in telephony and Salesforce integration, with an emphasis on functionality, compatibility, and the practical experiences of Czech companies.

Integrating Telephony Systems with Salesforce: Insights and Considerations, B...

CzechDreamin

WebAssembly is Key to Better LLM Performance

Samy Fodil

IESVE for Early Stage Design and Planning

IES VE

Explore the core of Salesforce success in 'Salesforce Adoption – Metrics, Methods, and Motivation.' We will discuss essential metrics, effective methods to drive adoption, and the driving force behind user engagement and explore strategies for onboarding, training, and continuous support that empower users to navigate the platform seamlessly. By leveraging these tools, you can effectively measure adoption against your company’s goals and create an environment where users not only adopt Salesforce but actively contribute to its ongoing success.

Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom

CzechDreamin

Overview of Hyperledger Foundation

Hyperleger Tokyo Meetup

Where to Learn More About FDO _ Richard at FIDO Alliance.pdf

FIDO Alliance

Speed Wins: From Kafka to APIs in Minutes

confluent

The presentation underscores the strategic advantage of treating design systems not just as technical assets but as vital business components that require thoughtful management, robust planning, and strategic alignment with organizational goals. Key Points Covered: - Understanding Design Systems as Business Entities: Conceptualizing design systems as internal business entities can streamline their integration and evolution within a company. - Adoption and Expansion: Elaborating on the importance of tactical adoption across organizational structures, enhancing product suites to cater to user needs and broadening scope to mobile and content authoring solutions. - Data-Driven Development: Utilizing data insights for component development ensures that resources are allocated to create valuable, widely used features. - Financial Modeling for Design Systems: Developing sustainable funding models is crucial for long-term support and success of design systems. - Promoting Internal Buy-In: Stressing on strategies for promoting design systems within the organization to increase engagement and investment from internal stakeholders.

A Business-Centric Approach to Design System Strategy

UXDXConf

The Metaverse: Are We There Yet?

Mark Billinghurst

Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf

FIDO Alliance

Oauth 2.0 Introduction and Flows with MuleSoft

shyamraj55

Using IESVE for Room Loads Analysis - UK & Ireland

IES VE

When you think of a highly secure meeting environment, do you instantly think 'Microsoft Teams'!? Or do you think about some unknown application, troublesome UI and daunting login process...? If you think the latter - let's change that! In this session Femke will show you how using Teams Premium features can create secure, but also good looking meetings! PRETTY. Make sure your company's brand is represented before, during and after the meeting with Customization policies in place. SECURE. Lets utilize Meeting templates and Sensitivity Labels to protect your meeting and data to prevent sensitive information from being leaked. After this session, you will have a clear understanding of the capabilities of Teams Premium features and how to set up the perfect meeting that suits your organizational requirements!

ECS 2024 Teams Premium - Pretty Secure

Femke de Vroome

Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf

FIDO Alliance

Webinar Recording: https://www.panagenda.com/webinars/alles-neu-macht-der-mai-wir-durchleuchten-den-verbesserten-notes-eigenschaftendialog/ Haben Sie sich schon einmal über den zu kleinen Eigenschaftendialog in Notes geärgert? Mussten Sie einen Agenten oder eine Aktion erstellen, um schnell mal ein Feld zu ändern? Haben Sie jedes mal endlos nach dem zu vergleichenden Feld gesucht, nachdem Sie ein neues Dokument ausgewählt haben? Wollten Sie das verdammte Ding einfach nur größer machen? Zum Glück gibt es dafür eine Lösung – und sie ist wahrscheinlich bereits installiert! Mit dem kostenlosen panagenda Document Properties (Pro) erhalten Sie den Eigenschaftendialog, den Sie schon immer haben wollten. Größer, anpassbar, und im Volltext durchsuchbar. Sehen Sie mehrere Dokumente gleichzeitig oder vergleichen Sie mit einem Diff-Viewer. Ändern Sie beliebige Felder und haben Sie endlich eine einfache Möglichkeit, Profildokumente für alle Benutzer zu verwalten. Entdecken Sie mit HCL Ambassador Marc Thomas, wie Document Properties Ihre Arbeit vereinfachen und Sie bei der täglichen Verwendung von Domino-Anwendungen unterstützen kann – im Client oder im Designer. Sie werden es nicht bereuen! Für Sie in diesem Webinar - Was Document Properties ist, welche Editionen es gibt und wo es in Notes und Domino Designer zu finden ist - Wie Sie nach einem beliebigen Feld suchen und es bearbeiten, Dokumente vergleichen oder alle Daten per CSV exportieren können - Suchen, Bearbeiten und auch Löschen von Profildokumenten - Welche Konfigurationseinstellungen verfügbar sind, um Funktionen anzupassen - Wie Ihre Endbenutzer davon profitieren - Sehen Sie alles in einer Live-Demo

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...

panagenda

ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...

FIDO Alliance

I'm excited to share my latest predictions on how AI, robotics, and other technological advancements will reshape industries in the coming years. The slides explore the exponential growth of computational power, the future of AI and robotics, and their profound impact on various sectors. Why this matters: The success of new products and investments hinges on precise timing and foresight into emerging categories. This deck equips founders, VCs, and industry leaders with insights to align future products with upcoming tech developments. These insights enhance the ability to forecast industry trends, improve market timing, and predict competitor actions. Highlights: ▪ Exponential Growth in Compute: How $1000 will soon buy the computational power of a human brain ▪ Scaling of AI Models: The journey towards beyond human-scale models and intelligent edge computing ▪ Transformative Technologies: From advanced robotics and brain interfaces to automated healthcare and beyond ▪ Future of Work: How automation will redefine jobs and economic structures by 2040 With so many predictions presented here, some will inevitably be wrong or mistimed, especially with potential external disruptions. For instance, a conflict in Taiwan could severely impact global semiconductor production, affecting compute costs and related advancements. Nonetheless, these slides are intended to guide intuition on future technological trends.

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl

Peter Udo Diehl

Ever caught yourself nodding along when someone mentions "delivering value" in Agile, but secretly wondering what the heck they actually mean? You're not alone! Join us for an eye-opening session where we'll strip away the buzzwords and dive into the heart of Agile—value delivery. But what is "value"? Is it a mythical unicorn in the world of software development, or is there more to this overused term? This isn't going to be a sit-and-get lecture. We're talking about a face-to-face, interactive meetup where YOU play a crucial role. Come along to: Define It: What does "value" really mean? We’ll build a definition that’s not just words, but a compass for your Agile journey. Contextualise It: Discover what value means specifically to you, your team, your company, and your industry. Because one size does not fit all. Deliver It: Share strategies and gather new ones for uncovering and delivering true value—no more shooting in the dark!

Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx

David Michel

Recently uploaded (20)

Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...

Microsoft CSP Briefing Pre-Engagement - Questionnaire

Integrating Telephony Systems with Salesforce: Insights and Considerations, B...

WebAssembly is Key to Better LLM Performance

IESVE for Early Stage Design and Planning

Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom

Overview of Hyperledger Foundation

Where to Learn More About FDO _ Richard at FIDO Alliance.pdf

Speed Wins: From Kafka to APIs in Minutes

A Business-Centric Approach to Design System Strategy

The Metaverse: Are We There Yet?

Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf

Oauth 2.0 Introduction and Flows with MuleSoft

Using IESVE for Room Loads Analysis - UK & Ireland

ECS 2024 Teams Premium - Pretty Secure

Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...

ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl

Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx

P99CONF — What We Need to Unlearn About Persistent Storage

1. Brought to you by What We Need to Unlearn About Persistent Storage Pavel Emelyanov Principal Engineer @ ScyllaDB

2. HDD vs SSD

3. Why HDD is hard to deal with ■ HDD has moving parts inside ● Each IOP is probably a seek ● Seek time can be milliseconds ■ Working with HDD in an eﬃcient way: try not to move the head ● Use sequential IO ● Use larger buffers (batch) ■ DB commitlog was designed with that in mind

4. Why SSD is cool ■ SSD has RAM-like storage inside ● Each IO is can be constant time ■ Working with SSD in an eﬃcient way: just do the IO ● Spoiler: not really

5. Is your disk fast or slow? ■ SSD is usually described by 4 “speeds” ● Throughput in MB/s ● IOPS in Hz (op/s) ● Both for read and write ■ The larger the “speed” numbers are – the better the disk should be

6. Now why my IO sucks? ■ SSD block overwrite problem ■ Internal caching ■ Internal parallelism ■ Bandwidth depends on buffer size ■ Mixed IO ■ Noisy neighbours (in clouds)

7. Overwriting

8. Internal structure ■ Read/Write is done in pages (e.g. 4k) ■ Erasure is done in blocks (e.g. 128 pages) ■ Overwrite is not possible ■ Disk controller has ● a mapping table to map IO offset to in-disk offset ● relocates pages in the background

9. IO sucks because ... ■ Disk is aged out ● Virtually sequential IO results in physically random one ● Background GC is taking place

10. How to make it suck faster? ■ Sequential IO with large buffers is back from the dead ■ Discard unused blocks ● Filesystems may do it for you

11. Burst vs Sustain

12. More on internal structure ■ Flash cells are prepended with faster cache ● Read-ahead ■ Parallel IO lanes ● Lare (N * page size) IO may be served by several chips in parallel ● Internal indirection may hide it

13. What’s measured in ads ■ Reported numbers can show burst performance ■ Sustained IO may be, and usually is, somewhat slower

14. How to live with it? ■ Get your disk’s sustained performance

15. IO size matters

16. Throughput vs IOPS ■ IOPS limit is the ability to process requests ● Measured with minimally possible buffers (usually a page-size) ■ Throughput is the ability to process data ● Measured with “large” buffers (~1MB and larger)

17. What if the buffer size is in between? ■ It depends on the disk ■ Some drop down to 70% of both bandwidth and IOPS peaks

18. What’s the optimal IO size? ■ Depends on the application ■ Less IO size – better latency ■ Larger IO size – better throughput, but it really scales

19. Write for real

20. Is my WRITE safe? ■ Write can be cached at many levels ● Application ● Linux page cache ● In-disk cache ■ Cache means faster but less reliable writes

21. Is my WRITE safe? (cont.) ■ There are different buzzwords that refer to writing for real ● O_DIRECT – prevent Linux from caching ● O_DSYNC – prevent disk from caching ● FUA – do write the data into energy-independent place ■ Not all disks handle O_DSYNC at the same speed as regular writes

22. How to write the data? ■ Check if the disk is O_DSYNC-friendly ● Most cloud disks are ■ Chose between speed and safety ● It may happen that losing last few seconds of writes is not critical

23. Read && Write

24. What’s really-really measured in ads ■ Bandwidth and IOPS of a pure IO ● Only read or only write ■ Mixed mode is incredibly worse ● Concurrency matters ● Disks often prefer writes over reads

25. What if I do read and write at the same time? ■ Not much ■ Hold on requests for better latencies

26. Brought to you by Pavel Emelyanov xemul@scylladb.com

P99CONF — What We Need to Unlearn About Persistent Storage

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to P99CONF — What We Need to Unlearn About Persistent Storage

Similar to P99CONF — What We Need to Unlearn About Persistent Storage (20)

More from ScyllaDB

More from ScyllaDB (20)

Recently uploaded

Recently uploaded (20)

P99CONF — What We Need to Unlearn About Persistent Storage