OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt AhrensMatthew Ahrens
Guest lecture at Brown University's Computer Science Operating Systems class, CS167, by Matt Ahrens, co-creator of ZFS. Introduction by professor Tom Doeppner. Recording, March 2017: https://youtu.be/uJGkyMxdNFE
Topics:
- Data structures and algorithms used by ZFS snapshots
- Overview of ZFS on-disk structure
- Data structures used for ZFS space allocation
- RAID-Z compared with traditional RAID-4/5/6
Class website: http://cs.brown.edu/courses/cs167/
Filesystem Showdown: What a Difference a Decade MakesPerforce
In the last 10 years, Ext4 has risen in prominence, ReiserFS has fallen to the wayside, ZFS has been ported to Linux, XFS keeps plugging along, and there's a new kid: Btrfs. NTFS has evolved, too. It's now 2016. How do these filesystems stack up against each other? Does it really make that much of a difference? We’ll show you the results of standard, consistent tests across platforms (Linux vs. Windows) and filesystems to see if the differences are worth choosing one over the other. For simplicity's sake, the tests are performed on identical hardware with out-of-the-box settings.
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt AhrensMatthew Ahrens
Guest lecture at Brown University's Computer Science Operating Systems class, CS167, by Matt Ahrens, co-creator of ZFS. Introduction by professor Tom Doeppner. Recording, March 2017: https://youtu.be/uJGkyMxdNFE
Topics:
- Data structures and algorithms used by ZFS snapshots
- Overview of ZFS on-disk structure
- Data structures used for ZFS space allocation
- RAID-Z compared with traditional RAID-4/5/6
Class website: http://cs.brown.edu/courses/cs167/
Filesystem Showdown: What a Difference a Decade MakesPerforce
In the last 10 years, Ext4 has risen in prominence, ReiserFS has fallen to the wayside, ZFS has been ported to Linux, XFS keeps plugging along, and there's a new kid: Btrfs. NTFS has evolved, too. It's now 2016. How do these filesystems stack up against each other? Does it really make that much of a difference? We’ll show you the results of standard, consistent tests across platforms (Linux vs. Windows) and filesystems to see if the differences are worth choosing one over the other. For simplicity's sake, the tests are performed on identical hardware with out-of-the-box settings.
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph EnterpriseRed_Hat_Storage
This session describes how to get the most out of OpenStack Cinder volumes on Ceph.
We’ll discuss:
Performance configuration, tuning, and workloads.
Performance test results of Red Hat Enterprise Linux OpenStack Platform 5, Red Hat Enterprise Linux OpenStack Platform 6, Red Hat Ceph Storage 1.2.3, and Firefly.
Anticipated improvements in performance for Red Hat Ceph Storage 1.3.
XPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, CitrixThe Linux Foundation
Storage systems continue to deliver better performance year after year. High performance solutions are now available off-the-shelf, allowing users to boost their servers with drives capable of achieving several GB/s worth of throughput per host. To fully utilise such devices, workloads with large queue depths are often necessary. In virtual environments, this translates into aggregate workloads coming from multiple virtual machines.
Having previously addressed the impact of low latency devices in virtualised platforms, we are now aiming at optimising aggregate workloads. We will discuss the existing memory grant technologies available in Xen and compare trade-offs and performance implications of each: grant mapping, persistent grants and grant copy. For the first time, we will present grant copy as an alternative and show measurements over 7 GB/s, maxing out a set of local SSDs.
DataStax: Extreme Cassandra Optimization: The SequelDataStax Academy
Al has been using Cassandra since version 0.6 and has spent the last few months doing little else but tune Cassandra clusters. In this talk, Al will show how to tune Cassandra for efficient operation using multiple views into system metrics, including OS stats, GC logs, JMX, and cassandra-stress.
This is to introduce the related components in SUSE Linux Enterprise High Availability Extension product to build High Available Storage (ha-lvm/drbd/iscsi/nfs, clvm, ocfs2, cluster-raid1).
Agenda:
The Linux kernel has multiple "tracers" built-in, with various degrees of support for aggregation, dynamic probes, parameter processing, filtering, histograms, and other features. Starting from the venerable ftrace, introduced in kernel 2.6, all the way through eBPF, which is still under development, there are many options to choose from when you need to statically instrument your software with probes, or diagnose issues in the field using the system's dynamic probes. Modern tools include SystemTap, Sysdig, ktap, perf, bcc, and others. In this talk, we will begin by reviewing the modern tracing landscape -- ftrace, perf_events, kprobes, uprobes, eBPF -- and what insight into system activity these tools can offer. Then, we will look at specific examples of using tracing tools for diagnostics: tracing a memory leak using low-overhead kmalloc/kfree instrumentation, diagnosing a CPU caching issue using perf stat, probing network and block I/O latency distributions under load, or merely snooping user activities by capturing terminal input and output.
Speaker:
Sasha is the CTO of Sela Group, a training and consulting company based in Israel that employs over 400 developers world-wide. Most of Sasha's work revolves around performance optimization, production debugging, and low-level system diagnostics, but he also dabbles in mobile application development on iOS and Android. Sasha is the author of two books and three Pluralsight courses, and a contributor to multiple open-source projects. He blogs at http://blog.sashag.net.
XPDS14: Efficient Interdomain Transmission of Performance Data - John Else, C...The Linux Foundation
As users demand greater scalability from Citrix XenServer, the transmission of performance data from guests via xenstore is increasingly becoming a bottleneck. Future use of service domains is likely to make this problem worse. A simple, efficient way of transmitting time-varying datasets between userspace components in different domains is required. This talk will propose a lock-free mechanism to allow interdomain reporting of performance data without relying on continuous xenstore usage, and describe how it fits into the XAPI toolstack.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
Corosync and Pacemaker
A computer cluster consists of a set of loosely connected or tightly connected computers that work together so that in many respects they can be viewed as a single system.
The components of a cluster are usually connected to each other through fast local area networks ("LAN"), with each node (computer used as a server) running its own instance of an operating system. Computer clusters emerged as a result of convergence of a number of computing trends including the availability of low cost microprocessors, high speed networks, and software for high performance distributed computing.
Clusters are usually deployed to improve performance and availability over that of a single computer, while typically being much more cost-effective than single computers of comparable speed or availability.
Computer clusters have a wide range of applicability and deployment, ranging from small business clusters with a handful of nodes to some of the fastest supercomputers in the world such as IBM's Sequoia
Scylla Summit 2022: Making Schema Changes Safe with RaftScyllaDB
ScyllaDB adopted Raft as a consensus protocol in order to dramatically improve our operational aspects as well as provide strong consistency to the end-user. This talk will explain how Raft behaves in Scylla Open Source 5.0 and introduce the first end-user visible major improvement: schema changes. Learn how cluster configuration resides in Raft, providing consistent cluster assembly and configuration management. This makes bootstrapping safer and provides reliable disaster recovery when you lose the majority of the cluster.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph EnterpriseRed_Hat_Storage
This session describes how to get the most out of OpenStack Cinder volumes on Ceph.
We’ll discuss:
Performance configuration, tuning, and workloads.
Performance test results of Red Hat Enterprise Linux OpenStack Platform 5, Red Hat Enterprise Linux OpenStack Platform 6, Red Hat Ceph Storage 1.2.3, and Firefly.
Anticipated improvements in performance for Red Hat Ceph Storage 1.3.
XPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, CitrixThe Linux Foundation
Storage systems continue to deliver better performance year after year. High performance solutions are now available off-the-shelf, allowing users to boost their servers with drives capable of achieving several GB/s worth of throughput per host. To fully utilise such devices, workloads with large queue depths are often necessary. In virtual environments, this translates into aggregate workloads coming from multiple virtual machines.
Having previously addressed the impact of low latency devices in virtualised platforms, we are now aiming at optimising aggregate workloads. We will discuss the existing memory grant technologies available in Xen and compare trade-offs and performance implications of each: grant mapping, persistent grants and grant copy. For the first time, we will present grant copy as an alternative and show measurements over 7 GB/s, maxing out a set of local SSDs.
DataStax: Extreme Cassandra Optimization: The SequelDataStax Academy
Al has been using Cassandra since version 0.6 and has spent the last few months doing little else but tune Cassandra clusters. In this talk, Al will show how to tune Cassandra for efficient operation using multiple views into system metrics, including OS stats, GC logs, JMX, and cassandra-stress.
This is to introduce the related components in SUSE Linux Enterprise High Availability Extension product to build High Available Storage (ha-lvm/drbd/iscsi/nfs, clvm, ocfs2, cluster-raid1).
Agenda:
The Linux kernel has multiple "tracers" built-in, with various degrees of support for aggregation, dynamic probes, parameter processing, filtering, histograms, and other features. Starting from the venerable ftrace, introduced in kernel 2.6, all the way through eBPF, which is still under development, there are many options to choose from when you need to statically instrument your software with probes, or diagnose issues in the field using the system's dynamic probes. Modern tools include SystemTap, Sysdig, ktap, perf, bcc, and others. In this talk, we will begin by reviewing the modern tracing landscape -- ftrace, perf_events, kprobes, uprobes, eBPF -- and what insight into system activity these tools can offer. Then, we will look at specific examples of using tracing tools for diagnostics: tracing a memory leak using low-overhead kmalloc/kfree instrumentation, diagnosing a CPU caching issue using perf stat, probing network and block I/O latency distributions under load, or merely snooping user activities by capturing terminal input and output.
Speaker:
Sasha is the CTO of Sela Group, a training and consulting company based in Israel that employs over 400 developers world-wide. Most of Sasha's work revolves around performance optimization, production debugging, and low-level system diagnostics, but he also dabbles in mobile application development on iOS and Android. Sasha is the author of two books and three Pluralsight courses, and a contributor to multiple open-source projects. He blogs at http://blog.sashag.net.
XPDS14: Efficient Interdomain Transmission of Performance Data - John Else, C...The Linux Foundation
As users demand greater scalability from Citrix XenServer, the transmission of performance data from guests via xenstore is increasingly becoming a bottleneck. Future use of service domains is likely to make this problem worse. A simple, efficient way of transmitting time-varying datasets between userspace components in different domains is required. This talk will propose a lock-free mechanism to allow interdomain reporting of performance data without relying on continuous xenstore usage, and describe how it fits into the XAPI toolstack.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
Corosync and Pacemaker
A computer cluster consists of a set of loosely connected or tightly connected computers that work together so that in many respects they can be viewed as a single system.
The components of a cluster are usually connected to each other through fast local area networks ("LAN"), with each node (computer used as a server) running its own instance of an operating system. Computer clusters emerged as a result of convergence of a number of computing trends including the availability of low cost microprocessors, high speed networks, and software for high performance distributed computing.
Clusters are usually deployed to improve performance and availability over that of a single computer, while typically being much more cost-effective than single computers of comparable speed or availability.
Computer clusters have a wide range of applicability and deployment, ranging from small business clusters with a handful of nodes to some of the fastest supercomputers in the world such as IBM's Sequoia
Scylla Summit 2022: Making Schema Changes Safe with RaftScyllaDB
ScyllaDB adopted Raft as a consensus protocol in order to dramatically improve our operational aspects as well as provide strong consistency to the end-user. This talk will explain how Raft behaves in Scylla Open Source 5.0 and introduce the first end-user visible major improvement: schema changes. Learn how cluster configuration resides in Raft, providing consistent cluster assembly and configuration management. This makes bootstrapping safer and provides reliable disaster recovery when you lose the majority of the cluster.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Talk for PerconaLive 2016 by Brendan Gregg. Video: https://www.youtube.com/watch?v=CbmEDXq7es0 . "Systems performance provides a different perspective for analysis and tuning, and can help you find performance wins for your databases, applications, and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes six important areas of Linux systems performance in 50 minutes: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events), static tracing (tracepoints), and dynamic tracing (kprobes, uprobes), and much advice about what is and isn't important to learn. This talk is aimed at everyone: DBAs, developers, operations, etc, and in any environment running Linux, bare-metal or the cloud."
Presented at LISA18: https://www.usenix.org/conference/lisa18/presentation/babrou
This is a technical dive into how we used eBPF to solve real-world issues uncovered during an innocent OS upgrade. We'll see how we debugged 10x CPU increase in Kafka after Debian upgrade and what lessons we learned. We'll get from high-level effects like increased CPU to flamegraphs showing us where the problem lies to tracing timers and functions calls in the Linux kernel.
The focus is on tools what operational engineers can use to debug performance issues in production. This particular issue happened at Cloudflare on a Kafka cluster doing 100Gbps of ingress and many multiple of that egress.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.Rakib Hossain
We present a program that implemented to execute Adaptive merge sort algorithm in parallel on a GPU based system. Parallel implementation is used to get better performance than serial implementation in runtime perspective. Parallel implementation executes independent executable operation in parallel using large number of cores in GPU based system. Results from a parallel implementation of the algorithm is given and compared with its serial implementation on run time basis. The parallel version is implemented with CUDA platform in a system based on NVIDIA GPU (GTX 650)
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTanel Poder
From Tanel Poder's Troubleshooting Complex Performance Issues series - an example of Oracle SEG$ internal segment contention due to some direct path insert activity.
One of the great challenges of of monitoring any large cluster is how much data to collect and how often to collect it. Those responsible for managing the cloud infrastructure want to see everything collected centrally which places limits on how much and how often. Developers on the other hand want to see as much detail as they can at as high a frequency as reasonable without impacting the overall cloud performance.
To address what seems to be conflicting requirements, we've chosen a hybrid model at HP. Like many others, we have a centralized monitoring system that records a set of key system metrics for all servers at the granularity of 1 minute, but at the same time we do fine-grained local monitoring on each server of hundreds of metrics every second so when there are problems that need more details than are available centrally, one can go to the servers in question to see exactly what was going on at any specific time.
The tool of choice for this fine-grained monitoring is the open source tool collectl, which additionally has an extensible api. It is through this api that we've developed a swift monitoring capability to not only capture the number of gets, put, etc every second, but using collectl's colmux utility, we can also display these in a top-like formact to see exactly what all the object and/or proxy servers are doing in real-time.
We've also developer a second cability that allows one to see what the Virtual Machines are doing on each compute node in terms of CPU, disk and network traffic. This data can also be displayed in real-time with colmux.
This talk will briefly introduce the audience to collectl's capabilities but more importantly show how it's used to augment any existing centralized monitoring infrastructure.
Speakers
Mark Seger
This presentation was given to the Dublin Node (JS) Community on May 29th 2014.
Presented by: Chris Lawless, Kevin Yu Wei Xia, Fergal Carroll @phergalkarl, Ciarán Ó hUallacháin, and Aman Kohli @akohli
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
2. ZFS Was Slow, Is Faster
Adam Leventhal, CTO Delphix
@ahl
3. My Version of ZFS History
• 2001-2005 The 1st age of ZFS: building the behemoth
– Stability, reliability, features
• 2006-2008 The 2nd age of ZFS: appliance model and open source
– Completing the picture; making it work as advertised; still more features
• 2008-2010 The 3rd age of ZFS: trial by fire
– Stability in the face of real workloads
– Performance in the face of real workloads
4. The 1st Age of OpenZFS
• All the stuff Matt talked about, yes:
– Many platforms
– Many companies
– Many contributors
• Performance analysis on real and varied customer workloads
5. A note about the data
•
•
•
•
•
The data you are about to see is real
The names have been changed to protect the innocent (and guilty)
It was mostly collected with DTrace
We used some other tools as well: lockstat, mpstat
You might wish I had more / different data – I do too
15. ZFS Write Throttle
•
•
•
•
•
Keep transactions to a reasonable size – limit outstanding data
Target a fixed time (1-5 seconds on most systems)
Figure out how much we can write in that time
Don’t accept more than that amount of data in a txg
When we get to 7/8ths of the limit, insert a 10ms delay
16. ZFS Write Throttle
•
•
•
•
•
Keep transactions to a reasonable size – limit outstanding data
Target a fixed time (1-5 seconds on most systems)
Figure out how much we can write in that time
Don’t accept more than that amount of data in a txg
When we get to 7/8ths of the limit, insert a 10ms delay
WTF!?
26. IO Problems
• The choice of IO queue depth was crucial
– Where did the default of 10 come from?!
– Balance between latency and throughput
• Shared IO queue for reads and writes
– Maybe this makes sense for disks… maybe…
• The wrong queue depth caused massive queuing within ZFS
– “What do you mean my SAN is slow? It looks great to me!”
27. New IO Scheduler
•
•
•
•
Choose a limit on the “dirty” (modified) data on the system
As more accumulates, schedule more concurrent IOs
Limits per IO type
If we still can’t keep up, start to limit the rate of incoming data
• Chose defaults as close to the old behavior as possible
• Much more straightforward to measure and tune
32. Name that lock!
> 0xffffff0d4aaa4818::whatis
ffffff0d4aaa4818 is ffffff0d4aaa47fc+20, allocated from taskq_cache
> 0xffffff0d4aaa4818-20::taskq
ADDR
NAME
ACT/THDS Q'ED MAXQ INST
ffffff0d4aaa47fc zio_write_issue
0/ 24 0 26977 -
33. Lock Breakup
•
•
•
•
Broke up the taskq lock for write_issue
Added multiple taskqs, randomly assigned
Recently hit a similar problem for read_interrupt
Same solution
• Worth investigating taskq stats
• A dynamic taskq might be an interesting experiment
• Other lock contention issues resolved
• Still more need additional attention
40. What about all space_map_*() functions?
space_map_truncate
33 times
6ms ( 0%)
space_map_load_wait
1721 times
7ms ( 0%)
space_map_sync
3766 times
210ms ( 0%)
space_map_unload
135 times
1268ms ( 0%)
space_map_free
21694 times
4280ms ( 1%)
space_map_vacate
3643 times
45891ms (12%)
space_map_seg_compare
13124822 times
55423ms (14%)
space_map_add
580809 times
79868ms (21%)
space_map_remove
514181 times
81682ms (21%)
space_map_walk
2081 times
120962ms (32%)
spa_sync
1 times
374818ms (100%)
42. Spacemaps and Metaslabs
• Two things going on here:
– 30,000+ segments per spacemap
– Building the perfect spacemap – close enough would work
– Doing a bunch of work that we can clever our way out of
• Still much to be done:
– Why 200 metaslabs per LUN?
– Allocations can still be very painful
43. The Next Age of OpenZFS
• General purpose and purpose-built OpenZFS products
• Used for varied and demanding uses
• Data-driven discoveries
–
–
–
–
Write throttle needed rethinking
Metaslabs / spacemaps / allocation is fertile ground
Performance nose-dives around 85% of pool capacity
Lock contention impacts high-performance workloads
• What’s next?
–
–
–
–
More workloads; more data!
Feedback on recent enhancements
Connect allocation / scrub to the new IO scheduler
Consider data-driven, adaptive algorithms within OpenZFS