This document discusses shingled magnetic recording (SMR) disks, which allow for higher storage densities by overlapping written tracks. SMR disks can achieve 2-3 times the density of conventional disks but only support random reads and sequential writes. The document explores two strategies for utilizing SMR disks: 1) masking their operational differences behind a translation layer or 2) using a specialized file system optimized for their characteristics. Key challenges include supporting random writes, managing bands of tracks, and reserving space for random access versus storing in non-volatile RAM. Workload analysis is needed to determine suitability for general usage.
This session will cover performance-related developments in Red Hat Gluster Storage 3 and share best practices for testing, sizing, configuration, and tuning.
Join us to learn about:
Current features in Red Hat Gluster Storage, including 3-way replication, JBOD support, and thin-provisioning.
Features that are in development, including network file system (NFS) support with Ganesha, erasure coding, and cache tiering.
New performance enhancements related to the area of remote directory memory access (RDMA), small-file performance, FUSE caching, and solid state disks (SSD) readiness.
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...DataStax Academy
Netflix has updated and added new tools and benchmarks for Cassandra in the last year. In this talk we will cover the latest additions and recipes for the Astyanax Java client, updates to Priam to support Cassandra 1.2 Vnodes, plus newly released and upcoming tools that are all part of the NetflixOSS platform. Following on from the Cassandra on SSD on AWS benchmark that was run live during the 2012 Summit, we've been benchmarking a large write intensive multi-region cluster to see how far we can push it. Cassandra is the data storage and global replication foundation for the Cloud Native architecture that runs Netflix streaming for 36 Million users. Netflix is also offering a Cloud Prize for open source contributions to NetflixOSS, and there are ten categories including Best Datastore Integration and Best Contribution to Performance Improvements, with $10K cash and $5K of AWS credits for each winner. We'd like to pay you to use our free software!
This session will cover performance-related developments in Red Hat Gluster Storage 3 and share best practices for testing, sizing, configuration, and tuning.
Join us to learn about:
Current features in Red Hat Gluster Storage, including 3-way replication, JBOD support, and thin-provisioning.
Features that are in development, including network file system (NFS) support with Ganesha, erasure coding, and cache tiering.
New performance enhancements related to the area of remote directory memory access (RDMA), small-file performance, FUSE caching, and solid state disks (SSD) readiness.
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...DataStax Academy
Netflix has updated and added new tools and benchmarks for Cassandra in the last year. In this talk we will cover the latest additions and recipes for the Astyanax Java client, updates to Priam to support Cassandra 1.2 Vnodes, plus newly released and upcoming tools that are all part of the NetflixOSS platform. Following on from the Cassandra on SSD on AWS benchmark that was run live during the 2012 Summit, we've been benchmarking a large write intensive multi-region cluster to see how far we can push it. Cassandra is the data storage and global replication foundation for the Cloud Native architecture that runs Netflix streaming for 36 Million users. Netflix is also offering a Cloud Prize for open source contributions to NetflixOSS, and there are ten categories including Best Datastore Integration and Best Contribution to Performance Improvements, with $10K cash and $5K of AWS credits for each winner. We'd like to pay you to use our free software!
Werner Vogels, the CTO of Amazon.com, mentioned in one of his papers that "data inconsistency in large-scale reliable distributed systems has to be tolerated" in order to obtain the desired performance and availability. In this talk I'll present you how we equip Cassandra with a primary-backup atomic broadcast of a write-ahead log. This way, we achieved to make Apache Cassandra a key-value store that combines strong consistency with high performance and high availability. Finally, we will discuss our compaction scheduling which by far improves throughput by up to 40% in write-intensive workloads.
Replication, Durability, and Disaster RecoverySteven Francia
This session introduces the basic components of high availability before going into a deep dive on MongoDB replication. We'll explore some of the advanced capabilities with MongoDB replication and best practices to ensure data durability and redundancy. We'll also look at various deployment scenarios and disaster recovery configurations.
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukAndrii Vozniuk
My presentation for the Cloud Data Management course at EPFL by Anastasia Ailamaki and Christoph Koch.
It is mainly based on the following two papers:
1) S. Ghemawat, H. Gobioff, S. Leung. The Google File System. SOSP, 2003
2) J. Dean, S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI, 2004
Cluster based storage - Nasd and Google file system - advanced operating syst...Antonio Cesarano
This is a seminar at the Course of Advanced Operating Systems at University of Salerno which shows the first cluster based storage technology (NASD) and its evolution till the development of the new Google File System.
Josh Berkus
You've heard that PostgreSQL is the highest-performance transactional open source database, but you're not seeing it on YOUR server. In fact, your PostgreSQL application is kind of poky. What should you do? While doing advanced performance engineering for really high-end systems takes years to learn, you can learn the basics to solve performance issues for 80% of PostgreSQL installations in less than an hour. In this session, you will learn: -- The parts of database application performance -- The performance setup procedure -- Basic troubleshooting tools -- The 13 postgresql.conf settings you need to know -- Where to look for more information.
Werner Vogels, the CTO of Amazon.com, mentioned in one of his papers that "data inconsistency in large-scale reliable distributed systems has to be tolerated" in order to obtain the desired performance and availability. In this talk I'll present you how we equip Cassandra with a primary-backup atomic broadcast of a write-ahead log. This way, we achieved to make Apache Cassandra a key-value store that combines strong consistency with high performance and high availability. Finally, we will discuss our compaction scheduling which by far improves throughput by up to 40% in write-intensive workloads.
Replication, Durability, and Disaster RecoverySteven Francia
This session introduces the basic components of high availability before going into a deep dive on MongoDB replication. We'll explore some of the advanced capabilities with MongoDB replication and best practices to ensure data durability and redundancy. We'll also look at various deployment scenarios and disaster recovery configurations.
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukAndrii Vozniuk
My presentation for the Cloud Data Management course at EPFL by Anastasia Ailamaki and Christoph Koch.
It is mainly based on the following two papers:
1) S. Ghemawat, H. Gobioff, S. Leung. The Google File System. SOSP, 2003
2) J. Dean, S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI, 2004
Cluster based storage - Nasd and Google file system - advanced operating syst...Antonio Cesarano
This is a seminar at the Course of Advanced Operating Systems at University of Salerno which shows the first cluster based storage technology (NASD) and its evolution till the development of the new Google File System.
Josh Berkus
You've heard that PostgreSQL is the highest-performance transactional open source database, but you're not seeing it on YOUR server. In fact, your PostgreSQL application is kind of poky. What should you do? While doing advanced performance engineering for really high-end systems takes years to learn, you can learn the basics to solve performance issues for 80% of PostgreSQL installations in less than an hour. In this session, you will learn: -- The parts of database application performance -- The performance setup procedure -- Basic troubleshooting tools -- The 13 postgresql.conf settings you need to know -- Where to look for more information.
Kafka on ZFS: Better Living Through Filesystems confluent
(Hugh O'Brien, Jet.com) Kafka Summit SF 2018
You’re doing disk IO wrong, let ZFS show you the way. ZFS on Linux is now stable. Say goodbye to JBOD, to directories in your reassignment plans, to unevenly used disks. Instead, have 8K Cloud IOPS for $25, SSD speed reads on spinning disks, in-kernel LZ4 compression and the smartest page cache on the planet. (Fear compactions no more!)
Learn how Jet’s Kafka clusters squeeze every drop of disk performance out of Azure, all completely transparent to Kafka.
-Striping cheap disks to maximize instance IOPS
-Block compression to reduce disk usage by ~80% (JSON data)
-Instance SSD as the secondary read cache (storing compressed data), eliminating >99% of disk reads and safe across host redeployments
-Upcoming features: Compressed blocks in memory, potentially quadrupling your page cache (RAM) for free
We’ll cover:
-Basic Principles
-Adapting ZFS for cloud instances (gotchas)
-Performance tuning for Kafka
-Benchmarks
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...peknap
Reducing memory usage is well covered in the history of this conference, yet new tricks still do exist. When optimizing memory footprint for an home gateway device, the author found some unexpected places where small changes can save valuable amount of DRAM or Flash space. This talk will visit different areas including - Kernel: fragmentation threshold, page frame reclamation task and atomic memory. Application level: Memory inefficient shared libraries due to ABI compliance and dynamic loading. Toolchain: Tuning malloc allocator parameters and compiler options. System level: General kernel might be more memory efficient than MMU-less uClinux, and preventing lock up when the system is on the brink of running out of memory.
Overview of Mass Storage Structure
Disk Structure
Disk Attachment
Disk Scheduling
Disk Management
Swap-Space Management
RAID Structure
Disk Attachment
Stable-Storage Implementation
Tertiary Storage Devices
Operating System Issues
Performance Issues
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
2. The Big Data Storage
Storage: Magnetic disks
High storage density
Current: 400GB/in2 - 550GB/in2
30-50% increase per year.
It’s reaching its physical limit…
“Superparamagnetic Limit”
Predicted limit: around 1TB/in2
5. Conventionally
Written Track
Non-overlap
Track width w (e.g. 25nm)
Guard gaps between tracks (e.g. g =
5nm).
Bottleneck is the writing track width.
Current read heads can work on much
narrower track.
But it is hard to write narrower track.
6. Shingled Disk:
Overlap Tracks
Wilder Track written. (e.g. w = 70nm).
Shingled writing overlaps tracks.
The remaining residual track could be
much narrower. (e.g. r = 10nm).
7. Characteristics
Higher Density without significant hardware change.
2-3 times of the conventional disk density.
Support Random Read / Sequential Write
A single write will destroy the next k tracks
Typically, k = 4~8
Can we do better than a “tape with random read
support”?
8. Two High-Level Strategies
Mask the operational difference of a Shingled Disk.
Drop-in replacement for current disks.
Uses the standard block interface.
Specialized file system with no/little hardware mask.
More flexibility in the data layout and block management.
Increased knowledge at file system layer.
9. Strategy One: Masking the
Operational Difference
Synergy with SSD: Slow erasure of block in SSD
SSD: Flash Translation Layer (FTL)
Shingled Disk: Shingled Translation Layer (STL)
Translate from Virtual Block Address to Logical Block
Address on disk.
How to perform random write.
One extreme: Read-modify-write.
Another extreme: Remap the physical location of written
data.
Benefit
No change for user and system.
“Drop-in” replacement for current system.
10. Strategy One: Masking the
Operational Difference
Drawback
Experience with SSDs indicates the performance will be
hard to predict.
Reverse engineering on SSD to achieve higher level goals.
Sophisticated STL could be expensive.
Data stored in Continuous Virtual Block Address could be
far away on disk.
Database table with frequent edits.
Concurrent downloads of movies.
Might use large NVRAM (as cache) to mitigate the problem.
11. Virtual Block Address
Translation
Need to quickly translate Virtual Block Address to
Logical Block Address.
Translation Table could be very large.
Capacity 2T, each entry 8 bytes.
Block Size 4K: Translation Table 4GB.
Block Size 512 bytes: Translation Table 32GB.
Some B+-Tree type structure.
12. Strategy Two:
Specialized System
Simple Shingled Translation Layer.
Random Update: Read-Modify-Write.
TRIM Command: Tell hardware that overwriting
subsequent tracks is fine.
Support to format some part of the disk unshingled.
More Sophisticated System Software
Avoid writing to the middle of the band.
Conceptualize writing as appending to a log.
Perform necessary data remapping and garbage
collection.
14. Band Abstraction
Store the bulk of data in band.
A collection of b-contiguous tracks.
A buffer of k tracks at the end of each band.
Bands are not interfered with each other
More flexible.
15. Proportional Capacity Loss
c = 1 – b/(b+k): proportional capacity loss
• k = 5 and want to
control c < 0.1
• b > 45
• Each band have
67.5MB
• Reasonable for
modern LFS.
16. Band Usage
1. Only writes complete bands.
Each band contains a segment of Log structured File System.
Assumes data is buffering in NVRAM.
2. Only appends to bands.
Less efficient.
3. Circular Log inside each band
Consume data from head.
Append data to the tail.
Require additional k track gap between head and tail.
4. Flexible band size
Neighboring bands could be joined.
Not suitable for a general purpose SWD: Just for completeness.
17.
18. Reserved Space for
Random Update
Option 1: NVRAM.
Option 2: Random Access Zone (RAZ)
Every track is followed by k unused tracks.
Density of RAZ is lower than current disk.
19. How Large Random Access
Zone Can we Have?
Assume without RAZ, the capacity of shingled disk is
2.3 times of conventional disk.
• If we want to guarantee
L = 2 times of the
conventional disk.
• k = 5: 3.75% of
total storage
capacity.
20. Trade offs for Two Options
Reserved Space for Random Access
Option 1: NVRAM
Faster
More Expensive: cost 10 times to RAZ
Option 2: Random Access Zone (RAZ)
Use some part of the disk for Random Access Zone.
Cheaper but slower.
Trade-offs would be interesting.
21. Usage of NVRAM
1. Buffering data for writing bands.
Be careful about the limited number of write-erase cycles
of flash memory.
2. Use NVRAM to store metadata.
Metadata tends to have a higher amount of activity.
In Write Anywhere File System, NVRAM could be used
to maintain the log of file system activities.
3. Store recently created objects.
Temporal locality: a block/object created long long ago is
less likely to be updated.
If data is first written to NVRAM, we can also have better
placement of data on disk.
22. Number of Logs
Here Log-structured File System is assumed to be
used.
What’s the benefit to have more than a single log?
1. Separation between metadata and data.
E.g. Access Time.
2. Allocate files for more efficient read access later.
E.g. Downloads several movies at the same time.
If only one log, all the movie objects will be interspersed.
Inefficient for read.
24. Workloads Evaluation
Rate of Block Updates
If few blocks are updated frequently.
Less need for Random Access Zone / NVRAM.
Shingled Disk is more usable to replace conventional disk.
Evaluated Workloads
1. General purpose personal usage for 1 month
2. Specialized workload: Video Edit for 3 hours.
3. Specialized workload: Music Library Management.
Negligible block update.
Not surprised.
25.
26.
27.
28.
29.
30. Some Points From Workload
1. Identifying hot blocks is important since the volume of
hot blocks is small enough to be held in the Random
Access Zone / NVRAM.
2. Larger block sizes reduces the accuracy of identifying
hot blocks. But it’s not that significant.
3. File system distinguish between metadata from user
data would be helpful.
32. Shingled Disk Arrays
Can the shingled disk used in a server environment?
Probably part of the disk array.
It could have writing originating from different sources.
Two Impacts in Workload
Data Striping.
Workload Interleaving.
Replay workloads against a simulated drive.
Log-structured Writing Scheme to perform in-band update.
Take-away point: 85% of all disk blocks written were never updated within the hour.93% of all disk blocks written were never updated within a day.Data is first stored in NVRAM / RAZ until it reaches a certain age.
one—day trace.With larger blocks, it’s more likely that larger percentage of blocks will be updated..Trade off: Larger block reduce each block overhead. But has higher update rate.Negligible for blocks updated more than 2-4 times.
A shingled-disk aware filesystem might be more helpful to optimize this situation.
Stripe: Stripe, and randomly interleaving the four workloads with burst size.Pure: No stripe, no interleaving of different data sources: i.e. data source 1, data source 2…Dedicated: Each disk is dedicated to an individual source workload.Relocating disk bans.Unrelated data being written in adjacent positions that increases the likelihood of an update in the band.