SlideShare a Scribd company logo
Shingled Disk
The Big Data Storage
 Storage: Magnetic disks
   High storage density
   Current: 400GB/in2 - 550GB/in2
   30-50% increase per year.
 It’s reaching its physical limit…
   “Superparamagnetic Limit”
   Predicted limit: around 1TB/in2
Review: Hard disk
Track
Conventionally
                Written Track
 Non-overlap
 Track width w (e.g. 25nm)
 Guard gaps between tracks (e.g. g =
  5nm).

 Bottleneck is the writing track width.
   Current read heads can work on much
    narrower track.
   But it is hard to write narrower track.
Shingled Disk:
              Overlap Tracks

 Wilder Track written. (e.g. w = 70nm).
 Shingled writing overlaps tracks.
 The remaining residual track could be
  much narrower. (e.g. r = 10nm).
Characteristics
 Higher Density without significant hardware change.
   2-3 times of the conventional disk density.
 Support Random Read / Sequential Write
   A single write will destroy the next k tracks
   Typically, k = 4~8
 Can we do better than a “tape with random read
  support”?
Two High-Level Strategies
 Mask the operational difference of a Shingled Disk.
   Drop-in replacement for current disks.
   Uses the standard block interface.
 Specialized file system with no/little hardware mask.
   More flexibility in the data layout and block management.
   Increased knowledge at file system layer.
Strategy One: Masking the
  Operational Difference
 Synergy with SSD: Slow erasure of block in SSD
   SSD: Flash Translation Layer (FTL)
   Shingled Disk: Shingled Translation Layer (STL)
   Translate from Virtual Block Address to Logical Block
    Address on disk.
 How to perform random write.
   One extreme: Read-modify-write.
   Another extreme: Remap the physical location of written
    data.
 Benefit
   No change for user and system.
   “Drop-in” replacement for current system.
Strategy One: Masking the
     Operational Difference
 Drawback
   Experience with SSDs indicates the performance will be
    hard to predict.
     Reverse engineering on SSD to achieve higher level goals.
   Sophisticated STL could be expensive.
   Data stored in Continuous Virtual Block Address could be
    far away on disk.
     Database table with frequent edits.
     Concurrent downloads of movies.
     Might use large NVRAM (as cache) to mitigate the problem.
Virtual Block Address
             Translation
 Need to quickly translate Virtual Block Address to
  Logical Block Address.

 Translation Table could be very large.
   Capacity 2T, each entry 8 bytes.
   Block Size 4K: Translation Table 4GB.
   Block Size 512 bytes: Translation Table 32GB.
 Some B+-Tree type structure.
Strategy Two:
          Specialized System
 Simple Shingled Translation Layer.
   Random Update: Read-Modify-Write.
   TRIM Command: Tell hardware that overwriting
    subsequent tracks is fine.
   Support to format some part of the disk unshingled.
 More Sophisticated System Software
   Avoid writing to the middle of the band.
   Conceptualize writing as appending to a log.
   Perform necessary data remapping and garbage
    collection.
Design Issues for
Shingled Write Disk
Band Abstraction
 Store the bulk of data in band.
   A collection of b-contiguous tracks.
   A buffer of k tracks at the end of each band.
   Bands are not interfered with each other
      More flexible.
Proportional Capacity Loss
 c = 1 – b/(b+k): proportional capacity loss


                                          • k = 5 and want to
                                            control c < 0.1
                                              • b > 45
                                              • Each band have
                                                 67.5MB
                                              • Reasonable for
                                                 modern LFS.
Band Usage
1. Only writes complete bands.
   Each band contains a segment of Log structured File System.
   Assumes data is buffering in NVRAM.
2. Only appends to bands.
   Less efficient.
3. Circular Log inside each band
   Consume data from head.
   Append data to the tail.
   Require additional k track gap between head and tail.
4. Flexible band size
   Neighboring bands could be joined.
   Not suitable for a general purpose SWD: Just for completeness.
Reserved Space for
          Random Update
 Option 1: NVRAM.
 Option 2: Random Access Zone (RAZ)
   Every track is followed by k unused tracks.
   Density of RAZ is lower than current disk.
How Large Random Access
     Zone Can we Have?
 Assume without RAZ, the capacity of shingled disk is
  2.3 times of conventional disk.
                                        • If we want to guarantee
                                          L = 2 times of the
                                          conventional disk.
                                             • k = 5: 3.75% of
                                               total storage
                                               capacity.
Trade offs for Two Options
 Reserved Space for Random Access
   Option 1: NVRAM
     Faster
     More Expensive: cost 10 times to RAZ
   Option 2: Random Access Zone (RAZ)
     Use some part of the disk for Random Access Zone.
     Cheaper but slower.
   Trade-offs would be interesting.
Usage of NVRAM
1. Buffering data for writing bands.
   Be careful about the limited number of write-erase cycles
      of flash memory.

2. Use NVRAM to store metadata.
   Metadata tends to have a higher amount of activity.
   In Write Anywhere File System, NVRAM could be used
      to maintain the log of file system activities.

3. Store recently created objects.
   Temporal locality: a block/object created long long ago is
      less likely to be updated.
     If data is first written to NVRAM, we can also have better
      placement of data on disk.
Number of Logs
 Here Log-structured File System is assumed to be
  used.

 What’s the benefit to have more than a single log?
  1. Separation between metadata and data.
           E.g. Access Time.
  2. Allocate files for more efficient read access later.
           E.g. Downloads several movies at the same time.
             If only one log, all the movie objects will be interspersed.
             Inefficient for read.
Workload for General
Purpose Personal Usage
Workloads Evaluation
 Rate of Block Updates
   If few blocks are updated frequently.
     Less need for Random Access Zone / NVRAM.
     Shingled Disk is more usable to replace conventional disk.

 Evaluated Workloads
  1. General purpose personal usage for 1 month
  2. Specialized workload: Video Edit for 3 hours.
  3. Specialized workload: Music Library Management.
       Negligible block update.
       Not surprised.
Some Points From Workload
1. Identifying hot blocks is important since the volume of
   hot blocks is small enough to be held in the Random
   Access Zone / NVRAM.

2. Larger block sizes reduces the accuracy of identifying
   hot blocks. But it’s not that significant.

3. File system distinguish between metadata from user
   data would be helpful.
Workload for Disk
     Arrays
Shingled Disk Arrays
 Can the shingled disk used in a server environment?
   Probably part of the disk array.
   It could have writing originating from different sources.

 Two Impacts in Workload
   Data Striping.
   Workload Interleaving.

 Replay workloads against a simulated drive.
   Log-structured Writing Scheme to perform in-band update.
Logical Arrangement of
        Blocks
Shignled disk

More Related Content

What's hot

Memory Bandwidth QoS
Memory Bandwidth QoSMemory Bandwidth QoS
Memory Bandwidth QoS
Rohit Jnagal
 
Cassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and LimitationsCassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and Limitations
Panagiotis Papadopoulos
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster Recovery
Steven Francia
 
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Gluster.org
 
Google File Systems
Google File SystemsGoogle File Systems
Google File SystemsAzeem Mumtaz
 
Userspace Linux I/O
Userspace Linux I/O Userspace Linux I/O
Userspace Linux I/O
Garima Kapoor
 
Introduction to DRBD
Introduction to DRBDIntroduction to DRBD
Introduction to DRBD
dawnlua
 
Anatomy of file write in hadoop
Anatomy of file write in hadoopAnatomy of file write in hadoop
Anatomy of file write in hadoop
Rajesh Ananda Kumar
 
State of Gluster Performance
State of Gluster PerformanceState of Gluster Performance
State of Gluster Performance
Gluster.org
 
Intuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin StopfordIntuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin Stopford
JAXLondon_Conference
 
Gfs google-file-system-13331
Gfs google-file-system-13331Gfs google-file-system-13331
Gfs google-file-system-13331
Fengchang Xie
 
Ch23
Ch23Ch23
Distributed replicated block device
Distributed replicated block deviceDistributed replicated block device
Distributed replicated block device
Chanaka Lasantha
 
Task migration using CRIU
Task migration using CRIUTask migration using CRIU
Task migration using CRIU
Rohit Jnagal
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Andrii Vozniuk
 
Cgroup resource mgmt_v1
Cgroup resource mgmt_v1Cgroup resource mgmt_v1
Cgroup resource mgmt_v1sprdd
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfsNAVER D2
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...
Antonio Cesarano
 
Google File System
Google File SystemGoogle File System
Google File System
DreamJobs1
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
Command Prompt., Inc
 

What's hot (20)

Memory Bandwidth QoS
Memory Bandwidth QoSMemory Bandwidth QoS
Memory Bandwidth QoS
 
Cassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and LimitationsCassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and Limitations
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster Recovery
 
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
 
Google File Systems
Google File SystemsGoogle File Systems
Google File Systems
 
Userspace Linux I/O
Userspace Linux I/O Userspace Linux I/O
Userspace Linux I/O
 
Introduction to DRBD
Introduction to DRBDIntroduction to DRBD
Introduction to DRBD
 
Anatomy of file write in hadoop
Anatomy of file write in hadoopAnatomy of file write in hadoop
Anatomy of file write in hadoop
 
State of Gluster Performance
State of Gluster PerformanceState of Gluster Performance
State of Gluster Performance
 
Intuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin StopfordIntuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin Stopford
 
Gfs google-file-system-13331
Gfs google-file-system-13331Gfs google-file-system-13331
Gfs google-file-system-13331
 
Ch23
Ch23Ch23
Ch23
 
Distributed replicated block device
Distributed replicated block deviceDistributed replicated block device
Distributed replicated block device
 
Task migration using CRIU
Task migration using CRIUTask migration using CRIU
Task migration using CRIU
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
Cgroup resource mgmt_v1
Cgroup resource mgmt_v1Cgroup resource mgmt_v1
Cgroup resource mgmt_v1
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfs
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...
 
Google File System
Google File SystemGoogle File System
Google File System
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
 

Similar to Shignled disk

Chapter 3
Chapter 3Chapter 3
sheet 3 answers.docx
sheet 3 answers.docxsheet 3 answers.docx
sheet 3 answers.docx
MohamedAyman183185
 
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
npinto
 
Storage structure1
Storage structure1Storage structure1
Storage structure1
amibuban
 
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementJ Singh
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems
confluent
 
Index file
Index fileIndex file
Index file
SushantGote
 
Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer Architecture
Haris456
 
FILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMSFILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMS
Abhishek Dutta
 
19IS305_U4_LP10_LM10-22-23.pdf
19IS305_U4_LP10_LM10-22-23.pdf19IS305_U4_LP10_LM10-22-23.pdf
19IS305_U4_LP10_LM10-22-23.pdf
JESUNPK
 
04.01 file organization
04.01 file organization04.01 file organization
04.01 file organization
Bishal Ghimire
 
Storage virtualization citrix blr wide tech talk
Storage virtualization citrix blr wide tech talkStorage virtualization citrix blr wide tech talk
Storage virtualization citrix blr wide tech talkSisimon Soman
 
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
peknap
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualizationSisimon Soman
 
Chapter 12 - Mass Storage Systems
Chapter 12 - Mass Storage SystemsChapter 12 - Mass Storage Systems
Chapter 12 - Mass Storage Systems
Wayne Jones Jnr
 
Zoned Storage
Zoned StorageZoned Storage
Zoned Storage
singh.gurjeet
 
Storage and File Structure in DBMS
Storage and File Structure in DBMSStorage and File Structure in DBMS
Storage and File Structure in DBMS
A. S. M. Shafi
 
Secondary storage devices
Secondary storage devices Secondary storage devices
Secondary storage devices Slideshare
 

Similar to Shignled disk (20)

Chapter 3
Chapter 3Chapter 3
Chapter 3
 
sheet 3 answers.docx
sheet 3 answers.docxsheet 3 answers.docx
sheet 3 answers.docx
 
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
 
Storage structure1
Storage structure1Storage structure1
Storage structure1
 
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage Management
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems
 
Index file
Index fileIndex file
Index file
 
Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer Architecture
 
DAS RAID NAS SAN
DAS RAID NAS SANDAS RAID NAS SAN
DAS RAID NAS SAN
 
FILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMSFILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMS
 
19IS305_U4_LP10_LM10-22-23.pdf
19IS305_U4_LP10_LM10-22-23.pdf19IS305_U4_LP10_LM10-22-23.pdf
19IS305_U4_LP10_LM10-22-23.pdf
 
04.01 file organization
04.01 file organization04.01 file organization
04.01 file organization
 
Storage virtualization citrix blr wide tech talk
Storage virtualization citrix blr wide tech talkStorage virtualization citrix blr wide tech talk
Storage virtualization citrix blr wide tech talk
 
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualization
 
Chapter 12 - Mass Storage Systems
Chapter 12 - Mass Storage SystemsChapter 12 - Mass Storage Systems
Chapter 12 - Mass Storage Systems
 
Zoned Storage
Zoned StorageZoned Storage
Zoned Storage
 
Storage and File Structure in DBMS
Storage and File Structure in DBMSStorage and File Structure in DBMS
Storage and File Structure in DBMS
 
CLFS 2010
CLFS 2010CLFS 2010
CLFS 2010
 
Secondary storage devices
Secondary storage devices Secondary storage devices
Secondary storage devices
 

Recently uploaded

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 

Recently uploaded (20)

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 

Shignled disk

  • 2. The Big Data Storage  Storage: Magnetic disks  High storage density  Current: 400GB/in2 - 550GB/in2  30-50% increase per year.  It’s reaching its physical limit…  “Superparamagnetic Limit”  Predicted limit: around 1TB/in2
  • 5. Conventionally Written Track  Non-overlap  Track width w (e.g. 25nm)  Guard gaps between tracks (e.g. g = 5nm).  Bottleneck is the writing track width.  Current read heads can work on much narrower track.  But it is hard to write narrower track.
  • 6. Shingled Disk: Overlap Tracks  Wilder Track written. (e.g. w = 70nm).  Shingled writing overlaps tracks.  The remaining residual track could be much narrower. (e.g. r = 10nm).
  • 7. Characteristics  Higher Density without significant hardware change.  2-3 times of the conventional disk density.  Support Random Read / Sequential Write  A single write will destroy the next k tracks  Typically, k = 4~8  Can we do better than a “tape with random read support”?
  • 8. Two High-Level Strategies  Mask the operational difference of a Shingled Disk.  Drop-in replacement for current disks.  Uses the standard block interface.  Specialized file system with no/little hardware mask.  More flexibility in the data layout and block management.  Increased knowledge at file system layer.
  • 9. Strategy One: Masking the Operational Difference  Synergy with SSD: Slow erasure of block in SSD  SSD: Flash Translation Layer (FTL)  Shingled Disk: Shingled Translation Layer (STL)  Translate from Virtual Block Address to Logical Block Address on disk.  How to perform random write.  One extreme: Read-modify-write.  Another extreme: Remap the physical location of written data.  Benefit  No change for user and system.  “Drop-in” replacement for current system.
  • 10. Strategy One: Masking the Operational Difference  Drawback  Experience with SSDs indicates the performance will be hard to predict.  Reverse engineering on SSD to achieve higher level goals.  Sophisticated STL could be expensive.  Data stored in Continuous Virtual Block Address could be far away on disk.  Database table with frequent edits.  Concurrent downloads of movies.  Might use large NVRAM (as cache) to mitigate the problem.
  • 11. Virtual Block Address Translation  Need to quickly translate Virtual Block Address to Logical Block Address.  Translation Table could be very large.  Capacity 2T, each entry 8 bytes.  Block Size 4K: Translation Table 4GB.  Block Size 512 bytes: Translation Table 32GB.  Some B+-Tree type structure.
  • 12. Strategy Two: Specialized System  Simple Shingled Translation Layer.  Random Update: Read-Modify-Write.  TRIM Command: Tell hardware that overwriting subsequent tracks is fine.  Support to format some part of the disk unshingled.  More Sophisticated System Software  Avoid writing to the middle of the band.  Conceptualize writing as appending to a log.  Perform necessary data remapping and garbage collection.
  • 14. Band Abstraction  Store the bulk of data in band.  A collection of b-contiguous tracks.  A buffer of k tracks at the end of each band.  Bands are not interfered with each other  More flexible.
  • 15. Proportional Capacity Loss  c = 1 – b/(b+k): proportional capacity loss • k = 5 and want to control c < 0.1 • b > 45 • Each band have 67.5MB • Reasonable for modern LFS.
  • 16. Band Usage 1. Only writes complete bands.  Each band contains a segment of Log structured File System.  Assumes data is buffering in NVRAM. 2. Only appends to bands.  Less efficient. 3. Circular Log inside each band  Consume data from head.  Append data to the tail.  Require additional k track gap between head and tail. 4. Flexible band size  Neighboring bands could be joined.  Not suitable for a general purpose SWD: Just for completeness.
  • 17.
  • 18. Reserved Space for Random Update  Option 1: NVRAM.  Option 2: Random Access Zone (RAZ)  Every track is followed by k unused tracks.  Density of RAZ is lower than current disk.
  • 19. How Large Random Access Zone Can we Have?  Assume without RAZ, the capacity of shingled disk is 2.3 times of conventional disk. • If we want to guarantee L = 2 times of the conventional disk. • k = 5: 3.75% of total storage capacity.
  • 20. Trade offs for Two Options  Reserved Space for Random Access  Option 1: NVRAM  Faster  More Expensive: cost 10 times to RAZ  Option 2: Random Access Zone (RAZ)  Use some part of the disk for Random Access Zone.  Cheaper but slower.  Trade-offs would be interesting.
  • 21. Usage of NVRAM 1. Buffering data for writing bands.  Be careful about the limited number of write-erase cycles of flash memory. 2. Use NVRAM to store metadata.  Metadata tends to have a higher amount of activity.  In Write Anywhere File System, NVRAM could be used to maintain the log of file system activities. 3. Store recently created objects.  Temporal locality: a block/object created long long ago is less likely to be updated.  If data is first written to NVRAM, we can also have better placement of data on disk.
  • 22. Number of Logs  Here Log-structured File System is assumed to be used.  What’s the benefit to have more than a single log? 1. Separation between metadata and data.  E.g. Access Time. 2. Allocate files for more efficient read access later.  E.g. Downloads several movies at the same time.  If only one log, all the movie objects will be interspersed.  Inefficient for read.
  • 24. Workloads Evaluation  Rate of Block Updates  If few blocks are updated frequently.  Less need for Random Access Zone / NVRAM.  Shingled Disk is more usable to replace conventional disk.  Evaluated Workloads 1. General purpose personal usage for 1 month 2. Specialized workload: Video Edit for 3 hours. 3. Specialized workload: Music Library Management.  Negligible block update.  Not surprised.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30. Some Points From Workload 1. Identifying hot blocks is important since the volume of hot blocks is small enough to be held in the Random Access Zone / NVRAM. 2. Larger block sizes reduces the accuracy of identifying hot blocks. But it’s not that significant. 3. File system distinguish between metadata from user data would be helpful.
  • 32. Shingled Disk Arrays  Can the shingled disk used in a server environment?  Probably part of the disk array.  It could have writing originating from different sources.  Two Impacts in Workload  Data Striping.  Workload Interleaving.  Replay workloads against a simulated drive.  Log-structured Writing Scheme to perform in-band update.

Editor's Notes

  1. Take-away point: 85% of all disk blocks written were never updated within the hour.93% of all disk blocks written were never updated within a day.Data is first stored in NVRAM / RAZ until it reaches a certain age.
  2. one—day trace.With larger blocks, it’s more likely that larger percentage of blocks will be updated..Trade off: Larger block reduce each block overhead. But has higher update rate.Negligible for blocks updated more than 2-4 times.
  3. A shingled-disk aware filesystem might be more helpful to optimize this situation.
  4. Stripe: Stripe, and randomly interleaving the four workloads with burst size.Pure: No stripe, no interleaving of different data sources: i.e. data source 1, data source 2…Dedicated: Each disk is dedicated to an individual source workload.Relocating disk bans.Unrelated data being written in adjacent positions that increases the likelihood of an update in the band.