SlideShare a Scribd company logo
1 of 31
Download to read offline
FILE SYSTEM TOPICS
Lei Xu
Agenda
 Introduction
 VFS
 Optimizations
 Examples
 F&Q
Introduction
 “A file system is a means to organize data expected
to be retained after a program terminates by
providing procedures to store, retrieve and update
data, as well as manage the available space on the
device(s) which contain it.” – from Wikipedia
 Store data
 Organize data
 Access data
 Manage storage resources (e.g. hard drive)
Relationship to Architecture Course
Acknowledge to the slides from 830 course
Relationship to Architecture Course
 File system is designed between memory and
secondary storage (or remote servers)
 One of the most complex part in an operating system
 Main R&D focuses:
 Performance: throughput, latency, scalability
 Reliability and availability
 Management: snapshot and etc.
Acknowledge to the slides from 830 course
Different types of file systems
 Local file systems
 Stored data on local hard drives, SSDs, floppy drives,
optical disks or etc.
 Examples: NTFS, EXT4, HFS+, ZFS
 Network/distributed file systems
 Stored data on remote file server(s)
 Example: NFS, CIFS/Samba, AFP, Hadoop DFS, Ceph
 Pseudo file systems
 Example: procfs, devfs, tmpfs
 “List of file systems”
 http://en.wikipedia.org/wiki/List_of_file_systems
Agenda
 Introduction
 VFS
 Optimizations
 Examples
 F&Q
Overall Architecture of Linux file
system components
Acknowledgement: “Anatomy of the Linux file system”, IBM
developerWorks.
Virtual File System (VFS)
 VFS is the essential concept in UNIX-like FS
 Specify an interface between the kernel and a concrete file
system
 Introduced by SUN in 1985
 Pass system calls to the underlying file systems
 E.g. pass sys_write() to Ext4 (i.e. ext4_write())
 Three major metadata in VFS
 Metadata: the data about data (wikipedia)
 Super block, dentry and inode
 OO design
 Each component defines a set of data members and the functions
to access them
Super block
 A segment of metadata that describes a file system
 Is constructed when mount a file system
 Usually, a persistent copy of super block is stored in the
beginning of a storage device
 Describes:
 File system type, size, status (e.g. dirty bit, read only bit)
 Block size, max file bytes, device size..
 How to find other metadata and data.
 How to manipulates these data (i.e. sb_ops)
Inode
 “Index-node” in Unix-style file system
 All information about one file (or directory)
 Except its name
 In UNIX-like system, file names are stored in the directory file:
the content of it is an “array” of file names
 E.g. owner, access rights, mode, size, time and etc.
 Pointers to data
Directory Entry (dentry)
 Dentry conceptually points a file name to its
corresponding Inode
 Each file/directory has a dentry presenting it
 File systems use dentry to lookup a file in the
hierarchical namespace
 Each dentry has a pointer to the dentry of its parent
directory
 Each dentry of a directory has a list of dentries of its sub-
directories and sub-files
Agenda
 Introduction
 VFS
 Optimizations
 Examples
 F&Q
Optimizations
 Most of file system optimizations are designed
based on the characteristics of the memory
hierarchy and storage devices.
 Recall:
 RAM 50-100 ns
 Disks: 5-10 ms
 2-3 orders of magnitude difference
 Almost all widely used local file systems are designed for
hard disk drives, which have their unique characteristics
Hard Disk Drive (HDD)
 Stores data on one or
more rotating disks,
coated with magnetic
material
 Introduce by IBM in
1956
 Use magnetic head to
read data
The very early HDD…..
Acknowledge to:
HDD (Cont’d)
 The essential structure of
HDD has not changed
too much…
 Constitute with several
disks
 Each disk is divided to
tracks, each of which
then is divided to sectors
 The single most
significant factor:
 Seek time
Why seek time matters
 When access a data (sector), the HDD head must
first move to the track (seek time), then rotates the
disk to the sector (rotational time)
 Seek time: 3 ms on high-end server disks, 12 ms on
desktop-level disks [1]
 Rotational time: 5.56ms on 5400 RPM HDD, 4.17ms on
7200 RPM HDD [1]
 As a result, sequential IO is much faster than
random IO, because there is no seek /rotational
time
[1], http://en.wikipedia.org/wiki/Disk-drive_performance_characteristics
General Optimizations
 Based on two principles:
 RAM access is much faster than the access on disk
 Sequential IOs is much faster than random IOs on disk
 So we design file systems that
 Largely utilizes CPU/RAM to reduce IO to disks (various
caches/write buffers)
 Prefers sequential IOs
 Computes disk layout to arrange related data sequentially
located on disks
Dcache
 Dentry cache (dcache)
 Directories are stored as files on disks.
 For each file lookup, we want obtain the inode from the
given full file path
 OS looks the dentries from the root to all parent directories in the
path.
 E.g. for looking up file “/Users/john/Documents/course.pdf”, OS
needs traverse the dentries that presents “/”, “Users”, “john”,
“Documents”, and “course.pdf”
 To accelerate this:
 We use a global hash table (dcache) to map “file path” -> dentry
 A two-list solution: one for active dentries, and one for “recent
unused dentries” (LRU).
Inode cache
 Similar to the dcache,
OS maintains a cache
for inode objects.
 Each inode object has
1-to-1 relation to a
dentry
 If the dentry object is
evicted, this inode is
evicted
P1 P2 P10
f0 f1 f0 f2 f3 f0
File
Objects
VFS
Processes
Dentry Cache (hash table)
Dentry 0 Dentry 10 Dentry 20
Inode 0 Inode 10 Inode 20
Inode Cache
Page
Cache 0
Page
Cache 10
Page
Cache 20
Page Cache
(Radix Tree)
Page Cache
 …a “transparent” buffer for disk-backed pages kept in
RAM for fast access… [wikipedia]
 A write-back cache
 Main purpose: reducing the # of IOs to disks
 Access based on page (usually 4KB).
 Page cache is per-file based.
 A Redix-tree in inode object.
 Prefetch pages to serve future read
 Absorb writes to reduce # of IOs
 The dirty pages (modified) are flushed to disks for : 1) each
30s or 5s, or 2) OS wants to reclaim RAMs
 Also can be forced to flush by calling “fsync()” system call
Agenda
 Introduction
 VFS
 Optimizations
 Examples
 F&Q
Examples
 Several concrete file system designs
 Ext4, classic UNIX-like file system concepts
 NTFS, advanced Windows file system
 ZFS, “the last word of file system”
 NFS, a standard network file system
 Google File System, a special distributed file system for
special requirements
Ext4
 The latest version of
the “extended file
system” (Ext2/3/4)
 The standard Linux file
system for a long time
 Inspired from UFS from
BSD/Solaris
 Group files to block
groups
 Keep file data near to
inodes
Ack: http://bit.ly/tjipWY
NTFS
 “New Technology File
System” (NTFS)
 The standard file
system in Windows
world.
 A Master File Table
(MFT) contains all
metadata.
 Directory is also a file
ZFS
 ZFS: “the last word of file system”
 The most advanced local file system in production
 128 bits space (2128 bytes in theory)
 larger the # of sand in the earth…
 A lot of advanced features:
 E.g. transactional commits, end-to-end integration, snapshot,
volume management and much more…
 Will never lose data and always be consistent.
 Every OS community wants to clone or copy its features…
 Btrfs on Linux, ReFS on Windows, ZFS on FreeBSD
NFS
 “Network File System
(NFS)”
 A protocol developed
by SUN in 1984
 A set of RPC calls
 IETF standard
 Supported by all major
OSs
 Simple and efficient
Google File System (GFS)
 A large distributed file
system specially
designed for
MapReduce framework
 High throughput
 High availability
 Special designed. Not
compatible to
VFS/POSIX API.
 Requires clients linked to
the GFS library.
 Hadoop DFS clones the
concepts of GFS
More File Systems
 Interesting file systems that are worth to explore
 Btrfs (B-tree FS) from oracle, expected to be the next
standard Linux file system. Many concepts are shared
with ZFS.
 ReFS: The file system for Windows 8 (from Microsoft).
Many concepts are shared with ZFS (too!).
 WAFL (Write Anywhere File Layout) file system from
NetApp.
 FUSE (Filesystem in Userspace): a cross-platform library
that allows developers to write file system running in
user mode
Thanks
FAQ?

More Related Content

Similar to 009709863.pdf

File management
File managementFile management
File managementMohd Arif
 
Introduction One of the key goals for the Windows Subsystem for Li.pdf
Introduction One of the key goals for the Windows Subsystem for Li.pdfIntroduction One of the key goals for the Windows Subsystem for Li.pdf
Introduction One of the key goals for the Windows Subsystem for Li.pdfanwarfoot
 
linuxfilesystem-180727181106 (1).pdf
linuxfilesystem-180727181106 (1).pdflinuxfilesystem-180727181106 (1).pdf
linuxfilesystem-180727181106 (1).pdfShaswatSurya
 
Disk and File System Management in Linux
Disk and File System Management in LinuxDisk and File System Management in Linux
Disk and File System Management in LinuxHenry Osborne
 
file management_part2_os_notes.ppt
file management_part2_os_notes.pptfile management_part2_os_notes.ppt
file management_part2_os_notes.pptHelalMirzad
 
Ch12 OS
Ch12 OSCh12 OS
Ch12 OSC.U
 
I/O System and Case study
I/O System and Case studyI/O System and Case study
I/O System and Case studyLavanya G
 
presentations
presentationspresentations
presentationsMISY
 
Ie Storage, Multimedia And File Organization
Ie   Storage, Multimedia And File OrganizationIe   Storage, Multimedia And File Organization
Ie Storage, Multimedia And File OrganizationMISY
 
Unit 3 chapter 1-file management
Unit 3 chapter 1-file managementUnit 3 chapter 1-file management
Unit 3 chapter 1-file managementKalai Selvi
 
storage & file strucure in dbms
storage & file strucure in dbmsstorage & file strucure in dbms
storage & file strucure in dbmssachin2690
 
Hadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsHadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsDrPDShebaKeziaMalarc
 

Similar to 009709863.pdf (20)

Operating System
Operating SystemOperating System
Operating System
 
Os
OsOs
Os
 
File management
File managementFile management
File management
 
File
FileFile
File
 
Introduction One of the key goals for the Windows Subsystem for Li.pdf
Introduction One of the key goals for the Windows Subsystem for Li.pdfIntroduction One of the key goals for the Windows Subsystem for Li.pdf
Introduction One of the key goals for the Windows Subsystem for Li.pdf
 
ch11
ch11ch11
ch11
 
Linux file system
Linux file systemLinux file system
Linux file system
 
linuxfilesystem-180727181106 (1).pdf
linuxfilesystem-180727181106 (1).pdflinuxfilesystem-180727181106 (1).pdf
linuxfilesystem-180727181106 (1).pdf
 
Disk and File System Management in Linux
Disk and File System Management in LinuxDisk and File System Management in Linux
Disk and File System Management in Linux
 
file management_part2_os_notes.ppt
file management_part2_os_notes.pptfile management_part2_os_notes.ppt
file management_part2_os_notes.ppt
 
Ch12 OS
Ch12 OSCh12 OS
Ch12 OS
 
OS_Ch12
OS_Ch12OS_Ch12
OS_Ch12
 
OSCh12
OSCh12OSCh12
OSCh12
 
I/O System and Case study
I/O System and Case studyI/O System and Case study
I/O System and Case study
 
presentations
presentationspresentations
presentations
 
Ie Storage, Multimedia And File Organization
Ie   Storage, Multimedia And File OrganizationIe   Storage, Multimedia And File Organization
Ie Storage, Multimedia And File Organization
 
UNIT III.pptx
UNIT III.pptxUNIT III.pptx
UNIT III.pptx
 
Unit 3 chapter 1-file management
Unit 3 chapter 1-file managementUnit 3 chapter 1-file management
Unit 3 chapter 1-file management
 
storage & file strucure in dbms
storage & file strucure in dbmsstorage & file strucure in dbms
storage & file strucure in dbms
 
Hadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsHadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data Analytics
 

More from KalsoomTahir2

More from KalsoomTahir2 (20)

005813616.pdf
005813616.pdf005813616.pdf
005813616.pdf
 
009576860.pdf
009576860.pdf009576860.pdf
009576860.pdf
 
005813185.pdf
005813185.pdf005813185.pdf
005813185.pdf
 
HASH FUNCTIONS.pdf
HASH FUNCTIONS.pdfHASH FUNCTIONS.pdf
HASH FUNCTIONS.pdf
 
6. McCall's Model.pptx
6. McCall's Model.pptx6. McCall's Model.pptx
6. McCall's Model.pptx
 
ch02-Database System Concepts and Architecture.ppt
ch02-Database System Concepts and Architecture.pptch02-Database System Concepts and Architecture.ppt
ch02-Database System Concepts and Architecture.ppt
 
9223301.ppt
9223301.ppt9223301.ppt
9223301.ppt
 
11885558.ppt
11885558.ppt11885558.ppt
11885558.ppt
 
Indexing.ppt
Indexing.pptIndexing.ppt
Indexing.ppt
 
chap05-info366.ppt
chap05-info366.pptchap05-info366.ppt
chap05-info366.ppt
 
1650607.ppt
1650607.ppt1650607.ppt
1650607.ppt
 
005281271.pdf
005281271.pdf005281271.pdf
005281271.pdf
 
soa_and_jra.ppt
soa_and_jra.pptsoa_and_jra.ppt
soa_and_jra.ppt
 
ERP_Up_Down.ppt
ERP_Up_Down.pptERP_Up_Down.ppt
ERP_Up_Down.ppt
 
Topic1CourseIntroduction.ppt
Topic1CourseIntroduction.pptTopic1CourseIntroduction.ppt
Topic1CourseIntroduction.ppt
 
Lecture 19 - Dynamic Web - JAVA - Part 1.ppt
Lecture 19 - Dynamic Web - JAVA - Part 1.pptLecture 19 - Dynamic Web - JAVA - Part 1.ppt
Lecture 19 - Dynamic Web - JAVA - Part 1.ppt
 
CommercialSystemsBahman.ppt
CommercialSystemsBahman.pptCommercialSystemsBahman.ppt
CommercialSystemsBahman.ppt
 
EJBDetailsFeb25.ppt
EJBDetailsFeb25.pptEJBDetailsFeb25.ppt
EJBDetailsFeb25.ppt
 
jan28EAI.ppt
jan28EAI.pptjan28EAI.ppt
jan28EAI.ppt
 
005428052.pdf
005428052.pdf005428052.pdf
005428052.pdf
 

Recently uploaded

call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 

Recently uploaded (20)

ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 

009709863.pdf

  • 2. Agenda  Introduction  VFS  Optimizations  Examples  F&Q
  • 3. Introduction  “A file system is a means to organize data expected to be retained after a program terminates by providing procedures to store, retrieve and update data, as well as manage the available space on the device(s) which contain it.” – from Wikipedia  Store data  Organize data  Access data  Manage storage resources (e.g. hard drive)
  • 4. Relationship to Architecture Course Acknowledge to the slides from 830 course
  • 5. Relationship to Architecture Course  File system is designed between memory and secondary storage (or remote servers)  One of the most complex part in an operating system  Main R&D focuses:  Performance: throughput, latency, scalability  Reliability and availability  Management: snapshot and etc. Acknowledge to the slides from 830 course
  • 6. Different types of file systems  Local file systems  Stored data on local hard drives, SSDs, floppy drives, optical disks or etc.  Examples: NTFS, EXT4, HFS+, ZFS  Network/distributed file systems  Stored data on remote file server(s)  Example: NFS, CIFS/Samba, AFP, Hadoop DFS, Ceph  Pseudo file systems  Example: procfs, devfs, tmpfs  “List of file systems”  http://en.wikipedia.org/wiki/List_of_file_systems
  • 7. Agenda  Introduction  VFS  Optimizations  Examples  F&Q
  • 8. Overall Architecture of Linux file system components Acknowledgement: “Anatomy of the Linux file system”, IBM developerWorks.
  • 9. Virtual File System (VFS)  VFS is the essential concept in UNIX-like FS  Specify an interface between the kernel and a concrete file system  Introduced by SUN in 1985  Pass system calls to the underlying file systems  E.g. pass sys_write() to Ext4 (i.e. ext4_write())  Three major metadata in VFS  Metadata: the data about data (wikipedia)  Super block, dentry and inode  OO design  Each component defines a set of data members and the functions to access them
  • 10. Super block  A segment of metadata that describes a file system  Is constructed when mount a file system  Usually, a persistent copy of super block is stored in the beginning of a storage device  Describes:  File system type, size, status (e.g. dirty bit, read only bit)  Block size, max file bytes, device size..  How to find other metadata and data.  How to manipulates these data (i.e. sb_ops)
  • 11. Inode  “Index-node” in Unix-style file system  All information about one file (or directory)  Except its name  In UNIX-like system, file names are stored in the directory file: the content of it is an “array” of file names  E.g. owner, access rights, mode, size, time and etc.  Pointers to data
  • 12. Directory Entry (dentry)  Dentry conceptually points a file name to its corresponding Inode  Each file/directory has a dentry presenting it  File systems use dentry to lookup a file in the hierarchical namespace  Each dentry has a pointer to the dentry of its parent directory  Each dentry of a directory has a list of dentries of its sub- directories and sub-files
  • 13. Agenda  Introduction  VFS  Optimizations  Examples  F&Q
  • 14. Optimizations  Most of file system optimizations are designed based on the characteristics of the memory hierarchy and storage devices.  Recall:  RAM 50-100 ns  Disks: 5-10 ms  2-3 orders of magnitude difference  Almost all widely used local file systems are designed for hard disk drives, which have their unique characteristics
  • 15. Hard Disk Drive (HDD)  Stores data on one or more rotating disks, coated with magnetic material  Introduce by IBM in 1956  Use magnetic head to read data
  • 16. The very early HDD….. Acknowledge to:
  • 17. HDD (Cont’d)  The essential structure of HDD has not changed too much…  Constitute with several disks  Each disk is divided to tracks, each of which then is divided to sectors  The single most significant factor:  Seek time
  • 18. Why seek time matters  When access a data (sector), the HDD head must first move to the track (seek time), then rotates the disk to the sector (rotational time)  Seek time: 3 ms on high-end server disks, 12 ms on desktop-level disks [1]  Rotational time: 5.56ms on 5400 RPM HDD, 4.17ms on 7200 RPM HDD [1]  As a result, sequential IO is much faster than random IO, because there is no seek /rotational time [1], http://en.wikipedia.org/wiki/Disk-drive_performance_characteristics
  • 19. General Optimizations  Based on two principles:  RAM access is much faster than the access on disk  Sequential IOs is much faster than random IOs on disk  So we design file systems that  Largely utilizes CPU/RAM to reduce IO to disks (various caches/write buffers)  Prefers sequential IOs  Computes disk layout to arrange related data sequentially located on disks
  • 20. Dcache  Dentry cache (dcache)  Directories are stored as files on disks.  For each file lookup, we want obtain the inode from the given full file path  OS looks the dentries from the root to all parent directories in the path.  E.g. for looking up file “/Users/john/Documents/course.pdf”, OS needs traverse the dentries that presents “/”, “Users”, “john”, “Documents”, and “course.pdf”  To accelerate this:  We use a global hash table (dcache) to map “file path” -> dentry  A two-list solution: one for active dentries, and one for “recent unused dentries” (LRU).
  • 21. Inode cache  Similar to the dcache, OS maintains a cache for inode objects.  Each inode object has 1-to-1 relation to a dentry  If the dentry object is evicted, this inode is evicted P1 P2 P10 f0 f1 f0 f2 f3 f0 File Objects VFS Processes Dentry Cache (hash table) Dentry 0 Dentry 10 Dentry 20 Inode 0 Inode 10 Inode 20 Inode Cache Page Cache 0 Page Cache 10 Page Cache 20 Page Cache (Radix Tree)
  • 22. Page Cache  …a “transparent” buffer for disk-backed pages kept in RAM for fast access… [wikipedia]  A write-back cache  Main purpose: reducing the # of IOs to disks  Access based on page (usually 4KB).  Page cache is per-file based.  A Redix-tree in inode object.  Prefetch pages to serve future read  Absorb writes to reduce # of IOs  The dirty pages (modified) are flushed to disks for : 1) each 30s or 5s, or 2) OS wants to reclaim RAMs  Also can be forced to flush by calling “fsync()” system call
  • 23. Agenda  Introduction  VFS  Optimizations  Examples  F&Q
  • 24. Examples  Several concrete file system designs  Ext4, classic UNIX-like file system concepts  NTFS, advanced Windows file system  ZFS, “the last word of file system”  NFS, a standard network file system  Google File System, a special distributed file system for special requirements
  • 25. Ext4  The latest version of the “extended file system” (Ext2/3/4)  The standard Linux file system for a long time  Inspired from UFS from BSD/Solaris  Group files to block groups  Keep file data near to inodes Ack: http://bit.ly/tjipWY
  • 26. NTFS  “New Technology File System” (NTFS)  The standard file system in Windows world.  A Master File Table (MFT) contains all metadata.  Directory is also a file
  • 27. ZFS  ZFS: “the last word of file system”  The most advanced local file system in production  128 bits space (2128 bytes in theory)  larger the # of sand in the earth…  A lot of advanced features:  E.g. transactional commits, end-to-end integration, snapshot, volume management and much more…  Will never lose data and always be consistent.  Every OS community wants to clone or copy its features…  Btrfs on Linux, ReFS on Windows, ZFS on FreeBSD
  • 28. NFS  “Network File System (NFS)”  A protocol developed by SUN in 1984  A set of RPC calls  IETF standard  Supported by all major OSs  Simple and efficient
  • 29. Google File System (GFS)  A large distributed file system specially designed for MapReduce framework  High throughput  High availability  Special designed. Not compatible to VFS/POSIX API.  Requires clients linked to the GFS library.  Hadoop DFS clones the concepts of GFS
  • 30. More File Systems  Interesting file systems that are worth to explore  Btrfs (B-tree FS) from oracle, expected to be the next standard Linux file system. Many concepts are shared with ZFS.  ReFS: The file system for Windows 8 (from Microsoft). Many concepts are shared with ZFS (too!).  WAFL (Write Anywhere File Layout) file system from NetApp.  FUSE (Filesystem in Userspace): a cross-platform library that allows developers to write file system running in user mode