SlideShare a Scribd company logo
1 of 58
Download to read offline
Distributed file systems
Viet Trung Tran
Department of Information Systems
School of Information and Communication Technology
Hanoi university of science and technology
trungtv@soict.hust.edu.vn
File systems
NTFS
Definitions
•  Filenames
– File Identity
•  Directories (folders)
– Group of files in separate collections
•  Metadata
– Creation time, last access time, last modification time
– Security information (Owner, Group owner)
– Mapping file to its physical location of file (e.g.
location in storage devices)
File system overview
•  Computer file
– A resource for storing information
– Durable, remained available for access
– Data: sequences of bits
•  File system
– Control how computer file are stored and retrieved
– Main operators: READ, WRITE (offset, size), CREATE,
DELETE
Local	
  file	
  systems	
  
Local vs. distributed file systems
NTFS
Distributed file system
•  File system
– Abstraction of storage devices
•  Distributed file system
– Available to remote processes in distributed
systems
•  Benefits
– File sharing
– Uniform view of system from different clients
– Centralized administration
DFS@wikipedia
•  In computing, a distributed file system or network file
system is any file system that allows access to files from
multiple hosts sharing via a computer network.[1] This
makes it possible for multiple users on multiple machines
to share files and storage resources.
•  The client nodes do not have direct access to the
underlying block storage but interact over the network
using a protocol. This makes it possible to restrict access
to the file system depending on access lists or capabilities
on both the servers and the clients, depending on how the
protocol is designed.
Goals
•  Network (Access)Transparency
– Users should be able to access files over a
network as easily as if the files were stored
locally.
– Users should not have to know the physical
location of a file to access it.
•  Transparency can be addressed through
naming and file mounting mechanisms
Components of Access Transparency
•  Location Transparency: file name doesn’t specify
physical location (Ch. 1)
•  Location Independence: files can be moved to
new physical location, no need to change
references to them. (A name is independent of its
addresses – see Ch. 5)
•  Location independence → location transparency,
but the reverse is not necessarily true.
Goals
•  Availability: files should be easily and quickly
accessible.
•  The number of users, system failures, or other
consequences of distribution shouldn’t
compromise the availability.
•  Addressed mainly through replication.
Architectures
•  Client-Server
– Traditional; e.g. Sun Microsystem Network File
System (NFS)
– Cluster-Based Client-Server; e.g., Google File System
(GFS)
•  Symmetric
– Fully decentralized; based on peer-to-peer technology
– e.g., Ivy (uses a Chord DHT approach)
Client-Server Architecture
•  One or more machines (file servers) manage the
file system.
•  Files are stored on disks at the servers
•  Requests for file operations are made from clients
to the servers.
•  Client-server systems centralize storage and
management; P2P systems decentralize it.
The Evolution of Storage
•  Direct attached storage (DAS)
•  Network attached storage (NAS)
•  Storage area network (SAN)
•  Combined technologies
From DAS to NAS
To SAN
NAS file system
SAN file system
Object storage device (OSD)
•  OSDs hold objects rather than blocks, which are
like files in a simple file system
– Objects are identified by a 64 bit Object ID (OID)
– Objects are dynamically created and freed
– Objects are variable length
•  OSDS manage space allocation of objects
– OSD are not dump as conventional storage disks
OSD vs. disk
OSD advantages
•  Data sharing
•  Security
•  Intelligence
OSD form factors
Offloading storage management from
the file system
Object-based file system
Object-based file system
•  Clients have direct access to storage to read and
write file data securely
–  Contrast with SAN?
–  Contrast with NAS?
•  File system includes multiple OSDs
–  Better scalability
•  Multiple file systems share the same OSDs
–  Real storage pooling
•  File server is called the Metadata server (MDS)
Design issues in distributed file
systems
Design issues
•  Naming and Name Resolution
•  Semantics of file sharing
•  Caching
•  Replication
Naming and Name Resolution
•  A name space -- collection of names
•  Name resolution -- mapping a name to an object
•  3 traditional ways
– Concatenate the host name to the names of
files stored on that host
– Mount remote directories onto local directories
– Provide a single global directory
File Sharing Semantics (1/2)
•  Problem: When dealing with distributed file
systems, we need to take into account the
ordering of concurrent read/write operations,
and expected semantics (=consistency).
File Sharing Semantics (2/2)
•  Assume open; reads/writes; close
1  UNIX semantics: value read is the value stored by last write

Writes to an open file are visible immediately to others that have this file opened at
the same time. Easy to implement if one server and no cache.
2  Session semantics:

Writes to an open file by a user is not visible immediately by other users that have
files opened already.

Once a file is closed, the changes made by it are visible by sessions started later.
3  Immutable-Shared-Files semantics:

A sharable file cannot be modified.

File names cannot be reused and its contents may not be altered.

Simple to implement.
4  Transactions: All changes have all-or-nothing property. W1,R1,R2,W2 not allowed
where P1 = W1;W2 and P2 = R1;R2
Caching
•  Server caching: in main memory
•  cache management issue, how much to cache, replacement strategy
•  still slow due to network delay
•  Used in high-performance web-search engine servers
•  Client caching in main memory
•  can be used by diskless workstation
•  faster to access from main memory than disk
•  compete with the virtual memory system for physical memory space
•  Client-cache on a local disk
•  large files can be cached
•  the virtual memory management is simpler
•  a workstation can function even when it is disconnected from the network
Caching tradeoffs
•  Reduces remote accesses ⇒ reduces network
traffic and server load
•  Total network overhead is lower for big chunks of
data (caching) than a series of responses to specific
requests.
•  Disk access can be optimized better for large
requests than random disk blocks
•  Cache-consistency problem is the major drawback.
If there are frequent writes, overhead due to the
consistency problem is significant.
Replication
•  File data is replicated to multiple storage servers
•  Goals
–  Increase reliability
–  improve availability
–  balance the servers workload
•  How to make replication transparent?
•  How to keep the replicas consistent?
– a replica is not updated due to its server failure
– network partitioned
NFS & AFS file systems
Network File System (NFS) 
•  First developed in 1980s by Sun
•  Presented with standard UNIX FS interface
•  Network drives are mounted into local directory
hierarchy
– Type ‘man mount’, 'mount' some time at the prompt if
curious
NFS Protocol
•  Initially completely stateless
– Operated over UDP; did not use TCP streams
– File locking, etc, implemented in higher-level
protocols
•  Modern implementations use TCP/IP & stateful
protocols
NFS Architecture for UNIX systems
•  NFS is implemented using the Virtual File System abstraction, which is
now used for lots of different operating systems:
•  Essence: VFS provides standard file system interface, and allows to
hide difference between accessing local or remote file system.
Server-side Implementation
•  NFS defines a virtual file system
–  Does not actually manage local disk layout on server
•  Server instantiates NFS volume on top of local file
system
–  Local hard drives managed by concrete file systems (EXT,
ReiserFS, ...)
Hard Drive 1
User-visible filesystem
NFS server
NFS client
Hard Drive 2
EXT2 fs ReiserFS
Hard Drive 1 Hard Drive 2
EXT3 fs EXT3 fs
Server filesystem
Typical implementation
•  Assuming a Unix-style scenario in which one machine requires access
to data stored on another machine:
•  The server implements NFS daemon processes in order to make its
data generically available to clients.
•  The server administrator determines what to make available, exporting
the names and parameters of directories.
•  The server security-administration ensures that it can recognize and
approve validated clients.
•  The server network configuration ensures that appropriate clients can
negotiate with it through any firewall system.
•  The client machine requests access to exported data, typically by
issuing a mount command.
•  If all goes well, users on the client machine can then view and interact
with mounted filesystems on the server within the parameters permitted.
NFS Locking
•  NFS v4 supports stateful locking of files
– Clients inform server of intent to lock
– Server can notify clients of outstanding lock requests
– Locking is lease-based: clients must continually
renew locks before a timeout
– Loss of contact with server abandons locks
NFS Client Caching
•  NFS Clients are allowed to cache copies of remote
files for subsequent accesses
•  Supports close-to-open cache consistency
–  When client A closes a file, its contents are synchronized
with the master, and timestamp is changed
–  When client B opens the file, it checks that local
timestamp agrees with server timestamp. If not, it
discards local copy.
–  Concurrent reader/writers must use flags to disable
caching
NFS: Tradeoffs
•  NFS Volume managed by single server
– Higher load on central server
– Simplifies coherency protocols
•  Full POSIX system means it “drops in” very
easily, but isn’t “great” for any specific need
Distributed FS Security
•  Security is a concern at several levels throughout
DFS stack
– Authentication
– Data transfer
– Privilege escalation
•  How are these applied in NFS?
AFS (The Andrew File System)
•  Developed at Carnegie Mellon
•  Strong security, high scalability
– Supports 50,000+ clients at enterprise level
•  AFS heavily influenced Version 4 of NFS.
Local Caching
•  File reads/writes operate on locally cached copy
•  Local copy sent back to master when file is
closed
•  Open local copies are notified of external
updates through callbacks
Symmetric File Systems
Peer-to-Peer
Symmetric Architectures
•  Fully distributed (decentralized) file systems do
not distinguish between client machines and
servers.
•  Most proposed systems are based on a
distributed hash table (DHT) approach for data
distribution across nodes.
•  The Ivy system is typical. It has a 3-layer
structure.
Ivy System Structure
•  The DHT layer implements a Chord scheme for
mapping keys (which represent objects to be
stored) to nodes.
•  The DHash layer is a block-storage layer
– Blocks = logical file blocks
– Different blocks are stored in different locations
•  The top, or file system layer, implements an NFS-
like file system.
Characteristics
•  File data and meta-data stored as blocks in a
DHash P2P system
•  Blocks are distributed and replicated across
multiple sites – increased availability.
•  Ivy is a read-write file system. Writing introduces
consistency issues (of data and meta-data)
Characteristics
•  Presents an NFS-like interface and semantics
•  Performance: 2X-3X slower than NFS
•  Potentially more available because of distributed
nature
Logs
•  Like most P2P systems, trustworthiness of users is not
guaranteed
–  Must be able to undo modifications
•  Network partitioning means the possibility of conflicting
updates – how to manage?
•  Solution: logs – one per user
–  Used to record changes made locally to file data and metadata
–  Contrast to shared data structures
–  Avoids use of locks for updating metadata
Dhash layer
•  Manages data blocks (of a file)
•  Stored as content-hash block or public-key block
•  Content-hash blocks
–  Compute the secure hash of this block to get the key
–  Clients must know the key to look up a block
–  When the block is returned to a client, compute its hash to
verify that this is the correct (uncorrupted) block.
DHash Layer – Public Key Blocks
•  “A public key block requires the block’s key to be
a public key, and the value to be signed using the
private key.”
•  Users can look up a block without the private key,
but cannot change data unless they have the
private key.
•  Ivy layer verifies all the data DHash returns and is
able to protect against malicious or corrupted
data.
DHash Layer
•  The DHash layer replicates each file block B to
the next k successors of the server that stores B.
– (remember how Chord maps keys to nodes)
•  This layer has no concept of files or file systems.
It merely knows about blocks
Ivy – the File System Layer
•  A file is represented by a log of operations
•  The log is a linked list of immutable (can’t be changed)
records.
–  Contains all of the additions made by a single user (to data and
metadata)
•  Each record records a file system operation (open, write,
etc.) as a DHash content-hash block.
•  A log-head node is a pointer to the most recent log entry
Using Logs
•  A user must consult all logs to read file data, (find
records that represent writes) but makes changes
only by adding records to its own log.
•  Logs contain data and metadata
•  Start scan with most recent entry
•  Keep local snapshot of file to avoid having to
scan entire logs
•  Update: Each participant maintains a log of its
changes to the file system
•  Lookup: Each participant scans all logs
•  The view-block has pointers to all log-heads
Combining Logs
•  Block order should reflect causality
•  All users should see same order
•  For each new log record assign
– A sequence # (orders blocks in a single log)
– A tuple with an entry for each log showing the
most recent info about that log (from current
user’s viewpoint)
•  Tuples are compared somewhat like vector timestamps;
either u < v or v < u or v = u or no relation (v and u are
concurrent)
•  Concurrency is the result of simultaneous updates

More Related Content

What's hot

Distributed shred memory architecture
Distributed shred memory architectureDistributed shred memory architecture
Distributed shred memory architectureMaulik Togadiya
 
Interprocess communication
Interprocess communicationInterprocess communication
Interprocess communicationSushil Singh
 
Distributed file system
Distributed file systemDistributed file system
Distributed file systemNaza hamed Jan
 
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUESDISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUESAAKANKSHA JAIN
 
Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed systemSunita Sahu
 
File replication
File replicationFile replication
File replicationKlawal13
 
File models and file accessing models
File models and file accessing modelsFile models and file accessing models
File models and file accessing modelsishmecse13
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Chapter 11 - File System Implementation
Chapter 11 - File System ImplementationChapter 11 - File System Implementation
Chapter 11 - File System ImplementationWayne Jones Jnr
 
Naming in Distributed Systems
Naming in Distributed SystemsNaming in Distributed Systems
Naming in Distributed SystemsNandakumar P
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File Systemelliando dias
 
Distributed File Systems
Distributed File Systems Distributed File Systems
Distributed File Systems Maurvi04
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systemssumitjain2013
 
Os Swapping, Paging, Segmentation and Virtual Memory
Os Swapping, Paging, Segmentation and Virtual MemoryOs Swapping, Paging, Segmentation and Virtual Memory
Os Swapping, Paging, Segmentation and Virtual Memorysgpraju
 
Communications is distributed systems
Communications is distributed systemsCommunications is distributed systems
Communications is distributed systemsSHATHAN
 
Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Akhil Nadh PC
 

What's hot (20)

Distributed shred memory architecture
Distributed shred memory architectureDistributed shred memory architecture
Distributed shred memory architecture
 
Interprocess communication
Interprocess communicationInterprocess communication
Interprocess communication
 
Distributed File Systems
Distributed File SystemsDistributed File Systems
Distributed File Systems
 
Distributed file system
Distributed file systemDistributed file system
Distributed file system
 
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUESDISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
 
Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed system
 
File replication
File replicationFile replication
File replication
 
The CAP Theorem
The CAP Theorem The CAP Theorem
The CAP Theorem
 
File models and file accessing models
File models and file accessing modelsFile models and file accessing models
File models and file accessing models
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Chapter 11 - File System Implementation
Chapter 11 - File System ImplementationChapter 11 - File System Implementation
Chapter 11 - File System Implementation
 
Naming in Distributed Systems
Naming in Distributed SystemsNaming in Distributed Systems
Naming in Distributed Systems
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Distributed File Systems
Distributed File Systems Distributed File Systems
Distributed File Systems
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systems
 
Naming in Distributed System
Naming in Distributed SystemNaming in Distributed System
Naming in Distributed System
 
Os Swapping, Paging, Segmentation and Virtual Memory
Os Swapping, Paging, Segmentation and Virtual MemoryOs Swapping, Paging, Segmentation and Virtual Memory
Os Swapping, Paging, Segmentation and Virtual Memory
 
Communications is distributed systems
Communications is distributed systemsCommunications is distributed systems
Communications is distributed systems
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]
 

Viewers also liked

Distributed File System
Distributed File SystemDistributed File System
Distributed File SystemNtu
 
Chapter 17 - Distributed File Systems
Chapter 17 - Distributed File SystemsChapter 17 - Distributed File Systems
Chapter 17 - Distributed File SystemsWayne Jones Jnr
 
Distributed File Systems: An Overview
Distributed File Systems: An OverviewDistributed File Systems: An Overview
Distributed File Systems: An OverviewAnant Narayanan
 
Distributed File Systems
Distributed File SystemsDistributed File Systems
Distributed File Systemsawesomesos
 
3. distributed file system requirements
3. distributed file system requirements3. distributed file system requirements
3. distributed file system requirementsAbDul ThaYyal
 
4.file service architecture
4.file service architecture4.file service architecture
4.file service architectureAbDul ThaYyal
 
Dsm (Distributed computing)
Dsm (Distributed computing)Dsm (Distributed computing)
Dsm (Distributed computing)Sri Prasanna
 
Distributed shared memory shyam soni
Distributed shared memory shyam soniDistributed shared memory shyam soni
Distributed shared memory shyam soniShyam Soni
 
Distributed Shared Memory Systems
Distributed Shared Memory SystemsDistributed Shared Memory Systems
Distributed Shared Memory SystemsArush Nagpal
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksMarian Marinov
 
file sharing semantics by Umar Danjuma Maiwada
file sharing semantics by Umar Danjuma Maiwada file sharing semantics by Umar Danjuma Maiwada
file sharing semantics by Umar Danjuma Maiwada umardanjumamaiwada
 
Distributed Shared Memory on Ericsson Labs
Distributed Shared Memory on Ericsson LabsDistributed Shared Memory on Ericsson Labs
Distributed Shared Memory on Ericsson LabsEricsson Labs
 
Database operation with nested transaction handling
Database operation with nested transaction handlingDatabase operation with nested transaction handling
Database operation with nested transaction handlingAshwinPoojary
 
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...Naoki Shibata
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemMilad Sobhkhiz
 
Back-End application for Distributed systems
Back-End application for Distributed systemsBack-End application for Distributed systems
Back-End application for Distributed systemsAtif Imam
 

Viewers also liked (20)

Distributed File System
Distributed File SystemDistributed File System
Distributed File System
 
Chapter 17 - Distributed File Systems
Chapter 17 - Distributed File SystemsChapter 17 - Distributed File Systems
Chapter 17 - Distributed File Systems
 
Distributed File Systems: An Overview
Distributed File Systems: An OverviewDistributed File Systems: An Overview
Distributed File Systems: An Overview
 
Distributed File Systems
Distributed File SystemsDistributed File Systems
Distributed File Systems
 
3. distributed file system requirements
3. distributed file system requirements3. distributed file system requirements
3. distributed file system requirements
 
12. dfs
12. dfs12. dfs
12. dfs
 
4.file service architecture
4.file service architecture4.file service architecture
4.file service architecture
 
Dsm (Distributed computing)
Dsm (Distributed computing)Dsm (Distributed computing)
Dsm (Distributed computing)
 
Distributed shared memory shyam soni
Distributed shared memory shyam soniDistributed shared memory shyam soni
Distributed shared memory shyam soni
 
Chap 4
Chap 4Chap 4
Chap 4
 
message passing
 message passing message passing
message passing
 
Distributed Shared Memory Systems
Distributed Shared Memory SystemsDistributed Shared Memory Systems
Distributed Shared Memory Systems
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
 
Distributed Operating System_1
Distributed Operating System_1Distributed Operating System_1
Distributed Operating System_1
 
file sharing semantics by Umar Danjuma Maiwada
file sharing semantics by Umar Danjuma Maiwada file sharing semantics by Umar Danjuma Maiwada
file sharing semantics by Umar Danjuma Maiwada
 
Distributed Shared Memory on Ericsson Labs
Distributed Shared Memory on Ericsson LabsDistributed Shared Memory on Ericsson Labs
Distributed Shared Memory on Ericsson Labs
 
Database operation with nested transaction handling
Database operation with nested transaction handlingDatabase operation with nested transaction handling
Database operation with nested transaction handling
 
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Back-End application for Distributed systems
Back-End application for Distributed systemsBack-End application for Distributed systems
Back-End application for Distributed systems
 

Similar to Introduction to distributed file systems

11 distributed file_systems
11 distributed file_systems11 distributed file_systems
11 distributed file_systemslongly
 
File service architecture and network file system
File service architecture and network file systemFile service architecture and network file system
File service architecture and network file systemSukhman Kaur
 
Applications of Distributed Systems
Applications of Distributed SystemsApplications of Distributed Systems
Applications of Distributed Systemssandra sukarieh
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSKathirvel Ayyaswamy
 
File system and Deadlocks
File system and DeadlocksFile system and Deadlocks
File system and DeadlocksRohit Jain
 
Introduction to Data Storage and Cloud Computing
Introduction to Data Storage and Cloud ComputingIntroduction to Data Storage and Cloud Computing
Introduction to Data Storage and Cloud ComputingRutuja751147
 
Storage solutions for High Performance Computing
Storage solutions for High Performance ComputingStorage solutions for High Performance Computing
Storage solutions for High Performance Computinggmateesc
 
Hadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsHadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsDrPDShebaKeziaMalarc
 
ITFT_File system interface in Operating System
ITFT_File system interface in Operating SystemITFT_File system interface in Operating System
ITFT_File system interface in Operating SystemSneh Prabha
 
Dfs (Distributed computing)
Dfs (Distributed computing)Dfs (Distributed computing)
Dfs (Distributed computing)Sri Prasanna
 
DFS PPT.pptx
DFS PPT.pptxDFS PPT.pptx
DFS PPT.pptxVMahesh5
 
Advanced Storage Area Network
Advanced Storage Area NetworkAdvanced Storage Area Network
Advanced Storage Area NetworkSoumee Maschatak
 

Similar to Introduction to distributed file systems (20)

5.distributed file systems
5.distributed file systems5.distributed file systems
5.distributed file systems
 
Chapter-5-DFS.ppt
Chapter-5-DFS.pptChapter-5-DFS.ppt
Chapter-5-DFS.ppt
 
Nfs
NfsNfs
Nfs
 
11 distributed file_systems
11 distributed file_systems11 distributed file_systems
11 distributed file_systems
 
File service architecture and network file system
File service architecture and network file systemFile service architecture and network file system
File service architecture and network file system
 
Applications of Distributed Systems
Applications of Distributed SystemsApplications of Distributed Systems
Applications of Distributed Systems
 
Ch10 file system interface
Ch10   file system interfaceCh10   file system interface
Ch10 file system interface
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Os6
Os6Os6
Os6
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
File system and Deadlocks
File system and DeadlocksFile system and Deadlocks
File system and Deadlocks
 
Introduction to Data Storage and Cloud Computing
Introduction to Data Storage and Cloud ComputingIntroduction to Data Storage and Cloud Computing
Introduction to Data Storage and Cloud Computing
 
Storage solutions for High Performance Computing
Storage solutions for High Performance ComputingStorage solutions for High Performance Computing
Storage solutions for High Performance Computing
 
Hadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsHadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data Analytics
 
ITFT_File system interface in Operating System
ITFT_File system interface in Operating SystemITFT_File system interface in Operating System
ITFT_File system interface in Operating System
 
Dfs (Distributed computing)
Dfs (Distributed computing)Dfs (Distributed computing)
Dfs (Distributed computing)
 
Gfs final
Gfs finalGfs final
Gfs final
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
 
DFS PPT.pptx
DFS PPT.pptxDFS PPT.pptx
DFS PPT.pptx
 
Advanced Storage Area Network
Advanced Storage Area NetworkAdvanced Storage Area Network
Advanced Storage Area Network
 

More from Viet-Trung TRAN

Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Viet-Trung TRAN
 
Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreViet-Trung TRAN
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnViet-Trung TRAN
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processingViet-Trung TRAN
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookViet-Trung TRAN
 
giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studyViet-Trung TRAN
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkViet-Trung TRAN
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkViet-Trung TRAN
 
Large-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on SparkLarge-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on SparkViet-Trung TRAN
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learningViet-Trung TRAN
 
success factors for project proposals
success factors for project proposalssuccess factors for project proposals
success factors for project proposalsViet-Trung TRAN
 
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents Viet-Trung TRAN
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Viet-Trung TRAN
 
Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Viet-Trung TRAN
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learningViet-Trung TRAN
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forestsViet-Trung TRAN
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
 

More from Viet-Trung TRAN (20)

Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
 
Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value Store
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớn
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processing
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
 
giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case study
 
Giasan.vn @rstars
Giasan.vn @rstarsGiasan.vn @rstars
Giasan.vn @rstars
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
Large-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on SparkLarge-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on Spark
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learning
 
success factors for project proposals
success factors for project proposalssuccess factors for project proposals
success factors for project proposals
 
GPSinsights poster
GPSinsights posterGPSinsights poster
GPSinsights poster
 
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 

Recently uploaded

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 

Recently uploaded (20)

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

Introduction to distributed file systems

  • 1. Distributed file systems Viet Trung Tran Department of Information Systems School of Information and Communication Technology Hanoi university of science and technology trungtv@soict.hust.edu.vn
  • 3. Definitions •  Filenames – File Identity •  Directories (folders) – Group of files in separate collections •  Metadata – Creation time, last access time, last modification time – Security information (Owner, Group owner) – Mapping file to its physical location of file (e.g. location in storage devices)
  • 4. File system overview •  Computer file – A resource for storing information – Durable, remained available for access – Data: sequences of bits •  File system – Control how computer file are stored and retrieved – Main operators: READ, WRITE (offset, size), CREATE, DELETE
  • 5. Local  file  systems   Local vs. distributed file systems NTFS
  • 6. Distributed file system •  File system – Abstraction of storage devices •  Distributed file system – Available to remote processes in distributed systems •  Benefits – File sharing – Uniform view of system from different clients – Centralized administration
  • 7. DFS@wikipedia •  In computing, a distributed file system or network file system is any file system that allows access to files from multiple hosts sharing via a computer network.[1] This makes it possible for multiple users on multiple machines to share files and storage resources. •  The client nodes do not have direct access to the underlying block storage but interact over the network using a protocol. This makes it possible to restrict access to the file system depending on access lists or capabilities on both the servers and the clients, depending on how the protocol is designed.
  • 8. Goals •  Network (Access)Transparency – Users should be able to access files over a network as easily as if the files were stored locally. – Users should not have to know the physical location of a file to access it. •  Transparency can be addressed through naming and file mounting mechanisms
  • 9. Components of Access Transparency •  Location Transparency: file name doesn’t specify physical location (Ch. 1) •  Location Independence: files can be moved to new physical location, no need to change references to them. (A name is independent of its addresses – see Ch. 5) •  Location independence → location transparency, but the reverse is not necessarily true.
  • 10. Goals •  Availability: files should be easily and quickly accessible. •  The number of users, system failures, or other consequences of distribution shouldn’t compromise the availability. •  Addressed mainly through replication.
  • 11. Architectures •  Client-Server – Traditional; e.g. Sun Microsystem Network File System (NFS) – Cluster-Based Client-Server; e.g., Google File System (GFS) •  Symmetric – Fully decentralized; based on peer-to-peer technology – e.g., Ivy (uses a Chord DHT approach)
  • 12. Client-Server Architecture •  One or more machines (file servers) manage the file system. •  Files are stored on disks at the servers •  Requests for file operations are made from clients to the servers. •  Client-server systems centralize storage and management; P2P systems decentralize it.
  • 13. The Evolution of Storage •  Direct attached storage (DAS) •  Network attached storage (NAS) •  Storage area network (SAN) •  Combined technologies
  • 14. From DAS to NAS
  • 18. Object storage device (OSD) •  OSDs hold objects rather than blocks, which are like files in a simple file system – Objects are identified by a 64 bit Object ID (OID) – Objects are dynamically created and freed – Objects are variable length •  OSDS manage space allocation of objects – OSD are not dump as conventional storage disks
  • 20. OSD advantages •  Data sharing •  Security •  Intelligence
  • 22. Offloading storage management from the file system
  • 24. Object-based file system •  Clients have direct access to storage to read and write file data securely –  Contrast with SAN? –  Contrast with NAS? •  File system includes multiple OSDs –  Better scalability •  Multiple file systems share the same OSDs –  Real storage pooling •  File server is called the Metadata server (MDS)
  • 25. Design issues in distributed file systems
  • 26. Design issues •  Naming and Name Resolution •  Semantics of file sharing •  Caching •  Replication
  • 27. Naming and Name Resolution •  A name space -- collection of names •  Name resolution -- mapping a name to an object •  3 traditional ways – Concatenate the host name to the names of files stored on that host – Mount remote directories onto local directories – Provide a single global directory
  • 28. File Sharing Semantics (1/2) •  Problem: When dealing with distributed file systems, we need to take into account the ordering of concurrent read/write operations, and expected semantics (=consistency).
  • 29. File Sharing Semantics (2/2) •  Assume open; reads/writes; close 1  UNIX semantics: value read is the value stored by last write
 Writes to an open file are visible immediately to others that have this file opened at the same time. Easy to implement if one server and no cache. 2  Session semantics:
 Writes to an open file by a user is not visible immediately by other users that have files opened already.
 Once a file is closed, the changes made by it are visible by sessions started later. 3  Immutable-Shared-Files semantics:
 A sharable file cannot be modified.
 File names cannot be reused and its contents may not be altered.
 Simple to implement. 4  Transactions: All changes have all-or-nothing property. W1,R1,R2,W2 not allowed where P1 = W1;W2 and P2 = R1;R2
  • 30. Caching •  Server caching: in main memory •  cache management issue, how much to cache, replacement strategy •  still slow due to network delay •  Used in high-performance web-search engine servers •  Client caching in main memory •  can be used by diskless workstation •  faster to access from main memory than disk •  compete with the virtual memory system for physical memory space •  Client-cache on a local disk •  large files can be cached •  the virtual memory management is simpler •  a workstation can function even when it is disconnected from the network
  • 31. Caching tradeoffs •  Reduces remote accesses ⇒ reduces network traffic and server load •  Total network overhead is lower for big chunks of data (caching) than a series of responses to specific requests. •  Disk access can be optimized better for large requests than random disk blocks •  Cache-consistency problem is the major drawback. If there are frequent writes, overhead due to the consistency problem is significant.
  • 32. Replication •  File data is replicated to multiple storage servers •  Goals –  Increase reliability –  improve availability –  balance the servers workload •  How to make replication transparent? •  How to keep the replicas consistent? – a replica is not updated due to its server failure – network partitioned
  • 33. NFS & AFS file systems
  • 34. Network File System (NFS) •  First developed in 1980s by Sun •  Presented with standard UNIX FS interface •  Network drives are mounted into local directory hierarchy – Type ‘man mount’, 'mount' some time at the prompt if curious
  • 35. NFS Protocol •  Initially completely stateless – Operated over UDP; did not use TCP streams – File locking, etc, implemented in higher-level protocols •  Modern implementations use TCP/IP & stateful protocols
  • 36. NFS Architecture for UNIX systems •  NFS is implemented using the Virtual File System abstraction, which is now used for lots of different operating systems: •  Essence: VFS provides standard file system interface, and allows to hide difference between accessing local or remote file system.
  • 37. Server-side Implementation •  NFS defines a virtual file system –  Does not actually manage local disk layout on server •  Server instantiates NFS volume on top of local file system –  Local hard drives managed by concrete file systems (EXT, ReiserFS, ...) Hard Drive 1 User-visible filesystem NFS server NFS client Hard Drive 2 EXT2 fs ReiserFS Hard Drive 1 Hard Drive 2 EXT3 fs EXT3 fs Server filesystem
  • 38. Typical implementation •  Assuming a Unix-style scenario in which one machine requires access to data stored on another machine: •  The server implements NFS daemon processes in order to make its data generically available to clients. •  The server administrator determines what to make available, exporting the names and parameters of directories. •  The server security-administration ensures that it can recognize and approve validated clients. •  The server network configuration ensures that appropriate clients can negotiate with it through any firewall system. •  The client machine requests access to exported data, typically by issuing a mount command. •  If all goes well, users on the client machine can then view and interact with mounted filesystems on the server within the parameters permitted.
  • 39. NFS Locking •  NFS v4 supports stateful locking of files – Clients inform server of intent to lock – Server can notify clients of outstanding lock requests – Locking is lease-based: clients must continually renew locks before a timeout – Loss of contact with server abandons locks
  • 40. NFS Client Caching •  NFS Clients are allowed to cache copies of remote files for subsequent accesses •  Supports close-to-open cache consistency –  When client A closes a file, its contents are synchronized with the master, and timestamp is changed –  When client B opens the file, it checks that local timestamp agrees with server timestamp. If not, it discards local copy. –  Concurrent reader/writers must use flags to disable caching
  • 41. NFS: Tradeoffs •  NFS Volume managed by single server – Higher load on central server – Simplifies coherency protocols •  Full POSIX system means it “drops in” very easily, but isn’t “great” for any specific need
  • 42. Distributed FS Security •  Security is a concern at several levels throughout DFS stack – Authentication – Data transfer – Privilege escalation •  How are these applied in NFS?
  • 43. AFS (The Andrew File System) •  Developed at Carnegie Mellon •  Strong security, high scalability – Supports 50,000+ clients at enterprise level •  AFS heavily influenced Version 4 of NFS.
  • 44. Local Caching •  File reads/writes operate on locally cached copy •  Local copy sent back to master when file is closed •  Open local copies are notified of external updates through callbacks
  • 46. Symmetric Architectures •  Fully distributed (decentralized) file systems do not distinguish between client machines and servers. •  Most proposed systems are based on a distributed hash table (DHT) approach for data distribution across nodes. •  The Ivy system is typical. It has a 3-layer structure.
  • 47. Ivy System Structure •  The DHT layer implements a Chord scheme for mapping keys (which represent objects to be stored) to nodes. •  The DHash layer is a block-storage layer – Blocks = logical file blocks – Different blocks are stored in different locations •  The top, or file system layer, implements an NFS- like file system.
  • 48. Characteristics •  File data and meta-data stored as blocks in a DHash P2P system •  Blocks are distributed and replicated across multiple sites – increased availability. •  Ivy is a read-write file system. Writing introduces consistency issues (of data and meta-data)
  • 49. Characteristics •  Presents an NFS-like interface and semantics •  Performance: 2X-3X slower than NFS •  Potentially more available because of distributed nature
  • 50. Logs •  Like most P2P systems, trustworthiness of users is not guaranteed –  Must be able to undo modifications •  Network partitioning means the possibility of conflicting updates – how to manage? •  Solution: logs – one per user –  Used to record changes made locally to file data and metadata –  Contrast to shared data structures –  Avoids use of locks for updating metadata
  • 51.
  • 52. Dhash layer •  Manages data blocks (of a file) •  Stored as content-hash block or public-key block •  Content-hash blocks –  Compute the secure hash of this block to get the key –  Clients must know the key to look up a block –  When the block is returned to a client, compute its hash to verify that this is the correct (uncorrupted) block.
  • 53. DHash Layer – Public Key Blocks •  “A public key block requires the block’s key to be a public key, and the value to be signed using the private key.” •  Users can look up a block without the private key, but cannot change data unless they have the private key. •  Ivy layer verifies all the data DHash returns and is able to protect against malicious or corrupted data.
  • 54. DHash Layer •  The DHash layer replicates each file block B to the next k successors of the server that stores B. – (remember how Chord maps keys to nodes) •  This layer has no concept of files or file systems. It merely knows about blocks
  • 55. Ivy – the File System Layer •  A file is represented by a log of operations •  The log is a linked list of immutable (can’t be changed) records. –  Contains all of the additions made by a single user (to data and metadata) •  Each record records a file system operation (open, write, etc.) as a DHash content-hash block. •  A log-head node is a pointer to the most recent log entry
  • 56. Using Logs •  A user must consult all logs to read file data, (find records that represent writes) but makes changes only by adding records to its own log. •  Logs contain data and metadata •  Start scan with most recent entry •  Keep local snapshot of file to avoid having to scan entire logs
  • 57. •  Update: Each participant maintains a log of its changes to the file system •  Lookup: Each participant scans all logs •  The view-block has pointers to all log-heads
  • 58. Combining Logs •  Block order should reflect causality •  All users should see same order •  For each new log record assign – A sequence # (orders blocks in a single log) – A tuple with an entry for each log showing the most recent info about that log (from current user’s viewpoint) •  Tuples are compared somewhat like vector timestamps; either u < v or v < u or v = u or no relation (v and u are concurrent) •  Concurrency is the result of simultaneous updates