SlideShare a Scribd company logo
Scale and
Performance in
a Distributed
File System
Prof. Giuseppe Cattaneo Giorgio Vitiello
Overview
◎Distributed File System
◎Andrew File System
◎Network File System
◎AFS vs NFS
◎Conclusions
2
Distributed File System
◎File System (FS) controls how data is stored
and retrieved
◎A Distributed File System (DFS) is a set of
client and server services that allow access
multiple hosts sharing information via
network
3
Andrew File System
◎Andrew is a distributed computing
environment that has been under development
at Carnegie Mellon University since 1983
◎Each individual at CMU may eventually possess
an Andrew workstation
◎A fundamental component of Andrew is the
distributed file system
4
Andrew File System
◎It’s the first wide DFS for large number of
users
◎Andrew File System presents a homogeneous,
location-transparent file name space to all
the client workstations
5
Andrew File System
◎The goal of Andrew File System is scalability
◎Large scale affects a distributed system in two
ways
○ it degrades performance
○ it complicates administration and day-to-day
operation
◎Andrew copes successfully with these
concerns
6
1° Prototype
◎Validate the basic file system architecture
○ obtain feedback on our design as rapidly as possible
○ build a system that was usable enough to make that
feedback meaningful
◎Prototype used by 400 users
◎100 workstations Sun2 with 65MB local disks
◎6 servers Sun2s or Vax-750s each with 2-3 disk
of 400MB
◎Client-Server nodes run 4.2BSD of UNIX
7
1° Prototype
◎VICE: a set of trusted servers, it defines a
homogeneous, location-transparent file name
space to all the client
◎VENUS: client process, it caches files from
Vice and stores modified copies of files back
on the servers they came from
8
9
● Venus would rendezvous with a process listening at a
well-known network address on a server
● This process then created a dedicated process to deal
with all future requests from the client
● All communication and manipulation of data structures
between server processes took place via files in the
underlying file system
10
● Each server contained a directory hierarchy mirroring
the structure of the Vice files stored on it
● The directory hierarchy contained Stub directories that
represented portions of the Vice name space located on
other servers. If a file were not on a server, the search
for its name would end in a stub directory which
identified the server containing that file
11
The Benchmark
◎This benchmark consists of a command script
that operates on a collection of files
constituting an application program
◎Load Unit: load placed on a server by a single
client workstation running this benchmark
◎A load unit corresponds to about five Andrew
users
◎The input to the benchmark is a read-only
source subtree consisting of about 70 files
12
Phases of the benchmark
◎ MakeDir: constructs a target subtree that is identical in
structure to the source subtree
◎ Copy: copies every file from the source subtree to the
target subtree
◎ ScanDir: recursively traverses the target subtree and
examines the status of every file in it. It doesn’t actually
read the contents of any file
◎ ReadAll: scans every byte of every file in the target
subtree once
◎ Make: compiles and links all the files in the target
subtree
13
14
Performance Observations
◎Venus used two caches
○ one for files
○ the other for status information about files
◎The most frequent system calls
○ TestAuth, validated cache entries
○ GetFileStat, obtained status information about files
absent from the cache
15
● 12 machines
○ 81% file-cache hit ratio
○ 82% status-cache hit ratio
○ 6% Fetch and Stores Calls
16
● The total running time for the benchmark as a function
of server load
● The table also shows the average response time for the
most frequent Vice operation, TestAuth
17
● A software on servers to maintain statistics about CPU
and disk utilization and about data transfers to and
from the disks
● Table IV presents these data for four servers over a 2-
week period
● As the CPU utilizations in the table show, the servers
loads were not evenly balanced
● CPU utilization of about 40 percent for two most used
servers.
18
Performance Observations
◎The performance bottleneck in our prototype
was the server CPU (high CPU utilization)
○ the frequency of context switches between the many
server processes
○ the time spent by the servers in traversing full
pathnames presented by workstations
19
Performance Observations
◎Significant performance improvement is
possible if
○ we reduce the frequency of cache validity checks
○ reduce the number of server processes
○ require workstations rather than the servers to do
pathname traversals
○ balance server usage by reassigning users
20
Changes for performance
◎Unchanged aspects
○ workstations cache entire files from a collection of
dedicated autonomous servers
○ venus and server code run as user-level processes
○ communication between servers and clients is based
on the RPC paradigm
○ the mechanism in the workstation kernels to
intercept and forward file requests to Venus is the
same as in the prototype
21
Changes for performance
◎Changed aspects
○ To enhance performance
● Cache management
● Name resolution
● Communication and server process structure
● Low-level storage representation
○ To improve the operability of the system
22
Cache management
◎Caching, the key to Andrew’s ability to scale
well
◎Venus now caches the contents of directories
and symbolic links in addition to files
◎There are still two separate caches:
○ for status
○ for data
◎Venus uses LRU to keep each of them
bounded in size
23
Cache management
◎ Modifications to a cached file are done locally and are
reflected back to Vice when the file is closed
◎ Venus intercepts only the opening and closing of files
and doesn’t participate in the reading or writing of
individual bytes on a cached copy
◎ For the integrity, modifications to a directory are made
directly on the server responsible for that directory
◎ Venus reflects the change in its cached copy to avoid
refetching the directory
24
Cache management
◎Callback mechanism to keep consistent cache
entries
○ Venus assumes that cache entries are valid unless
otherwise notified
○ When a workstation caches a file, the server promises
to notify it before allowing a modification by any
other workstation
◎Callback reduces the number of cache
validation requests received by servers
25
Cache management
◎Merits
○ reducing cache validation traffic
○ callback reduces the load on servers considerably
○ callback makes it feasible to resolve pathnames on
workstations
◎Defects
○ callback complicates the system because each server
and Venus now maintain callback state information
○ there is a potential for inconsistency if the callback
state maintained by a Venus gets out of sync with the
corresponding state maintained by the servers
26
Name Resolution
◎4.2BSD system
○ inode: a file has a unique, fixed-length name
○ one or more variable-length pathnames that map to
this inode
◎The routine that performs this mapping,
namei, is usually one of the most heavily used
◎CPU overhead on the servers and was an
obstacle to scaling
27
Name Resolution
◎The notion of two-level names
◎Each Vice file or directory is now identified by a
unique fixed-length Fid (96 bit)
◎Each entry in a directory maps a component of a
pathname to a fid
◎Venus performs the logical equivalent of a namei
operation, and maps Vice pathnames to fids
◎Servers are presented with fids and are unaware
of pathnames
28
Communication and Server Process Structure
◎The use of a server process per client didn’t scale
well
◎Server processes couldn’t cache critical shared
information in their address spaces
◎4.2BSD doesn’t permit processes to share virtual
memory
◎The redesign solves these problems by using a
single process to service all clients of a server
29
Communication and Server Process Structure
◎An user-level mechanism to support multiple
nonpreemptive Lightweight Processes (LWPs)
within one process
◎Context switching between LWPs is only on the
order of a few procedure-call times
◎Number of LWPs is typically five
◎RPC mechanism is implemented entirely outside
the kernel and is capable of supporting many
hundreds or thousands of clients per server
30
Low-Level Storage Representation
◎ Limited cost of the namei operations involved in
accessing data via pathnames
◎ Access files by their inodes
◎ The internal inode interface is not visible to user-level
processes
◎ Added an appropriate set of system calls
◎ The vnode information for a Vice file identifies the
inode of the file storing its data
◎ Data access consists of an index of a fid into a table to
look up vnode information
◎ An iopen call to read or write the data
31
Overall Design
◎Suppose a user process opens a file with
pathname P on a workstation
◎The kernel detects that it is a Vice file and
passes it to Venus on that workstation
◎Venus now uses the cache to examine each
directory component D of P in succession
32
Overall Design
1° Case
If D is in the cache
and has a callback
on it, it’s used
without any network
communication
2° Case
If D is in the cache
but has no callback
on it, the
appropriate server is
contacted, a new
copy of D is fetched
if it has been
updated, and a
callback is
established on it
3° Case
If D isn’t in the
cache, it is fetched
from the appropriate
server, and a
callback is
established on it
33
Overall Design
File F is
identified
34
Cache copy
is created
Venus
returns to
the kernel
Kernel opens
the cached
copy of F and
returns its
handle to the
user process
Directories
and F are in
the cache
with
callbacks
Venus regains
control when F
is closed.If it
has been
modified
locally,updates
it on the server
Overall Design
◎Future references to this file will involve no
network communication at all, unless a
callback is broken on a component of P
◎An LRU replacement algorithm is periodically
run to reclaim cache space
◎Problem: since the other LWPs in Venus may
be concurrently servicing file access requests
from other processes, accesses to cache data
structures must be synchronized
35
Overall Design
◎Design converged on the following consistency
semantics
○ Writes to an open file by a process on a workstation are
visible to all other processes on the workstation
immediately but are invisible elsewhere in the network
○ Once a file is closed, the changes made to it are visible to
new opens anywhere on the network
○ All other file operations are visible everywhere on the
network immediately after the operation completes
○ Application programs have to cooperate in performing the
necessary synchronization if they care about the
serialization of these operations
36
2° Prototype
◎Two questions
○ It has the anticipated improvement in scalability been
realized?
○ What are the characteristics of the system in normal
operation?
◎Client workstations
○ IBM-RT
○ Storage 40 or 70MB
○ First Models with 4MB RAM
37
● Benchmark
○ Andrew workstation is 19%
slower than a stand-alone
○ The prototype was 70% slower
○ Copy and Make phases are
most susceptible to server load
38
● CPU and disk utilization on the server during the benchmark
○ CPU - from 8.1% to 70.9%
○ Disk - from 2.7% to 23.6%
● CPU still limits performance in our system
● Our design changes have improved scalability considerably
○ At a load of 20, the system is still not saturated (50 Andrew users)
39
40
Comparison with a remote-open file system
◎Considerations about caching of entire files
on local disks in the AFS
○ Locality: server load and network traffic are reduced
○ Whole-file transfer approach contacts servers only on
opens and closes
○ Disk caches retain their entries across reboots
○ Caching of entire files simplifies cache management
41
Comparison with a remote-open file system
◎Workstations require local disks for
acceptable performance
◎Files that are larger than the local disk cache
cannot be accessed at all
◎Concurrent read and write semantics across
workstations is impossible
42
Comparison with a remote-open file system
◎Could an alternative design have produced
equivalent or better results?
◎How critical to scaling are caching and whole-
file transfer?
◎AFS vs NFS
43
Network File System
◎NFS developed by Sun Microsystems in 1984
◎builds on the Open Network Computing
Remote Procedure Call (ONC RPC) system
◎Client-Server Protocol
◎Sharing file, directory or FS
◎SO Unix
44
Network File System
◎NFSv1 (1984)
○ develop in-house experimental purpose
◎NFSv2 (March 1989)
○ Release for commercial use, UDP protocol, stateless
◎NFSv3 (June 1995)
○ Overcome to v2, UDP/TCP protocol, stateless
◎NFSv4 (April 2003)
○ Stateful Server, TCP protocol
○ Focus on Performance, Accessibility, Scalability,
Security
45
46
AFS vs NFS
◎Scalability assessment, to compare with NFS
○ NFS is a mature product, not a prototype
○ Sun has spent a lot in NFS
○ Different architecture, same hardware
◎Benchmark
○ 18 workstations Sun3
○ A server Sun3
○ Client-Server on a 10MB Ethernet
○ Experiments on Andrew consists in a Cold Cache and
Warm Cache
47
48
● NFS performs slightly
better than Andrew at
low loads, not with huge
loads
● ScanDir, ReadAll, Make
show different
performance
● Andrew wins on NFS in
ReadAll
49
● Benchmark in function of load unit
● CPU utilization in function of load unit
○ Load 1: 22% NFS - 3% AFS
○ Load 18: 100% NFS - 38% AFS in Cold
Cache and 42% in Warm Cache
Server Side
● Load 1
○ NFS, Disk 1: 9%
○ NFS, Disk 2: 3%
○ AFS, Disk 1: 4%
● Load 18
○ NFS, Disk 1: 95%
○ NFS, Disk 2: 19%
○ AFS, Disk 1: 33%
50
Conclusions
◎ AFS is more scalable than NFS
◎ AFS would reduce its costs if moved in kernel side
◎ Andrew is implemented in user space
◎ NFS in kernel space
51
Conclusions
◎ AFS testing
○ 400 workstations, 16 server, 3500 Andrew users
◎ Goals
○ Scaling until 5000 workstation
○ Moving Venus and server code in kernel
◎ AFS is a great FS
◎ Excellent Performance
◎ It compares favorably with the most prominent
alternative distributed file system
52
Thanks!
Any questions?
53

More Related Content

What's hot

11 distributed file_systems
11 distributed file_systems11 distributed file_systems
11 distributed file_systems
longly
 
Coda file system
Coda file systemCoda file system
Coda file system
Sneh Pahilwani
 
Hadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenesHadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenes
Nitin Khattar
 
Distributed file system
Distributed file systemDistributed file system
Distributed file system
Anamika Singh
 
Dfs (Distributed computing)
Dfs (Distributed computing)Dfs (Distributed computing)
Dfs (Distributed computing)
Sri Prasanna
 
Operating System : Ch17 distributed file systems
Operating System : Ch17 distributed file systemsOperating System : Ch17 distributed file systems
Operating System : Ch17 distributed file systems
Syaiful Ahdan
 
Chapter 8 distributed file systems
Chapter 8 distributed file systemsChapter 8 distributed file systems
Chapter 8 distributed file systems
AbDul ThaYyal
 
Distributed file systems
Distributed file systemsDistributed file systems
Distributed file systems
Sri Prasanna
 
Anatomy of file write in hadoop
Anatomy of file write in hadoopAnatomy of file write in hadoop
Anatomy of file write in hadoop
Rajesh Ananda Kumar
 
Distributed file system
Distributed file systemDistributed file system
Distributed file system
Naza hamed Jan
 
File replication
File replicationFile replication
File replication
Klawal13
 
Distributed File Systems
Distributed File Systems Distributed File Systems
Distributed File Systems
Maurvi04
 
Introduction to distributed file systems
Introduction to distributed file systemsIntroduction to distributed file systems
Introduction to distributed file systems
Viet-Trung TRAN
 
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHadoop HDFS NameNode HA
Hadoop HDFS NameNode HA
Hanborq Inc.
 
Hadoop
HadoopHadoop
Distributed File Systems
Distributed File SystemsDistributed File Systems
Distributed File Systems
awesomesos
 
5.distributed file systems
5.distributed file systems5.distributed file systems
5.distributed file systems
Gd Goenka University
 
File service architecture and network file system
File service architecture and network file systemFile service architecture and network file system
File service architecture and network file system
Sukhman Kaur
 
Distribution File System DFS Technologies
Distribution File System DFS TechnologiesDistribution File System DFS Technologies
Distribution File System DFS Technologies
Raphael Ejike
 
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma
 

What's hot (20)

11 distributed file_systems
11 distributed file_systems11 distributed file_systems
11 distributed file_systems
 
Coda file system
Coda file systemCoda file system
Coda file system
 
Hadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenesHadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenes
 
Distributed file system
Distributed file systemDistributed file system
Distributed file system
 
Dfs (Distributed computing)
Dfs (Distributed computing)Dfs (Distributed computing)
Dfs (Distributed computing)
 
Operating System : Ch17 distributed file systems
Operating System : Ch17 distributed file systemsOperating System : Ch17 distributed file systems
Operating System : Ch17 distributed file systems
 
Chapter 8 distributed file systems
Chapter 8 distributed file systemsChapter 8 distributed file systems
Chapter 8 distributed file systems
 
Distributed file systems
Distributed file systemsDistributed file systems
Distributed file systems
 
Anatomy of file write in hadoop
Anatomy of file write in hadoopAnatomy of file write in hadoop
Anatomy of file write in hadoop
 
Distributed file system
Distributed file systemDistributed file system
Distributed file system
 
File replication
File replicationFile replication
File replication
 
Distributed File Systems
Distributed File Systems Distributed File Systems
Distributed File Systems
 
Introduction to distributed file systems
Introduction to distributed file systemsIntroduction to distributed file systems
Introduction to distributed file systems
 
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHadoop HDFS NameNode HA
Hadoop HDFS NameNode HA
 
Hadoop
HadoopHadoop
Hadoop
 
Distributed File Systems
Distributed File SystemsDistributed File Systems
Distributed File Systems
 
5.distributed file systems
5.distributed file systems5.distributed file systems
5.distributed file systems
 
File service architecture and network file system
File service architecture and network file systemFile service architecture and network file system
File service architecture and network file system
 
Distribution File System DFS Technologies
Distribution File System DFS TechnologiesDistribution File System DFS Technologies
Distribution File System DFS Technologies
 
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
 

Similar to Andrew File System

Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologies
Kelly Technologies
 
HDFS.ppt
HDFS.pptHDFS.ppt
HDFS.ppt
ssuserec53e73
 
Hdfs
HdfsHdfs
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
Subhas Kumar Ghosh
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
Aisha Siddiqa
 
Redis. Seattle Data Science and Data Engineering Meetup
Redis. Seattle Data Science and Data Engineering MeetupRedis. Seattle Data Science and Data Engineering Meetup
Redis. Seattle Data Science and Data Engineering Meetup
Abhishek Goswami
 
Hadoop -HDFS.ppt
Hadoop -HDFS.pptHadoop -HDFS.ppt
Hadoop -HDFS.ppt
RamyaMurugesan12
 
Hadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbaiHadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbai
Unmesh Baile
 
Hadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbaiHadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbai
Unmesh Baile
 
HDFS.ppt
HDFS.pptHDFS.ppt
HDFS.ppt
ssuserec53e73
 
Lec+3-Introduction-to-Distributed-Systems.pdf
Lec+3-Introduction-to-Distributed-Systems.pdfLec+3-Introduction-to-Distributed-Systems.pdf
Lec+3-Introduction-to-Distributed-Systems.pdf
samaghorab
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
shrey mehrotra
 
Distributed File System.ppt
Distributed File System.pptDistributed File System.ppt
Distributed File System.ppt
KhawajaWaqasRaheel
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating system
Moeez Ahmad
 
Network and System Administration chapter 2
Network and System Administration chapter 2Network and System Administration chapter 2
Network and System Administration chapter 2
IgguuMuude
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
Kathirvel Ayyaswamy
 
Chapter 06
Chapter 06Chapter 06
Chapter 06
cclay3
 
Chapter-5-DFS.ppt
Chapter-5-DFS.pptChapter-5-DFS.ppt
Chapter-5-DFS.ppt
rameshwarchintamani
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
AnkitChauhan817826
 
OS_Ch16
OS_Ch16OS_Ch16

Similar to Andrew File System (20)

Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologies
 
HDFS.ppt
HDFS.pptHDFS.ppt
HDFS.ppt
 
Hdfs
HdfsHdfs
Hdfs
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
 
Redis. Seattle Data Science and Data Engineering Meetup
Redis. Seattle Data Science and Data Engineering MeetupRedis. Seattle Data Science and Data Engineering Meetup
Redis. Seattle Data Science and Data Engineering Meetup
 
Hadoop -HDFS.ppt
Hadoop -HDFS.pptHadoop -HDFS.ppt
Hadoop -HDFS.ppt
 
Hadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbaiHadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbai
 
Hadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbaiHadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbai
 
HDFS.ppt
HDFS.pptHDFS.ppt
HDFS.ppt
 
Lec+3-Introduction-to-Distributed-Systems.pdf
Lec+3-Introduction-to-Distributed-Systems.pdfLec+3-Introduction-to-Distributed-Systems.pdf
Lec+3-Introduction-to-Distributed-Systems.pdf
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Distributed File System.ppt
Distributed File System.pptDistributed File System.ppt
Distributed File System.ppt
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating system
 
Network and System Administration chapter 2
Network and System Administration chapter 2Network and System Administration chapter 2
Network and System Administration chapter 2
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
Chapter 06
Chapter 06Chapter 06
Chapter 06
 
Chapter-5-DFS.ppt
Chapter-5-DFS.pptChapter-5-DFS.ppt
Chapter-5-DFS.ppt
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
 
OS_Ch16
OS_Ch16OS_Ch16
OS_Ch16
 

More from Università Degli Studi Di Salerno

Presentazione Laurea Magistrale
Presentazione Laurea MagistralePresentazione Laurea Magistrale
Presentazione Laurea Magistrale
Università Degli Studi Di Salerno
 
Key reinstallation attacks forcing nonce reuse in wpa2
Key reinstallation attacks forcing nonce reuse in wpa2Key reinstallation attacks forcing nonce reuse in wpa2
Key reinstallation attacks forcing nonce reuse in wpa2
Università Degli Studi Di Salerno
 
Digital Forensics
Digital ForensicsDigital Forensics
Flyers
FlyersFlyers
Sfida CTF: Nebula Level10
Sfida CTF: Nebula Level10Sfida CTF: Nebula Level10
Sfida CTF: Nebula Level10
Università Degli Studi Di Salerno
 
CPU Scheduling
CPU SchedulingCPU Scheduling
Wireless Security
Wireless SecurityWireless Security
Sviluppo di un'app Android
Sviluppo di un'app AndroidSviluppo di un'app Android
Sviluppo di un'app Android
Università Degli Studi Di Salerno
 
CrowdMine
CrowdMineCrowdMine

More from Università Degli Studi Di Salerno (9)

Presentazione Laurea Magistrale
Presentazione Laurea MagistralePresentazione Laurea Magistrale
Presentazione Laurea Magistrale
 
Key reinstallation attacks forcing nonce reuse in wpa2
Key reinstallation attacks forcing nonce reuse in wpa2Key reinstallation attacks forcing nonce reuse in wpa2
Key reinstallation attacks forcing nonce reuse in wpa2
 
Digital Forensics
Digital ForensicsDigital Forensics
Digital Forensics
 
Flyers
FlyersFlyers
Flyers
 
Sfida CTF: Nebula Level10
Sfida CTF: Nebula Level10Sfida CTF: Nebula Level10
Sfida CTF: Nebula Level10
 
CPU Scheduling
CPU SchedulingCPU Scheduling
CPU Scheduling
 
Wireless Security
Wireless SecurityWireless Security
Wireless Security
 
Sviluppo di un'app Android
Sviluppo di un'app AndroidSviluppo di un'app Android
Sviluppo di un'app Android
 
CrowdMine
CrowdMineCrowdMine
CrowdMine
 

Recently uploaded

UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
Claudio Di Ciccio
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 

Recently uploaded (20)

UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 

Andrew File System

  • 1. Scale and Performance in a Distributed File System Prof. Giuseppe Cattaneo Giorgio Vitiello
  • 2. Overview ◎Distributed File System ◎Andrew File System ◎Network File System ◎AFS vs NFS ◎Conclusions 2
  • 3. Distributed File System ◎File System (FS) controls how data is stored and retrieved ◎A Distributed File System (DFS) is a set of client and server services that allow access multiple hosts sharing information via network 3
  • 4. Andrew File System ◎Andrew is a distributed computing environment that has been under development at Carnegie Mellon University since 1983 ◎Each individual at CMU may eventually possess an Andrew workstation ◎A fundamental component of Andrew is the distributed file system 4
  • 5. Andrew File System ◎It’s the first wide DFS for large number of users ◎Andrew File System presents a homogeneous, location-transparent file name space to all the client workstations 5
  • 6. Andrew File System ◎The goal of Andrew File System is scalability ◎Large scale affects a distributed system in two ways ○ it degrades performance ○ it complicates administration and day-to-day operation ◎Andrew copes successfully with these concerns 6
  • 7. 1° Prototype ◎Validate the basic file system architecture ○ obtain feedback on our design as rapidly as possible ○ build a system that was usable enough to make that feedback meaningful ◎Prototype used by 400 users ◎100 workstations Sun2 with 65MB local disks ◎6 servers Sun2s or Vax-750s each with 2-3 disk of 400MB ◎Client-Server nodes run 4.2BSD of UNIX 7
  • 8. 1° Prototype ◎VICE: a set of trusted servers, it defines a homogeneous, location-transparent file name space to all the client ◎VENUS: client process, it caches files from Vice and stores modified copies of files back on the servers they came from 8
  • 9. 9
  • 10. ● Venus would rendezvous with a process listening at a well-known network address on a server ● This process then created a dedicated process to deal with all future requests from the client ● All communication and manipulation of data structures between server processes took place via files in the underlying file system 10
  • 11. ● Each server contained a directory hierarchy mirroring the structure of the Vice files stored on it ● The directory hierarchy contained Stub directories that represented portions of the Vice name space located on other servers. If a file were not on a server, the search for its name would end in a stub directory which identified the server containing that file 11
  • 12. The Benchmark ◎This benchmark consists of a command script that operates on a collection of files constituting an application program ◎Load Unit: load placed on a server by a single client workstation running this benchmark ◎A load unit corresponds to about five Andrew users ◎The input to the benchmark is a read-only source subtree consisting of about 70 files 12
  • 13. Phases of the benchmark ◎ MakeDir: constructs a target subtree that is identical in structure to the source subtree ◎ Copy: copies every file from the source subtree to the target subtree ◎ ScanDir: recursively traverses the target subtree and examines the status of every file in it. It doesn’t actually read the contents of any file ◎ ReadAll: scans every byte of every file in the target subtree once ◎ Make: compiles and links all the files in the target subtree 13
  • 14. 14
  • 15. Performance Observations ◎Venus used two caches ○ one for files ○ the other for status information about files ◎The most frequent system calls ○ TestAuth, validated cache entries ○ GetFileStat, obtained status information about files absent from the cache 15
  • 16. ● 12 machines ○ 81% file-cache hit ratio ○ 82% status-cache hit ratio ○ 6% Fetch and Stores Calls 16
  • 17. ● The total running time for the benchmark as a function of server load ● The table also shows the average response time for the most frequent Vice operation, TestAuth 17
  • 18. ● A software on servers to maintain statistics about CPU and disk utilization and about data transfers to and from the disks ● Table IV presents these data for four servers over a 2- week period ● As the CPU utilizations in the table show, the servers loads were not evenly balanced ● CPU utilization of about 40 percent for two most used servers. 18
  • 19. Performance Observations ◎The performance bottleneck in our prototype was the server CPU (high CPU utilization) ○ the frequency of context switches between the many server processes ○ the time spent by the servers in traversing full pathnames presented by workstations 19
  • 20. Performance Observations ◎Significant performance improvement is possible if ○ we reduce the frequency of cache validity checks ○ reduce the number of server processes ○ require workstations rather than the servers to do pathname traversals ○ balance server usage by reassigning users 20
  • 21. Changes for performance ◎Unchanged aspects ○ workstations cache entire files from a collection of dedicated autonomous servers ○ venus and server code run as user-level processes ○ communication between servers and clients is based on the RPC paradigm ○ the mechanism in the workstation kernels to intercept and forward file requests to Venus is the same as in the prototype 21
  • 22. Changes for performance ◎Changed aspects ○ To enhance performance ● Cache management ● Name resolution ● Communication and server process structure ● Low-level storage representation ○ To improve the operability of the system 22
  • 23. Cache management ◎Caching, the key to Andrew’s ability to scale well ◎Venus now caches the contents of directories and symbolic links in addition to files ◎There are still two separate caches: ○ for status ○ for data ◎Venus uses LRU to keep each of them bounded in size 23
  • 24. Cache management ◎ Modifications to a cached file are done locally and are reflected back to Vice when the file is closed ◎ Venus intercepts only the opening and closing of files and doesn’t participate in the reading or writing of individual bytes on a cached copy ◎ For the integrity, modifications to a directory are made directly on the server responsible for that directory ◎ Venus reflects the change in its cached copy to avoid refetching the directory 24
  • 25. Cache management ◎Callback mechanism to keep consistent cache entries ○ Venus assumes that cache entries are valid unless otherwise notified ○ When a workstation caches a file, the server promises to notify it before allowing a modification by any other workstation ◎Callback reduces the number of cache validation requests received by servers 25
  • 26. Cache management ◎Merits ○ reducing cache validation traffic ○ callback reduces the load on servers considerably ○ callback makes it feasible to resolve pathnames on workstations ◎Defects ○ callback complicates the system because each server and Venus now maintain callback state information ○ there is a potential for inconsistency if the callback state maintained by a Venus gets out of sync with the corresponding state maintained by the servers 26
  • 27. Name Resolution ◎4.2BSD system ○ inode: a file has a unique, fixed-length name ○ one or more variable-length pathnames that map to this inode ◎The routine that performs this mapping, namei, is usually one of the most heavily used ◎CPU overhead on the servers and was an obstacle to scaling 27
  • 28. Name Resolution ◎The notion of two-level names ◎Each Vice file or directory is now identified by a unique fixed-length Fid (96 bit) ◎Each entry in a directory maps a component of a pathname to a fid ◎Venus performs the logical equivalent of a namei operation, and maps Vice pathnames to fids ◎Servers are presented with fids and are unaware of pathnames 28
  • 29. Communication and Server Process Structure ◎The use of a server process per client didn’t scale well ◎Server processes couldn’t cache critical shared information in their address spaces ◎4.2BSD doesn’t permit processes to share virtual memory ◎The redesign solves these problems by using a single process to service all clients of a server 29
  • 30. Communication and Server Process Structure ◎An user-level mechanism to support multiple nonpreemptive Lightweight Processes (LWPs) within one process ◎Context switching between LWPs is only on the order of a few procedure-call times ◎Number of LWPs is typically five ◎RPC mechanism is implemented entirely outside the kernel and is capable of supporting many hundreds or thousands of clients per server 30
  • 31. Low-Level Storage Representation ◎ Limited cost of the namei operations involved in accessing data via pathnames ◎ Access files by their inodes ◎ The internal inode interface is not visible to user-level processes ◎ Added an appropriate set of system calls ◎ The vnode information for a Vice file identifies the inode of the file storing its data ◎ Data access consists of an index of a fid into a table to look up vnode information ◎ An iopen call to read or write the data 31
  • 32. Overall Design ◎Suppose a user process opens a file with pathname P on a workstation ◎The kernel detects that it is a Vice file and passes it to Venus on that workstation ◎Venus now uses the cache to examine each directory component D of P in succession 32
  • 33. Overall Design 1° Case If D is in the cache and has a callback on it, it’s used without any network communication 2° Case If D is in the cache but has no callback on it, the appropriate server is contacted, a new copy of D is fetched if it has been updated, and a callback is established on it 3° Case If D isn’t in the cache, it is fetched from the appropriate server, and a callback is established on it 33
  • 34. Overall Design File F is identified 34 Cache copy is created Venus returns to the kernel Kernel opens the cached copy of F and returns its handle to the user process Directories and F are in the cache with callbacks Venus regains control when F is closed.If it has been modified locally,updates it on the server
  • 35. Overall Design ◎Future references to this file will involve no network communication at all, unless a callback is broken on a component of P ◎An LRU replacement algorithm is periodically run to reclaim cache space ◎Problem: since the other LWPs in Venus may be concurrently servicing file access requests from other processes, accesses to cache data structures must be synchronized 35
  • 36. Overall Design ◎Design converged on the following consistency semantics ○ Writes to an open file by a process on a workstation are visible to all other processes on the workstation immediately but are invisible elsewhere in the network ○ Once a file is closed, the changes made to it are visible to new opens anywhere on the network ○ All other file operations are visible everywhere on the network immediately after the operation completes ○ Application programs have to cooperate in performing the necessary synchronization if they care about the serialization of these operations 36
  • 37. 2° Prototype ◎Two questions ○ It has the anticipated improvement in scalability been realized? ○ What are the characteristics of the system in normal operation? ◎Client workstations ○ IBM-RT ○ Storage 40 or 70MB ○ First Models with 4MB RAM 37
  • 38. ● Benchmark ○ Andrew workstation is 19% slower than a stand-alone ○ The prototype was 70% slower ○ Copy and Make phases are most susceptible to server load 38
  • 39. ● CPU and disk utilization on the server during the benchmark ○ CPU - from 8.1% to 70.9% ○ Disk - from 2.7% to 23.6% ● CPU still limits performance in our system ● Our design changes have improved scalability considerably ○ At a load of 20, the system is still not saturated (50 Andrew users) 39
  • 40. 40
  • 41. Comparison with a remote-open file system ◎Considerations about caching of entire files on local disks in the AFS ○ Locality: server load and network traffic are reduced ○ Whole-file transfer approach contacts servers only on opens and closes ○ Disk caches retain their entries across reboots ○ Caching of entire files simplifies cache management 41
  • 42. Comparison with a remote-open file system ◎Workstations require local disks for acceptable performance ◎Files that are larger than the local disk cache cannot be accessed at all ◎Concurrent read and write semantics across workstations is impossible 42
  • 43. Comparison with a remote-open file system ◎Could an alternative design have produced equivalent or better results? ◎How critical to scaling are caching and whole- file transfer? ◎AFS vs NFS 43
  • 44. Network File System ◎NFS developed by Sun Microsystems in 1984 ◎builds on the Open Network Computing Remote Procedure Call (ONC RPC) system ◎Client-Server Protocol ◎Sharing file, directory or FS ◎SO Unix 44
  • 45. Network File System ◎NFSv1 (1984) ○ develop in-house experimental purpose ◎NFSv2 (March 1989) ○ Release for commercial use, UDP protocol, stateless ◎NFSv3 (June 1995) ○ Overcome to v2, UDP/TCP protocol, stateless ◎NFSv4 (April 2003) ○ Stateful Server, TCP protocol ○ Focus on Performance, Accessibility, Scalability, Security 45
  • 46. 46
  • 47. AFS vs NFS ◎Scalability assessment, to compare with NFS ○ NFS is a mature product, not a prototype ○ Sun has spent a lot in NFS ○ Different architecture, same hardware ◎Benchmark ○ 18 workstations Sun3 ○ A server Sun3 ○ Client-Server on a 10MB Ethernet ○ Experiments on Andrew consists in a Cold Cache and Warm Cache 47
  • 48. 48 ● NFS performs slightly better than Andrew at low loads, not with huge loads ● ScanDir, ReadAll, Make show different performance ● Andrew wins on NFS in ReadAll
  • 49. 49 ● Benchmark in function of load unit ● CPU utilization in function of load unit ○ Load 1: 22% NFS - 3% AFS ○ Load 18: 100% NFS - 38% AFS in Cold Cache and 42% in Warm Cache
  • 50. Server Side ● Load 1 ○ NFS, Disk 1: 9% ○ NFS, Disk 2: 3% ○ AFS, Disk 1: 4% ● Load 18 ○ NFS, Disk 1: 95% ○ NFS, Disk 2: 19% ○ AFS, Disk 1: 33% 50
  • 51. Conclusions ◎ AFS is more scalable than NFS ◎ AFS would reduce its costs if moved in kernel side ◎ Andrew is implemented in user space ◎ NFS in kernel space 51
  • 52. Conclusions ◎ AFS testing ○ 400 workstations, 16 server, 3500 Andrew users ◎ Goals ○ Scaling until 5000 workstation ○ Moving Venus and server code in kernel ◎ AFS is a great FS ◎ Excellent Performance ◎ It compares favorably with the most prominent alternative distributed file system 52

Editor's Notes

  1. Andrew File System presenta uno spazio di nomi di file omogeneo e trasparente alla posizione su tutte le workstation client
  2. Il nostro obiettivo principale nella creazione di un prototipo era di convalidare l'architettura di base del file system
  3. Per massimizzare il numero di client che possono essere supportati da un server, la maggior parte del lavoro possibile viene eseguita da Venus anziché da Vice
  4. Poiché 4.2BSD non consente la condivisione di spazi di indirizzi tra processi, tutte le comunicazioni e la manipolazione di strutture di dati tra processi del server sono avvenute tramite file nel file system sottostante
  5. Venus si incontrerà con un processo in ascolto su un noto indirizzo di rete su un server Questo processo ha quindi creato un processo dedicato per gestire tutte le richieste future del client Tutte le comunicazioni e la manipolazione delle strutture di dati tra i processi del server sono avvenute tramite file nel file system sottostante
  6. Ogni server conteneva una gerarchia di directory che rispecchiava la struttura dei file Vice memorizzati su di esso. La gerarchia della directory conteneva le directory Stub che rappresentavano porzioni dello spazio dei nomi dei vice situati su altri server. Il database delle posizioni che mappa i file sui server è stato quindi incorporato nella struttura dei file. Se un file non si trovava su un server, la ricerca del suo nome terminerebbe in una directory stub che identificava il server contenente quel file
  7. Questo benchmark consiste in uno script di comando che opera su una raccolta di file che costituiscono un programma applicativo Unità di carico: carica posta su un server da una singola workstation client che esegue questo benchmark Un'unità di carico corrisponde a circa cinque utenti Andrew L'input per il benchmark è una sottostruttura sorgente di sola lettura composta da circa 70 file
  8. MakeDir: crea una sottostruttura di destinazione identica nella struttura alla sottostruttura di origine Copia: copia ogni file dalla sottostruttura di origine alla sottostruttura di destinazione ScanDir: attraversa in modo ricorsivo il sottostruttura di destinazione ed esamina lo stato di ogni file in esso contenuto. In realtà non legge il contenuto di alcun file ReadAll: esegue la scansione di ogni byte di ogni file nella sottostruttura di destinazione una volta Crea: compila e collega tutti i file nella sottostruttura di destinazione
  9. Su una workstation Sun 2 con un disco locale, questo benchmark impiega circa 1000 secondi per il completamento quando tutti i file sono ottenuti localmente. I tempi corrispondenti per le altre macchine sono mostrati nella Tabella I.
  10. La voce TestAuth ha convalidato le voci della cache, mentre GetFileStat ha ottenuto informazioni sullo stato dei file assenti dalla cache.
  11. Un'istantanea delle cache di 12 macchine ha mostrato un tasso medio di riscontri tra cache e file dell'81%, e un rapporto di riscontri tra cache e stato dell'82%, La tabella mostra anche che solo il 6% delle chiamate a Vice (Recupera e archivia) ha effettivamente comportato il trasferimento di file e che il rapporto delle chiamate di recupero per le chiamate di archivio era approssimativamente 2:1.
  12. Il tempo totale di esecuzione del benchmark in funzione del carico del server La tabella mostra anche il tempo medio di risposta per l'operazione Vice più frequente, TestAuth
  13. Un software sui server per mantenere le statistiche sull'uso della CPU e del disco e sui trasferimenti di dati da e verso i dischi La Tabella IV presenta questi dati per quattro server nell'arco di un periodo di 2 settimane Come mostrano gli utilizzi della CPU nella tabella, i carichi dei server non erano equamente bilanciati Utilizzo della CPU di circa il 40 percento per i due server più utilizzati.
  14. la frequenza degli switch di contesto tra i molti processi del server e il tempo trascorso dai server nel percorrere i percorsi completi presentati dalle workstation.
  15. il miglioramento significativo delle prestazioni è possibile se riduciamo la frequenza dei controlli di validità della cache, riduciamo il numero di processi del server, richiediamo workstation anziché i path per attraversare il percorso e bilanciamo l'utilizzo del server riassegnando gli utenti.
  16. Le wrokstations memorizzano nella cache interi file da una raccolta di server autonomi dedicati venus e il codice del server vengono eseguiti come processi a livello di utente la comunicazione tra server e client si basa sul paradigma RPC il meccanismo nei kernel delle workstation per intercettare e inoltrare le richieste di file a Venus è lo stesso del prototipo RPC = Remote Procedure Call Quindi l'RPC consente a un programma di eseguire subroutine "a distanza" su computer remoti, accessibili attraverso una rete. Essenziale al concetto di RPC è l'idea di trasparenza: la chiamata di procedura remota deve essere infatti eseguita in modo il più possibile analogo a quello della chiamata di procedura locale; i dettagli della comunicazione su rete devono essere "nascosti" (resi trasparenti) all'utilizzatore del meccanismo.
  17. Per migliorare le prestazioni Gestione della cache Risoluzione del nome Struttura del processo di comunicazione e server Rappresentazione dello storage di basso livello Per migliorare l'operabilità del sistema
  18. Caching, la chiave per la capacità di Andrew di scalare bene Venus ora memorizza nella cache il contenuto delle directory e dei collegamenti simbolici oltre ai file Ci sono ancora due cache separate: per lo stato per i dati Venus usa LRU per mantenere ognuna di esse delimitata in dimensioni
  19. Le modifiche a un file memorizzato nella cache vengono eseguite localmente e vengono riflesse su Vice quando il file viene chiuso Venus intercetta solo l'apertura e la chiusura di file e non partecipa alla lettura o scrittura di singoli byte su una copia cache Per l'integrità, le modifiche a una directory vengono effettuate direttamente sul server responsabile di quella directory Venus riflette la modifica nella sua copia memorizzata nella cache per evitare il refetching della directory
  20. Meccanismo di callback per mantenere le voci della cache coerenti Venus presume che le voci della cache siano valide salvo diversa comunicazione Quando una workstation memorizza nella cache un file, il server promette di notificarlo prima di consentire una modifica da parte di un'altra workstation La richiamata riduce il numero di richieste di convalida della cache ricevute dai server A small amount of cache validation traffic is still present, usually to replace callbacks lost because of machine or network failures Una piccola quantità di traffico di validazione della cache è ancora presente, in genere per sostituire i callback persi a causa di errori della macchina o della rete
  21. meriti riducendo il traffico di validazione della cache la funzione di callback riduce considerevolmente il carico sui server la funzione di callback rende possibile la risoluzione dei nomi di percorso sulle workstation difetti la callback complica il sistema poiché ogni server e Venus ora gestiscono le informazioni sullo stato del callback esiste una possibilità di incoerenza se lo stato di callback mantenuto da una Venus non è sincronizzato con lo stato corrispondente gestito dai server
  22. In un sistema convenzionale 4.2BSD, un file ha un nome univoco, a lunghezza fissa, il suo inode e uno o più nomi di percorsi a lunghezza variabile che si associano a questo inode. La routine che esegue questo mapping, namei, è in genere una delle parti del kernel più pesantemente utilizzate e che richiede tempo. Ciò ha comportato un notevole sovraccarico della CPU sui server ed è stato un ostacolo alla scalabilità
  23. SOLUZIONE : La nozione di nomi a due livelli Ogni file o directory di Vice è ora identificata da un Fid di lunghezza fissa unico (96 bit) Ogni voce in una directory associa un componente di un percorso a un fid Venus esegue l'equivalente logico di un'operazione denominata e mappa i nomi dei percorsi di Vice in fids I server sono presentati con i fids e non sono a conoscenza dei nomi dei percorsi
  24. L'utilizzo di un processo server per client non ha avuto una buona scalabilità I processi del server non potevano memorizzare nella cache informazioni critiche condivise negli spazi degli indirizzi 4.2BSD non consente ai processi di condividere la memoria virtuale La riprogettazione risolve questi problemi utilizzando un singolo processo per servire tutti i client di un server
  25. Un meccanismo a livello di utente per supportare più processi leggeri non penetrabili (LWP) all'interno di un unico processo Il cambio di contesto tra LWP è solo nell'ordine di poche procedure, i tempi di chiamata Il numero di LWP è in genere cinque (Il numero di LWP (in genere cinque) viene determinato quando un server viene inizializzato e rimane fisso in seguito) Il meccanismo RPC è implementato interamente al di fuori del kernel ed è in grado di supportare molte centinaia o migliaia di client per server
  26. Come menzionato nella Sezione 3.2, siamo stati cauti sul costo delle operazioni namei coinvolte nell'accesso ai dati tramite i nomi dei percorsi. Pertanto, abbiamo deciso di accedere ai file tramite i loro inode anziché tramite i nomi dei percorsi. Poiché l'interfaccia inode interna non è visibile ai processi a livello utente, abbiamo dovuto aggiungere un insieme appropriato di chiamate di sistema. Le informazioni di vnode per un file Vice identificano l'inode del file che memorizza i suoi dati. Aggiunto un set appropriato di chiamate di sistema Le informazioni di vnode per un file Vice identificano l'inode del file che memorizza i suoi dati L'accesso ai dati consiste in un indice di un fid in una tabella per cercare informazioni su vnode Una chiamata di iopen per leggere o scrivere i dati
  27. Test di accesso ad un file remoto Supponiamo che un processo utente apra un file con il percorso P su una workstation. Il kernel, nel risolvere P, rileva che è un file Vice e lo passa a Venus su quella workstation. Uno dei LWP che comprende Venus ora utilizza la cache per esaminare in successione ciascun componente di directory D di P:
  28. -Se D è nella cache e ha una richiamata su di esso, viene utilizzato senza alcuna comunicazione di rete. -Se D è nella cache ma non contiene alcuna richiamata, viene contattato il server appropriato, una nuova copia di D viene recuperata se è stata aggiornata e su di essa è stata stabilita una richiamata. -Se D non è nella cache, viene prelevato dal server appropriato e su di esso viene stabilita una richiamata.
  29. Quando viene identificato il file di destinazione F, viene creata una copia cache corrente nello stesso modo. Venus quindi ritorna al kernel, che apre la copia cache di F e restituisce il suo handle al processo utente. Quindi, alla fine del pathname traversal, tutte le directory intermedie e il file di destinazione sono nella cache con i callback su di essi. I futuri riferimenti a questo file non implicheranno alcuna comunicazione di rete, a meno che una callback non venga interrotta su un componente di P. Venus riprende il controllo del file quando è chiuso e, se è stato modificato localmente, lo aggiorna sul server appropriato. Un algoritmo di sostituzione LRU viene periodicamente eseguito per recuperare spazio nella cache.
  30. Cache consistency and concurrent control I futuri riferimenti a questo file non implicheranno alcuna comunicazione di rete, a meno che una callback non venga interrotta su un componente di P Un algoritmo di sostituzione LRU viene periodicamente eseguito per recuperare spazio nella cache Problema: poiché gli altri LWP di Venus possono essere contemporaneamente utilizzati per le richieste di accesso ai file da altri processi, gli accessi alle strutture di dati della cache devono essere sincronizzati
  31. Il design convergeva sulla seguente semantica della coerenza Le scritture su un file aperto da un processo su una workstation sono immediatamente visibili a tutti gli altri processi sulla workstation, ma sono invisibili altrove nella rete Una volta che un file è chiuso, le modifiche apportate ad esso sono visibili alle nuove aperture in qualsiasi punto della rete Tutte le altre operazioni sui file sono visibili ovunque sulla rete immediatamente dopo il completamento dell'operazione I programmi applicativi devono cooperare per eseguire la sincronizzazione necessaria se si preoccupano della serializzazione di queste operazioni
  32. Due domande È stato realizzato il miglioramento anticipato della scalabilità? Quali sono le caratteristiche del sistema nel normale funzionamento?
  33. Table VI shows that the Copy and Make phases are most susceptible to server load.
  34. La tabella VIII presenta CPU del server e utilizzi del disco in Andrew. Le cifre mostrate sono medie su un periodo di un'ora dalle 9:00 A.M. alle 5:00 P.M. nei giorni della settimana. La maggior parte dei server mostra un utilizzo della CPU tra il 15 e il 25 percento. Uno dei server, vice4, mostra un utilizzo del 35,8 percento, ma l'utilizzo del disco non è corrispondentemente alto. L'elevata deviazione standard per l'utilizzo della CPU ci porta a credere che questa anomalia sia stata causata da attività di manutenzione del sistema che sono state eseguite inaspettatamente durante il giorno piuttosto che durante la notte. Server vice9, d'altra parte, mostra un utilizzo della CPU del 37,6 percento con una piccola deviazione standard. L'utilizzo del disco è del 12,1 percento, il più alto di qualsiasi server. L'elevato utilizzo è spiegato dal fatto che questo server memorizza le bacheche elettroniche, una raccolta di directory che sono spesso accessibili e modificate da molti utenti diversi.
  35. La memorizzazione nella cache di interi file su dischi locali in Andrew File System era motivata principalmente dalle seguenti considerazioni di scala: -Località dei riferimenti di file da parte degli utenti tipici rende attraente la cache: il carico del server e il traffico di rete sono ridotti. -Un approccio di trasferimento di file completi contatta i server solo su apre e chiude. Le operazioni di lettura e scrittura, che sono molto più numerose, sono trasparenti per i server e non causano traffico di rete. - lo studio di Ousterhout et al. [4] ha mostrato che la maggior parte delle tessere in un ambiente 4.2BSD sono lette nella loro interezza. Il trasferimento di file interi sfrutta questa proprietà consentendo l'uso di efficienti protocolli di trasferimento di dati alla rinfusa. -Le cache dei dischi mantengono le loro voci attraverso i riavvii, un evento sorprendentemente frequente negli ambienti workstation. Poiché alcuni dei file a cui si accede da un utente tipico sono suscettibili di essere modificati altrove nel sistema, la quantità di dati recuperati dopo un riavvio è in genere ridotta. -Infine, il caching di interi file semplifica la gestione della cache. Venere deve solo tenere traccia delle tessere nella sua cache, non delle loro singole pagine.
  36. Le workstation richiedono dischi locali per prestazioni accettabili Non è possibile accedere a file più grandi della cache del disco locale La semantica simultanea di lettura e scrittura tra le workstation è impossibile
  37. Un progetto alternativo potrebbe aver prodotto risultati equivalenti o migliori? Quanto sono importanti per il ridimensionamento del caching e il trasferimento di interi file? AFS vs NFS
  38. Gli esperimenti di Andrew consistevano in due sottoinsiemi: un set di Cold Cache, in cui le cache venivano cancellate prima di ogni prova e un set di Warm Cache, in cui le cache venivano lasciate inalterate. Valutazione della scalabilità, da confrontare con NFS NFS è un prodotto maturo, non un prototipo Sun ha speso molto in NFS Architettura diversa, stesso hardware segno di riferimento 18 postazioni di lavoro Sun3 Un server Sun3 Client-Server su Ethernet da 10 MB Gli esperimenti su Andrew consistono in Cold Cache e Warm Cache