Distributed computing

406 views

Published on

Distributed Computing,dispributed Operating System,Distributed File System

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
406
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
25
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Distributed computing

  1. 1. Distributed Operating System and File System Deepak John Department Of I.T,CE-Poonjar
  2. 2. Introduction Distributed OS I. Presents users (and applications) with an integrated computing platform that hides the individual computers. II. Has control over all of the nodes (computers) in the network and allocates their resources to tasks without user involvement. III. the user doesn't know (or care) where his programs are running. Ex:Cluster computer systems,V system, Sprite Deepak John,CE-Poonjar
  3. 3. Operating System layer Applications, services Computer & Platform Middleware OS: kernel, libraries & servers network hardware OS1 Computer & network hardware Node 1 Node 2 Processes, threads, communication, ... OS2 Processes, threads, communication, ... Deepak John,CE-Poonjar
  4. 4. Middleware implements abstractions that support network-wide programming. Examples: RPC and RMI (Sun RPC, Corba, Java RMI) event distribution and filtering (Corba Event Notification, Elvin) resource discovery for mobile and ubiquitous computing support for multimedia streaming Deepak John,CE-Poonjar
  5. 5. Functions that OS should provide for middleware 1. Encapsulation provide a set of operations that meet their clients’ needs 2. Protection protect resource from illegitimate access 3. Concurrent processing support clients access resource concurrently 4. Invocation mechanism: a means of accessing an encapsulated resource Communication : Pass operation parameters and results between resource managers Scheduling: Schedule the processing of the invoked operation Deepak John,CE-Poonjar
  6. 6. Core OS functionality Communication manager Thread manager Memory manager Supervisor Process manager Deepak John,CE-Poonjar
  7. 7. Process manager Handles the creation of and operations upon processes. Thread manager Thread creation, synchronization and scheduling Communication manager Communication between threads attached to different processes on the same computer Memory manager Management of physical and virtual memory Supervisor Dispatching of interrupts, system call traps and other exceptions control of memory management unit and hardware caches processor and floating point unit register manipulations Deepak John,CE-Poonjar
  8. 8. Kernel and Protection Kernel always runs complete access privileges for the physical resources Different execution mode supervisor mode (kernel process) / user mode (user process) system call trap: invocation mechanism for resources managed by kernel An address space: a collection of ranges of virtual memory locations, in each of which a specified combination of memory access rights applies, e.g.: read only or read-write The cost for protection switching between different processes take many processor cycles a system call trap is a more expensive operation than a simple method call Deepak John,CE-Poonjar
  9. 9. Processes and Threads Process A program in execution Problem: sharing between related activities are awkward and expensive a process consists of an execution environment together with one or more threads. An execution environment is a collection of local kernel managed resources to which its threads have access. An execution environment primarily consists of: an address space; thread synchronization and communication resources such as semaphores and communication interfaces (for example, sockets); higher-level resources such as open files and windows. Deepak John,CE-Poonjar
  10. 10. Process address space Stack Text Heap Auxiliary regions 0 2 N Address space a unit of management of a process’s virtual memory Up to 232 bytes and sometimes up to 264 bytes. consists of one or more regions. Region an area of continuous virtual memory that is accessible by the threads of the owning process. can be shared 1. kernel code 2. libraries 3. shared data & communication 4. Copy on write Deepak John,CE-Poonjar
  11. 11. Creation of new process in distributed system Creating process by the operation system Fork, in UNIX Process creation in distributed system The choice of a target host Choice of process host running new processes at their originator’s computer sharing processing load between a set of computers Load sharing system Centralized, Decentralized, Hierarchical sender-initiated,receiver-initiated Deepak John,CE-Poonjar
  12. 12. Load sharing policy Transfer policy: situate a new process locally or remotely Location policy: which node should host the new process 1. Static policy without regard to the current state of the system 2.Adaptive policy applies heuristics to make their allocation decision Creation of a new execution environment 1. Initializing the address space Statically defined format With respect to an existing execution environment, e.g. fork() 2. Copy-on-write scheme A technique that is used to reduce amount of data copy. Deepak John,CE-Poonjar
  13. 13. Copy-on-write – a convenient optimization a) Before write b) After write Shared frame A's page table B's page table Process A’s address space Process B’s address space Kernel RA RB RB copied from RA Deepak John,CE-Poonjar
  14. 14. Threads concept and implementation Thread threads were introduced as a lightweight – operating system provided . a process consists of its address-space, and a set of threads attached to that process. The operating system can perform less expensive context switches between threads attached to the same process and threads attached to the same process can access the same memory etc, such that communication/ synchronisation can be much cheaper and less awkward Deepak John,CE-Poonjar
  15. 15. A server application generally consists of: A single thread, the receiver- thread which receives all the requests, places them in a queue and dispatches those requests to be dealt with by the worker- threads The worker-thread which deals with the request may be a thread in the same process or it may be a thread in another process Deepak John,CE-Poonjar
  16. 16. Client and server with threads Worker pool Server creates a fixed pool of “worker” threads to process the requests when it starts up. The module marked ‘receipt and queuing is typically implemented by an ‘I/O’ thread, Pro: simple Cons 1. Inflexibility: worker threads number unequal to current request number 2. High level of switching between the I/O and worker thread For single thread, server can handle 100 client requests per second Deepak John,CE-Poonjar
  17. 17. Alternative server threading architectures Thread-per-request Server spawn a new worker thread for each new request, destroy it when the request processing finish. Pro: throughput is potentially maximized. Con: overhead of the thread creation and destruction Deepak John,CE-Poonjar
  18. 18. Thread-per-connection Server creates a new worker thread when client creates a connection, destroys the thread when the client closes the connection. Pro: lower thread management overheads compared with the thread- per-request. Con: client may be delayed while a worker thread has several outstanding requests but another thread has no work to perform Thread-per-object Associate a thread with each remote object Pro&Con are similar to thread-per-connection Deepak John,CE-Poonjar
  19. 19. Threads versus multiple processes Creating a thread is (much) cheaper than a process (~10-20times) Switching to a different thread in same process is (much) cheaper (5-50 times) Threads within same process can share data and other resources more conveniently and efficiently (without copying or messages) Threads within a process are not protected from each other Threads implementation can be implemented: in the OS kernel (Win NT, Solaris, Mach) at user level e.g. by a thread library, or in the language (Ada, Java). Deepak John,CE-Poonjar
  20. 20. Java thread constructor and management methods • Thread(ThreadGroup group, Runnable target, String name) Creates a new thread in the SUSPENDED state, which will belong to group and be identified as name; the thread will execute the run() method of target. • setPriority(int newPriority), getPriority() Set and return the thread’s priority. • run() A thread executes the run() method of its target object, if it has one, and otherwise its own run() method (Thread implements Runnable). Methods of objects that inherit from class Thread Deepak John,CE-Poonjar
  21. 21. • start() Change the state of the thread from SUSPENDED to RUNNABLE. • sleep(int millisecs) Cause the thread to enter the SUSPENDED state for the specified time. • yield() Enter the READY state and invoke the scheduler. • destroy() Destroy the thread. Deepak John,CE-Poonjar
  22. 22. Java thread synchronization calls • thread.join(int millisecs) Blocks the calling thread for up to the specified time until thread has terminated. • thread.interrupt() Interrupts thread: causes it to return from a blocking method call such as sleep(). • object.wait(long millisecs, int nanosecs) Blocks the calling thread until a call made to notify() or notifyAll() on object wakes the thread, or the thread is interrupted, or the specified time has elapsed. • object.notify(), object.notifyAll() Wakes, respectively, one or all of any threads that have called wait() on object. Deepak John,CE-Poonjar
  23. 23. Operating system architecture Monolithic Kernel Microkernel Server: Dynamically loaded server program:Kernel code and data: ....... ....... Key: S4 S1 ....... S1 S2 S3 S2 S3 S4 Monolithic kernel and microkernel Deepak John,CE-Poonjar
  24. 24. Monolithic vs Microkernel A monolithic kernel provides all of the services via a single image, that is a single program initialized when the computer boots A microkernel instead implements only the absolute minimum: Basic virtual memory, Basic scheduling and Inter-process communication All other services such as device drivers, the file system, networking etc are implemented as user-level server processes that communicate with each other and the kernel via IPC Deepak John,CE-Poonjar
  25. 25. The Microkernel Approach The major advantages of the microkernel approach include: Extensibility - major functionality can be added without modifying the core kernel of the operating system Modularity - the different functions of the operating system can be forced into modularity behind memory protection barriers. A monolithic kernel must use programming language features or code conventions to attempt to ensure this Robustness -relatively small kernel might be likely to contain fewer bugs than a larger program, however, this point is rather contentious Portability - since only a small portion of the operating system, its smaller kernel, relies on the particulars of a given machine it is easier to port to a new machine architecture Deepak John,CE-Poonjar
  26. 26. The role of the microkernel Middleware Language support subsystem Language support subsystem OS emulation subsystem .... Microkernel Hardware The microkernel supports middleware via subsystems Deepak John,CE-Poonjar
  27. 27. The Monolithic Approach The major advantage of the monolithic approach is the relative efficiency with which operations may be invoked. Since services share an address space with the core of the kernel they need not make system calls to access core-kernel functionality Most operating systems in use today are a kind of hybrid solution Linux is a monolithic kernel, but modules may be dynamically loaded and unloaded at run time. Deepak John,CE-Poonjar
  28. 28. Distributed File Systems Deepak John,CE-Poonjar
  29. 29. Purposes of a Distributed File System Sharing of storage and information across a network Convenience (and efficiency) of a conventional file system Persistent storage that most other services (e.g., Web servers) need Files Files are an abstraction of permanent storage. A file is typically defined as a sequence of similar-sized data items along with a set of attributes. A directory is a file that provides a mapping from text names to internal file identifiers. Deepak John,CE-Poonjar
  30. 30. File attributes updated by system: File length Creation timestamp Read timestamp Write timestamp Attribute timestamp Reference count Owner File type Access control list E.g. for UNIX: rw-rw-r-- updated by owner: Deepak John,CE-Poonjar
  31. 31. File Systems Responsible for the (a) organization, (b)storage, (c) retrieval, (d) naming, (e)sharing, and (f) protection of files. Provide a set of programming operations that characterize the file abstraction,particularly operations to read and write subsequences of data items beginning at any point of a file. Directory module: relates file names to file IDs File module: relates file IDs to particular files Access control module: checks permission for operation requested File access module: reads or writes file data or attributes Block module: accesses and allocates disk blocks Device module: disk I/O and buffering File system modules Deepak John,CE-Poonjar
  32. 32. UNIX file system operations Deepak John,CE-Poonjar
  33. 33. Distributed File System Requirements Deepak John,CE-Poonjar
  34. 34. File Service Architecture An architecture that offers a clear separation of the main concerns in providing access to files is obtained by structuring the file service as three components: A flat file service A directory service A client module. The Client module implements exported interfaces by flat file and directory services on server side. Deepak John,CE-Poonjar
  35. 35. Client computer Server computer Application program Application program Client module Flat file service Directory service Lookup AddName UnName GetNames Read Write Create Delete GetAttributes SetAttributes Deepak John,CE-Poonjar
  36. 36. Flat file service: Concerned with the implementation of operations on the contents of file. Unique File Identifiers (UFIDs) are used to refer to files in all requests for flat file service operations. Directory Service: Provides mapping between text names for the files and their UFIDs. Clients may obtain the UFID of a file by quoting its text name to directory service. Directory service supports functions needed generate directories, to add new files to directories. Client Module: It runs on each computer and provides integrated service (flat file and directory) as a single API to application programs. It holds information about the network locations of flat-file and directory server processes; and achieve better performance through implementation of a cache of recently used file blocks at the client. Deepak John,CE-Poonjar
  37. 37. Flat file service operations Read(FileId, i, n) -> Data — throws BadPosition If 1 ≤ i ≤ Length(File): Reads a sequence of up to n items from a file starting at item i and returns it in Data. Write(FileId, i, Data) — throws BadPosition If 1 ≤ i ≤ Length(File)+1: Writes a sequence of Data to a file, starting at item i, extending the file if necessary. Create() -> FileId Creates a new file of length 0 and delivers a UFID for it. Delete(FileId) Removes the file from the file store. GetAttributes(FileId) -> Attr Returns the file attributes for the file. SetAttributes(FileId, Attr) Sets the file attributes Deepak John,CE-Poonjar
  38. 38. Access control UNIX checks access rights when a file is opened ,subsequent checks during read/write are not necessary distributed environment a. server has to check. b. stateless approaches 1. access check once when UFID is issued • client gets an encoded "capability" (who can access and how) • capability is submitted with each subsequent request 2. access check for each request. Deepak John,CE-Poonjar
  39. 39. File Group A collection of files that can be located on any server or moved between servers while maintaining the same names. Similar to a UNIX filesystem Helps with distributing the load of file serving between several servers. File groups have identifiers which are unique throughout the system (and hence for an open system, they must be globally unique). Used to refer to file groups and files To construct a globally unique ID we use some unique attribute of the machine on which it is created, e.g. IP number, even though the file group may move subsequently. IP address date 32 bits 16 bits File Group ID: Deepak John,CE-Poonjar
  40. 40. Case Study: Sun NFS An industry standard for file sharing on local networks since the 1980s An open standard with clear and simple interfaces. OS independent unix implementation rpc udp or tcp Supports many of the design requirements already mentioned: Transparency,heterogeneity,efficiency,fault tolerance Limited achievement of: Concurrency,replication,consistency,security Deepak John,CE-Poonjar
  41. 41. NFS architecture Client computer Server computer UNIX file system NFS client NFS server UNIX file system Application program Application program Virtual file systemVirtual file system Other filesystem UNIX kernel system calls NFS protocol (remote operations) UNIX Operations on local files Operations on remote files Application program NFS Client Kernel Application program NFS Client Client computer Deepak John,CE-Poonjar
  42. 42. Virtual file system access transparency part of unix kernel NFS file handle, 3 components: filesystem identifier different groups of files i-node (index node) structure for finding the file i-node generation number i-nodes are reused incremented when reused VFS struct for each file system v-node for each open file file handle for remote file i-node number for local file Deepak John,CE-Poonjar
  43. 43. NFS access control and authentication Stateless server, so the user's identity and access rights must be checked by the server on each request. In the local file system they are checked only on open() Every client request is accompanied by the userID and groupID Server is exposed to imposter attacks unless the userID and groupID are protected by encryption Kerberos has been integrated with NFS to provide a stronger and more comprehensive security solution Deepak John,CE-Poonjar
  44. 44. • read(fh, offset, count) -> attr, data • write(fh, offset, count, data) -> attr • create(dirfh, name, attr) -> newfh, attr • remove(dirfh, name) status • getattr(fh) -> attr • setattr(fh, attr) -> attr • lookup(dirfh, name) -> fh, attr • rename(dirfh, name, todirfh, toname) • link(newdirfh, newname, dirfh, name) • readdir(dirfh, cookie, count) -> entries NFS server operations (simplified) fh = file handle: Filesystem identifier i-node number i-node generation Read(FileId, i, n) -> Data Write(FileId, i, Data) Create() -> FileId Delete(FileId) GetAttributes(FileId) -> Attr SetAttributes(FileId, Attr) Lookup(Dir, Name) -> FileId AddName(Dir, Name, File) UnName(Dir, Name) GetNames(Dir, Pattern) ->NameSeq Deepak John,CE-Poonjar
  45. 45. Mount service the process of including a new file system is called mounting. communicates with the mount process on the server in a mount protocol. Server maintains a table of clients who have mounted filesystems at that server Each client maintains a table of mounted file systems holding. hard-mounted user process is suspended until request is successful when server is not responding request is retried until it's satisfied soft-mounted if server fails, client returns failure after a small number of retries user process handles the failure Deepak John,CE-Poonjar
  46. 46. THE MOUNT PROTOCOL: The following operations occur: 1. The client's request is sent via RPC to the mount server ( on server machine.) 2. Mount server checks export list containing a) file systems that can be exported, b) legal requesting clients. c) It's legitimate to mount any directory within the legal filesystem. 3. Server returns "file handle" to client. 4. Server maintains list of clients and mounted directories -- this is state information! But this data is only a "hint" and isn't treated as essential. 5. Mounting often occurs automatically when client or server boots. Deepak John,CE-Poonjar
  47. 47. Local and remote file systems jim jane joeann usersstudents usrvmunix Client Server 2 . . . nfs Remote mount staff big bobjon people Server 1 export (root) Remote mount . . . x (root) (root) Note: The file system mounted at /usr/students in the client is actually the sub-tree located at /export/people in Server 1; the file system mounted at /usr/staff in the client is actually the sub-tree located at /nfs/users in Server 2. Deepak John,CE-Poonjar
  48. 48. Pathname translation pathname: /users/students/dc/abc server doesn't receive the entire pathname for translation, client breaks down the pathnames into parts iteratively translate each part translation is cached Deepak John,CE-Poonjar
  49. 49. Automounter NFS client catches attempts to access 'empty' mount points and routes them to the Automounter Automounter has a table of mount points and multiple candidate serves for each it sends a probe message to each candidate server and then uses the mount service to mount the filesystem at the first server to respond Keeps the mount table small Deepak John,CE-Poonjar
  50. 50. Server caching caching file pages, directory/file attributes read-ahead: prefetch pages following the most-recently read file pages delayed-write: write to disk when the page in memory is needed for other purposes "sync" flushes "dirty" pages to disk every 30 seconds two write option 1. write-through: write to disk before replying to the client 2. cache and commit: stored in memory cache write to disk before replying to a "commit" request from the client Deepak John,CE-Poonjar
  51. 51. Client caching caches results of read, write, getattr, lookup, readdir clients responsibility to poll the server for consistency timestamp-based methods for consistency validation Tc: time when the cache entry was last validated Tm: time when the block was last modified at the server cache entry is valid if: 1. T - Tc < t, where t is the freshness interval t is adaptively adjusted: files: 3 to 30 seconds depending on freq of updates directories: 30 to 60 seconds 2. Tmclient = Tmserver need validation for all cache accesses Reading Deepak John,CE-Poonjar
  52. 52. writing dirty: modified page in cache flush to disk: file is closed or sync from client bio-daemon (block input-output) read-ahead: after each read request, request the next file block from the server as well delayed write: after a block is filled, it's sent to the server reduce the time to wait for read/write Deepak John,CE-Poonjar
  53. 53. Summary for NFS access transparency: same system calls for local or remote files location transparency: could have a single name space for all files (depending on all the clients to agree the same name space) mobility transparency: mount table need to be updated on each client (not transparent) scalability: can usually support large loads, add processors, disks, servers... file replication: read-only replication, no support for replication of files with updates hardware and OS: many ports fault tolerance: stateless and idempotent consistency: not quite one-copy for efficiency security: added encryption--Kerberos efficiency: pretty efficient, wide-spread use Deepak John,CE-Poonjar
  54. 54. Naming Services Definition Key benefits Resource localization Uniform naming Device independent address (e.g., you can move domain name/web site from one server to another server seamlessly). In a Distributed System, a Naming Service is a specific service whose aim is to provide a consistent and uniform naming of resources, this allowing other programs or services to localize them and obtain the required metadata for interacting with them. Deepak John,CE-Poonjar
  55. 55. Requirements for name spaces Allow simple but meaningful names to be used Potentially infinite number of names Structured to allow similar subnames without clashes to group related names Management of trust Deepak John,CE-Poonjar
  56. 56. Iterative navigation Client 1 2 3 A client iteratively contacts name servers NS1–NS3 in order to resolve a name NS2 NS1 NS3 Name servers Iterative Navigation is the act of chaining multiple Naming Services in order to resolve a single name to the corresponding resource. Iterative Navigation is the act of chaining multiple Naming Services in order to resolve a single name to the corresponding resource. Deepak John,CE-Poonjar
  57. 57. Recursive Navigation The Iterative Navigation can be… Recursive: it is performed by the naming server the server becomes like a client for the next server Non recursive: it is performed by the client or the first server if it is performed by the server it is “server controlled” the server bounces back the next hop to its client Deepak John,CE-Poonjar
  58. 58. Non-recursive and recursive server-controlled navigation A name server NS1 communicates with other name servers on behalf of a client Recursive server-controlled 1 2 3 5 4 client NS2 NS1 NS3 1 2 34 client NS2 NS1 NS3 Non-recursive server-controlled DNS offers recursive navigation as an option, but iterative is the standard technique. Recursive navigation must be used in domains that limit client access to their DNS information for security reasons. Deepak John,CE-Poonjar
  59. 59. The SNS - a Simple Name Service model Stores attributes of named objects such as users, computers and services and group names. Users Computers Services Email server, login info, encoded passwords, home directory Network addresses, architecture, OS, owner Named object Value Service address, version no. Group Mailing lists, group1, group2,... Deepak John,CE-Poonjar
  60. 60. SNS basic design requirements Specify the Types of named objects: users, services, computers and group names and directories. Other types of objects may be integrated; The names are used only within the organization; Efficient name lookup; Access control: everyone can read but Authorized write; Deepak John,CE-Poonjar
  61. 61. DNS - The Internet Domain Name System A distributed naming database (specified in RFC 1034/1305) Name structure reflects administrative structure of the Internet Rapidly resolves domain names to IP addresses exploits caching heavily typical query time ~100 milliseconds Scales to millions of computers partitioned database caching replication Deepak John,CE-Poonjar
  62. 62. Basic DNS algorithm for name resolution (domain name -> IP number) • Look for the name in the local cache • Try a superior DNS server, which responds with: – another recommended DNS server – the IP address (which may not be entirely up to date) Deepak John,CE-Poonjar
  63. 63. DNS server functions and configuration Main function is to resolve domain names for computers, i.e. to get their IP addresses. Other functions: reverse resolution - get domain name from IP address Host information - type of hardware and OS Well-known services - a list of well-known services offered by a host Other attributes can be included (optional) Deepak John,CE-Poonjar
  64. 64. DNS issues Name tables change infrequently, but when they do, caching can result in the delivery of stale data. Clients are responsible for detecting this and recovering Its design makes changes to the structure of the name space difficult. For example: merging previously separate domain trees under a new root moving subtrees to a different part of the structure (e.g. if Scotland became a separate country, its domains should all be moved to a new country-level domain.) . Deepak John,CE-Poonjar

×