Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Nov 24,2000 r.innocente 1
Automatic load balancing and
transparent process migration
Roberto InnocenteRoberto Innocente
ri...
Nov 24,2000 r.innocente 2
Introduction
Purpose of this presentation is an introductionPurpose of this presentation is an i...
Nov 24,2000 r.innocente 3
Scheme of the presentation
1.1. Overview of some Distributed OperatingOverview of some Distribut...
Nov 24,2000 r.innocente 4
Overview of some
distributed OS
Nov 24,2000 r.innocente 5
Distributed O.S.
!! SpriteSprite
!! CharlotteCharlotte
!! AccentAccent
!! AmoebaAmoeba
!! V Syst...
Nov 24,2000 r.innocente 6
Sprite (Douglis,Ousterhout* 1987)
Experimental Network OS developed at UCB.Experimental Network ...
Nov 24,2000 r.innocente 7
Charlotte (Artsy, Finkel 1989)
It’s a Distributed Operating System developedIt’s a Distributed O...
Nov 24,2000 r.innocente 8
Accent (Zayas 1987)
"" developed at CMU as precursor of Machdeveloped at CMU as precursor of Mac...
Nov 24,2000 r.innocente 9
Amoeba (Tanenbaum et al 1990)
"" Tanenbaum,van Renesse,van Staveren, Sharp, Mullender, Jansen, v...
Nov 24,2000 r.innocente 10
V system (Theimer 1987)
"" developed at Stanford, microkernel based,developed at Stanford, micr...
Nov 24,2000 r.innocente 11
MOSIX (Barak 1990,1995)
its roots are back in the mid ’80s :its roots are back in the mid ’80s ...
Nov 24,2000 r.innocente 12
Condor (Litzkow,Livny 1988)
Condor is not a Distributed OS.Condor is not a Distributed OS.
It i...
Nov 24,2000 r.innocente 13
Load balancing
mechanisms
Nov 24,2000 r.innocente 14
Load balancing methods
taxonomy
"" job initiation only (aka job placement, aka remotejob initia...
Nov 24,2000 r.innocente 15
Load balancing algorithms
Two main categories:Two main categories:
"" Non preNon pre--emptive:e...
Nov 24,2000 r.innocente 16
Non pre-emptive load balancing
(aka job placement)
"" centralized initiation:centralized initia...
Nov 24,2000 r.innocente 17
Pre-emptive load balancing
"" randomrandom:: when a computer becomes overloaded it select at ra...
Nov 24,2000 r.innocente 18
Processor load measures
The metric used to evaluate the load is extremely important in a load
b...
Nov 24,2000 r.innocente 19
Process migration
mechanisms
Nov 24,2000 r.innocente 20
User/kernel level migration
"" Kernel level process migration (Mosix):Kernel level process migr...
Nov 24,2000 r.innocente 21
Process migration factors:
"" transparencytransparency:how much a process needs to be:how much ...
Nov 24,2000 r.innocente 22
Transparent:
"" Location transparentLocation transparent
"" Access transparent (or network)Acce...
Nov 24,2000 r.innocente 23
System call interposing
In many cases transparency is obtained interposing some codeIn many cas...
Nov 24,2000 r.innocente 24
Process migration parts:
"" execution state migrationexecution state migration
"" memory space ...
Nov 24,2000 r.innocente 25
Execution state migration
It’s not that difficult. For the mostIt’s not that difficult. For the...
Nov 24,2000 r.innocente 26
Communication state
migration
It is a difficult task and usually it is avoided forcing the
migr...
Nov 24,2000 r.innocente 27
Migratable sockets
Unfortunately TCP/IP has no direct support for this.Unfortunately TCP/IP has...
Nov 24,2000 r.innocente 28
Memory space migration
"" total copytotal copy(Charlotte,LOCUS): process is frozen and the enti...
Nov 24,2000 r.innocente 29
focus on Mosix
Nov 24,2000 r.innocente 30
MOSIX important points
"" Probabilistic info disseminationProbabilistic info dissemination
"" P...
Nov 24,2000 r.innocente 31
Probabilistic info dissemination
"" each node at regular intervals sends info oneach node at re...
Nov 24,2000 r.innocente 32
MOSIX load balancing
algorithm
Mosix uses a preMosix uses a pre--emptive and distributed loadem...
Nov 24,2000 r.innocente 33
Mosix info variables
These are the variables exchanged between Mosix nodes toThese are the vari...
Nov 24,2000 r.innocente 34
Memory ushering algorithm
Normally only the load balancing algorithm is active.Normally only th...
Nov 24,2000 r.innocente 35
MOSIX network protocols
These protocols are used to exchange load info amongThese protocols are...
Nov 24,2000 r.innocente 36
MOSIX process migration
"" It’s aIt’s a kernel level migrationkernel level migration (it’s done...
Nov 24,2000 r.innocente 37
Mosix migration transparency
"" Mosix achieves complete transparency at the priceMosix achieves...
Nov 24,2000 r.innocente 38
MOSIX deputy/remote
mechanism
Deputy
Link layer
Remote
Link layer
MNP
(Mosix
Network
Protocol)
...
Nov 24,2000 r.innocente 39
Mosix performance/1
" While a CPU bound process obtains clear advantages from Mosix
being very ...
Nov 24,2000 r.innocente 40
Mosix performance/2
A more general solution to the I/O bound processesA more general solution t...
Nov 24,2000 r.innocente 41
MOSIX dfsa (direct file system
access)
Deputy
Link layer
Remote
Link layer
MNP
(Mosix
Network
P...
Nov 24,2000 r.innocente 42
Cache coherent global
distributed file systems
To be a MOSIX DFSA it’s not sufficient to be a c...
Upcoming SlideShare
Loading in …5
×

Mosix : automatic load balancing and migration

511 views

Published on

Automatic Load Balancing and Transparent Process Migration

Published in: Software
  • Be the first to comment

Mosix : automatic load balancing and migration

  1. 1. Nov 24,2000 r.innocente 1 Automatic load balancing and transparent process migration Roberto InnocenteRoberto Innocente rinnocente@hotmail.comrinnocente@hotmail.com November 24,2000November 24,2000 Download postscript from :Download postscript from : mosix.psmosix.ps or gzipped postscript from:or gzipped postscript from: mosix.ps.gzmosix.ps.gz or pdf from:or pdf from: mosix.pdfmosix.pdf
  2. 2. Nov 24,2000 r.innocente 2 Introduction Purpose of this presentation is an introductionPurpose of this presentation is an introduction to the most important features of MOSIX:to the most important features of MOSIX: -- automatic load balancingautomatic load balancing -- transparent process migrationtransparent process migration giving an overview of related projects and of thegiving an overview of related projects and of the different possible implementations of thesedifferent possible implementations of these mechanismsmechanisms..
  3. 3. Nov 24,2000 r.innocente 3 Scheme of the presentation 1.1. Overview of some Distributed OperatingOverview of some Distributed Operating SystemsSystems 2.2. Load balancing mechanismsLoad balancing mechanisms 3.3. Process migration mechanismsProcess migration mechanisms 4.4. focus on Mosixfocus on Mosix
  4. 4. Nov 24,2000 r.innocente 4 Overview of some distributed OS
  5. 5. Nov 24,2000 r.innocente 5 Distributed O.S. !! SpriteSprite !! CharlotteCharlotte !! AccentAccent !! AmoebaAmoeba !! V SystemV System !! MOSIXMOSIX !! Condor (this is just a batch system, but offeringCondor (this is just a batch system, but offering job migration)job migration)
  6. 6. Nov 24,2000 r.innocente 6 Sprite (Douglis,Ousterhout* 1987) Experimental Network OS developed at UCB.Experimental Network OS developed at UCB. Was running on Suns and Decstations.Was running on Suns and Decstations. The process migration in this system introduced theThe process migration in this system introduced the simplifying concept of Home Node of a process (thesimplifying concept of Home Node of a process (the originating node where it should still appear to run).originating node where it should still appear to run). After then this concept has been utilized by many otherAfter then this concept has been utilized by many other implementations.implementations. *of Tcl/Tk fame*of Tcl/Tk fame
  7. 7. Nov 24,2000 r.innocente 7 Charlotte (Artsy, Finkel 1989) It’s a Distributed Operating System developedIt’s a Distributed Operating System developed at the UoWM based on the message passingat the UoWM based on the message passing paradigm.paradigm. Provides a wellProvides a well--developed process migrationdeveloped process migration mechanism that leaves nomechanism that leaves no residual dependenciesresidual dependencies (the migrated process can survive to the death of the(the migrated process can survive to the death of the originating node), and therefore is suitable also fororiginating node), and therefore is suitable also for fault tolerant systems.fault tolerant systems.
  8. 8. Nov 24,2000 r.innocente 8 Accent (Zayas 1987) "" developed at CMU as precursor of Machdeveloped at CMU as precursor of Mach "" a kernel not Unix compatible with IPCsa kernel not Unix compatible with IPCs based on a “port” abstractionbased on a “port” abstraction "" process migration has been added byprocess migration has been added by Zayas(1987)Zayas(1987) "" no load balancingno load balancing
  9. 9. Nov 24,2000 r.innocente 9 Amoeba (Tanenbaum et al 1990) "" Tanenbaum,van Renesse,van Staveren, Sharp, Mullender, Jansen, vaTanenbaum,van Renesse,van Staveren, Sharp, Mullender, Jansen, vann Rossum*Rossum*(Experiences with the Amoeba Distributed Operating System(Experiences with the Amoeba Distributed Operating System 1990)1990) "" Zhu,SteketeeZhu,Steketee(An experimental study of Load Balancing on Amoeba,(An experimental study of Load Balancing on Amoeba, 1994)1994) "" doesn’t support virtual memorydoesn’t support virtual memory "" was based on thewas based on the processorprocessor--pool modelpool model, that assumes the # of, that assumes the # of processors is more than the # of processesprocessors is more than the # of processes * of python fame* of python fame
  10. 10. Nov 24,2000 r.innocente 10 V system (Theimer 1987) "" developed at Stanford, microkernel based,developed at Stanford, microkernel based, runs on a set of diskless wksruns on a set of diskless wks "" V has support for both remote executionV has support for both remote execution and process migration, the facilities wereand process migration, the facilities were added having in mind to utilize wks duringadded having in mind to utilize wks during idle periodsidle periods
  11. 11. Nov 24,2000 r.innocente 11 MOSIX (Barak 1990,1995) its roots are back in the mid ’80s :its roots are back in the mid ’80s : !! MOS multicomputer OS (Barak, HUJI 1985)MOS multicomputer OS (Barak, HUJI 1985) !! set of enhancements to BSD/OS: MOSIX ’90set of enhancements to BSD/OS: MOSIX ’90 !! port to LINUX (Mosix 7port to LINUX (Mosix 7thth edition) around ’95:edition) around ’95: the system call based mgmt has beenthe system call based mgmt has been transformed to a /proc filesystem interfacetransformed to a /proc filesystem interface !! enhancements ’98 (memory usheringenhancements ’98 (memory ushering algorithm,...)algorithm,...)
  12. 12. Nov 24,2000 r.innocente 12 Condor (Litzkow,Livny 1988) Condor is not a Distributed OS.Condor is not a Distributed OS. It is a batch system whose purpose is the utilizationIt is a batch system whose purpose is the utilization of the idle time of WKS and PCs.of the idle time of WKS and PCs. It can migrate jobs from machine to machine,It can migrate jobs from machine to machine, but imposes restrictions on the jobs that can bebut imposes restrictions on the jobs that can be migrated and the programs have to be linkedmigrated and the programs have to be linked with a special checkpoint library.with a special checkpoint library. (More details on(More details on http://hpc.sissa.it/batchhttp://hpc.sissa.it/batch))
  13. 13. Nov 24,2000 r.innocente 13 Load balancing mechanisms
  14. 14. Nov 24,2000 r.innocente 14 Load balancing methods taxonomy "" job initiation only (aka job placement, aka remotejob initiation only (aka job placement, aka remote execution)execution) !! staticstatic !! dynamic (almost all batch systems can do this:dynamic (almost all batch systems can do this: NQS and derivatives)NQS and derivatives) "" migration (Mosix, some Distributed OS)migration (Mosix, some Distributed OS) "" job initiation + migrationjob initiation + migration
  15. 15. Nov 24,2000 r.innocente 15 Load balancing algorithms Two main categories:Two main categories: "" Non preNon pre--emptive:emptive: they are used only when a newthey are used only when a new job has to be started, they decide where to run ajob has to be started, they decide where to run a new job (aka job placement, many batch systems)new job (aka job placement, many batch systems) "" PrePre--emptive:emptive: jobs can be migrated duringjobs can be migrated during execution, therefore load balancing can beexecution, therefore load balancing can be dynamically applied duringdynamically applied during job executionjob execution (Distributed OS,Condor)(Distributed OS,Condor)
  16. 16. Nov 24,2000 r.innocente 16 Non pre-emptive load balancing (aka job placement) "" centralized initiation:centralized initiation: there is only 1 load balancer in thethere is only 1 load balancer in the system that is in charge of assigning new jobs to computerssystem that is in charge of assigning new jobs to computers according to the info it collectsaccording to the info it collects "" distributed initiationdistributed initiation: there is a load balancer on each: there is a load balancer on each computer that xfers its load to the others anytime it detectscomputer that xfers its load to the others anytime it detects a load change, the local balancer decides according to thea load change, the local balancer decides according to the load info received where to start new jobsload info received where to start new jobs "" random initiationrandom initiation: when a new job arrives the local load: when a new job arrives the local load balancer selects randomly to which computer to assignbalancer selects randomly to which computer to assign new jobsnew jobs
  17. 17. Nov 24,2000 r.innocente 17 Pre-emptive load balancing "" randomrandom:: when a computer becomes overloaded it select at randomwhen a computer becomes overloaded it select at random another computer to which to transfer a processanother computer to which to transfer a process "" central:central: one load balancer regularly polls the other machine to obtainone load balancer regularly polls the other machine to obtain load info. If it detects a load imbalance it select a computer wload info. If it detects a load imbalance it select a computer with a lowith a low load to migrate a process from an overloaded computerload to migrate a process from an overloaded computer "" neighbourneighbour: (Bryant,Finkel: (Bryant,Finkel 1981) each computer sends a request to its1981) each computer sends a request to its neighbours, when the request is accepted, a pair is constituted.neighbours, when the request is accepted, a pair is constituted.If theyIf they detect a load imbalance between them they migrate some processesdetect a load imbalance between them they migrate some processes from one to another after that the pair is brokenfrom one to another after that the pair is broken "" broadcastbroadcast:: server basedserver based "" distributeddistributed: each node receives a measure of the load of other nodes, if it discovers a load imbalance it would try to migrate some of its processes to a less loaded node
  18. 18. Nov 24,2000 r.innocente 18 Processor load measures The metric used to evaluate the load is extremely important in a load balancing system. More than 20 different metrics are cited in only 1 paper(Mehra & Wah 1993): ! std Unix : # of tasks in the run queue ! size of the free available memory ! rate of CPU context switches ! rate of system calls ! 1 minute load avg ! amount of CPU free time but the conclusion of many studies indicates that the simple rqueue is a better measure than complex load criteria in most situations.
  19. 19. Nov 24,2000 r.innocente 19 Process migration mechanisms
  20. 20. Nov 24,2000 r.innocente 20 User/kernel level migration "" Kernel level process migration (Mosix):Kernel level process migration (Mosix): !! no special requirements, no need to recompileno special requirements, no need to recompile !! transparent migrationtransparent migration "" User level process migration (Condor,...):User level process migration (Condor,...): !! compilation with special libraries (libckpt.a)compilation with special libraries (libckpt.a) !! a snapshot of the running process is taken on signala snapshot of the running process is taken on signal !! the snapshot is restarted on another machinethe snapshot is restarted on another machine !! more details atmore details at http://hpc.sissa.it/batchhttp://hpc.sissa.it/batch
  21. 21. Nov 24,2000 r.innocente 21 Process migration factors: "" transparencytransparency:how much a process needs to be:how much a process needs to be aware of the migrationaware of the migration "" residual dependencyresidual dependency: how much a process is: how much a process is dependent on the source node after migrationdependent on the source node after migration "" performanceperformance:how much the migration affects:how much the migration affects performanceperformance "" complexitycomplexity: how much complex the migration is: how much complex the migration is
  22. 22. Nov 24,2000 r.innocente 22 Transparent: "" Location transparentLocation transparent "" Access transparent (or network)Access transparent (or network) "" Migration transparentMigration transparent "" Failure transparentFailure transparent
  23. 23. Nov 24,2000 r.innocente 23 System call interposing In many cases transparency is obtained interposing some codeIn many cases transparency is obtained interposing some code at the system call level. There are different ways to do it:at the system call level. There are different ways to do it: "" Very easy on message based OSs in which a system call isVery easy on message based OSs in which a system call is just a message to the kerneljust a message to the kernel "" A modified kernel: in this case it can be completelyA modified kernel: in this case it can be completely transparent to the process (MOSIX), reasonably efficienttransparent to the process (MOSIX), reasonably efficient for some kind of processesfor some kind of processes "" A modified system library: in this case the programs needA modified system library: in this case the programs need to be linked with a special library(Condor), reasonablyto be linked with a special library(Condor), reasonably transparenttransparent
  24. 24. Nov 24,2000 r.innocente 24 Process migration parts: "" execution state migrationexecution state migration "" memory space migrationmemory space migration "" communication state migrationcommunication state migration
  25. 25. Nov 24,2000 r.innocente 25 Execution state migration It’s not that difficult. For the mostIt’s not that difficult. For the most part the code can be the same as that used forpart the code can be the same as that used for swapping. In that case the destination is a diskswapping. In that case the destination is a disk while for migration the destination is awhile for migration the destination is a different node.different node.
  26. 26. Nov 24,2000 r.innocente 26 Communication state migration It is a difficult task and usually it is avoided forcing the migrated process to redirect networking operations to the Home Node where they are performed. On Amoeba each thread of a process has its own communication staOn Amoeba each thread of a process has its own communication statete which remembers the stage of the ongoing RPCs, these states arewhich remembers the stage of the ongoing RPCs, these states are kept inkept in the kernelthe kernel !! during migration any message received for a suspended process isduring migration any message received for a suspended process is rejected and a “process migrating” response is returnedrejected and a “process migrating” response is returned !! sending machine will rexmit after some delaysending machine will rexmit after some delay !! sending machine using a special protocol(FLIP) will locate thesending machine using a special protocol(FLIP) will locate the migrated processmigrated process !! the same technique is used for reply msgs to a migrating clientthe same technique is used for reply msgs to a migrating client
  27. 27. Nov 24,2000 r.innocente 27 Migratable sockets Unfortunately TCP/IP has no direct support for this.Unfortunately TCP/IP has no direct support for this. Research is ongoing on different solutions.Research is ongoing on different solutions. A proposed userA proposed user--level implementation requires:level implementation requires: "" substitution of the socket librarysubstitution of the socket library "" relinking of programs with the new libraryrelinking of programs with the new library (Bubak, Zbik et al. 1999)(Bubak, Zbik et al. 1999) In this proposal communication is done between virtualIn this proposal communication is done between virtual addresses (there are servers that keep a correspondence table beaddresses (there are servers that keep a correspondence table betweentween virtual and real addresses). When an endpoint migrates, a stub rvirtual and real addresses). When an endpoint migrates, a stub remains onemains on the original node to respond with a “process migrating” msg. Thethe original node to respond with a “process migrating” msg. The otherother endpoint then should consult the server for the new real addressendpoint then should consult the server for the new real address to contact.to contact.
  28. 28. Nov 24,2000 r.innocente 28 Memory space migration "" total copytotal copy(Charlotte,LOCUS): process is frozen and the entire memory space(Charlotte,LOCUS): process is frozen and the entire memory space is moved, the process is then restarted on remote (this can reqis moved, the process is then restarted on remote (this can require manyuire many seconds and further it can transfer memory that will not be usedseconds and further it can transfer memory that will not be used)) "" prepre--copyingcopying (V System:Theimer): allows the process to continue the(V System:Theimer): allows the process to continue the execution, while its memory space is being transferred to the taexecution, while its memory space is being transferred to the target.Then Vrget.Then V freezes the process on the source and refreezes the process on the source and re--copies the pages modified during thecopies the pages modified during the transfer timetransfer time "" lazylazy--copying or demand pagecopying or demand page(Accent:Zayas): pages are left on the source(Accent:Zayas): pages are left on the source machine and are transferred only at page faults. This mechanismmachine and are transferred only at page faults. This mechanism can leavecan leave dirty pages on all the nodes the process goes through. A small vdirty pages on all the nodes the process goes through. A small variation usedariation used by MOSIX is to freeze the process during the transfer of all theby MOSIX is to freeze the process during the transfer of all the dirty pages anddirty pages and then to transfer from the backing store the other pages when refthen to transfer from the backing store the other pages when referrederred "" shared file servershared file server(Sprite): Sprite uses the network file system as a backing(Sprite): Sprite uses the network file system as a backing store for virtual memory. Dirty pages are flushed on the networstore for virtual memory. Dirty pages are flushed on the network file systemk file system and on the target the process is restarted (it’s just aswapand on the target the process is restarted (it’s just aswap--out,swapout,swap--in)in)
  29. 29. Nov 24,2000 r.innocente 29 focus on Mosix
  30. 30. Nov 24,2000 r.innocente 30 MOSIX important points "" Probabilistic info disseminationProbabilistic info dissemination "" PPM (PrePPM (Pre--emptiveProcessMigration)emptiveProcessMigration) "" Dynamic load balancingDynamic load balancing "" Memory usheringMemory ushering "" Decentralized controlDecentralized control "" Efficient kernel communicationsEfficient kernel communications
  31. 31. Nov 24,2000 r.innocente 31 Probabilistic info dissemination "" each node at regular intervals sends info oneach node at regular intervals sends info on its resources at a random subset of nodesits resources at a random subset of nodes "" each node at the same time keeps a smalleach node at the same time keeps a small buffer with the most recently info receivedbuffer with the most recently info received
  32. 32. Nov 24,2000 r.innocente 32 MOSIX load balancing algorithm Mosix uses a preMosix uses a pre--emptive and distributed loademptive and distributed load balancing algorithm that is essentially basedbalancing algorithm that is essentially based on a Unix load avg metric. This means thaton a Unix load avg metric. This means that each node will try to find a less loaded node toeach node will try to find a less loaded node to which it would try to migrate one of itswhich it would try to migrate one of its processes.processes. When a node is low on memory a different algorithmWhen a node is low on memory a different algorithm prevails: memory usheringprevails: memory ushering
  33. 33. Nov 24,2000 r.innocente 33 Mosix info variables These are the variables exchanged between Mosix nodes toThese are the variables exchanged between Mosix nodes to feed the distributed load balancing algorithm:feed the distributed load balancing algorithm: "" Load (based on a unit of a P IILoad (based on a unit of a P II--400Mhz)400Mhz) "" Number of processorsNumber of processors "" Speed of processorSpeed of processor "" Memory usedMemory used "" Raw Memory UsedRaw Memory Used "" Total MemoryTotal Memory Look atLook at http://hpc.sissa.it/walrus/mjmhttp://hpc.sissa.it/walrus/mjm for a java appletfor a java applet showing these values over a Mosix cluster.showing these values over a Mosix cluster.
  34. 34. Nov 24,2000 r.innocente 34 Memory ushering algorithm Normally only the load balancing algorithm is active.Normally only the load balancing algorithm is active. When memory is under a threshold (paging), the memoryWhen memory is under a threshold (paging), the memory ushering algorithm is started and prevails the load balancingushering algorithm is started and prevails the load balancing algorithm.algorithm. This algorithm would try to migrate the smallest runningThis algorithm would try to migrate the smallest running process to the node with more free memory.process to the node with more free memory. Eventually processes are migrated between remote nodesEventually processes are migrated between remote nodes to create sufficient space on a single nodeto create sufficient space on a single node..
  35. 35. Nov 24,2000 r.innocente 35 MOSIX network protocols These protocols are used to exchange load info amongThese protocols are used to exchange load info among the nodes and to negotiate and carry out process migrationsthe nodes and to negotiate and carry out process migrations between nodes. Mosix uses:between nodes. Mosix uses: !! UDP to txfer load infoUDP to txfer load info !! TCP to migrate processesTCP to migrate processes These communications are not encrypted/ authenticated/These communications are not encrypted/ authenticated/ protected therefore it’s better to use a private network for thprotected therefore it’s better to use a private network for thee MNP.MNP. (Unfortunately the sockets used by Mosix are not reported by(Unfortunately the sockets used by Mosix are not reported by standard tools: e.g. netstat)standard tools: e.g. netstat)
  36. 36. Nov 24,2000 r.innocente 36 MOSIX process migration "" It’s aIt’s a kernel level migrationkernel level migration (it’s done without(it’s done without the necessity of any modification to the programs,the necessity of any modification to the programs, because Mosix changes the linux kernel)because Mosix changes the linux kernel) "" fully transparentfully transparent : the process involved does’nt have any need to know about the migration " strong residual dependency:the migrated process is completely dependent on the deputy for system calls( and particularly I/O and networking)
  37. 37. Nov 24,2000 r.innocente 37 Mosix migration transparency "" Mosix achieves complete transparency at the priceMosix achieves complete transparency at the price of some efficiency interposing to system calls aof some efficiency interposing to system calls a link layer that redirects their execution to thelink layer that redirects their execution to the originating node of the process(originating node of the process(Unique HomeUnique Home NodeNode) .) . "" The skeleton of the process that remains on theThe skeleton of the process that remains on the originating node to execute the system calls,originating node to execute the system calls, receive signals, perform I/O is called in Mosix thereceive signals, perform I/O is called in Mosix the deputydeputy (what Condor calls shadow)(what Condor calls shadow) "" The migrated process is called theThe migrated process is called the remoteremote..
  38. 38. Nov 24,2000 r.innocente 38 MOSIX deputy/remote mechanism Deputy Link layer Remote Link layer MNP (Mosix Network Protocol) UHN (Unique Home Node) Running node
  39. 39. Nov 24,2000 r.innocente 39 Mosix performance/1 " While a CPU bound process obtains clear advantages from Mosix being very little penalized by process migration, I/O bound and network bound processes suffer big drawbacks and are usually not good candidates for migration. " A first solution addressing only part of the problems has been the development of a daemon to support remote initiation (MExec/MPmake) of processes. With the help of this daemon processes that can run in parallel are dispatched during the exec call (you need to change the source and recode the exec as mexec) thus avoiding the creation of a bottle-neck single Home Node where all the I/O would be performed. Gmake is a tool already prepared to be recompiled to exploit its inherent parallelism (follow the indication in the MExec tarball).
  40. 40. Nov 24,2000 r.innocente 40 Mosix performance/2 A more general solution to the I/O bound processesA more general solution to the I/O bound processes performance problem under migration is anyway envisionedperformance problem under migration is anyway envisioned to be the use of ato be the use of a cache coherent global file systemcache coherent global file system: "" cache coherentcache coherent: clients keep their caches synchronized, or only 1clients keep their caches synchronized, or only 1 cache at the server node is keptcache at the server node is kept " global: every file can be seen from every node with the same name: every file can be seen from every node with the same name If such a file system can be used then I/O operations can be exeIf such a file system can be used then I/O operations can be executedcuted directly from the running node without resorting to redirect thedirectly from the running node without resorting to redirect them to them to the Unique Home Node. Mosix recently has incorporated the possibilitUnique Home Node. Mosix recently has incorporated the possibility toy to declare that a file system is a Direct File System Access.declare that a file system is a Direct File System Access. For such file systems Mosix will perform I/O operation directlyFor such file systems Mosix will perform I/O operation directly on theon the running node.running node.
  41. 41. Nov 24,2000 r.innocente 41 MOSIX dfsa (direct file system access) Deputy Link layer Remote Link layer MNP (Mosix Network Protocol) DFSA access UHN (Unique Home Node) Running node
  42. 42. Nov 24,2000 r.innocente 42 Cache coherent global distributed file systems To be a MOSIX DFSA it’s not sufficient to be a cacheTo be a MOSIX DFSA it’s not sufficient to be a cache--coherentcoherent global file system. In fact some additional functions (not availglobal file system. In fact some additional functions (not available in stdable in std Linux superLinux super--blocks) must be available at the superblocks) must be available at the super--block levelblock level:: "" identify,compare,reconstructidentify,compare,reconstruct A prototype DFSA that is just a small software layer over the stA prototype DFSA that is just a small software layer over the std Linuxd Linux file system isfile system is MFS (Mosix File System) provided directly by the MOSIXprovided directly by the MOSIX team. Other possibile candidate DFSA are:team. Other possibile candidate DFSA are: "" GFSGFS (GlobalFileSystem)(GlobalFileSystem) "" xFSxFS (Serverless File System)(Serverless File System) It seems now stable the integration of Mosix with GFS.It seems now stable the integration of Mosix with GFS.

×