Process Migration DEJAN S. MILOJICIC†, FRED DOUGLIS‡, YVES PAINDAVEINE††, RICHARD WHEELER‡‡ and SONGNIAN ZHOU* † HP Labs, ‡ AT&T Labs–Research, †† TOG Research Institute, ‡‡ EMC, and *University of Toronto and Platform Computing Abstract Process migration is the act of transferring a process between two machines. It enables dynamic load distribution, fault resilience, eased system administration, and data access locality. Despite these goals and ongoing research efforts, migration has not achieved widespread use.
With the increasing deployment of distributed systems in general, and distributed operating systems in particular, process migration is again receiving more attention in both research and product development. As high-performance facilities shift from supercomputers to networks of workstations, and with the ever-increasing role of the World Wide Web, we expect migration to play a more important role and eventually to be widely adopted. This survey reviews the field of process migration by summarizing the key concepts and giving an overview of the most important implementations.
Design and implementation issues of process migration are analyzed in general, and then revisited for each of the case studies described: MOSIX, Sprite, Mach and Load Sharing Facility. The benefits and drawbacks of process migration depend on the details of implementation and therefore this paper focuses on practical matters. This survey will help in understanding the potentials of process migration and why it has not caught on. Categories and Subject Descriptors: C. 2. 4 [Computer-Communication Networks]: Distributed Systems – network operating systems; D. . 7 [Operating Systems]: Organization and Design – distributed systems; D. 4. 8 [Operating Systems]: Performance: measurements; D. 4. 2 [Operating Systems]: Storage Management – distributed memories. Additional Key Words and Phrases: process migration, distributed systems, distributed operating systems, load distribution. 1 INTRODUCTION A process is an operating system abstraction representing an instance of a running computer program. Process migration is the act of transferring a process between two machines during its execution.
Several implementations have been built for different operating systems, including MOSIX [Barak and Litman, 1985], V [Cheriton, 1988], Accent [Rashid and Robertson, 1981], Sprite [Ousterhout et al. , 1988], Mach [Accetta et al. , 1986], and OSF/1 AD TNC [Zajcew et al. , 1993]. In addition, some systems provide mechanisms that checkpoint active processes and resume their execution in essentially the same state on another machine, including Condor [Litzkow et al. , 1988] and Load Sharing Facility (LSF) [Zhou et al. , 1994].
Process migration enables: • dynamic load distribution, by migrating processes from overloaded nodes to less loaded ones, • fault resilience, by migrating processes from nodes that may have experienced a partial failure, • improved system administration, by migrating processes from the nodes that are about to be shut down or otherwise made unavailable, and • data access locality, by migrating processes closer to the source of some data. Despite these goals and ongoing research efforts, migration has not achieved widespread use.
One reason for this is the complexity of adding transparent migration to systems originally designed to run stand-alone, since designing new systems with migration in mind from the beginning is not a realistic option anymore. Another reason is that there has not been a compelling commercial argument for operating system vendors to support process migration. Checkpoint-restart approaches offer a compromise here, since they can run on more looselycoupled systems by restricting the types of processes that can migrate. In spite of these barriers, process migration continues to attract research.
We believe that the main reason is the potentials offered by mobility as well as the attraction to hard problems, so inherent to the research community. There have been many different goals and approaches to process migration because of the potentials migration can offer to different applications (see Section 2. 3 on goals, Section 4 on approaches and Section 2. 4 on applications). With the increasing deployment of distributed systems in general, and distributed operating systems in particular, the interest in process migration is again on the rise both in research and in product development.
As high-perfor- August 10, 1999 5:48 pm 1. INTRODUCTION Organization of the Paper 2. BACKGROUND 2. 1. Terminology 2. 2. Target Architectures 2. 3. Goals 2. 4. Application Taxonomy 2. 5. Migration Algorithm 2. 6. System Requirements for Migration 2. 7. Load Information Management 2. 8. Distributed Scheduling 2. 9. Alternatives to Migration 3. CHARACTERISTICS 3. 1. Complexity and Operating System Support 3. 2. Performance 3. 3. Transparency 3. 4. Fault Resilience 3. 5. Scalability 3. 6. Heterogeneity 3. 7. Summary 4. EXAMPLES 4. . Early Work 4. 2. Transparent Migration in UNIX-like Systems 4. 3. OS with Message-Passing Interface 4. 4. Microkernels 4. 5. User-space Migrations 4. 6. Application-specific Migration 4. 7. Mobile Objects 4. 8. Mobile Agents 5. CASE STUDIES 5. 1. MOSIX 5. 2. Sprite 5. 3. Mach 5. 4. LSF 6. COMPARISON 7. WHY PROCESS MIGRATION HAS NOT CAUGHT ON 7. 1. Case Analysis 7. 2. Misconceptions 7. 3. True Barriers to Migration Adoption 7. 4. How these Barriers Might be Overcome 8. SUMMARY AND FURTHER RESEARCH ACKNOWLEDGMENTS REFERENCES ally, techniques originally developed for process migration have been employed in developing mobile agents on the World Wide Web. Recent interpreted programming languages, such as Java [Gosling et al. , 1996], Telescript [White, 1996] and Tcl/Tk [Ousterhout, 1994] provide additional support for agent mobility. There exist a few books that discuss process migration [Goscinski, 1991; Barak et al. , 1993; Singhal and Shivaratri, 1994; Milojicic et al. , 1999]; a number of surveys [Smith, 1988; Eskicioglu, 1990; Nuttal, 1994], though none as detailed as this survey; and Ph. D. heses that deal directly with migration [Theimer et al. , 1985; Zayas, 1987a; Lu, 1988; Douglis, 1990; Philippe, 1993; Milojicic, 1993c; Zhu, 1992; Roush, 1995], or that are related to migration [Dannenberg, 1982; Nichols, 1990; Tracey, 1991; Chapin, 1993; Knabe, 1995; Jacqmot, 1996]. This survey reviews the field of process migration by summarizing the key concepts and describing the most important implementations. Design and implementation issues of process migration are analyzed in general and then revisited for each of the case studies described: MOSIX, Sprite, Mach, and LSF.
The benefits and drawbacks of process migration depend on the details of implementation and therefore this paper focuses on practical matters. In this paper we address mainly process migration mechanisms. Process migration policies, such as load information management and distributed scheduling, are mentioned to the extent that they affect the systems being discussed. More detailed descriptions of policies have been reported elsewhere (e. g. , Chapin’s survey ). This survey will help in understanding the potential of process migration.
It attempts to demonstrate how and why migration may be widely deployed. We assume that the reader has a general knowledge of operating systems. Organization of the Paper mance facilities shift from supercomputers to Networks of Workstations (NOW) [Anderson et al. , 1995] and large-scale distributed systems, we expect migration to play a more important role and eventually gain wider acceptance. Operating systems developers in industry have considered supporting process migration, for example Solaris MC [Khalidi et al. 1996], but thus far the availability of process migration in commercial systems is non-existent as we describe below. Checkpoint-restart systems are becoming increasingly deployed for long-running jobs. FiThe paper is organized as follows. Section 2 provides background on process migration. Section 3 describes the process migration by surveying its main characteristics: complexity, performance, transparency, fault resilience, scalability and heterogeneity. Section 4 classifies various implementations of process migration mechanisms and then describes a couple of representatives for each class.
Section 5 describes four case studies of process migration in more detail. In Section 6 we compare the process migration implementations presented earlier. In Section 7 we discuss why we believe that process mi- 2 migrating process (source instance) state transfer Mobility migrating process (destination instance) Hardware Software Active data Process migration (code+data) source node communicating process destination node Passive data Mobile code Mobile agents (code+data+authority) communicating node Figure 1: High Level View of Process Migration.
Process migration consists of extracting the state of the process on the source node, transferring it to the destination node where a new instance of the process is created, and updating the connections with other processes on communicating nodes. (code) Figure 2: Taxonomy of Mobility. task between two machines during execution of its threads. During migration, two instances of the migrating process exist: the source instance is the original process, and the destination instance is the new process created on the destination node. After migration, the destination instance becomes a migrated process.
In systems with a home node, a process that is running on other machines may be called a remote process (from the perspective of the home node) or a foreign process (from the perspective of the hosting node). Remote invocation is the creation of a process on a remote node. Remote invocation is usually a less “expensive” operation than process migration. Although the operation can involve the transfer of some state, such as code or open files, the contents of the address space need not be transferred. Generally speaking, mobility can be classified into hardware and software mobility, as described in Figure 2.
Hardware mobility deals with mobile computing, such as with limitations on the connectivity of mobile computers and mobile IP (see [Milojicic et al. , 1999] for more details). A few techniques in mobile computing have an analogy in software mobility, such as security, locating, naming, and communication forwarding. Software mobility can be classified into the mobility of passive data and active data. Passive data represents traditional means of transferring data between computers; it has been employed ever since the first two computers were connected.
Active data can be further classified into mobile code, process migration and mobile agents. These three classes represent incremental evolution of state transfer. Mobile code, such as Java applets, transfers only code between nodes. Process migration, which is the main theme of this paper, deals primarily with code and data transfer. It also deals with the transfer of authority, for instance access to a shared file system, but in a limited way: authority is under the control of a single administrative domain. Finally, mobile agents transfer code, data, ration has not caught on so far. In the last section we summarize the paper and describe opportunities for further research. 2 BACKGROUND This section gives some background on process migration by providing an overview of process migration terminology, target architectures, goals, application taxonomy, migration algorithms, system requirements, load information management, distributed scheduling, and alternatives to migration. 2. 1 Terminology A process is a key concept in operating systems [Tanenbaum, 1992].
It consists of data, a stack, register contents, and the state specific to the underlying Operating System (OS), such as parameters related to process, memory, and file management. A process can have one or more threads of control. Threads, also called lightweight processes, consist of their own stack and register contents, but share a process’s address space and some of the operating-system-specific state, such as signals. The task concept was introduced as a generalization of the process concept, whereby a process is decoupled into a task and a number of threads.
A traditional process is represented by a task with one thread of control. Process migration is the act of transferring a process between two machines (the source and the destination node) during its execution. Some architectures also define a host or home node, which is the node where the process logically runs. A high-level view of process migration is shown in Figure 1. The transferred state includes the process’s address space, execution point (register contents), communication state (e. g. , open files and message channels) and other operating system dependent state. Task migration represents transferring a and especially authority to act on the owner’s behalf on a wide scale, such as within the entire Internet. 2. 2 Target Architectures Process migration research started with the appearance of distributed processing among multiple processors. Process migration introduces opportunities for sharing processing power and other resources, such as memory and communication channels. It is addressed in early multiprocessor systems [Stone, 1978; Bokhari, 1979]. Current multiprocessor systems, especially symmetric multiprocessors, are scheduled using traditional scheduling methods.
They are not used as an environment for process migration research. Process migration in NUMA (Non-Uniform Memory Access) multiprocessor architectures is still an active area of research [Gait, 1990; Squillante and Nelson, 1991; Vaswani and Zahorjan, 1991; Nelson and Squillante, 1995]. The NUMA architectures have a different access time to the memory of the local processor, compared to the memory of a remote processor, or to a global memory. The access time to the memory of a remote processor can be variable, depending on the type of interconnect and the distance to the remote processor.
Migration in NUMA architectures is heavily dependent on the memory footprint that processes have, both in memory and in caches. Recent research on virtual machines on scalable shared memory multiprocessors [Bugnion, et al. , 1997] represents another potential for migration. Migration of whole virtual machines between processors of a multiprocessor abstracts away most of the complexities of operating systems, reducing the migrateable state only to memory and to state contained in a virtual monitor [Teodosiu, 1999].
Therefore, migration is easier to implement if there is a notion of a virtual machine. Massively Parallel Processors (MPP) are another type of architecture used for migration research [Tritscher and Bemmerl, 1992; Zajcew et al. , 1993]. MPP machines have a large number of processors that are usually shared between multiple users by providing each of them with a subset, or partition, of the processors. After a user relinquishes a partition, it can be reused by another user. MPP computers are typically of a NORMA (NO Remote Memory Access) type, i. e. , there is no remote memory access.
In that respect they are similar to network clusters, except they have a much faster interconnect. Migration represents a convenient tool to achieve repartitioning. Since MPP machines have a large number of processors, the probability of failure is also larger. Migrating a running process from a partially failed node, for example after a bank of memory unrelated to the process fails, allows the process to continue running safely. MPP machines also use migration for load distribution, such as the psched daemon on Cray T3E, or Loadleveler on IBM SP2 machines.
Since its inception, a Local Area Network (LAN) of computers has been the most frequently used architecture for process migration. The bulk of the systems described in this paper, including all of the case studies, are implemented on LANs. Systems such as NOW [Anderson et al. , 1995] or Solaris [Khalidi et al. , 1996] have recently investigated process migration using clusters of workstations on LANs. It was observed that at any point in time many autonomous workstations on a LAN are unused, offering potential for other users based on process migration [Mutka and Livny, 1987].
There is, however, a sociological aspect to the autonomous workstation model. Users are not willing to share their computers with others if this means affecting their own performance [Douglis and Ousterhout, 1991]. The priority of the incoming processes (processing, VM, IPC priorities) may be reduced in order to allow for minimal impact on the workstation’s owner [Douglis and Ousterhout, 1991; Krueger and Chawla, 1991]. Most recently, wide-area networks have presented a huge potential for migration.
The evolution of the Web has significantly improved the relevance and the opportunities for using a wide-area network for distributed computing. This has resulted in the appearance of mobile agents, entities that freely roam the network and represent the user in conducting his tasks. Mobile agents can either appear on the Internet [Johansen et al. , 1995] or in closed networks, as in the original version of Telescript [White, 1996]. 2. 3 Goals The goals of process migration are closely tied with the type of applications that use migration, as described in next section.
The goals of process migration include: Accessing more processing power is a goal of migration when it is used for load distribution. Migration is particularly important in the receiver-initiated distributed scheduling algorithms, where a lightly loaded node announces its availability and initiates process migration from an overloaded node. This was the goal of many systems described in this survey, such as Locus [Walker et al. , 1983], MOSIX [Barak and Shiloh, 1985], and 4 Mach [Milojicic et al. , 1993a]. Load distribution also depends on load information management and distributed scheduling (see Sections 2. and 2. 8). A variation of this goal is harnessing the computing power of temporarily free workstations in large clusters. In this case, process migration is used to evict processes upon the owner’s return, such as in the case of Sprite (see Section 5. 2). Exploitation of resource locality is a goal of migration in cases when it is more efficient to access resources locally than remotely. Moving a process to another end of a communication channel transforms remote communication to local and thereby significantly improves performance.
It is also possible that the resource is not remotely accessible, as in the case when there are different semantics for local and remote accesses. Examples include work by Jul , Milojicic et al. , and Miller and Presotto . Resource sharing is enabled by migration to a specific node with a special hardware device, large amounts of free memory, or some other unique resource. Examples include NOW [Anderson et al. , 1995] for utilizing memory of remote nodes, and the use of parallel make in Sprite [Douglis and Ousterhout, 1991] and work by Skordos  for utilizing unused workstations.
Fault resilience is improved by migration from a partially failed node, or in the case of long-running applications when failures of different kinds (network, devices) are probable [Chu et al. , 1980]. In this context, migration can be used in combination with checkpointing, such as in Condor [Litzkow and Solomon, 1992] or Utopia [Zhou et al. , 1994]. Large-scale systems where there is a likelihood that some of the systems can fail can also benefit from migration, such as in Hive [Chapin95] and OSF/1 AD TNC [Zajc93]. System administration is simplified if long-running computations can be temporarily transferred to other machines.
For example, an application could migrate from a node that will be shutdown, and then migrate back after the node is brought back up. Another example is the repartitioning of large machines, such as in the OSF/1 AD TNC Paragon configuration [Zajcew et al. , 1993]. Mobile computing also increases the demand for migration. Users may want to migrate running applications from a host to their mobile computer as they connect to a network at their current location or back again when they disconnect [Bharat and Cardelli, 1995]. 2. Application Taxonomy The type of applications that can benefit from process migration include: Parallelizable applications can be started on certain nodes, and then migrated at the application level or by a system-wide migration facility in response to things like load balancing considerations. Parallel Virtual Machine (PVM) [Beguelin et al. , 1993] is an example of application-level support for parallel invocation and interprocess communication, while Migratory PVM (MPVM) [Casas et al. , 1995] extends PVM to allow instances of a parallel application to migrate among nodes.
Some other applications are inherently parallelizable, such as the make tool [Baalbergen, 1988]. For example, Sprite provides a migration-aware parallel make utility that distributes a compilation across several nodes [Douglis and Ousterhout, 1991]. Certain processor-bound applications, such as scientific computations, can be parallelized and executed on multiple nodes. An example includes work by Skordos , where an acoustic application is parallelized and executed on a a cluster of workstations.
Applications that perform I/O and other nonidempotent operations are better suited to a system-wide remote execution facility that provides location transparency and, if possible, preemptive migration. Long-running applications, which can run for days or even weeks, can suffer various interruptions, for example partial node failures or administrative shutdowns. Process migration can relocate these applications transparently to prevent interruption. Examples of such systems include work by Freedman  and MPVM [Casas et al. , 1995]. Migration can also be supported at the application level [Zhou et al. 1994] by providing a checkpoint/restart mechanism which the application can invoke periodically or upon notification of an impending interruption. Generic multiuser workloads, for example the random job mix that an undergraduate computer laboratory produces, can benefit greatly from process migration. As users come and go, the load on individual nodes varies widely. Dynamic process migration [Barak and Wheeler, 1989, Douglis and Ousterhout, 1991] can automatically spread processes across all nodes, including those applications that are not enhanced to exploit the migration mechanism.
An individual generic application, which is preemptable, can be used with various goals in mind (see Section 2. 3). Such an application can either migrate it- 5 self, or it can be migrated by another authority. This type of application is most common in various systems described in Section 4 and in the case studies described in Section 5. Note that it is difficult to select such applications without detailed knowledge of past behavior, since many applications are short-lived and do not execute long enough to justify the overhead of migration (see Section 2. 7).
Migration-aware applications are applications that have been coded to explicitly take advantage of process migration. Dynamic process migration can automatically redistribute these related processes if the load becomes uneven on different nodes, e. g. if processes are dynamically created, or there are many more processes than nodes. Work by Skordos , Freedman  and Cardelli  represent this class of application. They are described in more detail in Section 4. 6. Network applications are the most recent example of the potential use of migration: for instance, mobile agents and mobile objects (see Sections 4. and 4. 8). These applications are designed with mobility in mind. Although this mobility differs significantly from the kinds of “process migration” considered elsewhere in this paper, it uses some of the same techniques: location policies, checkpointing, transparency, and locating and communicating with a mobile entity. 2. 5 Migration Algorithm Although there are many different migration implementations and designs, most of them can be summarized in the following steps (see also Figure 3): 1. A migration request is issued to a remote node. After negotiation, migration has been accepted. 2.
A process is detached from its source node by suspending its execution, declaring it to be in a migrating state, and temporarily redirecting communication as described in the following step. 3. Communication is temporarily redirected by queuing up arriving messages directed to the migrated process, and by delivering them to the process after migration. This step continues in parallel with steps 4, 5, and 6, as long as there are additional incoming messages. Once the communication channels are enabled after migration (as a result of step 7), the migrated process is known to the external world. . The process state is extracted, including memory contents; processor state (register contents); communication state (e. g. , opened files and message channels); and relevant kernel context. The communication state and kernel context are OS- dependent. Some of the local OS internal state is not transferable. The process state is typically retained on the source node until the end of migration, and in some systems it remains there even after migration completes. Processor dependencies, such as register and stack contents, have to be eliminated in the case of heterogeneous migration. . A destination process instance is created into which the transferred state will be imported. A destination instance is not activated until a sufficient amount of state has been transferred from the source process instance. After that, the destination instance will be promoted into a regular process. 6. State is transferred and imported into a new instance on the remote node. Not all of the state needs to be transferred; some of the state could be lazily brought over after migration is completed (see lazy evaluation in Section 3. 2). 7.
Some means of forwarding references to the migrated process must be maintained. This is required in order to communicate with the process or to control it. It can be achieved by registering the current location at the home node (e. g. in Sprite), by searching for the migrated process (e. g. in the V Kernel, at the communication protocol level), or by forwarding messages across all visited nodes (e. g. in Charlotte). This step also enables migrated communication channels at the destination and it ends step 3 as communication is permanently redirected. 8.
The new instance is resumed when sufficient state has been transferred and imported. With this step, process migration completes. Once all of the state has been transferred from the original instance, it may be deleted on the source node. 2. 6 System Requirements for Migration To support migration effectively, a system should provide the following types of functionality: • Exporting/importing the process state. The system must provide some type of export/import interfaces that allow the process migration mechanism to extract a process’s state from the source node and import this state on the destination node.
These interfaces may be provided by the underlying operating system, the programming language, or other elements of the programming environment that the process has access to. State includes processor registers, process address space and communication state, such as open message channels in the case of message-based systems, or open files and signal masks in the case of UNIX-like systems. • Naming/accessing the process and its resources. After migration, the migrated process should be accessible by the same name and mechanisms as if migration 6 external communication external communication rocess X process X kernel source node migration request kernel negotiation migr. acceptance destination node kernel source node kernel destination node 1. A migration request is issued to a remote node external communication 5. A destination process instance is created transfer pending messages process X process X external communication kernel source node kernel destination node kernel source node kernel transferable state (code, data, registers, etc. ) destination node 2. A process is detached from its source node 6. State is transferred and imported into a new instance xternal communication process X forwarding reference external communication kernel source node kernel destination node kernel source node process X kernel destination node 3. Temporary Communication redirection (ends in Step 7) external communication 7. Some means of forwarding references, permanent communication redirection external communication process X forwarding reference kernel source node kernel destination node kernel source node process X kernel destination node 4. The process state is extracted 8. The new instance is resumed Figure 3: Migration Algorithm.
Many details have been simplified, such as user v. kernel migration, when is process actually suspended, when is the state transferred, how are message transferred, etc. These details vary subject to particular implementation. never occurred. The same applies to process’s resources, such as threads, communication channels, files and devices. During migration, access to a process and/or some of its resources can be temporarily suspended. Varying degrees of transparency are achieved in naming and accessing resources during and after migration (see Section 3. 3). Cleaning up the process’s non-migratable state. Frequently, the migrated process has associated system state that is not migratable (examples include a local process identifier, pid, and the local time; a local pid is relevant only to the local OS, and every host may have a slightly different value for the local time–something that may or may not matter to a migrating process). Migration must wait until the process finishes or aborts any pending system operation. If the operation can be arbitrarily long, it is typically aborted and restarted on the destination node.
For example, migration can wait for the completion of local file operations or local device requests that are guaranteed to return in a limited time frame. Waiting for a message or accessing a remote device are examples of operations that need to be aborted and restarted on the remote node. Processes that cannot have their non-migrateable state cleaned cannot be considered for migration. 2. 7 Load Information Management The local processes and the resources of local and remote nodes have to be characterized, in order to select a process for migration and a destination node, as well as to justify migration.
This task is commonly known as load information management. Load information is collected and passed to a distributed scheduling policy (see Figure 4). Load information management is concerned with the following three questions: What is load information and how is it represented? The node load is typically represented by one or more of the following load indices: utilization of the CPU, the length of the queue of processes waiting to be executed, 7 Distributed Scheduling Policies activation (when? ) selection (which? ) location (where? ) migration directives Migration Mechanism information dissemination to remote nodes oad information Load Information Management local information collection local node Figure 4: Load Information Management Module collects load information on the local node and disseminates it among the nodes. Distributed Scheduling instructs the migration mechanism when, where, and which process to migrate. the stretch factor (ratio between turnaround- and execution-time—submission to completion v. start to completion) [Ferrari and Zhou 1986], the number of running processes, the number of background processes, paging, communication [Milojicic, 1993c], disk utilization, and the interrupt rate [Hwang et al. 1982]. A process load is typically characterized by process lifetime, CPU usage, memory consumption (virtual and physical), file usage [Hac, 1989a], communication [Lo, 1989], and paging [Milojicic, 1993c]. Kuntz uses a combination of workload descriptions for distributed scheduling [Kunz, 1991]. The application type is considered in Cedar [Hagmann, 1986]. When are load information collection and dissemination activated? These can be periodic or event-based. A typical period is in the range of 1 second or longer, while typical events are process creation, termination, or migration.
The frequency of information dissemination is usually lower than the frequency of information collection, i. e. it is averaged over time in order to prevent instability [Casavant and Kuhl, 1988b]. It also depends on the costs involved with dissemination and the costs of process migration. The lower the costs, the shorter the period can be; the higher the costs, less frequently load information is disseminated. How much information should be transferred? It can be the entire state, but typically only a subset is transferred in order to minimize the transfer costs and have a scalable solution.
In large systems, approximations are applied. For example, only a subset of the information might be transferred, or it might be derived from the subset of all nodes [Barak and Shiloh, 1985; Alon et al. , 1987; Han and Finkel, 1988; Chapin and Spafford, 1994]. There are two important observations derived from the research in load information management. The first one is that just a small amount of information can lead to substantial performance improvements. This observation is related to load distribution in general, but it also applies to process migration. Eager et al. ere among the first to argue that load sharing using minimal load information can gain dramatic improvements in performance over the non-load-sharing case, and perform nearly as well as more complex policies using more information [Eager et al. , 1986b]. The minimal load information they use consists of the process queue length of a small number of successively probed remote nodes. A small amount of state also reduces communication overhead. Kunz comes to the same conclusion using the concept of stochastic learning automata to implement a task scheduler [Kunz, 1991].
The second observation is that the current lifetime of a process can be used for load distribution purposes. The issue is to find how old the process needs to be before it is worth to migrate it. Costs involved with migrating short-lived processes can outweigh the benefits. Leland and Ott were the first to account for the process age in the balancing policy . Cabrera finds that it is possible to predict a process’s expected lifetime from how long it has already lived [Cabrera, 1986]. This justifies migrating processes that manage to live to a certain age. In particular, he finds that over 40% of processes doubled their age.
He also finds that the most UNIX processes are short-lived, more than 78% of the observed processes have a lifetime shorter than 1s and 97% shorter than 4s. Harchol-Balter and Downey explore the correlation between process lifetime and acceptable migration costs [Harchol-Balter and Downey, 1997]. They derive a more accurate form of the process life-time distribution that allows them to predict the life-time correlated to the process age and to derive a cost criterion for migration. Svensson filters out short-running processes by relying on statistics [Svensson, 1990], whereas Wang et al. deploy AI theory for the same purpose [Wang et al. 1993]. 2. 8 Distributed Scheduling This section addresses distributed scheduling closely related to process migration mechanisms. General surveys are presented elsewhere [Wang and Morris, 1985; Casavant and Kuhl, 1988a; Hac, 1989b; Goscinski, 1991; Chapin, 1996]. Distributed scheduling uses the information provided by the load information management module to make mi- 8 gration decisions, as described in Figure 4. The main goal is to determine when to migrate which process where. The activation policy provides the answer to the question when to migrate. Scheduling is activated periodically or it is event-driven.
After activation, the load is inspected, and if it is above/below a threshold, actions are undertaken according to the selected strategy. The selection policy answers the question which process to migrate. The processes are inspected and some of them are selected for migration according to the specified criteria. Where to migrate depends on the location policy algorithm, which chooses a remote node based on the available information. There are a few well-known classes of distributed scheduling policies: • A sender-initiated policy is activated on the node that is overloaded and that wishes to off-load to other nodes.
A sender-initiated policy is preferable for low and medium loaded systems, which have a few overloaded nodes. This strategy is convenient for remote invocation strategies [Eager et al. , 1986a; Krueger and Livny, 1987b; Agrawal and Ezzat, 1987]. • A receiver-initiated policy is activated on underloaded nodes willing to accept the load from overloaded ones. A receiver-initiated policy is preferable for high load systems, with many overloaded nodes and few underloaded ones.
Process migration is particularly well-suited for this strategy, since only with migration can one initiate process transfer at an arbitrary point in time [Bryant and Finkel, 1981; Eager et al. , 1986a; Krueger and Livny, 1988]. • A symmetric policy is the combination of the previous two policies, in an attempt to take advantage of the good characteristics of both of them. It is suitable for a broader range of conditions than either receiver-initiated or sender-initiated strategies alone [Krueger and Livny, 1987b; Shivaratri et al. , 1992]. A random policy chooses the destination node randomly from all nodes in a distributed system. This simple strategy can result in a significant performance improvement [Alon et al. , 1987; Eager et al. , 1986b; Kunz, 1991]. The following are some of the issues in distributed scheduling related to the process migration mechanism: • Adaptability is concerned with the scheduling impact on system behavior [Stankovic, 1984]. Based on the current host and network load, the relative importance of load parameters may change. The policy should adapt to these changes.
Process migration is inherently adaptable because it allows processes to run prior to dispatching them to other nodes, giving them a chance to adapt. Migration can happen at any time (thereby adapting to sudden load changes), whereas initial placement happens only prior to starting a process. Examples of adaptive load distribution include work by Agrawal and Ezzat , Krueger and Livny , Concepcion and Eleazar , Efe and Groselj , Venkatesh and Dattatreya , Shivaratri and Krueger , and Mehra and Wah . Stability is defined as the ability to detect when the effects of further actions (e. g. load scheduling or paging) will not improve the system state as defined by a user’s objective [Casavant and Kuhl, 1988b]. Due to the distributed state, some instability is inevitable, since it is impossible to transfer state changes across the system instantly. However, high levels of instability should be avoided. In some cases it is advisable not to perform any action, e. g. under extremely high loads it is better to abandon load distribution entirely. Process igration can negatively affect stability if processes are migrated back and forth among the nodes, similar to the thrashing introduced by paging [Denning, 1980]. To prevent such behavior a limit on the number of migrations can be imposed. Bryant and Finkel demonstrate how process migration can improve stability [Bryant and Finkel, 1981]. • Approximate and heuristic scheduling is necessary since optimal solutions are hard to achieve. Suboptimal solutions are reached either by approximating the search space with its subset or by using heuristics.
Some of the examples of approximate and heuristic scheduling include work by Efe , Leland and Ott , Lo , Casavant and Kuhl [1988a], and Xu and Hwang . Deploying process migration introduces more determinism and requires fewer heuristics than alternative load distribution mechanisms. Even when incorrect migration decisions are made, they can be alleviated by subsequent migrations, which is not the case with initial process placement where processes have to execute on the same node until the end of its lifetime. • Hierarchical scheduling integrates distributed and centralized scheduling.
It supports distributed scheduling within a group of nodes and centralized scheduling among the groups. This area has attracted much research [Bowen et al. , 1988; Bonomi and Kumar, 1988; Feitelson and Rudolph, 1990; Gupta and Gopinath, 1990; Gopinath and Gupta, 1991; Chapin, 1995]. A process migration mechanism is a good fit for hierarchical scheduling since processes are typically migrated within a LAN or other smaller domain. Only in the case of large load discrepancies are processes migrated between domains, i. e. between peers at higher levels of the hierarchy. 9
The most important question that distributed scheduling studies address related to process migration is whether migration pays off. The answer depends heavily on the assumptions made. For example, Eager et al. compare the receiver- and sender-initiated policies [Eager et al. , 1986a], and show that the sender-initiated policies outperform the receiver-initiated policies for light and moderate system loads. The receiver-initiated policy is better for higher loads, assuming that transfer costs are same. They argue that the transfer costs for the receiver policy, that requires some ind of migration, are much higher than the costs for mechanisms for the sender-initiated strategies, where initial placement suffices. They finally conclude that under no condition could migration provide significantly better performance than initial placement [Eager et al. , 1988]. Krueger and Livny investigate the relationship between load balancing and load sharing [Krueger and Livny, 1988]. They argue that load balancing and load sharing represent various points in a continuum defined by a set of goals and load conditions [Krueger and Livny, 1987].
They claim that the work of Eager et al. [Eager et al. , 1988] is only valid for a part of the continuum, but it cannot be adopted generally. Based on better job distributions than those used by Eager et al. , their simulation results show that migration can improve performance. Harchol-Balter and Downey present the most recent results on the benefits of using process migration [HarcholBalter and Downey, 1997]. They use the measured distribution of process lifetimes for a variety of workloads in an academic environment.
The crucial point of their work is understanding the correct lifetime distribution, which they find to be Pareto (heavy-tailed). Based on the trace-driven simulation, they demonstrate a 35-50% improvement in the mean delay when using process migration instead of remote execution (preemptive v. nonpreemptive scheduling) even when the costs of migration are high. Their work differs from [Eager et al. , 1988] in system model and workload description. Eager et al. model server farms, where the benefits of remote execution are overestimated: there are no associated costs and no affinity toward a particular node.
Harchol-Balter and Downey model a network of workstations where remote execution entails costs, and there exists an affinity toward some of the nodes in a distributed system. The workload that Eager et al. use contains few jobs with non-zero life- times, resulting in a system with little imbalance and little need for process migration. 2. 9 Alternatives to Process Migration Given the relative complexity of implementation, and the expense incurred when process migration is invoked, researchers often choose to implement alternative mechanisms [Shivaratri et al. 1992; Kremien and Kramer, 1992]. Remote execution is the most frequently used alternative to process migration. Remote execution can be as simple as the invocation of some code on a remote node, or it can involve transferring the code to the remote node and inheriting some of the process environment, such as variables and opened files. Remote execution is usually faster than migration because it does not incur the cost of transferring a potentially large process state (such as the address space, which is created anew in the case of remote execution).
For small address spaces, the costs for remote execution and migration can be similar. Remote execution is used in many systems such as COCANET [Rowe and Birman, 1982], Nest [Agrawal and Ezzat, 1987], Sprite [Ousterhout et al. , 1988], Plan 9 [Pike et al. , 1990], Amoeba [Mullender et al. , 1990], Drums [Bond, 1993], Utopia [Zhou et al. , 1994], and Hive [Chapin et al. , 1995]. Remote execution has disadvantages as well. It allows creation of the remote instance only at the time of process creation, as opposed to process migration which allows moving the process at an arbitrary time.
Allowing a process to run on the source node for some period of time is advantageous in some respects. This way, short-lived processes that are not worth migrating are naturally filtered out. Also, the longer a process runs, the more information about its behavior is available, such as whether and with whom it communicates. Based on this additional information, scheduling policies can make more appropriate decisions. Cloning processes is useful in cases where the child process inherits state from the parent process. Cloning is typically achieved using a remote fork mechanism.
A remote fork, followed by the termination of the parent, resembles process migration. The complexity of cloning processes is similar to migration, because the same amount of the process state is inherited (e. g. open files and address space). In the case of migration, the parent is terminated. In the case of cloning, both parent and child may continue to access the same state, introducing distributed shared state, which is typically complex and 10 costly to maintain. Many systems use remote forking [Goldberg and Jefferson, 1987; Smith and Ioannidis, 1989; Zajcew et al. 1993]. Programming language support for mobility enables a wide variety of options, since such systems have almost complete control over the runtime implementation of an application. Such systems can enable self-checkpointing (and hence migratable) applications. They are suitable for entire processes, but also for objects as small as a few bytes, such as in Emerald [Jul et al. , 1988; Jul, 1989] or Ellie [Andersen, 1992]. Finer granularity incurs lower transfer costs. The complexity of maintaining communication channels poses different kinds of problems.
In Emerald, for example, the pointers have to be updated to the source object. Programming language support allows a programmer to introduce more information on object behavior, such as hints about communication and concurrency patterns. Object migration at the middleware level is also possible. Because of the increasing costs of operating system development and the lack of standard solutions for distributed systems and heterogeneity, middleware level solutions have become of more interest [Bernstein, 1996]. Distributed objects are supported in middleware systems such as DCE [Rosenberry et al. , 1992] and CORBA [OMG, 1996].
Object migration at the middleware level has not attracted as much research as process migration in operating systems. One of the reasons is that the early heterogeneity of these systems did not adequately support mobility. Nevertheless, a couple of systems do support mobility at the middleware level, such as DC++ [Schill and Mock, 1993] and the OMG MASIF specification for mobile agents [Milojicic et al. , 1998b] based on OMG CORBA. Mobile agents are becoming increasingly popular. The mobility of agents on the Web emphasizes safety and security issues more than complexity, performance, transparency and heterogeneity.
Mobile agents are implemented on top of safe languages, such as Java [Gosling et al. , 1996], Telescript [White, 1996] and Tcl/ Tk [Ousterhout, 1994]. Compared to process migration, mobile agents have reduced implementation complexity because they do not have to support OS semantics. Performance requirements are different due to the wide-area network communication cost, which is the dominant factor. Heterogeneity is abstracted away at the language level. The early results and opportunities for deployment, as well as the wide interest in the area of mobile agents, indicate a promising future for this form of mobility.
How- applicationspecific migration user-level process migration traditional process migration distributed applications end user applications system libraries user space OS kernel kernel space Figure 5: Migration levels differ in implementation complexity, performance, transparency, and reusability. ever, the issues of security, social acceptance, and commercializable applications have been significantly increased and they represent the main focus of research in the mobile agent community. Mobile agents are described in more detail in Section 4. 8. CHARACTERISTICS This section addresses issues in process migration, such as complexity, performance, transparency, fault resilience, scalability and heterogeneity. These characteristics have a major impact on the effectiveness and deployment of process migration. 3. 1 Complexity and Operating System Support The complexity of implementation and dependency on an operating system are among the obstacles to the wider use of process migration. This is especially true for fullytransparent migration implementations. Migration can be classified according to the level at which it is applied.
It can be applied as part of the operating system kernel, in user space, as part of a system environment, or as a part of the application (see Figure 5). Implementations at different levels result in different performance, complexity, transparency and reusability. User-level migration typically yields simpler implementations, but suffers too much from reduced performance and transparency to be of general use for load distribution. User-space implementations are usually provided for the support of long-running computations [Litzkow and Solomon, 1992].
Migration implemented as part of an application can have poor reusability if modifications are required to the application, as was done in the work by Freedman  and Skordos . This requires familiarity with applications and duplicating some of the mechanisms for each subsequent application, frequently involving effort beyond re-linking the migration part with the application code. It could be somewhat im- 11 proved if parts of migration support is organized in a reusable run-time library. Lower-level migration is more complex to implement, but has better performance, transparency and reusability.
Despite high migration costs, user-level implementations have some benefits with regard to policy. The layers closer to an application typically have more knowledge about its behavior. This knowledge can be used to derive better policies and hence, better overall performance. Similar motivations led to the development of microkernels, such as Mach [Accetta et al. , 1986], Chorus [Rozier, 1992], and Amoeba [Tanenbaum, 1990], which have moved much of their functionality from the kernel into user space.
For example, file servers and networking may be implemented in user space, leaving only a minimal subset of functionality provided in the microkernel, such as virtual memory management, scheduling and interprocess communication. Extensible kernels, such as Spin [Bershad et al. , 1995], Exokernel [Engler et al. , 1995], and Synthetix [Pu et al. , 1995], have taken an alternative approach by allowing user implemented parts to be imported into the kernel. Both microkernels and extensible kernels provide opportunities for extracting a process’s state from the operating system.
There have been many implementations of migration for various operating systems and hardware architectures; many of them required a significant implementation effort and modifications to the underlying kernel [Barak and Shiloh, 1985; Theimer et al. , 1985; Zayas, 1987a; Douglis and Ousterhout, 1991]. This complexity is due to the underlying operating system architecture and specifically its lack of support for the complex interactions resulting from process migration. In the early days, migration required additional OS support, such as extensions for communications forwarding [Artsy et al. 1987], or for data transfer strategies [Theimer et al. , 1985; Zayas, 1987a]. In the case of some subsequent migration implementations, this support already existed in the OS, such as in the case of Mach [Milojicic et al. , 1993a]. In UNIX-like operating systems, support for opened files and signals requires significant interaction with various kernel subsystems [Douglis, 1989; Welch, 1990]. Process migration in message-passing kernels requires significant effort to support message handling [Theimer et al. , 1985; Artsy et al. , 1987; Artsy and Finkel, 1989].
Recent operating systems provide much of this support, such as transparent distributed IPC with message forwarding, and external distributed pagers, which allow easier optimizations and customizing [Black et al. , 1992; Rozier, 1992]. Nevertheless, migration still challenges these mechanisms and frequently breaks them [Douglis and Ousterhout, 1991; Milojicic, 1993c]. 3. 2 Performance Performance is the second important factor that affects the deployment of process migration. Migration performance depends on initial and run-time costs introduced by the act of migration.
The initial costs stem from state transfer. Instead of at migration time, some of the state may be transferred lazily (on-demand), thereby incurring run-time costs. Both types of cost may be significant, depending on the application characteristics, as well as on the ratio of state transferred eagerly/lazily. If only part of the task state is transferred to another node, the task can start executing sooner, and the initial migration costs are lower. This principle is called lazy evaluation: actions are not taken before they are really needed with the hope that they will never be needed.
However, when this is not true, penalties are paid for postponed access. For example, it is convenient to migrate a huge address space on demand instead of eagerly. In the lazy case, part of the space may never be transferred if it is not accessed. However, the source node needs to retain lazily evaluated state throughout the lifetime of the migrated process. A process’s address space usually constitutes by far the largest unit of process state; not surprisingly, the performance of process migration largely depends on the performance of the address space transfer.
Various data transfer strategies have been invented in order to avoid the high cost of address space transfer. • The eager (all) strategy copies all of the address space at the migration time. Initial costs may be in the range of minutes. Checkpoint/restart implementations typically use this strategy, such as Condor [Litzkow and Solomon, 1992] or LSF [Zhou et al. , 1994]. • The eager (dirty) strategy can be deployed if there is remote paging support. This is a variant of the eager (all) strategy that transfers only modified (dirty) pages. Unmodified pages are paged in on request from a backing store.
Eager (dirty) significantly reduces the initial transfer costs when a process has a large address space. Systems supporting eager (dirty) strategy include MOSIX [Barak and Litman, 1985] and Locus [Popek and Walker, 1985] 12 • The Copy-On-Reference (COR) strategy is a network version of demand paging: pages are transferred only upon reference. While dirty pages are brought from the source node, clean pages can be brought either from the source node or from the backing store. The COR strategy has the lowest initial costs, ranging from a few tens to a few hundred microseconds.
However, it increases the run-time costs, and it also requires substantial changes to the underlying operating system and to the paging support [Zayas, 1987a]. • The flushing strategy consists of flushing dirty pages to disk and then accessing them on demand from disk instead of from memory on the source node as in copyon-reference [Douglis and Ousterhout, 1991]. The flushing strategy is like the eager (dirty) transfer strategy from the perspective of the source, and like copyon-reference from the target’s viewpoint. It leaves dependencies on the server, but not on the source node. The precopy strategy reduces the “freeze” time of the process, the time that process is neither executed on the source nor on the destination node. While the process is executed on the source node, the address space is being transferred to the remote node until the number of dirty pages is smaller than a fixed limit. Pages dirtied during precopy have to be copied a second time. The precopy strategy cuts down the freeze time below the costs of the COR technique [Theimer et al. , 1985]. There are also variations of the above strategies.
The most notable example is migration in the Choices operating system [Roush and Campbell, 1996]. It uses a variation of the eager (dirty) strategy which transfers minimal state to the remote node at the time of migration. The remote instance is started while the remainder of the state is transferred in parallel. The initial migration time is reduced to 13. 9ms running on a SparcStation II connected by a 10Mb Ethernet, which is an order of magnitude better than all other reported results, even if results are normalized (see work by Rousch  for more details on normalized performance results).
Leaving some part of the process state on the source or intermediate nodes of the migrated instance results in a residual dependency. Residual dependencies typically occur as a consequence of two implementation techniques: either using lazy evaluation (see definition below), or as a means for achieving transparency in communication, by forwarding subsequent messages to a migrated process. A particular case of residual dependency is the home dependency, which is a dependency on the (home) node where a process was created [Douglis and Ousterhout, 1991].
An example of a home dependency is redirecting systems calls to the home node: for example, local host-dependent calls, calls related to the file system (in the absence of a distributed file system), or operations on local devices. A home dependency can simplify migration, because it is easier to redirect requests to the home node than to support services on all nodes. However, it also adversely affects reliability, because a migrated foreign process will always depend on its home node. The notion of the home dependency is further elaborated upon below in Section 5. (MOSIX) and Section 5. 2 (Sprite). Redirecting communication through the previously established links represents another kind of residual dependency. In general, dependencies left at multiple nodes should be avoided, since they require complex support, and degrade performance and fault resilience. Therefore, some form of periodic or lazy removal of residual dependencies is desirable. For example, the system could flush remaining pages to the backing store, or update residual information on migrated communication channels. 3. Transparency Transparency requires that neither the migrated task nor other tasks in the system can notice migration, with the possible exception of performance effects. Communication with a migrated process could be delayed during migration, but no message can be lost. After migration, the process should continue to communicate through previously opened I/O channels, for example printing to the same console or reading from the same files. Transparency is supported in a variety of ways, depending on the underlying operating system.
Sprite and NOW MOSIX maintain a notion of a home machine that executes all host-specific code [Douglis and Ousterhout, 1991; Barak et al. , 1995]. Charlotte supports IPC through links, which provide for remapping after migration [Finkel et al. , 1989]. Transparency also assumes that the migrated instance can execute all system calls as if it were not migrated. Some user-space migrations do not allow system calls that generate internode signals or file access [Mandelberg and Sunderam, 1988; Freedman, 1991].
Single System Image (SSI) represents a complete form of transparency. It provides a unique view of a system composed of a number of nodes as if there were just one node. A process can be started and communicated with without knowing where it is physically executing. Resources can be transparently accessed from any node in 13 the system as if they were attached to the local node. The underlying system typically decides where to instantiate new processes or where to allocate and access resources. SSI can be applied at different levels of the system.
At the user-level, SSI consists of providing transparent access to objects and resources that comprise a particular programming environment. Examples include Amber [Chase et al. , 1989] and Emerald [Jul, 1989]. At the traditional operating system level, SSI typically consists of a distributed file system and distributed process management, such as in MOSIX [Barak and Litman, 1985], Sprite [Ousterhout et al. , 1988] and OSF/1 AD TNC [Zajcew et al. , 1993]. At the microkernel level, SSI is comprised of mechanisms, such as distributed IPC, distributed memory management, and remote tasking.
A near-SSI is implemented for Mach [Black et al. , 1992] based on these transparent mechanisms, but the policies are supported at the OSF/1 AD server running on top of it. At the microkernel level the programmer needs to specify where to create remote tasks. SSI supports transparent access to a process, as well as to its resources, which simplifies migration. On the other hand, the migration mechanism exercises functionality provided at the SSI level, posing a more stressful workload than normally experienced in systems without migration [Milojicic et al. , 1993a].
Therefore, although a migration implementation on top of SSI may seem less complex, this complexity is pushed down into the SSI implementation. Some location dependencies on another host may be inevitable, such as accessing local devices or accessing kernel-dependent state that is managed by the other host. It is not possible transparently to support such dependencies on the newly visited nodes, other than by forwarding the calls back to the home node, as was done in Sprite [Douglis and Ousterhout, 1991]. 3. 4 Fault Resilience Fault resilience is frequently mentioned as a benefit of process migration.
However, this claim has never been substantiated with a practical implementation, although some projects have specifically addressed fault resilience [Chou and Abraham, 1983; Lu et al. , 1987]. So far the major contribution of process migration for fault resilience is through combination with checkpointing, such as in Condor [Litzkow and Solomon, 1992], LSF Zhou et al. , 1994] and in work by Skordos . Migration was also suggested as a means of fault containment [Chapin et al. , 1995]. Failures play an important role in the implementation of process migration.
They can happen on a source or target machine or on the communication medium. Various migration schemes are more or less sensitive to each type of failure. Residual dependencies have a particularly negative impact on fault resilience. Using them is a trade-off between efficiency and reliability. Fault resilience can be improved in several ways. The impact of failures during migration can be reduced by maintaining process state on both the source and destination sites until the destination site instance is successfully promoted to a regular process and the source node is informed about this.
A source node failure can be overcome by completely detaching the instance from the source node once it is migrated, though this prevents lazy evaluation techniques from being employed. One way to remove communication residual dependencies is to deploy locating techniques, such as multicasting (as used in V kernel Theimer et al. , 1985), rel