Chapter 2 presented a general survey of multimedia schedulers, applications, and programming models. The purpose of this section, on the other hand, is to explicitly compare elements of HLS with related work that has been described in the literature.
Systems providing hierarchical CPU scheduling can be divided into three broad categories. Homogeneous hierarchical schedulers use the same scheduling algorithm throughout the hierarchy. General heterogeneous hierarchies, such as HLS, allow user-specified scheduling algorithms everywhere in the hierarchy. In between these two extremes are fixed heterogeneous scheduling hierarchies that specify some of the schedulers in the hierarchy (usually including the root scheduler), possibly allowing user-specified schedulers at other locations in the hierarchy. Most of the existing hierarchical scheduling work falls into the last category.
Waldspurger and Weihl developed the currency abstraction in the context of lottery scheduling [90], a randomized proportional share scheduler. Currencies allow hierarchical isolation between groups of threads by providing a level of indirection between requests for scheduling and the actual number of shares that are allocated. Currencies are also supported by the stride scheduler [91], a deterministic proportional share scheduler based on virtual times.
The borrowed virtual time (BVT) scheduler [20] is a proportional share scheduler that allows selected threads to borrow against their future processor allocation, decreasing their scheduling latencies (at the expense of threads that are not allowed to borrow, or that can borrow less). A hierarchical version of BVT was proposed in order to reserve part of the processor bandwidth for multimedia applications, scheduling time-sharing applications in a best-effort manner using a second-level scheduler.
The main contribution of homogeneous hierarchical schedulers is that they provide hierarchical isolation between groups of resource consumers. They do not address the need for different kinds of scheduling support for applications with diverse requirements. Furthermore, the stride, lottery, and BVT schedulers do not address the problem of providing bounded-latency scheduling to real-time applications.
CPU inheritance scheduling [24] permits any thread to act as a scheduler by donating the CPU to other threads. The root of the scheduling hierarchy is a special thread that the operating system always donates the CPU to after the occurrence of a scheduling event. The scheduling hierarchy, then, exists as a set of informal relationships between threads, requiring very little support from the kernel.
CPU inheritance scheduling and HLS are equivalent in the sense that a scheduling hierarchy that can be expressed in one can be expressed in the other. However, HLS implements schedulers in the kernel in order to avoid causing unnecessary context switches. For example, on a 500 MHz PC running Windows 2000 a context switch between threads (the mechanism used for communication between schedulers in CPU inheritance) costs around 7 s and a virtual function call (the mechanism used for communication between schedulers in HLS) costs about 20 ns. The other main difference between the two systems is that HLS was developed along with guarantees, a mechanism for reasoning about scheduler composition in order to provide guarantees to real-time applications, while CPU inheritance stipulates that all composition issues are the responsibility of the user.
As far as we know, HLS is the first general, heterogeneous hierarchical scheduling system to be implemented in a general-purpose operating system. CPU inheritance scheduling was prototyped in a user-level thread package and also implemented as part of the OSKit [22]. The Bossa domain-specific language (DSL) for application-specific scheduling policies [6] is being implemented in Linux and appears to support a general, heterogeneous scheduling hierarchy. Bossa represents an approach to making it easier to write schedulers that is more aggressive than the one taken by HLS: the DSL is used to guarantee safety properties of schedulers and also to specialize schedulers to meet the needs of specific applications.
Scheduler activations [3] allow the OS to notify user-level thread schedulers of operating system events that may affect scheduling decisions, such as blocking and unblocking threads and granting and revocation of processors. The virtual processor interface in HLS is a generalization of scheduler activations; the relationship between the two is discussed in detail in Section 4.3.1.4 . Nemesis [49] permits two-level scheduling hierarchies using an interface similar to scheduler activations. Both of these systems differ from HLS in that they require the second-level scheduler to execute in user space, they limit the scheduling hierarchy to two levels, and they fix the scheduler at the root of the hierarchy.
The Spin operating system [9] provides functionality similar to scheduler activations, but allows applications to load their own schedulers and thread package implementations into the kernel at run time. Therefore, Spin is more similar to HLS than scheduler activations are, although significant differences remain--Spin has a fixed root scheduler, and supports only two-level scheduling hierarchies. Vassal [14] also allows an application-specified scheduler to be loaded into the OS kernel at run time. Vassal is similar to HLS in that the loaded scheduler may take precedence over the time-sharing scheduler, but Vassal only allows a single scheduler to be loaded at a time, and consequently does not have to deal with the problem of schedulers whose demands for CPU time conflict. Also, in contrast with HLS, Vassal does not provide the loaded scheduler with the full set of notifications about OS events (such as thread blocking and unblocking) that a scheduler needs to be informed of.
The Exokernel [21] is an extensible operating system that uses a CPU donation primitive similar to the one used in CPU inheritance scheduling, allowing unprivileged applications to act as schedulers. However, the root scheduler for the Exokernel is fixed, and is not a real-time scheduler.
Resource containers [5] permit limits on various resources, including processor time, to be applied to hierarchical collections of threads. However, fair scheduling is guaranteed only on long time scales (tens of seconds). Share II [10] and the Solaris Resource Manager [58, Ch. 7] are commercial products that provide hierarchical fair-share scheduling, also on long time scales. Similarly, many general-purpose operating systems provide mechanisms outside of the main time-sharing scheduler for enforcing long-term limits on CPU usage. For example, the SIGXCPU signal in Unix operating systems [26, pp. 201-202] and job objects in Windows 2000 [77, pp. 374-378] can be used to notify or terminate processes or groups of processes that have exceeded their CPU usage limits. All of these mechanisms were designed to achieve long-term fairness among non-real-time applications--they are not suitable for real-time scheduling.
Goyal et al. invented start-time fair queuing (SFQ) [28], a proportional share scheduling algorithm that works well in hierarchical environments. They proposed an architecture that uses SFQ schedulers at all points in the scheduling hierarchy except for leaves, which may implement arbitrary scheduling algorithms such as rate monotonic scheduling or earliest deadline first. However, no method for guaranteeing the schedulability of applications running under the leaf schedulers was presented. Also, the fact that the guaranteed scheduling latency of SFQ depends on the number of threads being scheduled by a SFQ scheduler makes it a dubious choice for scheduling real-time application that have deadlines of the same order of magnitude as the length of the system scheduling quantum (as Section 5.3.3 showed).
RED-Linux [92] defines a two-level scheduling framework in which each task is characterized by a 4-tuple consisting of priority, start time, finish time, and budget. Multiple leaf schedulers may exist, and each task must belong to one of them. A leaf scheduler is a function from a task’s scheduling parameters to an effective priority; the root scheduler always runs the task that has the highest effective priority. This framework is general enough to implement many existing classes of real-time schedulers. However, it supports no mechanism for resolving conflicts among leaf schedulers, and consequently cannot provide throughput or delay guarantees to applications.
The open environment for real-time applications developed by Deng and Liu [18] and the BSS-I [51] and PShED [52] frameworks developed by Lipari et al. are hierarchical scheduling systems designed around the idea of providing a uniformly slower processor (USP) to each real-time application. The USP abstraction guarantees that any task set (i.e. the set of real-time tasks comprising a real-time application) that can be scheduled without missing any deadlines (by a particular scheduler) on a processor of speed s can also be scheduled by that scheduler if it is given a USP with rate s/f on a processor of speed f.
The distinguishing feature of the USP guarantee is that it is characterized only by a rate (a share of a processor) and not by a granularity at which that rate will be granted. Granularity information must be specified dynamically: a scheduler receiving a USP guarantee must inform the root scheduler of every deadline that matters to it. The root scheduler then performs a computation over the deadlines provided by all of its children in order to ensure that no USP uses more than its share of the processor over any time interval that matters to any USP. Therefore, USPs implicitly require all deadlines to be known at run time.
The requirement that all deadlines be specified dynamically adds run-time overhead and is not a good match for some kinds of schedulers. For example, a rate monotonic scheduler retains no information about task deadlines at run time, and consequently cannot make use of a USP guarantee. Similarly, proportional share schedulers and time-sharing schedulers do not have explicit deadlines. Furthermore, the BSS-I and PShED schedulers entail considerable implementation complexity, requiring a balanced-tree data structure if run-time cost linear in the number of active deadlines is to be avoided. The USP guarantee fits into the HLS framework as a special case of the CPU reservation abstraction. However, its restricted programming model--deadlines must be known at run time--limits the utility of the USP approach for scheduling unmodified real-time applications in a general-purpose operating system.
The HLS resource manager differs from all previous CPU resource managers in one important respect: it has available to it a collection of hierarchical schedulers, as well as reflective information about those schedulers, that allows it to match each application request with a scheduler that can meet its requirements. When appropriate, it provides applications with guarantees about the rate and granularity with which they will receive CPU time.
Another important distinction is that many of the resource managers that have appeared in the literature are explicitly designed to support either adaptive applications, networked applications, or both. Although the HLS resource manager could be used to support these kinds of applications, they are not its main target. Distributed applications are not a principal concern by assumption: the HLS resource manager is designed to solve problems associated with the allocation of local processor time. Adaptive applications are not a principal concern because the vast majority of existing multimedia applications are not resource self-aware, and do not attempt to dynamically adjust their behavior in response to changing resource availability.
The QoS Broker [65] is a middleware resource manager that reserves network resources (bandwidth, buffer space, etc.) and operating system resources (real-time scheduling capacity, memory, storage, etc.) using information stored in profiles, in order to meet applications’ needs. The goals of the HLS resource manager and the QoS broker overlap to some extent--both are designed to store resource information for applications and then retrieve and apply it when needed--but the similarities appear to end there because the QoS broker focuses primarily on the management of network resources and the HLS resource manager focuses primarily on CPU time.
The Dynamic QoS Resource Manager (DQM) [13, 12] is designed to operate without sophisticated scheduling support from the operating system. Real-time applications are expected to support a number of different execution levels, each of which is characterized by a 3-tuple consisting of resource usage, benefit, and period. The DQM then selects an execution level for each application such that high overall benefit is provided to the user. It reduces overall resource usage when applications miss deadlines and increases usage when idle time is detected. Though there are similarities between the HLS resource manager and the DQM (i.e. applications are characterized by the resources they require), the approaches are otherwise very different. DQM was designed under the assumption that applications may be freely modified, and that the services provided by existing general-purpose operating systems are sufficient to schedule multimedia applications. HLS, on the other hand, was designed under the assumption that unmodified multimedia applications must be supported, and that the operating system may be freely modified in order to implement diverse scheduling algorithms. The justification for the HLS approach is that since there are many applications it is unacceptable to complicate the application programming model.
Oparah [70] describes a QoS manager (QM) for the Nemesis operating system that is similar in many ways to the DQM: it requires applications to register different modes with the OS, with each mode being characterized by a resource tuple and a value. The QM then attempts to provide high overall value during overload by reducing the resource usage of some or all applications. A novel feature of the QM is a graphical interface that allows users to indicate special preferences such as forbidding the system from taking resources from an important application or correcting suboptimal resource allocations; the QM remembers these corrections and attempts to track user preferences more closely in the future. Like the DQM, the QM focuses on adaptive applications.
The modular resource manager for Rialto [36] divides the system into resource providers and activities. Each activity requests a resource set, a list of amounts of the different resources it requires to operate correctly, from a central planner. During overload, user preferences are consulted to resolve the conflict. The Rialto and HLS resource managers are similar in intent, but differ in a number of details. First, and most importantly, Rialto assumes that applications are resource-self aware, meaning that they are able to make good use of amounts of resources other than their full requirements. We reject this assumption--some applications are simply incapable of functioning correctly when they receive less CPU time than they require, and others are capable of adapting in principle, but have not been implemented in a manner that makes this possible. Second, the Rialto resource manager was designed to allocate multiple resources while the HLS resource manager is only concerned with the CPU. Finally, the Rialto resource manager assumes that resource amounts may be combined linearly, meaning that a resource with overall capacity 1.0 will always be able to support two activities that each require 0.5 of the resource. This assumption is not true for CPU time, for example, when the rate monotonic scheduling algorithm is used, or when schedulability analysis includes elements such as context switch overhead or blocking factors. The HLS resource manager uses scheduler-specific schedulability analysis routines to determine whether a particular set of demands for processing time is feasible or not.