Many of these are copyrighted by ACM. See the full text of the ACM copyright notice.
Abtract: Regional garbage collection is scalable, with theoretical worst-case bounds for gc latency, MMU, and throughput that are independent of mutator behavior and the volume of reachable storage. Regional collection improves upon the worst-case pause times and MMU seen in most other general-purpose collectors, including garbage-first and concurrent mark/sweep collectors.
Abtract: Regional garbage collection offers a useful compromise between real-time and generational collection. Regional collectors resemble generational collectors, but are scalable: our main theorem guarantees a positive lower bound, independent of mutator and live storage, for the theoretical worst-case minimum mutator utilization (MMU). The theorem also establishes upper bounds for worst-case space usage and collection pauses.
Standard generational collectors are not scalable. Some real-time collectors are scalable, while others assume a well-behaved mutator or provide no worst-case guarantees at all.
Regional collectors cannot compete with hard real-time collectors at millisecond resolutions, but offer efficiency comparable to contemporary generational collectors combined with improved latency and MMU at resolutions on the order of hundreds of milliseconds to a few seconds.
Abstract: Scheme began as a sequential implementation of the Actor model, from which Scheme acquired its proper tail recursion and first class continuations; other consequences of its origins include lexical scoping, first class procedures, uniform evaluation, and a unified environment. As Scheme developed, it spun off important new technical ideas such as delimited continuations and hygienic macros while enabling research in compilers, semantics, partial evaluation, and other areas. Dozens of implementations support a wide variety of users and aspirations, exerting pressure on the processes used to specify Scheme.
case expressions of Scheme can and should
be implemented efficiently. A three-level dispatch performs
well, even when dispatching on symbols, and scales to
Abstract: A program's distribution of object lifetimes is one of the factors that determine whether and how much it will benefit from generational garbage collection, and from what kind of generational collector. Linear combinations of radioactive decay models appear adequate for modelling object lifetimes in many programs, especially when the goal is to analyze the relative or theoretical performance of simple generational collectors.
The boundary between models that favor younger-first generational collectors and models that favor older-first generational collectors is mathematically complex, even for highly idealized collectors. For linear combinations of radioactive decay models, non-generational collection is rarely competitive with idealized generational collection, even at that boundary.
Abstract: In a computer system that uses a generational garbage collector in which objects are promoted from a "young" generation to an "old" generation, a compiler output designates certain dynamic-allocation instructions as being ones whose resultant allocated objects will be considered "pinned." The compiler associates with such allocation instructions respective segments of the code following the instructions and objects allocated within one of those segments are considered to remain pinned until program execution passes beyond that segment. The garbage collector refrains from promoting any pinned object, and as a consequence, an instruction that writes a reference into an object field while that object is pinned does not need to be accompanied by a write barrier.
Abstract: Common Larceny is an implementation of Scheme for Microsoft's Common Language Runtime (CLR), which is part of .NET. Common Larceny interoperates with other CLR languages using the JavaDot notation of JScheme, generating the JavaDot interfaces just in time via reflection, and caching them for performance. All other differences between Common Larceny, Petit Larceny, and Larceny derive from differences in the compiler's target language and architecture: IL/CLR, ANSI C, or machine code. The Larceny family therefore offers a case study in the suitability of these target architectures for Scheme.
Abstract: Generational collection has improved the efficiency of garbage collection in fast-allocating programs by focusing on collecting young garbage, but has done little to reduce the cost of collecting a heap containing large amounts of older data. A new generational technique, older-first collection, shows promise in its ability to manage older data.
This paper reports on an implementation study that compared two older-first collectors to traditional (younger-first) generational collectors. One of the older-first collectors performed well and was often effective at reducing the first-order cost of collection relative to younger-first collectors. Older-first collectors perform especially well when objects have queue-like or random lifetimes.
Some larger benchmark data are now available:
Abstract: Generational garbage collection divides a heap up into two or more generations, and usually collects a youngest generation most frequently. Collection of the youngest generation requires identification of pointers into that generation from older generations; a data structure that supports such identification is called a remembered set. Various remembered set mechanisms have been proposed; these generally require mutator code to execute a write barrier when modifying pointer fields. Remembered set data structures can vary in their precision: an imprecise structure requires the garbage collector to do more work to find old-to-young pointers. Generally there is a tradeoff between remembered set precision and barrier cost: a more precise remembered set requires a more elaborate barrier. Many current systems tend to favor more efficient barriers in this tradeoff, as shown by the widespread popularity of relatively imprecise card marking techniques. This imprecision becomes increasingly costly as the ratio between old- and young-generation sizes grows. We propose a technique that maintains more precise remembered sets that scale with old-generation size, using a barrier whose cost is not significantly greater than card marking.
Abstract: Scheme and Smalltalk continuations may have unlimited extent. This means that a purely stack-based implementation of continuations, as suffices for most languages, is inadequate. We review several implementation strategies for continuations and compare their performance using instruction counts for the normal case and continuation-intensive synthetic benchmarks for other scenarios, including coroutines and multitasking. All of the strategies constrain a compiler in some way, resulting in indirect costs that are hard to measure directly. We use related measurements on a set of benchmarks to calculate upper bounds for these indirect costs.
Abstract: The report gives a defining description of the programming language Scheme. Scheme is a statically scoped and properly tail-recursive dialect of the Lisp programming language invented by Guy Lewis Steele Jr. and Gerald Jay Sussman. It was designed to have an exceptionally clear and simple semantics and few different ways to form expressions. A wide variety of programming paradigms, including imperative, functional, and message passing styles, find convenient expression in Scheme.
Abstract: The IEEE/ANSI standard for Scheme requires implementations to be properly tail recursive. This ensures that portable code can rely upon the space efficiency of continuation-passing style and other idioms. On its face, proper tail recursion concerns the efficiency of procedure calls that occur within a tail context. When examined closely, proper tail recursion also depends upon the fact that garbage collection can be asymptotically more space-efficient than Algol-like stack allocation.
Proper tail recursion is not the same as ad hoc tail call optimization in stack-based languages. Proper tail recursion often precludes stack allocation of variables, but yields a well-defined asymptotic space complexity that can be relied upon by portable programs.
This paper offers a formal and implementation-independent definition of proper tail recursion for Scheme. It also shows how an entire family of reference implementations can be used to characterize related safe-for-space properties, and proves the asymptotic inequalities that hold between them.
Abstract: Destructive array update optimization is critical for writing scientific codes in functional languages. We present set constraints for an interprocedural update optimization that runs in polynomial time. This is a multi-pass optimization, involving interprocedural flow analyses for aliasing and liveness.
We characterize the soundness of these analyses using small-step operational semantics. We have also proved that any sound liveness analysis induces a correct program transformation.
Abstract: If a fixed exponentially decreasing probability distribution is used to model every object's lifetime, then the age of an object gives no information about its future life expectancy. This radioactive decay model implies that there can be no rational basis for deciding which live objects should be promoted to another generation. Yet there remains a rational basis for deciding how many objects to promote, when to collect garbage, and which generations to collect.
Analysis of the model leads to a new kind of generational garbage collector whose effectiveness does not depend upon heuristics that predict which objects will live longer than others.
This result provides insight into the computational advantages of generational garbage collection, with implications for the management of objects whose life expectancies are difficult to predict.
Abstract: Optimizing compilers for higher-order languages need not be terribly complex. The problems created by non-local, non-global variables can be eliminated by allocating all such variables in the heap. Lambda lifting makes this practical by eliminating all non-local variables except for those that would have to be allocated on the heap anyway. The eliminated non-local variables become local variables that can be allocated in registers. Since calls to known procedures are just gotos that pass arguments, lifted lambda expressions are just assembly language labels that have been augmented by a list of symbolic names for the registers that are live at that label.
Abstract: In [a previous] paper, we gave an efficient interprocedural update analysis algorithm for strict functional languages with flat arrays and sequential evaluation. In this paper, we show that the same algorithm extends to a parallel functional language with additional constructs for partitioning and combining arrays. Even with parallel evaluation, the complexity of the analysis remains polynomial. The analysis has been implemented and the results show that several numerical algorithms such as direct and iterative methods for solving linear equations, LU, Cholesky, and QR factorizations, multigrid methods for solving partial differential equations, and non-numeric algorithms such as sorting can be implemented functionally with all updates made destructive. We describe a new array construct to specify a collection of updates on an array and show that problems like histogram, inverse permutation, and polynomial multiplication have efficient parallel functional solutions. Monolithic array operators can be considered a special case of this new construct.
This paper describes a modified form of Kohlbecker's algorithm
for reliably hygienic (capture-free) macro expansion in block-structured
languages. Unlike previous algorithms, the modified algorithm runs
in linear instead of quadratic [or exponential] time,
copies few constants, does not assume that syntactic keywords
if) are reserved words,
and allows local (scoped) macros to refer to lexical variables
in a referentially transparent manner.
Abstract: Consider the problem of converting decimal scientific notation for a number into the best binary floating point approximation to that number, for some fixed precision. This problem cannot be solved using arithmetic of any fixed precision. Hence the IEEE Standard for Binary Floating-Point Arithmetic does not require the result of such a conversion to be the best approximation.
This paper presents an efficient algorithm that always finds the best approximation. The algorithm uses a few extra bits of precision to compute an IEEE-conforming approximation while testing an intermediate result to determine whether the approximation could be other than the best. If the approximation might not be the best, then the best approximation is determined by a few simple operations on multiple-precision integers, where the precision is determined by the input. When using 64 bits of precision to compute IEEE double precision results, the algorithm avoids higher-precision arithmetic over 99% of the time.
Last updated 29 October 2011.