Reference-counted garbage collection

#include <libcork/core.h>

The functions in this section implement a reference counting garbage collector. The garbage collector handles reference cycles correctly. It is not a conservative garbage collector — i.e., we don’t assume that every word inside an object might be a pointer. Instead, each garbage-collected object must provide a recursion function that knows how to delve down into any child objects that it references.

The garbage collector is not thread-safe. If your application is multi-threaded, each thread will (automatically) have its own garbage collection context. There are two strategies that you can use when using the garbage collector in a multi-threaded application:

  • Have a single “master” thread be responsible for the lifecycle of every object. This thread is the only one allowed to interact with the garbage collector. No other threads are allowed to call any of the functions in this section, including the cork_gc_incref() and cork_gc_decref() functions. Other threads are allowed to access the objects that are managed by the garbage collector, but the master thread must ensure that all objects are live whenever another thread attempts to use them. This will require some kind of thread-safe communication or synchronization between the master thread and the worker threads.
  • Have a separate garbage collector per thread. Each object is “owned” by a single thread, and the object is managed by that thread’s garbage collector. As with the first strategy, other threads can use any object, as long as the object’s owner thread is able to guarantee that the object will be live for as long as it’s needed. (Eventually we’ll also support migrating an object from one garbage collector to another, but that feature isn’t currently implemented.)

The garbage collection implementation is based on the algorithm described in §3 of [1].

[1]Bacon, DF and Rajan VT. Concurrent cycle collection in reference counted systems. Proc. ECOOP 2001. LNCS 2072.

Creating a garbage collector

void cork_gc_init(void)

Initalizes a garbage collection context for the current thread. Usually, you can allocate this on the stack of your main function:

int
main(int argc, char ** argv)
{
    cork_gc_init();

    // use the GC context

    // and free it when you're done
    cork_gc_done();
}

It’s not required that you call this function at all; if you don’t, we’ll automatically initialize a garbage collection context for the current thread the first time you try to allocate a garbage-collected object. You can call this function, though, if you want to have more control over when the initialization occurs.

void cork_gc_done(void)

Finalize the garbage collection context for the current thread. All objects created in this thread will be freed when this function returns.

You must call this function in each thread that allocates garbage-collected objects, just before that thread finishes. (If your application is single-threaded, then you must call this function before the main function finishes.) If you don’t, you’ll almost certainly get memory leaks.

Managing garbage-collected objects

A garbage collection context can’t be used to manage arbitrary objects, since each garbage-collected class must define some callback functions for interacting with the garbage collector. (The next section contains more details.)

Each garbage-collected class will provide its own constructor function for instantiating a new instance of that class. There aren’t any explicit destructors for garbage-collected objects; instead you manage “references” to the objects. Each pointer to a garbage-collected object is a reference, and each object maintains a count of the references to itself. A newly constructed object starts with a reference count of 1. Whenever you save a pointer to a garbage-collected object, you should increase the object’s reference count. When you’re done with the pointer, you decrease its reference count. When the reference count drops to 0, the garbage collector frees the object.

void *cork_gc_incref(void *obj)

Increments the reference count of an object obj that is managed by the current thread’s garbage collector. We always return obj as a result, which allows you to use the following idiom:

struct my_obj * my_copy_of_obj = cork_gc_incref(obj);
void cork_gc_decref(void *obj)

Decrements the reference count of an object obj that is managed by the current thread’s garbage collector If the reference count drops to 0, then the garbage collector will free the object.

Note

It’s safe to call this function with a NULL obj pointer; in this case, the function acts as a no-op.

Borrowing a reference

While the strategy mentioned above implies that you should call cork_gc_incref() and cork_gc_decref() for every pointer to a garbage-collected object, you can sometimes get away without bumping the reference count. In particular, you can often borrow an existing reference to an object, if you can guarantee that the borrowed reference will exist for as long as you need access to the object. The most common example of this when you pass in a garbage-collected object as the parameter to a function:

int
use_new_reference(struct my_obj *obj)
{
    /* Here we're being pedantically correct, and incrementing obj's
     * reference count since we've got our own pointer to the object. */
    cork_gc_incref(obj);

    /* Do something useful with obj */

    /* And now that we're done with it, decrement the reference count. */
    cork_gc_decref(obj);
}

int
borrowed_reference(struct my_obj *obj)
{
    /* We can assume that the caller has a valid reference to obj, so
     * we're just going to borrow that reference. */

    /* Do something useful with obj */
}

In this example, borrowed_reference doesn’t need to update obj’s reference count. We assume that the caller has a valid reference to obj when it makes the call to borrowed_reference. Moreover, we know that the caller can’t possibly release this reference (via cork_gc_decref()) until borrowed_reference returns. Since we can guarantee that the caller’s reference to obj will exist for the entire duration of borrowed_reference, we don’t need to protect it with an incref/decref pair.

Stealing a reference

Another common pattern is for a “parent” object to maintain a reference to a “child” object. (For example, a container class might maintain references to all of the elements in the container, assuming that the container and elements are all garbage-collected objects.) When you have a network of objects like this, the parent object’s constructor will usually take in a pointer to the child object as a parameter. If we strictly follow the basic referencing counting rules described above, you’ll end up with something like:

struct child  *child = child_new();
struct parent  *parent = parent_new(child);
cork_gc_decref(child);

The child_new constructor gives us a reference to child. The parent_new constructor then creates a new reference to child, which will be stored somewhere in parent. We no longer need our own reference to child, so we immediately decrement its reference count.

This is a common enough occurrence that many constructor functions will instead steal the reference passed in as a parameter. This means that the constructor takes control of the caller’s reference. This allows us to rewrite the example as:

struct parent  *parent = parent_new_stealing(child_new());

For functions that steal a reference, the caller cannot assume that the object pointed to by the stolen reference exists when the function returns. (If there’s an error in parent_new_stealing, for instance, it must release the stolen reference to child to prevent a memory leak.) If a function is going to steal a reference, but you also need to use the object after the function call returns, then you need to explicitly increment the reference count before calling the function:

struct child  *child = child_new();
struct parent  *parent = parent_new_stealing(cork_gc_incref(child));
/* Do something with child. */
/* And then release our reference when we're done. */
cork_gc_decref(child);

Note

It’s important to point out that not every constructor will steal the references passed in as parameters. Moreover, there are some constructors that steal references for some parameters but not for others. It entirely depends on what the “normal” use case is for the constructor. If you’re almost always going to pass in a child object that was just created, and that will always be accessed via the parent, then the constructor will usually steal the reference. If the child can be referenced by many parents, then the constructor will usually not steal the reference. The documentation for each constructor function will explicitly state which references are stolen and which objects it creates new references for.

Writing a new garbage-collected class

When you are creating a new class that you want to be managed by a garbage collector, there are two basic steps you need to follow:

  • Implement a set of callback functions that allow the garbage collector to interact with objects of the new class.
  • Use the garbage collector’s allocation functions to allocate storage for instance of your class.

You won’t need to write a public destructor function, since objects of the new class will be destroyed automatically when the garbage collector determines that they’re no longer needed.

Garbage collector callback interface

Each garbage-collected class must provide an implementation of the “callback interface”:

struct cork_gc_obj_iface
void (*free)(void *obj)

This callback is called when a garbage-collected object is about to be freed. You can perform any special cleanup steps in this callback. You do not need to deallocate the object’s storage, and you do not need to release any references that you old to other objects. Both of these steps will be taken care of for you by the garbage collector.

If your class doesn’t need any additional finalization steps, this entry in the callback interface can be NULL.

void (*recurse)(struct cork_gc *gc, void *obj, cork_gc_recurser recurse, void *ud)

This callback is how you inform the garbage collector of your references to other garbage-collected objects.

The garbage collector will call this function whenever it needs to traverse through a graph of object references. Your implementation of this callback should just call recurse with each garbage-collected object that you hold a reference to. You must pass in gc as the first parameter to each call to recurse, and ud as the third parameter.

Note that it’s fine to call recurse with a NULL object pointer, which makes it slightly easier to write implementations of this callback.

If instances of your class can never contain references to other garbage-collected objects, this entry in the callback interface can be NULL.

void (*cork_gc_recurser)(struct cork_gc *gc, void *obj, void *ud)

An opaque callback provided by the garbage collector when it calls an object’s recurse method.

struct cork_gc

An opaque type representing the current thread’s garbage-collection context. You’ll only need to use this type when implementing a recurse function.

Helper macros

There are several macros declared in libcork/helpers/gc.h that make it easier to define the garbage-collection interface for a new class.

Note

Unlike most libcork modules, these macros are not automatically defined when you include the libcork/core.h header file, since they don’t include a cork_ prefix. Because of this, we don’t want to pollute your namespace unless you ask for the macros. To do so, you must explicitly include their header file:

#include <libcork/helpers/gc.h>
_free_(SYMBOL name)
_recurse_(SYMBOL name)

These macros declare the free and recurse methods for a new class. The functions will be declared with exactly the signatures and parameter names shown in the entries for the free and recurse methods.

You will almost certainly not need to refer to the method implementations directly, since you can use the _gc_*_ macros below to declare the interface struct. But if you do, they’ll be called [name]__free and [name]__recurse. (Note the double underscore.)

_gc_(SYMBOL name)
_gc_no_free_(SYMBOL name)
_gc_no_recurse_(SYMBOL name)
_gc_leaf_(SYMBOL name)

Define the garbage-collection interface struct for a new class. If you defined both free and recurse methods, you should use the _gc_ variant. If you only defined one of the methods, you should use _gc_no_free_ or _gc_no_recurse_. If you didn’t define either method, you should use _gc_free_.

Like the method definitions, you probably won’t need to refer to the interface struct directly, since you can use the cork_gc_new() macro to allocate new instances of the new class. But if you do, it will be called [name]__gc. (Note the double underscore.)

As an example, we can use these macros to define a new tree class:

#include <libcork/helpers/gc.h>

struct tree {
    const char  *name;
    struct tree  *left;
    struct tree  *right;
};

_free_(tree) {
    struct tree  *self = obj;
    cork_strfree(self->name);
}

_recurse_(tree) {
    struct tree  *self = obj;
    recurse(self->left, ud);
    recurse(self->right, ud);
}

_gc_(tree);

Allocating new garbage-collected objects

In your garbage-collected class’s constructor, you must use one of the following functions to allocate the object’s storage. (The garbage collector hides some additional state in the object’s memory region, so you can’t allocate the storage using malloc or cork_new() directly.)

void *cork_gc_alloc(size_t instance_size, struct cork_gc_obj_iface *iface)

Allocates a new garbage-collected object that is instance_size bytes large. iface should be a pointer to a callback interface for the object. If there are any problems allocating the new instance, the program will abort.

type *cork_gc_new_iface(TYPE type, struct cork_gc_obj_iface *iface)

Allocates a new garbage-collected instance of type. The size of the memory region to allocate is calculated using the sizeof operator, and the result will be automatically cast to type *. iface should be a pointer to a callback interface for the object. If there are any problems allocating the new instance, the program will abort.

struct name *cork_gc_new(SYMBOL name)

Allocates a new garbage-collected instance of struct [name]. (Note that you don’t pass in the struct part of the type name.) We assume that the garbage collection interface was created using one of the _gc_*_ macros, using the same name parameter.

Using these functions, could instantiate our example tree class as follows:

struct tree *
tree_new(const char *name)
{
    struct tree  *self = cork_gc_new(tree);
    self->name = cork_strdup(name);
    self->left = NULL;
    self->right = NULL;
    return self;
}