Reference-counted garbage collection¶
#include <libcork/core.h>
The functions in this section implement a reference counting garbage collector. The garbage collector handles reference cycles correctly. It is not a conservative garbage collector — i.e., we don’t assume that every word inside an object might be a pointer. Instead, each garbage-collected object must provide a recursion function that knows how to delve down into any child objects that it references.
The garbage collector is not thread-safe. If your application is multi-threaded, each thread will (automatically) have its own garbage collection context. There are two strategies that you can use when using the garbage collector in a multi-threaded application:
- Have a single “master” thread be responsible for the lifecycle of
every object. This thread is the only one allowed to interact with
the garbage collector. No other threads are allowed to call any
of the functions in this section, including the
cork_gc_incref()
andcork_gc_decref()
functions. Other threads are allowed to access the objects that are managed by the garbage collector, but the master thread must ensure that all objects are live whenever another thread attempts to use them. This will require some kind of thread-safe communication or synchronization between the master thread and the worker threads. - Have a separate garbage collector per thread. Each object is “owned” by a single thread, and the object is managed by that thread’s garbage collector. As with the first strategy, other threads can use any object, as long as the object’s owner thread is able to guarantee that the object will be live for as long as it’s needed. (Eventually we’ll also support migrating an object from one garbage collector to another, but that feature isn’t currently implemented.)
The garbage collection implementation is based on the algorithm described in §3 of [1].
[1] | Bacon, DF and Rajan VT. Concurrent cycle collection in reference counted systems. Proc. ECOOP 2001. LNCS 2072. |
Creating a garbage collector¶
-
void
cork_gc_init
(void)¶ Initalizes a garbage collection context for the current thread. Usually, you can allocate this on the stack of your
main
function:int main(int argc, char ** argv) { cork_gc_init(); // use the GC context // and free it when you're done cork_gc_done(); }
It’s not required that you call this function at all; if you don’t, we’ll automatically initialize a garbage collection context for the current thread the first time you try to allocate a garbage-collected object. You can call this function, though, if you want to have more control over when the initialization occurs.
-
void
cork_gc_done
(void)¶ Finalize the garbage collection context for the current thread. All objects created in this thread will be freed when this function returns.
You must call this function in each thread that allocates garbage-collected objects, just before that thread finishes. (If your application is single-threaded, then you must call this function before the
main
function finishes.) If you don’t, you’ll almost certainly get memory leaks.
Managing garbage-collected objects¶
A garbage collection context can’t be used to manage arbitrary objects, since each garbage-collected class must define some callback functions for interacting with the garbage collector. (The next section contains more details.)
Each garbage-collected class will provide its own constructor function
for instantiating a new instance of that class. There aren’t any
explicit destructors for garbage-collected objects; instead you manage
“references” to the objects. Each pointer to a garbage-collected object
is a reference, and each object maintains a count of the references to
itself. A newly constructed object starts with a reference count of
1
. Whenever you save a pointer to a garbage-collected object, you
should increase the object’s reference count. When you’re done with the
pointer, you decrease its reference count. When the reference count
drops to 0
, the garbage collector frees the object.
-
void *
cork_gc_incref
(void *obj)¶ Increments the reference count of an object obj that is managed by the current thread’s garbage collector. We always return obj as a result, which allows you to use the following idiom:
struct my_obj * my_copy_of_obj = cork_gc_incref(obj);
-
void
cork_gc_decref
(void *obj)¶ Decrements the reference count of an object obj that is managed by the current thread’s garbage collector If the reference count drops to
0
, then the garbage collector will free the object.Note
It’s safe to call this function with a
NULL
obj pointer; in this case, the function acts as a no-op.
Borrowing a reference¶
While the strategy mentioned above implies that you should call
cork_gc_incref()
and cork_gc_decref()
for every
pointer to a garbage-collected object, you can sometimes get away
without bumping the reference count. In particular, you can often
borrow an existing reference to an object, if you can guarantee that
the borrowed reference will exist for as long as you need access to the
object. The most common example of this when you pass in a
garbage-collected object as the parameter to a function:
int
use_new_reference(struct my_obj *obj)
{
/* Here we're being pedantically correct, and incrementing obj's
* reference count since we've got our own pointer to the object. */
cork_gc_incref(obj);
/* Do something useful with obj */
/* And now that we're done with it, decrement the reference count. */
cork_gc_decref(obj);
}
int
borrowed_reference(struct my_obj *obj)
{
/* We can assume that the caller has a valid reference to obj, so
* we're just going to borrow that reference. */
/* Do something useful with obj */
}
In this example, borrowed_reference
doesn’t need to update obj’s
reference count. We assume that the caller has a valid reference to
obj when it makes the call to borrowed_reference
. Moreover, we
know that the caller can’t possibly release this reference (via
cork_gc_decref()
) until borrowed_reference
returns. Since
we can guarantee that the caller’s reference to obj will exist for the
entire duration of borrowed_reference
, we don’t need to protect it
with an incref
/decref
pair.
Stealing a reference¶
Another common pattern is for a “parent” object to maintain a reference to a “child” object. (For example, a container class might maintain references to all of the elements in the container, assuming that the container and elements are all garbage-collected objects.) When you have a network of objects like this, the parent object’s constructor will usually take in a pointer to the child object as a parameter. If we strictly follow the basic referencing counting rules described above, you’ll end up with something like:
struct child *child = child_new();
struct parent *parent = parent_new(child);
cork_gc_decref(child);
The child_new
constructor gives us a reference to child. The
parent_new
constructor then creates a new reference to child,
which will be stored somewhere in parent. We no longer need our own
reference to child, so we immediately decrement its reference count.
This is a common enough occurrence that many constructor functions will instead steal the reference passed in as a parameter. This means that the constructor takes control of the caller’s reference. This allows us to rewrite the example as:
struct parent *parent = parent_new_stealing(child_new());
For functions that steal a reference, the caller cannot assume that
the object pointed to by the stolen reference exists when the function
returns. (If there’s an error in parent_new_stealing
, for instance,
it must release the stolen reference to child to prevent a memory
leak.) If a function is going to steal a reference, but you also need
to use the object after the function call returns, then you need to
explicitly increment the reference count before calling the function:
struct child *child = child_new();
struct parent *parent = parent_new_stealing(cork_gc_incref(child));
/* Do something with child. */
/* And then release our reference when we're done. */
cork_gc_decref(child);
Note
It’s important to point out that not every constructor will steal the references passed in as parameters. Moreover, there are some constructors that steal references for some parameters but not for others. It entirely depends on what the “normal” use case is for the constructor. If you’re almost always going to pass in a child object that was just created, and that will always be accessed via the parent, then the constructor will usually steal the reference. If the child can be referenced by many parents, then the constructor will usually not steal the reference. The documentation for each constructor function will explicitly state which references are stolen and which objects it creates new references for.
Writing a new garbage-collected class¶
When you are creating a new class that you want to be managed by a garbage collector, there are two basic steps you need to follow:
- Implement a set of callback functions that allow the garbage collector to interact with objects of the new class.
- Use the garbage collector’s allocation functions to allocate storage for instance of your class.
You won’t need to write a public destructor function, since objects of the new class will be destroyed automatically when the garbage collector determines that they’re no longer needed.
Garbage collector callback interface¶
Each garbage-collected class must provide an implementation of the “callback interface”:
-
struct
cork_gc_obj_iface
¶ -
void
(*free)
(void *obj)¶ This callback is called when a garbage-collected object is about to be freed. You can perform any special cleanup steps in this callback. You do not need to deallocate the object’s storage, and you do not need to release any references that you old to other objects. Both of these steps will be taken care of for you by the garbage collector.
If your class doesn’t need any additional finalization steps, this entry in the callback interface can be
NULL
.
-
void
(*recurse)
(struct cork_gc *gc, void *obj, cork_gc_recurser recurse, void *ud)¶ This callback is how you inform the garbage collector of your references to other garbage-collected objects.
The garbage collector will call this function whenever it needs to traverse through a graph of object references. Your implementation of this callback should just call recurse with each garbage-collected object that you hold a reference to. You must pass in gc as the first parameter to each call to recurse, and ud as the third parameter.
Note that it’s fine to call recurse with a
NULL
object pointer, which makes it slightly easier to write implementations of this callback.If instances of your class can never contain references to other garbage-collected objects, this entry in the callback interface can be
NULL
.
-
void
-
void
(*cork_gc_recurser)
(struct cork_gc *gc, void *obj, void *ud)¶ An opaque callback provided by the garbage collector when it calls an object’s
recurse
method.
-
struct
cork_gc
¶ An opaque type representing the current thread’s garbage-collection context. You’ll only need to use this type when implementing a
recurse
function.
Helper macros¶
There are several macros declared in libcork/helpers/gc.h
that
make it easier to define the garbage-collection interface for a new
class.
Note
Unlike most libcork modules, these macros are not automatically
defined when you include the libcork/core.h
header file, since
they don’t include a cork_
prefix. Because of this, we don’t
want to pollute your namespace unless you ask for the macros. To do
so, you must explicitly include their header file:
#include <libcork/helpers/gc.h>
-
_free_
(SYMBOL name)¶ -
_recurse_
(SYMBOL name)¶ These macros declare the free and recurse methods for a new class. The functions will be declared with exactly the signatures and parameter names shown in the entries for the
free
andrecurse
methods.You will almost certainly not need to refer to the method implementations directly, since you can use the
_gc_*_
macros below to declare the interface struct. But if you do, they’ll be called[name]__free
and[name]__recurse
. (Note the double underscore.)
-
_gc_
(SYMBOL name)¶ -
_gc_no_free_
(SYMBOL name)¶ -
_gc_no_recurse_
(SYMBOL name)¶ -
_gc_leaf_
(SYMBOL name)¶ Define the garbage-collection interface struct for a new class. If you defined both
free
andrecurse
methods, you should use the_gc_
variant. If you only defined one of the methods, you should use_gc_no_free_
or_gc_no_recurse_
. If you didn’t define either method, you should use_gc_free_
.Like the method definitions, you probably won’t need to refer to the interface struct directly, since you can use the
cork_gc_new()
macro to allocate new instances of the new class. But if you do, it will be called[name]__gc
. (Note the double underscore.)
As an example, we can use these macros to define a new tree class:
#include <libcork/helpers/gc.h>
struct tree {
const char *name;
struct tree *left;
struct tree *right;
};
_free_(tree) {
struct tree *self = obj;
cork_strfree(self->name);
}
_recurse_(tree) {
struct tree *self = obj;
recurse(self->left, ud);
recurse(self->right, ud);
}
_gc_(tree);
Allocating new garbage-collected objects¶
In your garbage-collected class’s constructor, you must use one of the
following functions to allocate the object’s storage. (The garbage
collector hides some additional state in the object’s memory region, so
you can’t allocate the storage using malloc
or cork_new()
directly.)
-
void *
cork_gc_alloc
(size_t instance_size, struct cork_gc_obj_iface *iface)¶ Allocates a new garbage-collected object that is instance_size bytes large. iface should be a pointer to a callback interface for the object. If there are any problems allocating the new instance, the program will abort.
-
type *
cork_gc_new_iface
(TYPE type, struct cork_gc_obj_iface *iface)¶ Allocates a new garbage-collected instance of type. The size of the memory region to allocate is calculated using the
sizeof
operator, and the result will be automatically cast totype *
. iface should be a pointer to a callback interface for the object. If there are any problems allocating the new instance, the program will abort.
-
struct name *
cork_gc_new
(SYMBOL name)¶ Allocates a new garbage-collected instance of
struct [name]
. (Note that you don’t pass in thestruct
part of the type name.) We assume that the garbage collection interface was created using one of the_gc_*_
macros, using the same name parameter.
Using these functions, could instantiate our example tree class as follows:
struct tree *
tree_new(const char *name)
{
struct tree *self = cork_gc_new(tree);
self->name = cork_strdup(name);
self->left = NULL;
self->right = NULL;
return self;
}