.. _gc: ************************************ Reference-counted garbage collection ************************************ .. highlight:: c :: #include The functions in this section implement a reference counting garbage collector. The garbage collector handles reference cycles correctly. It is **not** a conservative garbage collector — i.e., we don't assume that every word inside an object might be a pointer. Instead, each garbage-collected object must provide a *recursion function* that knows how to delve down into any child objects that it references. The garbage collector is **not** thread-safe. If your application is multi-threaded, each thread will (automatically) have its own garbage collection context. There are two strategies that you can use when using the garbage collector in a multi-threaded application: * Have a single “master” thread be responsible for the lifecycle of every object. This thread is the only one allowed to interact with the garbage collector. **No** other threads are allowed to call any of the functions in this section, including the :c:func:`cork_gc_incref()` and :c:func:`cork_gc_decref()` functions. Other threads are allowed to access the objects that are managed by the garbage collector, but the master thread must ensure that all objects are live whenever another thread attempts to use them. This will require some kind of thread-safe communication or synchronization between the master thread and the worker threads. * Have a separate garbage collector per thread. Each object is “owned” by a single thread, and the object is managed by that thread's garbage collector. As with the first strategy, other threads can use any object, as long as the object's owner thread is able to guarantee that the object will be live for as long as it's needed. (Eventually we'll also support migrating an object from one garbage collector to another, but that feature isn't currently implemented.) The garbage collection implementation is based on the algorithm described in §3 of [1]_. .. [1] Bacon, DF and Rajan VT. *Concurrent cycle collection in reference counted systems*. Proc. ECOOP 2001. LNCS 2072. Creating a garbage collector ============================ .. function:: void cork_gc_init(void) Initalizes a garbage collection context for the current thread. Usually, you can allocate this on the stack of your ``main`` function:: int main(int argc, char ** argv) { cork_gc_init(); // use the GC context // and free it when you're done cork_gc_done(); } It's not required that you call this function at all; if you don't, we'll automatically initialize a garbage collection context for the current thread the first time you try to allocate a garbage-collected object. You can call this function, though, if you want to have more control over when the initialization occurs. .. function:: void cork_gc_done(void) Finalize the garbage collection context for the current thread. All objects created in this thread will be freed when this function returns. You must call this function in each thread that allocates garbage-collected objects, just before that thread finishes. (If your application is single-threaded, then you must call this function before the ``main`` function finishes.) If you don't, you'll almost certainly get memory leaks. Managing garbage-collected objects ================================== A garbage collection context can't be used to manage arbitrary objects, since each garbage-collected class must define some callback functions for interacting with the garbage collector. (The :ref:`next section ` contains more details.) Each garbage-collected class will provide its own constructor function for instantiating a new instance of that class. There aren't any explicit destructors for garbage-collected objects; instead you manage “references” to the objects. Each pointer to a garbage-collected object is a reference, and each object maintains a count of the references to itself. A newly constructed object starts with a reference count of ``1``. Whenever you save a pointer to a garbage-collected object, you should increase the object's reference count. When you're done with the pointer, you decrease its reference count. When the reference count drops to ``0``, the garbage collector frees the object. .. function:: void \*cork_gc_incref(void \*obj) Increments the reference count of an object *obj* that is managed by the current thread's garbage collector. We always return *obj* as a result, which allows you to use the following idiom:: struct my_obj * my_copy_of_obj = cork_gc_incref(obj); .. function:: void cork_gc_decref(void \*obj) Decrements the reference count of an object *obj* that is managed by the current thread's garbage collector If the reference count drops to ``0``, then the garbage collector will free the object. .. note:: It's safe to call this function with a ``NULL`` *obj* pointer; in this case, the function acts as a no-op. .. _borrow-ref: Borrowing a reference --------------------- While the strategy mentioned above implies that you should call :c:func:`cork_gc_incref()` and :c:func:`cork_gc_decref()` for *every* pointer to a garbage-collected object, you can sometimes get away without bumping the reference count. In particular, you can often *borrow* an existing reference to an object, if you can guarantee that the borrowed reference will exist for as long as you need access to the object. The most common example of this when you pass in a garbage-collected object as the parameter to a function:: int use_new_reference(struct my_obj *obj) { /* Here we're being pedantically correct, and incrementing obj's * reference count since we've got our own pointer to the object. */ cork_gc_incref(obj); /* Do something useful with obj */ /* And now that we're done with it, decrement the reference count. */ cork_gc_decref(obj); } int borrowed_reference(struct my_obj *obj) { /* We can assume that the caller has a valid reference to obj, so * we're just going to borrow that reference. */ /* Do something useful with obj */ } In this example, ``borrowed_reference`` doesn't need to update *obj*\ 's reference count. We assume that the caller has a valid reference to *obj* when it makes the call to ``borrowed_reference``. Moreover, we know that the caller can't possibly release this reference (via :c:func:`cork_gc_decref()`) until ``borrowed_reference`` returns. Since we can guarantee that the caller's reference to *obj* will exist for the entire duration of ``borrowed_reference``, we don't need to protect it with an ``incref``/``decref`` pair. .. _steal-ref: Stealing a reference -------------------- Another common pattern is for a “parent” object to maintain a reference to a “child” object. (For example, a container class might maintain references to all of the elements in the container, assuming that the container and elements are all garbage-collected objects.) When you have a network of objects like this, the parent object's constructor will usually take in a pointer to the child object as a parameter. If we strictly follow the basic referencing counting rules described above, you'll end up with something like:: struct child *child = child_new(); struct parent *parent = parent_new(child); cork_gc_decref(child); The ``child_new`` constructor gives us a reference to *child*. The ``parent_new`` constructor then creates a new reference to *child*, which will be stored somewhere in *parent*. We no longer need our own reference to *child*, so we immediately decrement its reference count. This is a common enough occurrence that many constructor functions will instead *steal* the reference passed in as a parameter. This means that the constructor takes control of the caller's reference. This allows us to rewrite the example as:: struct parent *parent = parent_new_stealing(child_new()); For functions that steal a reference, the caller **cannot** assume that the object pointed to by the stolen reference exists when the function returns. (If there's an error in ``parent_new_stealing``, for instance, it must release the stolen reference to *child* to prevent a memory leak.) If a function is going to steal a reference, but you also need to use the object after the function call returns, then you need to explicitly increment the reference count *before* calling the function:: struct child *child = child_new(); struct parent *parent = parent_new_stealing(cork_gc_incref(child)); /* Do something with child. */ /* And then release our reference when we're done. */ cork_gc_decref(child); .. note:: It's important to point out that not every constructor will steal the references passed in as parameters. Moreover, there are some constructors that steal references for some parameters but not for others. It entirely depends on what the “normal” use case is for the constructor. If you're almost always going to pass in a child object that was just created, and that will always be accessed via the parent, then the constructor will usually steal the reference. If the child can be referenced by many parents, then the constructor will usually *not* steal the reference. The documentation for each constructor function will explicitly state which references are stolen and which objects it creates new references for. .. _new-gc-class: Writing a new garbage-collected class ===================================== When you are creating a new class that you want to be managed by a garbage collector, there are two basic steps you need to follow: * Implement a set of callback functions that allow the garbage collector to interact with objects of the new class. * Use the garbage collector's allocation functions to allocate storage for instance of your class. You won't need to write a public destructor function, since objects of the new class will be destroyed automatically when the garbage collector determines that they're no longer needed. Garbage collector callback interface ------------------------------------ Each garbage-collected class must provide an implementation of the “callback interface”: .. type:: struct cork_gc_obj_iface .. member:: void (\*free)(void \*obj) This callback is called when a garbage-collected object is about to be freed. You can perform any special cleanup steps in this callback. You do **not** need to deallocate the object's storage, and you do **not** need to release any references that you old to other objects. Both of these steps will be taken care of for you by the garbage collector. If your class doesn't need any additional finalization steps, this entry in the callback interface can be ``NULL``. .. member:: void (\*recurse)(struct cork_gc \*gc, void \*obj, cork_gc_recurser recurse, void \*ud) This callback is how you inform the garbage collector of your references to other garbage-collected objects. The garbage collector will call this function whenever it needs to traverse through a graph of object references. Your implementation of this callback should just call *recurse* with each garbage-collected object that you hold a reference to. You must pass in *gc* as the first parameter to each call to *recurse*, and *ud* as the third parameter. Note that it's fine to call *recurse* with a ``NULL`` object pointer, which makes it slightly easier to write implementations of this callback. If instances of your class can never contain references to other garbage-collected objects, this entry in the callback interface can be ``NULL``. .. type:: void (\*cork_gc_recurser)(struct cork_gc \*gc, void \*obj, void \*ud) An opaque callback provided by the garbage collector when it calls an object's :c:member:`~cork_gc_obj_iface.recurse` method. .. type:: struct cork_gc An opaque type representing the current thread's garbage-collection context. You'll only need to use this type when implementing a :c:member:`~cork_gc_obj_iface.recurse` function. .. _gc-macros: Helper macros ~~~~~~~~~~~~~ There are several macros declared in :file:`libcork/helpers/gc.h` that make it easier to define the garbage-collection interface for a new class. .. note:: Unlike most libcork modules, these macros are **not** automatically defined when you include the ``libcork/core.h`` header file, since they don't include a ``cork_`` prefix. Because of this, we don't want to pollute your namespace unless you ask for the macros. To do so, you must explicitly include their header file:: #include .. macro:: _free_(SYMBOL name) _recurse_(SYMBOL name) These macros declare the *free* and *recurse* methods for a new class. The functions will be declared with exactly the signatures and parameter names shown in the entries for the :c:member:`~cork_gc_obj_iface.free` and :c:member:`~cork_gc_obj_iface.recurse` methods. You will almost certainly not need to refer to the method implementations directly, since you can use the :c:macro:`_gc_*_ <_gc_>` macros below to declare the interface struct. But if you do, they'll be called :samp:`{[name]}__free` and :samp:`{[name]}__recurse`. (Note the double underscore.) .. macro:: _gc_(SYMBOL name) _gc_no_free_(SYMBOL name) _gc_no_recurse_(SYMBOL name) _gc_leaf_(SYMBOL name) Define the garbage-collection interface struct for a new class. If you defined both ``free`` and ``recurse`` methods, you should use the ``_gc_`` variant. If you only defined one of the methods, you should use ``_gc_no_free_`` or ``_gc_no_recurse_``. If you didn't define either method, you should use ``_gc_free_``. Like the method definitions, you probably won't need to refer to the interface struct directly, since you can use the :c:func:`cork_gc_new` macro to allocate new instances of the new class. But if you do, it will be called :samp:`{[name]}__gc`. (Note the double underscore.) As an example, we can use these macros to define a new tree class:: #include struct tree { const char *name; struct tree *left; struct tree *right; }; _free_(tree) { struct tree *self = obj; cork_strfree(self->name); } _recurse_(tree) { struct tree *self = obj; recurse(self->left, ud); recurse(self->right, ud); } _gc_(tree); Allocating new garbage-collected objects ---------------------------------------- In your garbage-collected class's constructor, you must use one of the following functions to allocate the object's storage. (The garbage collector hides some additional state in the object's memory region, so you can't allocate the storage using ``malloc`` or :c:func:`cork_new()` directly.) .. function:: void \*cork_gc_alloc(size_t instance_size, struct cork_gc_obj_iface \*iface) Allocates a new garbage-collected object that is *instance_size* bytes large. *iface* should be a pointer to a callback interface for the object. If there are any problems allocating the new instance, the program will abort. .. function:: type \*cork_gc_new_iface(TYPE type, struct cork_gc_obj_iface \*iface) Allocates a new garbage-collected instance of *type*. The size of the memory region to allocate is calculated using the ``sizeof`` operator, and the result will be automatically cast to ``type *``. *iface* should be a pointer to a callback interface for the object. If there are any problems allocating the new instance, the program will abort. .. function:: struct name \*cork_gc_new(SYMBOL name) Allocates a new garbage-collected instance of :samp:`struct {[name]}`. (Note that you don't pass in the ``struct`` part of the type name.) We assume that the garbage collection interface was created using one of the :c:macro:`_gc_*_ <_gc_>` macros, using the same *name* parameter. Using these functions, could instantiate our example tree class as follows:: struct tree * tree_new(const char *name) { struct tree *self = cork_gc_new(tree); self->name = cork_strdup(name); self->left = NULL; self->right = NULL; return self; }