Introduction to Libds
libds is a simple, small data structure library with a number of commonly used data structures that I have found useful when writing code in the past. It includes:
- Hash tables
- Bit sets
- Linked lists
- A custom slab allocator
- Memory allocation utilities
- And not too much else, at the moment, although I'd be happy to grow it as I find myself needing more data structures.
While it is not a thread safe library, all APIs are reentrant, which means that if as long as accesses to the data structures are not shared across threads, it is safe for multiple threads to access the APIs concurrently.
Getting The Source
The code lives in Eigenstate's Git repository
git clone git://git.eigenstate.org/git/ori/libds.git
Which is browsable online at http://git.eigenstate.org/ori/libds.git
In order to install, you use the same configure scripts as Automake would want, although this code uses a simple custom makefile.
cd libds/
./configure
make
make install
It uses the standard pkgconfig machinery to make it easy to integrate with most build systems. The pkg-config name is libds-1.0. In the most basic example, building from the command line should be as simple as this:
cc -o my-binary my-source.c `pkg-config --cflags --libs libds-1.0`
Slab Allocator
Source: ds-alloc.c
typedef struct _DSAlloc DSAlloc;
DSAlloc *ds_allocatornew (size_t sz);
void *ds_alloc (DSAlloc *a);
void *ds_zalloc (DSAlloc *a);
void ds_free (DSAlloc *a, void *dat);
DSAlloc *ds_allocatornew(size_t sz)
will create a new allocator, that hands
out objects of size sz. All objects allocated from within an allocator are of
the same size, which allows the allocations to happen with very little waste,
very little per object overhead, and a high resistance to fragmentation. This
means, however, that you need a separate allocator for each object size.
Benchmarking I did in 2010 showed that the performance for this allocator was roughly double that of glibc's malloc() for small object sizes, with far less tracking overhead. However, this is due to the specificity of the API, and not due to any magical optimization that is done in this library.
void *ds_alloc(DSAlloc *a)
will create a new value from the allocator a,
much like malloc. The size is determined by the allocator that is passed to
the function.
void *ds_zalloc(DSAlloc *a)
is similar to dsalloc, except that it returns a
zeroed structure. void dsfree(DSAlloc *a, void *dat) frees a value that was
allocated from a. The allocator that you free on must be the one that the
allocation initially came from.
Example
#include <ds.h>
struct Stuff {
int value;
int another_value;
};
int main(int int argc, char **argv)
{
DSAlloc *listnode_alloc;
ListNode *head, *n;
size_t i;
head = NULL;
listnode_alloc = ds_allocatornew(sizeof(ListNode));
for (i = 0; i < 1000; i++) {
n = ds_alloc(listnode_alloc);
ds_free(listnode_alloc);
}
}
Bitsets
Source: ds-bitset.c
typedef struct DSBitset DSBitset;
/* creation */
DSBitset *ds_bsnew (void);
void ds_bsfree (DSBitset *bs);
DSBitset *ds_bsdup (DSBitset *bs);
/* manipulation */
DSBitset *ds_bsclear (DSBitset *bs);
size_t ds_bscount (DSBitset *bs);
int ds_bsiter (DSBitset *bs, size_t *elt);
/* insertion */
void ds_bsput (DSBitset *bs, size_t elt)
void ds_bsdel (DSBitset *bs, size_t elt)
int ds_bshas (DSBitset *bs, size_t elt);k
/* set operations */
void ds_bsdiff (DSBitset *a, DSBitset *b);
void ds_bsintersect(DSBitset *a, DSBitset *b)
void ds_bsunion (DSBitset *a, DSBitset *b);
/* predicates */
int ds_bseq (DSBitset *a, DSBitset *b)
int ds_bsissubset (DSBitset *set, DSBitset *sub)
ds_bsnew
, ds_bsdup
, and ds_bsfree
will respectively create a new bit set, duplicate an existing bit set, and free the storage associated with a bitset
ds_bsclear
will take an existing bitset, and clear it's contents.
ds_bscount
will count the number of elements that exist in the bitset. This is an O(n) operation.
ds_bsiter
is a slightly tricky function to iterate over the contents of a
bitset. It returns true immediately if 'elt' is in the bitset, and does not
modify the value. Otherwise it seeks forward to the next value held in the
bitset and stores it in elt. If there are no more values, it returns false to
stop iteration. This means that you need to increment elt every time you
iterate through the loop. Example usage is below:
for (size_t i = 0; bsiter(set, &i); i++)
use(i);
ds_bsput
and ds_bsdel
respectively insert or remove elt from the set bs
`ds_bshas returns 1 if the element elt is in the set bs, or 0 if it is not.
dsbsdiff, dsbsintersect, and ds_bsunion respectively find the set difference, intersection, or union between the two bitsets a and b
dsbseq and dsbsissubset return 1 if the sets are equal or subsets, respectively, and 0 if they are not.
Example
#include <ds.h>
#include <assert.h>
int main(int argc, char **argv)
{
DSBitset *bs, *other;
bs = ds_bsnew()
ds_bsput(bs, 123)
assert(ds_bshas(bs, 123))
other = ds_bsdup(bs)
ds_bsput(bs, 234)
ds_bsdiff(bs, other);
}
Hash Tables
source: ds-htab.c
typedef struct _DSHtab DSHtab;
DSHtab *ds_hnew (DSHashFn hash, DSCmpFn cmp_key);
DSHtab *ds_hnewhold (DSHashFn hash, DSCmpFn cmp_key,
DSHoldFn holdk, DSFreeFn freek,
DSHoldFn holdv, DSFreeFn freev);
void ds_hfree (DSHtab *ht);
void ds_hput (DSHtab *ht, void *key, void *data);
void ds_hdel (DSHtab *ht, void *key);
void ds_hset (DSHtab *ht, void *key, void *data);
void *ds_hget (DSHtab *ht, void *key);
int ds_hhas (DSHtab *ht, void *key);
DSHtab *ds_hclone (DSHtab *ht);
void **ds_hkeys (DSHtab *ht);
void **ds_hvals (DSHtab *ht);
int ds_hcount (DSHtab *ht);
Documentation for each function
DSHtab *ds_hnew (DSHashFn hash, DSCmpFn cmp_key);
DSHtab *ds_hnewhold (DSHashFn hash, DSCmpFn cmp_key,
DSHoldFn holdk, DSFreeFn freek,
DSHoldFn holdv, DSFreeFn freev);
void ds_hfree (DSHtab *ht);
These functions create a new hash table. Both variants take both a hash function 'hash' and a comparison function 'cmpkey'. These functions are called to hash and compare the functions. A number of useful hash functions such as dsstrhash, ds_ptrhash, and so on are included in the library.
The ds_hnewhold() hash table constructor takes an additional four function pointers: holdk, freek, holdv, freev. These function pointers are used for memory management, holding and releasing a value. They may be called multiple times per value, so they should be implemented with something along the lines of refcounting.
dshnew() will not call any retain or release functions, and is equivalent to dshnewhold(cmp, eq, NULL, NULL, NULL, NULL)
ds_hfree() releases the storage associated with the hash table, and will call the release functions on each data element still in the hash table, if provided.
ds_hput
, ds_hdel
, ds_hset
, ds_hget
, and ds_hhas
insert, remove, and
query the hash table for elements. If hold and free functions are defined,
these will be called on insertion and removal of elements from the hash table.
ds_hput
assumes that the entry being inserted is not already in the table.
use ds_hset if you want to insert duplicate elements.
ds_hclone
will duplicate a hash table, calling the hold functions on the
data elements if provided.
ds_hkeys
and ds_hvals
will return a copy of the keys and values,
respectively, in the hash table. The array returned is allocated with
'malloc()' and must be freed.
ds_hcount
counts the number of entries in the hash table.
Utility Hash Functions
uint32_t ds_strhash (void *data);
int ds_streq (void *a, void *b);
uint32_t ds_ptrhash (void *ptr);
int ds_ptreq (void *a, void *b);
uint32_t ds_directhash (void *p);
int ds_directeq (void *a, void *b);
These functions implement hashes for common data types, suitable for use in the hash table implementation above.
Linked Lists
typedef struct _DSList DSList;
struct _DSList {
DSList *prev;
DSList *next;
void *data;
};
typedef void (*DSListMapFn)(void *cont, void *data);
typedef void *(*DSListEachFn)(void *cont, void *data);
DSList *ds_lmap (DSList *head, DSListMapFn fn, void *data);
void ds_lforeach (DSList *head, DSListIterFn fn, void *data);
DSList *ds_linsert (DSList *head, void *data);
DSList *ds_lpreinsert (DSList *head, void *data);
DSList *ds_lprepend (DSList *head, void *data);
DSList *ds_lappend (DSList *head, void *data);
DSList *ds_ldup (DSList *head);
DSList *ds_ljoin (DSList *a, DSList *b);
DSList *ds_ltappend (DSList *head, DSList **tail, void *data);
DSList *ds_lcmpsearch (DSList *haystack, void *needle, DSCmpFn cmp);
DSList *ds_lsearch (DSList *haystack, void *needle);
int ds_llen (DSList *l);
DSList *ds_ltail (DSList *l);
DSList *ds_ldel (DSList *l, void *dat);
DSList *ds_lndel (DSList *l, DSList *n);
DSList *ds_lndel_full (DSList *l, DSList *n, DSFreeFn fnc);
void ds_lfree (DSList *l);
void ds_lfree_full (DSList *l, DSFreeFn fnc);
The data structure 'DSList' is publically exposed, and it's members may be touched directly for convenience.
The empty list is represented by NULL, and any place a list is passed, NULL is a valid value.
ds_lmap
and ds_lforeach
and the associated function pointer types are used
for iterating and mapping over the list elements. For the most part, I find
that it's more idiomatic in C to just loop over the lists, though.
ds_linsert
and ds_lpreinsert
insert an element before and after the node
'head', respectively. Note that the node 'elt' is not necessarily the head of
the list, and you may insert into the list at any point with these functions.
These functions return the new head of the sublist produced.
ds_lprepend
and ds_lappend
are similar in concept to their friends
dslinsert() and dslprepend() above, however, they will seek to the start or
end of the list respectively before doing the insert. They return the new head
of the list.
ds_ltappend
appends a value into the list 'head'. If 'tail' is passed and
is non-null, this function assumes it is the actual tail of the list, and
inserts after it. After inserting, it updates the tail pointer to the newly
appended node. This allows for O(1) list appends. If tail is null, then this
function seeks to the end of the list in O(n) time, and appends as normal.
ds_ldup
will duplicate a list.
ds_ljoin
will append list 'b' to the end of list 'a'. It does not copy the
nodes in 'b', not does it seek to the head of it. It does, however, seek to
the end of list 'a'.
ds_lcmpsearch
and ds_lsearch
iterate through the entire list searching for
a value. dslsearch() searches for the node containing a pointer by value.
dslcmpsearch() takes a function pointer which it uses to compare the values.
ds_llen
counts the number of nodes in the list in O(n) time.
ds_ltail
finds the tail of the list 'l' in O(n) time.
ds_ldel
, ds_lndel
, and ds_lndel_full
remove links from lists. dslndel()
will simply unlink the node, splicing the list back around it. dsldel() will
search the list by pointer value, similar to dslsearch(), and when it finds
the node it will unlink it, as in dslndel.
. ds_lndel_full
is similar to ds_lndel, however, it takes a function to
call over the data element in order to free it.
ds_lfree
and ds_lfree_full
will free a full list. Similar to dslndel and
dslndelfull, dslfree will free only the node data, while dslfreefull will
take a free function to call over each node in turn.
Malloc utility functions
void *ds_zmalloc(size_t sz);
void *ds_xmalloc(size_t sz);
void *ds_zrealloc(void *mem, size_t oldsz, size_t sz);
void *ds_xrealloc(void *mem, size_t sz);
These are utility wrappers for malloc(), calloc(), and realloc() that will abort on out of memory. Since most programs can't actually do anything sane on out of memory, and modern OSes overcommit anyways, making testing the failure paths nearly impossible, this is usually the right thing to do.
Memory allocated with these functions comes from malloc internally, and should be freed with free()
dsxmalloc() is similar to malloc: you give it a size, it gives you a chunk of memory. dszmalloc is similar to ds_malloc, only it clears the memory.
dsxrealloc is similar to realloc: you give it a valid pointer (or NULL), and it will return a resized chunk of memory, or NULL if the size of memory is zero. In other words, dsxrealloc(NULL, size) is equivalent to dsxalloc(size), and dsxrealloc(ptr, NULL) is equivalent to free(ptr).
dszrealloc() is similar to dsxrealloc(), only the values between oldsz and sz are all set to zero.