Herin, I document Myrddin's standard library. It fits on one page. With time, hopefully the problematic parsimony will be solved.
Memory Management
/* single item allocation */
generic alloc : (-> @a#)
generic zalloc : (-> @a#)
generic free : (v : @a# -> void)
/* slice allocation */
generic slalloc : (len : size -> @a[:])
generic slzalloc : (len : size -> @a[:])
generic slfree : (sl : @a[:], len : size -> @a[:])
/* slice resizing */
generic slgrow : (sl : @a[:], len : size -> @a[:])
generic slzgrow : (sl : @a[:], len : size -> @a[:])
/* byte allocation */
const bytealloc : (sz : size -> byte#)
const bytefree : (m : byte#, sz : size -> void)
The functions in this section deal with memory. By convention, they will raise
an error on failure to allocate memory, instead of trying to deal with the out
of memory condition. On modern operating systems with overcommit, trying to
correctly handle out of memory is futile. By convention, a zero slice [][:]
is always safe to pass to the slice free or grow functions.
alloc
allocates or free a single element of type @a, respectively. zalloc
does the same, but zeros the memory allocated before returning it. free
is
used to return the memory allocated by these functions to the system.
slalloc
allocates or frees a slice of len
items of type @a. slzalloc
does the same, but zeros the memory allocated before returning it. slfree
is used to return the memory to the system.
slgrow
resizes the slize sl
to length len
, allocating the appropriate
amount of memory. slzgrow
does the same, but any elements between the old
and new length are zeroed, as in slzalloc.
bytealloc
and bytefree
are the low level raw-byte allocation interface,
returning blobs of bytes. Since the only way to use more than one of these
bytes is to cast to a different type, it is strongly recommended to avoid
using these functions in user code.
String And Unicode Utils
const strcmp : (a : byte[:], b : byte[:] -> order)
const strncmp : (a : byte[:], b : byte[:], n : size -> order)
const streq : (a : byte[:], b : byte[:] -> bool)
const strfind : (haystack : byte[:], needle : byte[:] -> option(size))
const strcat : (a : byte[:], b : byte[:] -> byte[:])
const strjoin : (strings : byte[:][:], delim : byte[:] -> byte[:])
const strsplit : (s : byte[:], delim : byte[:] -> byte[:][:])
const strstrip : (str : byte[:] -> byte[:])
const strfstrip : (str : byte[:] -> byte[:])
const strrstrip : (str : byte[:] -> byte[:])
These functions are documented twice because, although they are useful utility functions suitable for hash tables and such, they are also string utilities.
strcmp
returns an order on two strings. strcmp
does not handle
normalization or canonicalization. strncmp
is similar, although it only
compares at most the first n
bytes of the strings.
streq
returns true
if the strings would compare equal.
strfind
finds the substring needle
in the string haystack
, and returns
\
Some indexif it finds it, or
`None` if the string is not present.
strcat
will join two strings, allocating a return buffer. It is equivalent
to strjoin([a, b][:], "")
.
strjoin
will take a slice of strings, and join them with the provided delimiter
interjected between each element. So, for example, strjoin(["a", "b", "c"][:], ",")
will return a new heap allocated string containing "a,b,c"
.
strsplit
will split a string on a delimiter. It will allocate a slice to
return the split segments in, however, the split segments will point into the
original string. Duplicated delimiters will return an empty field.
std.strsplit("a,b,,c", ",")
will return the slice ["a","b","","c"][:]
.
strstrip
and friends will strip spaces from a string. strstrip
will remove both leading and trailing spaces. strfstrip
will remove
spaces scanning forward from the start of the string, and strrstrip
will remove spaces scanning in reverse from the end of the string.
These functions return a subslice of the original input buffer, and do not allocate or modify their input.
Unicode types
const Badchar : char = -1 castto(char)
const Maxcharlen : size = 4
const Maxcharval : char = 0x10FFFF
const charlen : (chr : char -> size)
const encode : (buf : byte[:], chr : char -> size)
const decode : (buf : byte[:] -> char)
const striter : (str : byte[:] -> [char, byte[:]])
Badchar
is a value returned by libstd when it cannot decode a character.
Maxcharlen
is the maximum size of a utf8 sequence. Maxcharval
is the
maximum codepoint that is available in Unicode.
charlen
returns the length of the character buffer that would be needed
to encode a character chr
.
encode
and decode
respectively convert a single character to a utf8
encoded byte buffer, and vice versa.
striter
is a slightly tricky to use interface for iterating over unicode
encoded strings. A better API is needed here.
Example
const walkstring = {str
var c
while str.len > 0
(c, str) = std.striter(str)
consume(c)
;;
}
Character Types
const isalpha : (c : char -> bool)
const isnum : (c : char -> bool)
const isalnum : (c : char -> bool)
const isspace : (c : char -> bool)
const islower : (c : char -> bool)
const isupper : (c : char -> bool)
const istitle : (c : char -> bool)
All of the above functions take a char (ie, utf codepoint) and returns a boolean representing whether the predicate holds.
const tolower : (c : char -> char)
const toupper : (c : char -> char)
const totitle : (c : char -> char)
These functions convert characters to upper, lower, or title case respectively.
Because they work on a character by character basis, they are not always correct
for unicode, where, for example, toupper(ß)
should return SS
. However,
because SS
is two codepoints, this API does not work correctly.
System Calls
type pid = int64
type scno = int64 /* syscall */
type fdopt = int64 /* fd options */
type fd = int64 /* fd */
type mprot = int64 /* memory protection */
type mopt = int64 /* memory mapping options */
type socktype = int64 /* socket type */
type sockproto = int64 /* socket protocol */
type sockfam = uint16 /* socket family */
type whence = uint64
type clock = union
`Clockrealtime
`Clockmonotonic
`Clockproccpu
`Clockthreadcpu
`Clockmonotonicraw
`Clockrealtimecoarse
`Clockmonotoniccoarse
`Clockboottime
`Clockrealtimealarm
`Clockboottimealarm
;;
type timespec = struct
sec : uint64
nsec : uint64
;;
type timeval = struct
sec : uint64
usec : uint64
;;
type rusage = struct
utime : timeval /* user time */
stime : timeval /* system time */
;;
type statbuf = struct
dev : uint
ino : uint
mode : uint16
nlink : uint16
uid : uint16
gid : uint16
rdev : uint
size : uint
blksize : uint
blocks : uint
atime : uint
atimens : uint
mtime : uint
mtimens : uint
ctime : uint
ctimens : uint
_unused1: uint
_unused2: uint
;;
type utsname = struct
system : byte[65]
node : byte[65]
release : byte[65]
version : byte[65]
machine : byte[65]
domain : byte[65]
;;
type sockaddr = struct
fam : sockfam
data : byte[14]
;;
type sockaddr_in = struct
fam : sockfam
port : uint16
addr : byte[4]
zero : byte[8]
;;
type sockaddr_storage = struct
fam : sockfam
__align : uint32
__pad : byte[112]
;;
These types represent the arguments and return values for a large number of system calls. They are used in the functions listed below.
extern const syscall : (sc:scno, args:... -> int64)
Syscall() takes a system call number, arguments, returning -errno. It's up to the caller to match the correct ABI.
/* process management */
const exit : (status:int -> void)
const getpid : ( -> int64)
const kill : (pid:int64, sig:int64 -> int64)
const fork : (-> int64)
const wait4 : (pid:int64, loc:int32#, opt : int64, usage:rusage# -> int64)
const waitpid : (pid:int64, loc:int32#, opt : int64 -> int64)
const execv : (cmd : byte[:], args : byte[:][:] -> int64)
const execve : (cmd : byte[:], args : byte[:][:], env : byte[:][:] -> int64)
/* fd manipulation */
const open : (path:byte[:], opts:fdopt, mode:int64 -> fd)
const close : (fd:fd -> int64)
const creat : (path:byte[:], mode:int64 -> fd)
const read : (fd:fd, buf:byte[:] -> size)
const write : (fd:fd, buf:byte[:] -> size)
const lseek : (fd:fd, off:uint64, whence:int64 -> int64)
const stat : (path:byte[:], sb:statbuf# -> int64)
const fstat : (fd:fd, sb:statbuf# -> int64)
const mkdir : (path : byte[:], mode : int64 -> int64)
/* networking */
const socket : (dom : sockfam, stype : socktype, proto : sockproto -> fd)
const connect : (sock : fd, addr : sockaddr#, len : size -> int)
const accept : (sock : fd, addr : sockaddr#, len : size# -> fd)
const listen : (sock : fd, backlog : int -> int)
const bind : (sock : fd, addr : sockaddr#, len : size -> int)
/* memory mapping */
const munmap : (addr:byte#, len:size -> int64)
const mmap : (addr:byte#, len:size, prot:mprot, flags:mopt, fd:fd, off:off -> byte#)
/* time */
const clock_getres : (clk : clock, ts : timespec# -> int32)
const clock_gettime : (clk : clock, ts : timespec# -> int32)
const clock_settime : (clk : clock, ts : timespec# -> int32)
const sleep : (time : uint64 -> int32)
const nanosleep : (req : timespec#, rem : timespec# -> int32)
/* system information */
const uname : (buf : utsname# -> int)
All of these functions match the standard system calls documented in syscall-name(2). These are merely Myrddin wrappers for them, which will call the 'syscall()' function with the appropriate arguments. They may do some transformation to the arguments, or in some cases, wrap other system calls for cross platform compatibility. For example, Darwin (ie, OSX) does not have a uname() system call, so this code makes the appropriate sysctl requests to fill the data structure.
const Sys$(syscall_name) = ...
All the system call numbers are exposed from libstd. These system calls are platform specific, and probably should not be used directly.
Networking
DNS Resolution
type rectype = uint16
const DnsA : rectype = 1 /* host address */
const DnsNS : rectype = 2 /* authoritative name server */
const DnsMD : rectype = 3 /* mail destination (Obsolete - use MX) */
const DnsMF : rectype = 4 /* mail forwarder (Obsolete - use MX) */
const DnsCNAME : rectype = 5 /* canonical name for an alias */
const DnsSOA : rectype = 6 /* marks the start of a zone of authority */
const DnsMB : rectype = 7 /* mailbox domain name (EXPERIMENTAL) */
const DnsMG : rectype = 8 /* mail group member (EXPERIMENTAL) */
const DnsMR : rectype = 9 /* mail rename domain name (EXPERIMENTAL) */
const DnsNULL : rectype = 10 /* null RR (EXPERIMENTAL) */
const DnsWKS : rectype = 11 /* well known service description */
const DnsPTR : rectype = 12 /* domain name pointer */
const DnsHINFO : rectype = 13 /* host information */
const DnsMINFO : rectype = 14 /* mailbox or mail list information */
const DnsMX : rectype = 15 /* mail exchange */
const DnsTXT : rectype = 16 /* text strings */
const DnsAAAA : rectype = 28 /* ipv6 host address */
type resolveerr = union
`Badhost
`Badsrv
`Badquery
`Badresp
;;
type hostinfo = struct
fam : sockfam
stype : socktype
ttl : uint32
addr : netaddr
/*
flags : uint32
addr : sockaddr[:]
canon : byte[:]
*/
;;
const resolve : (host : byte[:] -> error(hostinfo[:], resolveerr))
const resolvemx : (host : byte[:] -> error(hostinfo[:], resolveerr))
const resolverec : (host : byte[:], t : rectype -> error(hostinfo[:], resolveerr))
DNS is implemented in libstd through the resolve() family of functions. 'resolve()' defaults to looking up A records, although when IPv6 support lands fully, it will return both A and AAAA records.
resolvemx() will look up an MX record, and resolverec() will resolve any record type passed to it. This API will need to change in order to return all useful and relevant information.
Dialing Hosts
const dial : (dialstr : byte[:] -> error(fd, byte[:]))
Dial wraps up the ugliness of raw BSD sockets, allowing you to connect to a host with a Plan 9 style dial string, of the form::
"proto!host-or-ip!port"
Protocol is one of tcp
, udp
, unix
. No other protocols are currently supported, although adding
support would be a good thing. Host-or-ip currently only supports DNS A records or IPv4 IP addresses. Ports
may be any name in /etc/services, or a numeric port number.
Extrema
generic min : (a : @a::numeric, b : @a::numeric -> @a::numeric)
generic max : (a : @a::numeric, b : @a::numeric -> @a::numeric)
As one would expect, min() returns the minimum of it's two arguments, and max() returns the maximum of the two arguments, as determined by the '<' and > operators respectively. If the two values are equal, the first argument will be returned.
Slice Utilities
generic slcp : (dst : @a[:], src : @a[:] -> void)
generic sldup : (sl : @a[:] -> @a[:])
generic sleq : (a : @a[:], b : @a[:] -> bool)
generic slfill : (sl : @a[:], v : @a -> @a[:])
generic sljoin : (dst : @a[:], src : @a[:] -> @a[:])
generic slpush : (sl : @a[:], elt : @a -> @a[:])
generic slinsert : (sl : @a[:], idx : size, elt : @a -> @a[:])
These utility functions are used for manipulating slices of any type. By convention, if they return a value, that value is allocated on the heap. Any exceptions will be noted.
slcp
copies elements from slice src
to slice dst
with a shallow
copy. Both dst and src must be the same length, or the program will abort.
sldup
duplicates a slice, allocating it on the heap with slalloc(), and
then copying the elements over with
slcp()`.
sleq
compares two slices for equality, using the ==
operator elementwise.
slfill
fills all elements in a slice with the value provided. This is useful
to, for example, fill an array with default elements.
sljoin
will take a slice dst
, and append src
to it destructively.
slpush
will take a slice sl
, and append elt
to it destructively, growing
if needed.
slinsert
will take an element elt
and insert it at an arbitrary index
idx
, shifting the other elements downwards in the array.
Misc Utility Functions
type order = union
`Before
`Equal
`After
;;
/* comparison functions */
generic numcmp : (a : @a, b : @a -> order)
const strcmp : (a : byte[:], b : byte[:] -> order)
const strncmp : (a : byte[:], b : byte[:], n : size -> order)
/* hash functions */
const strhash : (s : byte[:] -> uint32)
generic ptrhash : (p : @a# -> uint32)
generic inthash : (v : @a::(integral,numeric) -> uint32)
/* equality functions */
generic inteq : (a : @a::(integral,numeric), b : @a::(integral,numeric) -> bool)
const streq : (a : byte[:], b : byte[:] -> bool)
generic ptreq : (a : @a#, b : @a# -> bool)
These utility functions are designed to be used as arguments to a variety of other functions, for example, generic algorithms, hash tables, and similar. They are implemented in a suitable way for some more common data types that a Myrddin program will use.
The comparison functions define an order on their two arguments, and will
return \
Beforeif
a < b,
`Equalif
a == b, and
`Afterif
a > b`.
The hash functions will return a 32 bit, non cryptographic hash usable for hash tables, sets, and other program-internal hashing operations.
The equality functions will return true
if a == b
, and false
in all other cases.
Generic Algorithms
Sorting
generic sort : (sl:@a[:], cmp:(a:@a, b:@a -> order) -> @a[:])
sort()
is a generic sort function that orders based on a user provided
comparison function cmp
. This function must return an order, where for
any two elements, cmp(a, b)
will return \
Beforeif
a < b,
`Equalif
a == b, and
`Afterif
a > b`. This ordering must
be transitive, or algorithms will fail.
Error Handling
type error(@a, @b) = union
`Success @a
`Failure @b
;;
type option(@a) = union
`Some @a
`None
;;
Error handling in Myrddin is limited, and library support is merely down to
providing two types for it. error(@a, @b)
is used for where you want to
indicate failure with a value. option(@a)
is used for where you want
to indicate that a value may or may not be present.
Option Parsing
type optctx = struct
/* public variables */
args : byte[:][:]
;;
const optinit : (optstr: byte[:], optargs : byte[:][:] -> optctx#)
const optnext : (ctx : optctx# -> [char, byte[:]])
const optdone : (ctx : optctx# -> bool)
const optfin : (ctx : optctx# -> byte[:][:])
optctx
functions are used for parsing command line arguments. They take an
option string in the style of getopt(3). In other words, they take a string
of characters abcde:f?g:
. Each character represents a valid option, except
for the two metacharacters :
and ?
. These represent a mandatory or
optional argument, respectively.
Option parsing stops at the first --
but other than that, all options are
order independent, may be repeated, and may be seen at any point in the program.
Arguments that do not take arguments may be glommed together into a single argument.
If an optional argument is not present, it is represented by an empty string. This
should be fixed so that an option type is used.
For example, with the option string "abc:d?
, the following parses will
result: ::
-ab => -a -b
-a -b => -a -b
-abc => Error, 's' has no argument
-abcd => -a -b -c "d"
-abc str => -a -b -c "str"
-ab -c str => -a -b -c "str"
-ab -c str -d => -a -b -c "str"
-ab -c str -d => -a -b -c "str" -d
-ab -c str -d xyz => -a -b -c "str" -d "xyz"
An example of the usage of these functions is below:
use std
const main = {args
var optctx
optctx = std.optbegin("abc:d?", args)
while !std.optdone(optctx)
match std.optnext(optctx)
| ('a', _): std.put("Got option 'a'\n")
| ('b', _): std.put("Got option 'b'\n")
| ('c', str): std.put("Got option 'c' with arg %s\n", str)
| ('d', str): std.put("Got option 'c' with arg %s\n", str)
;;
;;
}
Variadic Functions
const vastart : (args : ...* -> valist)
generic vanext : (ap : valist -> [@a, valist])
These functions are used to iterate through a variadic argument list. They have no type safety, and no good error handling. They can crash your program or silently pull out incorrect values if used incorrectly, and will be a pain to debug. There are a number of ideas to fix it so that they are at least runtime type safe, but for now, this is what you got.
vastart
converts a variadic argument list type (...
) to a valist
type,
which can be iterated for arguments. vanext
will iterate through this, pulling
one value out of the argument list, and returning it along side the rest of the list.
A usage example is below:
use std
const main = {
/* easy as... */
f(1, 2, 3, 'a', 'b', 'c')
}
const f = {args : ...
var ival : int
var cval : char
var ap
ap = std.vastart(&args)
/* 3 integers */
(ival, ap) = vanext(ap)
std.put("first int: %i\n", ival)
(ival, ap) = vanext(ap)
std.put("second int: %i\n", ival)
(ival, ap) = vanext(ap)
std.put("third int: %i\n", ival)
/* 3 chars */
(cval, ap) = vanext(ap)
std.put("first char: %c\n", cval)
(cval, ap) = vanext(ap)
std.put("second char: %c\n", cval)
(cval, ap) = vanext(ap)
std.put("third char: %c\n", cval)
}
Data Structures: Hash Tables
type htab(@k, @v)
generic mkht : (h : (k : @k -> uint32), eq : (a : @k, b : @k -> bool) -> htab(@k, @v)#)
generic htfree : (ht : htab(@k, @v)# -> void)
generic htput : (ht : htab(@k, @v)#, k : @k, v : @v -> void)
generic htdel : (ht : htab(@k, @v)#, k : @k -> void)
generic htget : (ht : htab(@k, @v)#, k : @k -> option(@v))
generic htgetv : (ht : htab(@k, @v)#, k : @k, fallback : @v-> @v)
generic hthas : (ht : htab(@k, @v)#, k : @k -> bool)
generic htkeys : (ht : htab(@k, @v)# -> @k[:])
mkht
and htfree
respectively allocate and free a hash table. The hash
table will use the hash and comparsion functions provided. It does not
allocate or free nodes.
htput
, htdel
, hthas
, htget
, and htgetv
, respectively insert, delete,
query the presence of, and retrieve keys from the hash table. htgetv
allows
retrieval from the hash table that will never fail, always returning a
fallback.
htkeys
will return a list of all keys currently in the hash table, allowing
iteration over them.
Example
use std
const main = {
var ht : std.htab(byte[:], int)#
var keys
ht = std.mkht(std.strhash, std.streq)
std.htput(ht, "abc", 123)
std.htput(ht, "xyz", 987)
if !std.hthas(ht, "abc")
std.die("our hash table is broken")
;;
match std.htget(ht, "xyz")
| `std.Some v: std.put("Got val %i\n", val)
| `std.None: std.put("No value present\n")
;;
std.put("Defaulted value: %i\n", std.htgetv(ht, "nope", 42))
keys = std.htkeys(ht)
for k in keys
std.put("%s -> %i\n", k, std.htgetv(ht, k, -1))
;;
}
Data Structures: Bit Sets
type bitset
const mkbs : (-> bitset#)
const bsfree : (bs : bitset# -> void)
const bsdup : (bs : bitset# -> bitset#)
generic bsput : (bs : bitset#, v : @a::(integral,numeric) -> void)
generic bsdel : (bs : bitset#, v : @a::(integral,numeric) -> void)
generic bshas : (bs : bitset#, v : @a::(integral,numeric) -> bool)
const bsdiff : (a : bitset#, b : bitset# -> void)
const bsintersect : (a : bitset#, b : bitset# -> void)
const bsunion : (a : bitset#, b : bitset# -> void)
const bseq : (a : bitset#, b : bitset# -> bool)
const bsissub : (set : bitset#, sub : bitset# -> bool)
const bsclear : (bs : bitset# -> bitset#)
const bscount : (bs : bitset# -> bitset#)
Libstd ships with an implementation of simple bitsets. These bitsets perform best on relatively low valued, densely packed sets of integers, and take space proportional to the highest value inserted into them. They do no compression.
mkbs
, and bsfree
, respectively, allocate and free bitsets. bsdup
duplicates a bitset, which must be bsfree
ed after use.
bsput
, bsdel
, and bshas
respectively insert, remove, and query
for the presence of a value in the bitset.
bsdiff
, bsintersect
, and bsunion
perform the set operations that
you would expect them to, in O(n) time.
bseq
checks to see if a
is equal to b, while
bsissubchecks
if
subis a subset of
set`.
bsclear
clears all set values in bs
, returning an empty set.
bscount
iterates over all the items in set
, returning the
total count of items.