++The string buffer library allows high-performance manipulation of ++string-like data. ++
++++Unlike Lua strings, which are constants, string buffers are ++mutable sequences of 8-bit (binary-transparent) characters. Data ++can be stored, formatted and encoded into a string buffer and later ++converted, extracted or decoded. ++
++++The convenient string buffer API simplifies common string manipulation ++tasks, that would otherwise require creating many intermediate strings. ++String buffers improve performance by eliminating redundant memory ++copies, object creation, string interning and garbage collection ++overhead. In conjunction with the FFI library, they allow zero-copy ++operations. ++
++++The string buffer library also includes a high-performance ++serializer for Lua objects. ++
++ ++Using the String Buffer Library
++++The string buffer library is built into LuaJIT by default, but it's not ++loaded by default. Add this to the start of every Lua file that needs ++one of its functions: ++
++
++local buffer = require("string.buffer")
++
++++The convention for the syntax shown on this page is that buffer ++refers to the buffer library and buf refers to an individual ++buffer object. ++
++++Please note the difference between a Lua function call, e.g. ++buffer.new() (with a dot) and a Lua method call, e.g. ++buf:reset() (with a colon). ++
++ ++Buffer Objects
++++A buffer object is a garbage-collected Lua object. After creation with ++buffer.new(), it can (and should) be reused for many operations. ++When the last reference to a buffer object is gone, it will eventually ++be freed by the garbage collector, along with the allocated buffer ++space. ++
++++Buffers operate like a FIFO (first-in first-out) data structure. Data ++can be appended (written) to the end of the buffer and consumed (read) ++from the front of the buffer. These operations may be freely mixed. ++
++++The buffer space that holds the characters is managed automatically ++— it grows as needed and already consumed space is recycled. Use ++buffer.new(size) and buf:free(), if you need more ++control. ++
++++The maximum size of a single buffer is the same as the maximum size of a ++Lua string, which is slightly below two gigabytes. For huge data sizes, ++neither strings nor buffers are the right data structure — use the ++FFI library to directly map memory or files up to the virtual memory ++limit of your OS. ++
++ ++Buffer Method Overview
++-
++
- ++The buf:put*()-like methods append (write) characters to the ++end of the buffer. ++ ++
- ++The buf:get*()-like methods consume (read) characters from the ++front of the buffer. ++ ++
- ++Other methods, like buf:tostring() only read the buffer ++contents, but don't change the buffer. ++ ++
- ++The buf:set() method allows zero-copy consumption of a string ++or an FFI cdata object as a buffer. ++ ++
- ++The FFI-specific methods allow zero-copy read/write-style operations or ++modifying the buffer contents in-place. Please check the ++FFI caveats below, too. ++ ++
- ++Methods that don't need to return anything specific, return the buffer ++object itself as a convenience. This allows method chaining, e.g.: ++buf:reset():encode(obj) or buf:skip(len):get() ++ ++
Buffer Creation and Management
++ ++local buf = buffer.new([size [,options]])
++local buf = buffer.new([options])
++++Creates a new buffer object. ++
++++The optional size argument ensures a minimum initial buffer ++size. This is strictly an optimization when the required buffer size is ++known beforehand. The buffer space will grow as needed, in any case. ++
++++The optional table options sets various ++serialization options. ++
++ ++buf = buf:reset()
++++Reset (empty) the buffer. The allocated buffer space is not freed and ++may be reused. ++
++ ++buf = buf:free()
++++The buffer space of the buffer object is freed. The object itself ++remains intact, empty and may be reused. ++
++++Note: you normally don't need to use this method. The garbage collector ++automatically frees the buffer space, when the buffer object is ++collected. Use this method, if you need to free the associated memory ++immediately. ++
++ ++Buffer Writers
++ ++buf = buf:put([str|num|obj] [,…])
++++Appends a string str, a number num or any object ++obj with a __tostring metamethod to the buffer. ++Multiple arguments are appended in the given order. ++
++++Appending a buffer to a buffer is possible and short-circuited ++internally. But it still involves a copy. Better combine the buffer ++writes to use a single buffer. ++
++ ++buf = buf:putf(format, …)
++++Appends the formatted arguments to the buffer. The format ++string supports the same options as string.format(). ++
++ ++buf = buf:putcdata(cdata, len)FFI
++++Appends the given len number of bytes from the memory pointed ++to by the FFI cdata object to the buffer. The object needs to ++be convertible to a (constant) pointer. ++
++ ++buf = buf:set(str)
++buf = buf:set(cdata, len)FFI
++++This method allows zero-copy consumption of a string or an FFI cdata ++object as a buffer. It stores a reference to the passed string ++str or the FFI cdata object in the buffer. Any buffer ++space originally allocated is freed. This is not an append ++operation, unlike the buf:put*() methods. ++
++++After calling this method, the buffer behaves as if ++buf:free():put(str) or buf:free():put(cdata, len) ++had been called. However, the data is only referenced and not copied, as ++long as the buffer is only consumed. ++
++++In case the buffer is written to later on, the referenced data is copied ++and the object reference is removed (copy-on-write semantics). ++
++++The stored reference is an anchor for the garbage collector and keeps the ++originally passed string or FFI cdata object alive. ++
++ ++ptr, len = buf:reserve(size)FFI
++buf = buf:commit(used)FFI
++++The reserve method reserves at least size bytes of ++write space in the buffer. It returns an uint8_t * FFI ++cdata pointer ptr that points to this space. ++
++++The available length in bytes is returned in len. This is at ++least size bytes, but may be more to facilitate efficient ++buffer growth. You can either make use of the additional space or ignore ++len and only use size bytes. ++
++++The commit method appends the used bytes of the ++previously returned write space to the buffer data. ++
++++This pair of methods allows zero-copy use of C read-style APIs: ++
++
++local MIN_SIZE = 65536
++repeat
++ local ptr, len = buf:reserve(MIN_SIZE)
++ local n = C.read(fd, ptr, len)
++ if n == 0 then break end -- EOF.
++ if n < 0 then error("read error") end
++ buf:commit(n)
++until false
++
++++The reserved write space is not initialized. At least the ++used bytes must be written to before calling the ++commit method. There's no need to call the commit ++method, if nothing is added to the buffer (e.g. on error). ++
++ ++Buffer Readers
++ ++len = #buf
++++Returns the current length of the buffer data in bytes. ++
++ ++res = str|num|buf .. str|num|buf […]
++++The Lua concatenation operator .. also accepts buffers, just ++like strings or numbers. It always returns a string and not a buffer. ++
++++Note that although this is supported for convenience, this thwarts one ++of the main reasons to use buffers, which is to avoid string ++allocations. Rewrite it with buf:put() and buf:get(). ++
++++Mixing this with unrelated objects that have a __concat ++metamethod may not work, since these probably only expect strings. ++
++ ++buf = buf:skip(len)
++++Skips (consumes) len bytes from the buffer up to the current ++length of the buffer data. ++
++ ++str, … = buf:get([len|nil] [,…])
++++Consumes the buffer data and returns one or more strings. If called ++without arguments, the whole buffer data is consumed. If called with a ++number, up to len bytes are consumed. A nil argument ++consumes the remaining buffer space (this only makes sense as the last ++argument). Multiple arguments consume the buffer data in the given ++order. ++
++++Note: a zero length or no remaining buffer data returns an empty string ++and not nil. ++
++ ++str = buf:tostring()
++str = tostring(buf)
++++Creates a string from the buffer data, but doesn't consume it. The ++buffer remains unchanged. ++
++++Buffer objects also define a __tostring metamethod. This means ++buffers can be passed to the global tostring() function and ++many other functions that accept this in place of strings. The important ++internal uses in functions like io.write() are short-circuited ++to avoid the creation of an intermediate string object. ++
++ ++ptr, len = buf:ref()FFI
++++Returns an uint8_t * FFI cdata pointer ptr that ++points to the buffer data. The length of the buffer data in bytes is ++returned in len. ++
++++The returned pointer can be directly passed to C functions that expect a ++buffer and a length. You can also do bytewise reads ++(local x = ptr[i]) or writes ++(ptr[i] = 0x40) of the buffer data. ++
++++In conjunction with the skip method, this allows zero-copy use ++of C write-style APIs: ++
++
++repeat
++ local ptr, len = buf:ref()
++ if len == 0 then break end
++ local n = C.write(fd, ptr, len)
++ if n < 0 then error("write error") end
++ buf:skip(n)
++until n >= len
++
++++Unlike Lua strings, buffer data is not implicitly ++zero-terminated. It's not safe to pass ptr to C functions that ++expect zero-terminated strings. If you're not using len, then ++you're doing something wrong. ++
++ ++Serialization of Lua Objects
++++The following functions and methods allow high-speed serialization ++(encoding) of a Lua object into a string and decoding it back to a Lua ++object. This allows convenient storage and transport of structured ++data. ++
++++The encoded data is in an internal binary ++format. The data can be stored in files, binary-transparent ++databases or transmitted to other LuaJIT instances across threads, ++processes or networks. ++
++++Encoding speed can reach up to 1 Gigabyte/second on a modern desktop- or ++server-class system, even when serializing many small objects. Decoding ++speed is mostly constrained by object creation cost. ++
++++The serializer handles most Lua types, common FFI number types and ++nested structures. Functions, thread objects, other FFI cdata and full ++userdata cannot be serialized (yet). ++
++++The encoder serializes nested structures as trees. Multiple references ++to a single object will be stored separately and create distinct objects ++after decoding. Circular references cause an error. ++
++ ++Serialization Functions and Methods
++ ++str = buffer.encode(obj)
++buf = buf:encode(obj)
++++Serializes (encodes) the Lua object obj. The stand-alone ++function returns a string str. The buffer method appends the ++encoding to the buffer. ++
++++obj can be any of the supported Lua types — it doesn't ++need to be a Lua table. ++
++++This function may throw an error when attempting to serialize ++unsupported object types, circular references or deeply nested tables. ++
++ ++obj = buffer.decode(str)
++obj = buf:decode()
++++The stand-alone function deserializes (decodes) the string ++str, the buffer method deserializes one object from the ++buffer. Both return a Lua object obj. ++
++++The returned object may be any of the supported Lua types — ++even nil. ++
++++This function may throw an error when fed with malformed or incomplete ++encoded data. The stand-alone function throws when there's left-over ++data after decoding a single top-level object. The buffer method leaves ++any left-over data in the buffer. ++
++++Attempting to deserialize an FFI type will throw an error, if the FFI ++library is not built-in or has not been loaded, yet. ++
++ ++Serialization Options
++++The options table passed to buffer.new() may contain ++the following members (all optional): ++
++-
++
- ++dict is a Lua table holding a dictionary of strings that ++commonly occur as table keys of objects you are serializing. These keys ++are compactly encoded as indexes during serialization. A well-chosen ++dictionary saves space and improves serialization performance. ++ ++
- ++metatable is a Lua table holding a dictionary of metatables ++for the table objects you are serializing. ++ ++
++dict needs to be an array of strings and metatable needs ++to be an array of tables. Both starting at index 1 and without holes (no ++nil in between). The tables are anchored in the buffer object and ++internally modified into a two-way index (don't do this yourself, just pass ++a plain array). The tables must not be modified after they have been passed ++to buffer.new(). ++
++++The dict and metatable tables used by the encoder and ++decoder must be the same. Put the most common entries at the front. Extend ++at the end to ensure backwards-compatibility — older encodings can ++then still be read. You may also set some indexes to false to ++explicitly drop backwards-compatibility. Old encodings that use these ++indexes will throw an error when decoded. ++
++++Metatables that are not found in the metatable dictionary are ++ignored when encoding. Decoding returns a table with a nil ++metatable. ++
++++Note: parsing and preparation of the options table is somewhat ++expensive. Create a buffer object only once and recycle it for multiple ++uses. Avoid mixing encoder and decoder buffers, since the ++buf:set() method frees the already allocated buffer space: ++
++
++local options = {
++ dict = { "commonly", "used", "string", "keys" },
++}
++local buf_enc = buffer.new(options)
++local buf_dec = buffer.new(options)
++
++local function encode(obj)
++ return buf_enc:reset():encode(obj):get()
++end
++
++local function decode(str)
++ return buf_dec:set(str):decode()
++end
++
++
++Streaming Serialization
++++In some contexts, it's desirable to do piecewise serialization of large ++datasets, also known as streaming. ++
++++This serialization format can be safely concatenated and supports streaming. ++Multiple encodings can simply be appended to a buffer and later decoded ++individually: ++
++++local buf = buffer.new() ++buf:encode(obj1) ++buf:encode(obj2) ++local copy1 = buf:decode() ++local copy2 = buf:decode() ++++
++Here's how to iterate over a stream: ++
++++while #buf ~= 0 do ++ local obj = buf:decode() ++ -- Do something with obj. ++end ++++
++Since the serialization format doesn't prepend a length to its encoding, ++network applications may need to transmit the length, too. ++
++ ++Serialization Format Specification
++++This serialization format is designed for internal use by LuaJIT ++applications. Serialized data is upwards-compatible and portable across ++all supported LuaJIT platforms. ++
++++It's an 8-bit binary format and not human-readable. It uses e.g. ++embedded zeroes and stores embedded Lua string objects unmodified, which ++are 8-bit-clean, too. Encoded data can be safely concatenated for ++streaming and later decoded one top-level object at a time. ++
++++The encoding is reasonably compact, but tuned for maximum performance, ++not for minimum space usage. It compresses well with any of the common ++byte-oriented data compression algorithms. ++
++++Although documented here for reference, this format is explicitly ++not intended to be a 'public standard' for structured data ++interchange across computer languages (like JSON or MessagePack). Please ++do not use it as such. ++
++++The specification is given below as a context-free grammar with a ++top-level object as the starting point. Alternatives are ++separated by the | symbol and * indicates repeats. ++Grouping is implicit or indicated by {…}. Terminals are ++either plain hex numbers, encoded as bytes, or have a .format ++suffix. ++
++
++object → nil | false | true
++ | null | lightud32 | lightud64
++ | int | num | tab | tab_mt
++ | int64 | uint64 | complex
++ | string
++
++nil → 0x00
++false → 0x01
++true → 0x02
++
++null → 0x03 // NULL lightuserdata
++lightud32 → 0x04 data.I // 32 bit lightuserdata
++lightud64 → 0x05 data.L // 64 bit lightuserdata
++
++int → 0x06 int.I // int32_t
++num → 0x07 double.L
++
++tab → 0x08 // Empty table
++ | 0x09 h.U h*{object object} // Key/value hash
++ | 0x0a a.U a*object // 0-based array
++ | 0x0b a.U a*object h.U h*{object object} // Mixed
++ | 0x0c a.U (a-1)*object // 1-based array
++ | 0x0d a.U (a-1)*object h.U h*{object object} // Mixed
++tab_mt → 0x0e (index-1).U tab // Metatable dict entry
++
++int64 → 0x10 int.L // FFI int64_t
++uint64 → 0x11 uint.L // FFI uint64_t
++complex → 0x12 re.L im.L // FFI complex
++
++string → (0x20+len).U len*char.B
++ | 0x0f (index-1).U // String dict entry
++
++.B = 8 bit
++.I = 32 bit little-endian
++.L = 64 bit little-endian
++.U = prefix-encoded 32 bit unsigned number n:
++ 0x00..0xdf → n.B
++ 0xe0..0x1fdf → (0xe0|(((n-0xe0)>>8)&0x1f)).B ((n-0xe0)&0xff).B
++ 0x1fe0.. → 0xff n.I
++
++
++Error handling
++++Many of the buffer methods can throw an error. Out-of-memory or usage ++errors are best caught with an outer wrapper for larger parts of code. ++There's not much one can do after that, anyway. ++
++++OTOH, you may want to catch some errors individually. Buffer methods need ++to receive the buffer object as the first argument. The Lua colon-syntax ++obj:method() does that implicitly. But to wrap a method with ++pcall(), the arguments need to be passed like this: ++
++++local ok, err = pcall(buf.encode, buf, obj) ++if not ok then ++ -- Handle error in err. ++end ++++ ++
FFI caveats
++++The string buffer library has been designed to work well together with ++the FFI library. But due to the low-level nature of the FFI library, ++some care needs to be taken: ++
++++First, please remember that FFI pointers are zero-indexed. The space ++returned by buf:reserve() and buf:ref() starts at the ++returned pointer and ends before len bytes after that. ++
++++I.e. the first valid index is ptr[0] and the last valid index ++is ptr[len-1]. If the returned length is zero, there's no valid ++index at all. The returned pointer may even be NULL. ++
++++The space pointed to by the returned pointer is only valid as long as ++the buffer is not modified in any way (neither append, nor consume, nor ++reset, etc.). The pointer is also not a GC anchor for the buffer object ++itself. ++
++++Buffer data is only guaranteed to be byte-aligned. Casting the returned ++pointer to a data type with higher alignment may cause unaligned ++accesses. It depends on the CPU architecture whether this is allowed or ++not (it's always OK on x86/x64 and mostly OK on other modern ++architectures). ++
++++FFI pointers or references do not count as GC anchors for an underlying ++object. E.g. an array allocated with ffi.new() is ++anchored by buf:set(array, len), but not by ++buf:set(array+offset, len). The addition of the offset ++creates a new pointer, even when the offset is zero. In this case, you ++need to make sure there's still a reference to the original array as ++long as its contents are in use by the buffer. ++
++++Even though each LuaJIT VM instance is single-threaded (but you can ++create multiple VMs), FFI data structures can be accessed concurrently. ++Be careful when reading/writing FFI cdata from/to buffers to avoid ++concurrent accesses or modifications. In particular, the memory ++referenced by buf:set(cdata, len) must not be modified ++while buffer readers are working on it. Shared, but read-only memory ++mappings of files are OK, but only if the file does not change. ++
++++