tasl instances can be serialized to binary files, canonically given the .instance
file extension.
Instances and schemas are always represented separately - a .instance
file does not contain the schema for the instance. In other words, given in-memory representation types Schema
and Instance
, serialization is always a function of two arguments:
declare function serialize<S extends Schema>( schema: S, instance: Instance<S> ): Buffer
... and parsing is also always a function of two arguments:
declare function parse<S extends Schema>(schema: S, data: Buffer): Instance<S>
Schemas can be stored and parsed from text schema language files (.tasl
), or stored as serialized instances of the canonical schema schema. Instances of the schema schema are canonically given the file extension .schema
, and all tasl library implementations should be able to parse them.
.instance
files begin with a single unsigned varint indicating the serialization version number; the serialized version described on this page is version 1
(0x01
).
After the version number, the contents of each class are serialized in lexicographic order of the class key URI. The class key URIs, since they are known from the schema, do not appear in the .instance
file. For example, given this schema:
namespace ex http://example.com/ class ex:Foo <> class ex:Bar <>
... the elements in class ex:Bar
would be serialized first, followed by the elements in ex:Foo
.
Each serialized class begins with an unsigned varint encoding the total number of elements in the class (which may be zero). After this, each element is serialized consecutively; there are no delimiters between elements or between classes.
Elements have no explicit identity or header; they are simply represented by serializing their value. Since the schema is known in advance, there is no need to represent the types of each value in the instance serialization, since they are known from the schema.
URI values begin with an unsigned varint encoding the length in bytes of the URI, followed by the bytes URI itself.
Since the datatype of the literal type is known from the schema, only the value needs to be serialized.
The tasl .instance
format can serialize values of arbitrary datatypes, since all values are represented (by definition) as UTF-8 strings. Values are serialized with a uvarint length prefix followed by the raw bytes of the value.
However the tasl binary format has special cases for the following XSD datatypes:
| XSD datatype | width (bytes) | serialization | | ------------------------ | ------------- | ------------------------------------------------- | | `xsd:boolean` | 1 | `1` for true or `0` for false | | `xsd:double` | 4 | IEEE float64 | | `xsd:float` | 2 | IEEE float32 | | `xsd:integer` | variable | signed varint | | `xsd:nonNegativeInteger` | variable | unsigned varint | | `xsd:long` | 8 | int64 | | `xsd:int` | 4 | int32 | | `xsd:short` | 2 | int16 | | `xsd:byte` | 1 | int8 | | `xsd:unsignedLong` | 8 | uint64 | | `xsd:unsignedInt` | 4 | uint32 | | `xsd:unsignedShort` | 2 | uint16 | | `xsd:unsignedByte` | 1 | uint8 | | `xsd:hexBinary` | variable | a uvarint length prefix followed by the raw bytes |
As per the XSD spec, xsd:integer
and xsd:nonNegativeInteger
values can be arbitrarily large. Unsigned varints use the Protobuf / Golang encoding/binary
encoding scheme, only without the 10-byte maximum limit. A signed varint n
is represented as an unsigned varint: 2n
if 0 <= n
and -2n - 1
if n < 0
.
Note that xsd:positiveInteger
, xsd:nonPositiveInteger
, and negativeInteger
do not have special encodings.
Since the product components, their keys, and their values are all known in advance from the schema, a product value is serialized by serializing its component values by lexicographic order of their keys.
For example, in this schema
namespace ex http://example.com/ class ex:Widget { ex:spinniness -> float64 ex:deluxe -> boolean }
each ex:Widget
element would be serialized by nine bytes wide - first the ex:deluxe
component, whose value (boolean
) would take a fixed single byte, and then the ex:deluxe
component, whose value (float64
) would take a fixed eight bytes.
Again, the coproduct options, their keys, and their values are all known in advance from the schema. A serialized coproduct value begins with an unsigned varint encoding the index of the value's option, as sorted lexicographically by key, followed by the serialization of the option value.
In other words, when decoding a coproduct value, the coproduct options are first sorted lexicographically, and then an unsigned varint is read to index into the sorted array. A value of the corresponding option's type immediately follows.
Element references are represented by unsigned varints encoding indices of elements as they appear in the serialization.
These indices are only for use within the logic of a serialization; it is anti-pattern to use indices to externally identify elements within an instance. Elements, by design, cannot be referenced "from the outside" - if you need to reference elements, you should add a UUID component to their type, or some other kind of identification scheme that fits your needs.
Given this schema
namespace ex http://example.com/ class ex:Person { ex:age -> int } class ex:Person/name { ex:person -> * ex:Person ex:name -> string }
we would encode this instance (given here in an informal JSON format)
{ "http://example.com/Person": [ { "http://example.com/age": 26 }, { "http://example.com/age": 25 } ], "http://example.com/Person/name": [ { "http://example.com/person": 0, "http://example.com/name": "Jim Halpert" }, { "http://example.com/person": 1, "http://example.com/name": "Pam Beesly" }, { "http://example.com/person": 1, "http://example.com/name": "Pamela Morgan Halpert" } ] }
as the byte array
010232580B4A696D2048616C70657274000A50616D20426565736C79011550616D656C61204D6F7267616E2048616C7065727401
... which is the concatenation of these parts:
01
is the serialization version number02
is the number of elements in the ex:Person
class32
is the signed varint for 25 (the product has only one component)58
is the signed varint for 440B
the length of the string Jim Halpert
(http://example.com/name
lexicographically preceeds http://example.com/person
)4A696D2048616C70657274
the content of the string Jim Halpert
00
the index of the referenced ex:Person
element0A
the length of the string Pam Beesly
50616D20426565736C79
the content of the string Pam Beesly
01
the index of the referenced ex:Person
element15
the length of the string Pamela Morgan Halpert
50616D656C61204D6F7267616E2048616C70657274
the content of the string Pamela Morgan Halpert
01
the index of the referenced ex:Person
elementThis serialized instance is 104 bytes (compared to 423 bytes for the indented JSON version and 352 bytes for the JSON with all whitespace stripped).