tasl instances can be serialized to binary files, canonically given the .instance file extension.
Instances and schemas are always represented separately - a .instance file does not contain the schema for the instance. In other words, given in-memory representation types Schema and Instance, serialization is always a function of two arguments:
... and parsing is also always a function of two arguments:
Schemas can be stored and parsed from text schema language files (.tasl), or stored as serialized instances of the canonical schema schema. Instances of the schema schema are canonically given the file extension .schema, and all tasl library implementations should be able to parse them.
.instance files begin with a single unsigned varint indicating the serialization version number; the serialized version described on this page is version 1 (0x01).
After the version number, the contents of each class are serialized in lexicographic order of the class key URI. The class key URIs, since they are known from the schema, do not appear in the .instance file. For example, given this schema:
... the elements in class ex:Bar would be serialized first, followed by the elements in ex:Foo.
Each serialized class begins with an unsigned varint encoding the total number of elements in the class (which may be zero). After this, each element is serialized consecutively; there are no delimiters between elements or between classes.
Elements have no explicit identity or header; they are simply represented by serializing their value. Since the schema is known in advance, there is no need to represent the types of each value in the instance serialization, since they are known from the schema.
URI values begin with an unsigned varint encoding the length in bytes of the URI, followed by the bytes URI itself.
Since the datatype of the literal type is known from the schema, only the value needs to be serialized.
The tasl .instance format can serialize values of arbitrary datatypes, since all values are represented (by definition) as UTF-8 strings. Values are serialized with a uvarint length prefix followed by the raw bytes of the value.
However the tasl binary format has special cases for the following XSD datatypes:
| XSD datatype | width (bytes) | serialization | | ------------------------ | ------------- | ------------------------------------------------- | | `xsd:boolean` | 1 | `1` for true or `0` for false | | `xsd:double` | 4 | IEEE float64 | | `xsd:float` | 2 | IEEE float32 | | `xsd:integer` | variable | signed varint | | `xsd:nonNegativeInteger` | variable | unsigned varint | | `xsd:long` | 8 | int64 | | `xsd:int` | 4 | int32 | | `xsd:short` | 2 | int16 | | `xsd:byte` | 1 | int8 | | `xsd:unsignedLong` | 8 | uint64 | | `xsd:unsignedInt` | 4 | uint32 | | `xsd:unsignedShort` | 2 | uint16 | | `xsd:unsignedByte` | 1 | uint8 | | `xsd:hexBinary` | variable | a uvarint length prefix followed by the raw bytes |
As per the XSD spec, xsd:integer and xsd:nonNegativeInteger values can be arbitrarily large. Unsigned varints use the Protobuf / Golang encoding/binary encoding scheme, only without the 10-byte maximum limit. A signed varint n is represented as an unsigned varint: 2n if 0 <= n and -2n - 1 if n < 0.
Note that xsd:positiveInteger, xsd:nonPositiveInteger, and negativeInteger do not have special encodings.
Since the product components, their keys, and their values are all known in advance from the schema, a product value is serialized by serializing its component values by lexicographic order of their keys.
For example, in this schema
each ex:Widget element would be serialized by nine bytes wide - first the ex:deluxe component, whose value (boolean) would take a fixed single byte, and then the ex:deluxe component, whose value (float64) would take a fixed eight bytes.
Again, the coproduct options, their keys, and their values are all known in advance from the schema. A serialized coproduct value begins with an unsigned varint encoding the index of the value's option, as sorted lexicographically by key, followed by the serialization of the option value.
In other words, when decoding a coproduct value, the coproduct options are first sorted lexicographically, and then an unsigned varint is read to index into the sorted array. A value of the corresponding option's type immediately follows.
Element references are represented by unsigned varints encoding indices of elements as they appear in the serialization.
These indices are only for use within the logic of a serialization; it is anti-pattern to use indices to externally identify elements within an instance. Elements, by design, cannot be referenced "from the outside" - if you need to reference elements, you should add a UUID component to their type, or some other kind of identification scheme that fits your needs.
Given this schema
we would encode this instance (given here in an informal JSON format)
{ "http://example.com/Person": [ { "http://example.com/age": 26 }, { "http://example.com/age": 25 } ], "http://example.com/Person/name": [ { "http://example.com/person": 0, "http://example.com/name": "Jim Halpert" }, { "http://example.com/person": 1, "http://example.com/name": "Pam Beesly" }, { "http://example.com/person": 1, "http://example.com/name": "Pamela Morgan Halpert" } ] }
as the byte array
010232580B4A696D2048616C70657274000A50616D20426565736C79011550616D656C61204D6F7267616E2048616C7065727401
... which is the concatenation of these parts:
01 is the serialization version number02 is the number of elements in the ex:Person class32 is the signed varint for 25 (the product has only one component)58 is the signed varint for 440B the length of the string Jim Halpert (http://example.com/name lexicographically preceeds http://example.com/person)4A696D2048616C70657274 the content of the string Jim Halpert00 the index of the referenced ex:Person element0A the length of the string Pam Beesly50616D20426565736C79 the content of the string Pam Beesly01 the index of the referenced ex:Person element15 the length of the string Pamela Morgan Halpert50616D656C61204D6F7267616E2048616C70657274 the content of the string Pamela Morgan Halpert01 the index of the referenced ex:Person elementThis serialized instance is 104 bytes (compared to 423 bytes for the indented JSON version and 352 bytes for the JSON with all whitespace stripped).