Literal types in tasl are similar to literals in RDF. tasl's underlying data model doesn't have a fixed set of primitive types in the way that most languages do - instead, just like RDF, you can use any URI as a datatype.
In RDF, Literals are defined as a tuple of three elements:
rdf:langString
Since RDF has no concept of "types", all three of these comprise a single RDF term. However in tasl, we lift the datatype URI to the type level. A literal type in tasl is parametrized with a datatype URI, and a value of that type is a Unicode string. (tasl has no concept of language tags)
Literal types are special in tasl in that they don't have an inline syntax - they can only be defined using literal
keyword statements.
namespace ex http://example.com/ namespace xsd http://www.w3.org/2001/XMLSchema# literal myCustomLiteralName ex:hello/world literal integer xsd:integer class ex:Thing { ex:foo -> myCustomLiteralName ex:bar -> integer }
Here, we use the literal
keyword statment to define local variables myCustomLiteralName
and xsd_integer
as literal types with datatypes http://example.com/hello/world
and http://www.w3.org/2001/XMLSchema#integer
. Once we define them, we can use the bare identifiers myCustomLiteralName
and xsd_integer
as type expressions later in the schema.
But how do we know what URIs to use as datatypes? And how does tasl know what they all mean?
We can actually use any URI that we want. tasl doesn't know anything about the URI http://www.w3.org/2001/XMLSchema#integer
, and it doesn't need to.
From tasl's perspective, any value of any literal type (regardless of its datatype) is always a Unicode string. The datatype URI is an opaque tag; when we write mappings between schemas, tasl will check that datatypes are preserved (it won't let us map a literal with one datatype onto a literal with a different datatype), but it won't really use the datatype URI for anything else beyond that.
What datatypes are for is interfacing with the outside world. Just like class URIs, datatypes are a social contract. In this case, there was a specification published in 2004 by the W3C that defined a big collection of datatypes under the http://www.w3.org/2001/XMLSchema#
namespace, with very precise specs for their lexical forms (ie how to represent them all as strings). By using the datatype xsd:integer
, we're promising that all of the values of that type will follow the specification on this webpage ("42"
, "0"
, "-5"
, ...). This lets other people make tools that interface with instances on that assumption: for example, someone could make a tool for importing an instance into a relational database that maps every literal with datatype xsd:integer
to a native INTEGER NOT NULL
column, parsing an integer out of each string value based on the XSD spec. For datatypes that it doesn't recognize, it can always fall back to treating them as Unicode strings, since that's the baseline representation for all literal values.
Datatypes are another example of using URIs for decentralized coordination.
This sounds like a lot of complexity, especially just for primitives! What if we just want a regular type like boolean
- do we always have to come up with a URI and declare it with a literal
statement?
Fortunately not; tasl has some affordances to make common cases easy. The XSD namespace http://www.w3.org/2001/XMLSchema#
is the (somewhat) canonical default namespace for literals in RDF, and it includes definitions for all the basic common datatypes like strings, booleans, and various sizes of numbers. In tasl, some of the datatypes from the XSD namespace are defined as global variables, meaning you can just use them as bare identifiers without declaring them, and without declaring the XSD namespace itself. These datatypes are
namespace xsd http://www.w3.org/2001/XMLSchema# literal string xsd:string literal boolean xsd:boolean literal float32 xsd:float literal float64 xsd:double literal int xsd:integer literal uint xsd:nonNegativeInteger literal int8 xsd:byte literal int16 xsd:short literal int32 xsd:int literal int64 xsd:long literal uint8 xsd:unsignedByte literal uint16 xsd:unsignedShort literal uint32 xsd:unsignedInt literal uint64 xsd:unsignedLong literal binary xsd:hexBinary
The XSD spec defines some of these as "derivations" of others, which you can see in this diagram, but tasl doesn't know or care about that part. To tasl, these are all just opaque URIs.
You don't have to remember to include the XSD namespace in every schema, and you generally don't even have to remember the literal
statement syntax. You can just use the global variables as type expressions:
namespace ex http://example.com/ class ex:Person { ex:name -> string ex:age -> integer }
As a general rule, try use the most specific XSD datatype available. If you know that all of your ages will be greater than or equal to zero, you should feel free to say so:
namespace ex http://example.com/ class ex:Person { ex:name -> string ex:age -> uint }
Note that XSD defines many additional datatypes (e.g. yearMonthDuration
) that are not given global variable names in tasl. You can still use these wherever you find them useful, but you will have to declare them with a literal
statement yourself. Be aware that there are diminishing returns on using more specific datatypes, since fewer tools will be able to recognize them.
In addition to the XSD datatypes, tasl defines the rdf:JSON
datatype (described here) as a global variable JSON
.
namespace rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# literal JSON rdf:JSON
The JSON datatype is a particularly useful escape hatch for semi-structured, heterogenous, or miscellaneous data.
The literals in global namespace should cover most use cases, but sometimes you'll need to model a value that is best treated as a literal but doesn't have a good pre-existing datatype. In that case, the best thing to do is to create your own custom datatype URI.
You should only try to do this for things that meet all of the following conditions:
For example, here are some bad candidates for custom datatypes:
... and here are some good candidates for custom datatypes:
"^[a-z][a-zA-Z0-9]+$"
)"0.15.2-rc.1"
)Good candidates for custom datatypes generally follow a strict mini-language of their own that can't itself be naturally modeled in tasl for some reason. But if that's what you have, go for it! It's your way to signal to the world that the values are a specific format, and that people shouldn't try to mess with them unless they understand what that format is.
You don't need to do anything to start using your own datatype. Just be sure to pick a nice stable URI that you have authority over, and if you want other people to be able to interface with it, you should definitely publish documentation somewhere.