Compiler ยท Jul 28, 2020

Introducing string literal types in BuckleScript version 8.2

Highlights of our newest changes to the internal representation and how they will benefit our users.
Hongbo Zhang
Compiler Team

Important: This is an archived blog post, kept for historic reasons. Please note that this information might be terribly outdated.

String literal types in BuckleScript

String literal types were introduced by TypeScript to model JavaScript behavior, it's a relatively new concept since most type systems are runtime encoding agnostic. However, to smooth the user experience when writing bindings to existing JS API, we are introducing string literal types which are unique in several behaviors compared with TypeScript: they support type inference, pattern matching and can be attached to data.

Vanilla string literal types

The notation in Reason for string literal types is like this: `hello, which will be compiled into "hello". The difference is that `hello is given a type so that you can not mix it with other strings.

Take the following code snippet as an example:

RE
let encoding = (enc) => switch (enc) { | `utf8 => 0 | `ascii => 1 | `utf16 => 2 };

It will be compiled into

JS
function encoding(en) { if (en === "ascii") { return 1; } else if (en === "utf16") { return 2; } else { return 0; } }

If you pass a random encoding, e.g, encoding (`ucs32), you get a type error:

This expression has type [> `ucs32 ] but an expression was expected of type [< `ascii | `utf16 | `utf8 ] The second variant type does not allow tag(s) `ucs32

Another thing you can observe from the generated JS is that since the compiler can guarantee that the input could only be `utf8, `ascii,`utf16, it will skip the comparison with "utf8" when the first two are compared.

If we add a wild card to match any encoding

RE
let encoding = (enc) => switch (enc) { | `utf8 => 0 | `ascii => 1 | `utf16 => 2 | _ => 3 };

It will generate JS as below:

JS
function encoding(en) { if (en === "utf8") { return 0; } else if (en === "ascii") { return 1; } else if (en === "utf16") { return 2; } else { return 3; } }

Declaring types for string literal types

Note that all string literal types can be inferred. This is very convenient for you when you are doing development. When things get more stable, it would be nice to give string literal types a name as below:

RE
type utf = [ | `utf8 | `utf19 ]; type ascii = [ | `ascii ]

You can also embed string literal types directly inside other types without declaring it first:

RE
type t = { encodings : list([ | `utf8 | `ascii ]) }

The cool thing is that you can create union types by simply putting the types together:

RE
type encoding = [ | utf | ascii ]

The compiler even supports sugar over named string literal types:

RE
let classify = (enc) => switch (enc) { | #utf => "utf" // string literals belong to utf type | #ascii => "ascii" // string literals belog to ascii type };

The compiler would generate well optimized code as below:

JS
function classify(enc) { if (enc === "ascii") { return "ascii"; } else { return "utf"; } }

String literal types in bindings

Since string literal types are just strings after type checking, you can use them to bind to js libraries directly without any conversion, as follows:

RE
type encoding = [ | `hex | `utf8 | `ascii | `latin1 | `ucs2 | `base64 | `binary | `utf16le ]; [@bs.val] [@bs.module "fs"] external readFileSync: (string, encoding) => string = "readFileSync";

String literal types attached to data

Since Reason is a typed language, you can not mix data of different types in a collection.

For example, you will get a type error when writing code like this: [ 3, "3" ].

The deep reason is that if the compiler allows you to do such things after you box different types of data in a single collection, it is hard to give such collection a type and process it later.

With string literal types, you can do things like this:

RE
[ 3 -> `Int , "3" -> `String ]

Note the generated code for 3 -> `Int`, "3"-> `String would be:

JS
{ NAME: "Int", VAL : 3} { NAME: "String", VAL : "3"}

And you can also write code to process such collections:

RE
let handle = (xs) => Belt.List.map( xs, (param) => switch(param){ | `Int(n) => n | `String(s) => String.length(s) }, );

The generated code would be:

JS
function handle(xs) { return Belt_List.map(xs, function (param) { if (param.NAME === "Int") { return param.VAL; } else { return param.VAL.length; } }); }

To conclude, string literal types give users a convenient way to mix data with different types and process it via pattern matching later.

Declaring types for string literal types attached to data

Type inference is great during development. Users can also write down the formal types for string literal types attached to data:

RE
type number_or_string = [ | `Int(int) | `String(string) ];

Further reading

Here we only cover most daily usage of string literal types. For more advanced usage, see here. The type theory is almost the same, however, we adapt it to make sure it is compiled into string literals to match the JS runtime.

Want to read more?
Back to Overview