Annotating datasets

It is possible to annotate a dataset with so called key/value pairs. Such key/value annotations are intended to make it easy to add and access specific metadata at a per dataset level.

The difference between annotations and the descriptive metadata is that the former is easier to work with in a programmatic fashion. The descriptive metadata, stored in the dataset’s README content, is more free form. It is non-trivial to access specific pieces of information from the descriptive metadata in the dataset’s README content, whereas a dtool annotation can be easily accessed by its name (key).

To create an annotation using the dtool CLI one would use the dtool annotation set command. For example to annotate a dataset with a “project” one would use the command:

$ dtool annotation set <DS_URI> project world-peace

To access the “project” annotation one would use the dtool annotation get command:

$ dtool annotation get <DS_URI> project
world-peace

Annotations set using dtool annotation set are strings by default. It is possible to set the type to int, float, and bool using the --type option. For example to annotate a dataset with a “stars” rating one could use the command:

$ dtool annotation set --type int <DS_URI> stars 3

For more complex data structures one can set the type to json. For example:

$ dtool annotation set --type json <DS_URI> params '{"x": 3.4, "y": 5.6}'

It is possible to list all the annotations of a dataset:

$ dtool annotation ls
params  {"x": 3.4, "y": 5.6}
project world-peace
stars   3

To update an annotation one can use the dtool annotation set command again. For example to show that a dataset is really fantastic one could increase its star rating to 5:

$ dtool annotation set <DS_URI> stars 5 --type int
$ dtool annotation get <DS_URI> stars
5

Warning

There are restrictions on the characters and the length of the keys. They have to match the regular expression ^[a-zA-Z.-_]*$ and it must be 80 characters or less.