Versioning
The Version ID must adhere to URI, maven and filename standards, so the characters \/:"<>|?*
are forbidden. Furthermore it needs to be at least three characters long.
Apart from this rule the VersionIDs can contain any alphanumeric character (regardless of the case) and any of these seperator chars: -._
.
Sortable Timestamps
Although the definition of the version ID is quite free and left to the user, there is a good practise: Setting the version in the form of YYYY.MM.DD-hhmmss
, YYYY.MM.DD-hh.mm.ss
or YYYY.MM.DDThhmmss
(ISO 8601 conform) has multiple advantages:
Sorting the version strings (alphanumerically) results in sorting from oldest to latest, which can be used in multiple ways in SPARQL. For example setting
ORDER BY ?version
at the end of the query is an easy way of sorting versions of data chronologically. Furthermore you can use a filter likeFILTER(str(?version) >"2020.01.01")
to find all versions deployed in 2020 and later.You can set it according to your deploy schedule, e.g. if you deploy monthly you can just use
YYYY.MM
. You can also switch the versioning (e.g. toYYYY.MM.DD
) and the sorting still stays intact.This query provides an example how this can be used on the Databus to find DBpedia long abstracts later then 2021 and then order them chronologically:
General Notes about Versioning
Generally on the Databus the User has the complete control over its data. So it is possible to resubmit versions with the same version again, for example in the case of link rot or migrated data. Usually in this case the
databus:version
anddct:hasVersion
stays the same butdct:issued
should change (it defaults to now if not explicitly set) to make it transparent that this dataset has been modified.If you plan on further tinkering a specific version of a Dataset (e.g. the first one) it can be helpful to document that by appending
-snapshot
or-dev
to the version ID to document this and make it clear for the users. This also helps in searching such Datasets with SPARQL.
Timestamping
if dct:issue is given on post, this will be used
if not, then Databus inserts %now%
dct:modified is always set by Databus
Last updated