Databus Gitbook
Databus
Databus
  • Overview
  • Guides
    • Data Publishing
    • Data Download
  • Use Cases
    • Data Version Control
    • Populating Database with Data
    • Data Quality Control
    • Data Crawling
    • Automated Deployment
    • Building Data Repositories
  • Organising Your Data (Model)
    • How to Organise Your Data
    • URI Design
    • Versioning
    • Metadata
      • Group
      • Artifact
      • Version
      • Distribution
      • Collection
    • Content Variants
    • Persistence (HowTo)
  • Usage
    • Quickstart Examples (Publish, Download)
    • Web Interface
      • Publish
      • Collections
      • Auto-Completion
    • API
    • Databus Mods
    • Databus Client
    • Integration with CI (Jenkins)
  • Running Your Own Databus Server
    • Run with Docker
    • Configuration
    • HTTPS & Proxy Setup
  • Development Environment
Powered by GitBook
On this page
  1. Organising Your Data (Model)

Content Variants

Content variants are a tool to distinguish the databus:Part(s) of a databus:Version. Parts of a Dataset (i.e. Version) may describe files of different formats or compression types. Sometimes however, they differ in other aspects, e.g. the language or a data specific subtopic. These special distinguations can be reflected with content variants to allow a more meaningful selection of files.

In fact, features such as the faceted browsing interface or the Databus Collections rely on a proper setup of content variants.

The main rule for content variant setup is the following:

All databus:Part(s) of a databus:Version have to be distinguishable by either format, compression type or at least one content variant

This ensures that each file in the databus:Version can be selected individually by querying for its unique tuple of format, compression type and content variants.

A content variant is a key-value pair with the key being a sub-property of databus:contentVariant and the value being a (preferrably short) string that can be chosen freely. Content variants could describe either a property of the file or its content.

Examples:

"@context" : "https://downloads.dbpedia.org/databus/context.jsonld",
...
"distribution": [
  {
    "@type": "Part",
    "@id": "https://databus.example.org/john/animals/cats/2022-02-02#cats.nt",
    "format": "nt",
    "compression": "none",
    ...
  },
  {
    "@type": "Part",
    "@id": "https://databus.example.org/john/animals/cats/2022-02-02#cats.ttl",
    "format": "ttl",
    "compression": "none",
    ...
  }
],
...

The above example shows two databus:Part(s) of a databus:Version. The two parts are distinguishable by format (nt and ttl). Hence, no content variant is required.

"@context" : "https://downloads.dbpedia.org/databus/context.jsonld",
...
"distribution": [
  {
    "@type": "Part",
    "@id": "https://databus.example.org/john/animals/cats/2022-02-02#cats_size=small.ttl",
    "format": "ttl",
    "compression": "none",
    "dcv:size": "small",
    ...
  },
  {
    "@type": "Part",
    "@id": "https://databus.example.org/john/animals/cats/2022-02-02#cats_size=big.ttl",
    "format": "ttl",
    "compression": "none",
    "dcv:size": "big",
    ...
  }
],
...
{
  "@type": "rdf:Property",
  "@id": "https://dataid.dbpedia.org/databus-cv#size",
  "rdfs:subPropertyOf": "databus:contentVariant"
}

The above example shows two databus:Parts of a databus:Version. Both parts have a format of ttl and a compression type of none. In order to improve the distinguishability of the two parts, an additional content variant has to be used. The publisher of the databus:Version chose the property dcv:size as the content variant dimension and assigned each part a different value (small and big).

PreviousCollectionNextPersistence (HowTo)

Last updated 1 year ago