Databus Gitbook
Databus Client
Databus Client
  • Overview
    • Features
  • Usage
    • JAR
    • CLI
    • Docker
      • Docker Compose
    • Scala/Java API
  • Examples
    • Loading data into Virtuoso (Docker)
    • Download data using Collection Uri
Powered by GitBook
On this page
  • Concept
  • Features
  • Used frameworks
  • Limitations of mappings
  1. Overview

Features

PreviousOverviewNextUsage

Last updated 1 year ago

Concept

The client is a modular system designed for high reusability, with components like the downloading and compression converter available interchangeably. It operates across four functionality layers.

Download-Layer: This layer downloads data assets from the DBpedia Databus, preserving their provenance through stable file identifiers and additional metadata. It allows fine-grained selection of data assets through an interoperable data dependency specification and compiling configurations.

Compression-Layer: If conversion is needed, this layer detects the input compression format, decompresses the file, and passes it to the Format-Layer if necessary. It then compresses the converted file into the desired output compression format and returns it to the Download-Layer.

File-Format-Layer: This layer handles data format conversion, utilizing the Format-Layer and Mapping-Layer as required. It parses the uncompressed file into a unified internal data structure for the corresponding format equivalence class. The Format-Layer serializes this data structure into the desired output format and sends it back to the Compression-Layer.

Mapping-Layer: Used when the input and output formats belong to different equivalence classes or require data manipulation. Mapping configurations are used to transform the data from the input equivalence class to the internal data structure of the target format. The transformed data is then passed back to the Format-Layer.

Databus Client's modular design enables efficient data processing, conversion, and manipulation, enhancing reusability and flexibility in data management.

Features

Layer
Implemented formats
Future Work

Download-Layer

  • All features finished

Compression-Layer

  • bz2, gz, br, lzma, xz, zstd, snappy-framed, deflate

  • Additional formats will be included upon demand.

File-Format-Layer

  • RDF-Triples: {nt, ttl, rdfxml, hdt, owl, omn, owx}

  • RDF-Quads: {nq, trix, trig, json-ld}

  • TSD: {tsv, csv}

Mapping-Layer

  • RDF-Triples <--> RDF-Quads

  • RDF-Triples <--> TSD

  • RDF-Quads --> TSD

  • TSD --> RDF-Quads

Used frameworks

  • Compression-Layer:

  • File-Format-Layer:

  • Mapping-Layer:

    • Apache Jena and Apache Spark are used to achieve the RDF Triples to TSD mapping.

    • Apache Jena is used for RDF Triples <-> RDF Quads mappings.

Limitations of mappings

  • TSD -> RDF Quads: Due to the limitations of Tarql, there is no mapping from TSD to RDF Quads possible at the moment.

  • RDF Triples -> TSD: The mapping results in a wide table, no more precise mapping is possible yet.

All files on the

scalable RDF libraries from and

step by step, extension for all (quasi-)isomorphic

Provide a plugin mechanism to incorporate more sophisticated format.mapping engines as , R2RML, (for owl:equivalence translation) and XSLT.

covers most of the compression formats.

-> nt, ttl, rdfxml, nq, trix, trig, json-ld

-> hdt

-> owl, omn, owx

has been implemented for mapping from TSD to RDF Triples.

Apache Compress library
Apache Jena Framework
RDF HDT Framework
OWL API
Tarql
DBpedia Databus
SANSA-Stack
Databus Derive
IANA mediatypes
RML
R2R
data flow of DBpedia's Databus Client