Features

Concept

The client is a modular system designed for high reusability, with components like the downloading and compression converter available interchangeably. It operates across four functionality layers.

Download-Layer: This layer downloads data assets from the DBpedia Databus, preserving their provenance through stable file identifiers and additional metadata. It allows fine-grained selection of data assets through an interoperable data dependency specification and compiling configurations.

Compression-Layer: If conversion is needed, this layer detects the input compression format, decompresses the file, and passes it to the Format-Layer if necessary. It then compresses the converted file into the desired output compression format and returns it to the Download-Layer.

File-Format-Layer: This layer handles data format conversion, utilizing the Format-Layer and Mapping-Layer as required. It parses the uncompressed file into a unified internal data structure for the corresponding format equivalence class. The Format-Layer serializes this data structure into the desired output format and sends it back to the Compression-Layer.

Mapping-Layer: Used when the input and output formats belong to different equivalence classes or require data manipulation. Mapping configurations are used to transform the data from the input equivalence class to the internal data structure of the target format. The transformed data is then passed back to the Format-Layer.

Databus Client's modular design enables efficient data processing, conversion, and manipulation, enhancing reusability and flexibility in data management.

Features

LayerImplemented formatsFuture Work

Download-Layer

  • All features finished

Compression-Layer

  • bz2, gz, br, lzma, xz, zstd, snappy-framed, deflate

  • Additional formats will be included upon demand.

File-Format-Layer

  • RDF-Triples: {nt, ttl, rdfxml, hdt, owl, omn, owx}

  • RDF-Quads: {nq, trix, trig, json-ld}

  • TSD: {tsv, csv}

Mapping-Layer

  • RDF-Triples <--> RDF-Quads

  • RDF-Triples <--> TSD

  • RDF-Quads --> TSD

  • TSD --> RDF-Quads

  • Provide a plugin mechanism to incorporate more sophisticated format.mapping engines as RML, R2RML, R2R (for owl:equivalence translation) and XSLT.

Used frameworks

  • Compression-Layer:

  • File-Format-Layer:

  • Mapping-Layer:

    • Tarql has been implemented for mapping from TSD to RDF Triples.

    • Apache Jena and Apache Spark are used to achieve the RDF Triples to TSD mapping.

    • Apache Jena is used for RDF Triples <-> RDF Quads mappings.

Limitations of mappings

  • TSD -> RDF Quads: Due to the limitations of Tarql, there is no mapping from TSD to RDF Quads possible at the moment.

  • RDF Triples -> TSD: The mapping results in a wide table, no more precise mapping is possible yet.

Last updated