Features
Last updated
Last updated
The client is a modular system designed for high reusability, with components like the downloading and compression converter available interchangeably. It operates across four functionality layers.
Download-Layer: This layer downloads data assets from the DBpedia Databus, preserving their provenance through stable file identifiers and additional metadata. It allows fine-grained selection of data assets through an interoperable data dependency specification and compiling configurations.
Compression-Layer: If conversion is needed, this layer detects the input compression format, decompresses the file, and passes it to the Format-Layer if necessary. It then compresses the converted file into the desired output compression format and returns it to the Download-Layer.
File-Format-Layer: This layer handles data format conversion, utilizing the Format-Layer and Mapping-Layer as required. It parses the uncompressed file into a unified internal data structure for the corresponding format equivalence class. The Format-Layer serializes this data structure into the desired output format and sends it back to the Compression-Layer.
Mapping-Layer: Used when the input and output formats belong to different equivalence classes or require data manipulation. Mapping configurations are used to transform the data from the input equivalence class to the internal data structure of the target format. The transformed data is then passed back to the Format-Layer.
Databus Client's modular design enables efficient data processing, conversion, and manipulation, enhancing reusability and flexibility in data management.
Compression-Layer:
Apache Compress library covers most of the compression formats.
File-Format-Layer:
Apache Jena Framework -> nt, ttl, rdfxml, nq, trix, trig, json-ld
RDF HDT Framework -> hdt
OWL API -> owl, omn, owx
Mapping-Layer:
Tarql has been implemented for mapping from TSD to RDF Triples.
Apache Jena and Apache Spark are used to achieve the RDF Triples to TSD mapping.
Apache Jena is used for RDF Triples <-> RDF Quads mappings.
TSD -> RDF Quads: Due to the limitations of Tarql, there is no mapping from TSD to RDF Quads possible at the moment.
RDF Triples -> TSD: The mapping results in a wide table, no more precise mapping is possible yet.
Layer | Implemented formats | Future Work |
---|---|---|
Download-Layer
All files on the DBpedia Databus
All features finished
Compression-Layer
bz2, gz, br, lzma, xz, zstd, snappy-framed, deflate
Additional formats will be included upon demand.
File-Format-Layer
RDF-Triples: {nt, ttl, rdfxml, hdt, owl, omn, owx}
RDF-Quads: {nq, trix, trig, json-ld}
TSD: {tsv, csv}
scalable RDF libraries from SANSA-Stack and Databus Derive
step by step, extension for all (quasi-)isomorphic IANA mediatypes
Mapping-Layer
RDF-Triples <--> RDF-Quads
RDF-Triples <--> TSD
RDF-Quads --> TSD
TSD --> RDF-Quads