Overview
Download data from DBpedia Databus using SPARQL and make it data fit for your applications.
The DBpedia Databus Client simplifies data consumption and compilation from the DBpedia Databus, addressing challenges in using data from different publishers and domains.
Data is often released in various serialization and compression formats, requiring conversion before it can be utilized. Additionally, tabular structured data like in relational databases or data from community-specific formats necessitates mapping for integration with knowledge graphs. Currently, mapping efforts are dispersed, leading to reduced reusability and unclear provenance.
To address these issues, we propose a client that can automatically convert and compile data assets registered on any DBpedia Databus distribution into formats supported by the target infrastructure. It enables seamless consumption of compiled data, similar to traditional software dependency management systems. By shifting the burden of format conversion from data providers to the client, we reduce the publishing effort, enhance data consumption with fewer conversion problems, enable data-driven applications with automatically updated dependencies and enhance the findability and reuse of mapping definitions.
The client brings us closer to realizing a unified and efficient data ecosystem, promoting reusability and maintaining clear provenance.
Status
Beta: The Databus Client produces expected results for compression conversion and file format conversion. Errors could occure for the mapping process. Please expect some code refactoring and fluctuation.
Important Links
Discord: Don't hesitate to ask, if you have any questions.
Quickstart
Requirements
You have multiple options to run the client (shown in Usage). For the standalone approach (.jar file) you only need Java
installed on your machine.
Java:
JDK 8
orJDK 11
Installation
Download databus-client.jar
of the latest Databus Client release.
Choose Data for your application
First, we need to select the Databus, where we want to get our data from. We take this databus. Its SPARQL Endpoint is located here: https://dev.databus.dbpedia.org/sparql .
To select data from a DBpedia Databus, you can perform queries. Databus provides two mechanisms for this, which are described in detail here.
We use the following query as selection for this example and write it to the file test.sparql
:
Download and convert selected data
In order to download the data we need to pass the query as the -s
argument. Additionaly we need to specify where the query needs to be asked to. This is done using the -e
argument. Furthermore if we want to convert the files to .nt we need to specify if in the -f
parameter and finally we need to tell the client the desired compression. There are more options described in #cli-options
Per default the resulting files will be saved to ./files/
.
Contributing
Please report issues in our github repository.
If you would like to submit a non-trivial patch or pull request we will need you to sign the Contributor License Agreement, we will send it to you in that case.
License
The source code of this repo is published under the Apache License Version 2.0
Databus is configured so that the default license of all metadata is CC-0, which is relevant for all data of the Model, i.e. who published which data, when and under which license.
The individual datasets are referenced via links (dcat:downloadURL) and can have any license.
Citation
If you use the DBpedia Databus Client in your research, please cite the following paper:
Last updated