Databus Gitbook
Databus
Databus
  • Overview
  • Guides
    • Data Publishing
    • Data Download
  • Use Cases
    • Data Version Control
    • Populating Database with Data
    • Data Quality Control
    • Data Crawling
    • Automated Deployment
    • Building Data Repositories
  • Organising Your Data (Model)
    • How to Organise Your Data
    • URI Design
    • Versioning
    • Metadata
      • Group
      • Artifact
      • Version
      • Distribution
      • Collection
    • Content Variants
    • Persistence (HowTo)
  • Usage
    • Quickstart Examples (Publish, Download)
    • Web Interface
      • Publish
      • Collections
      • Auto-Completion
    • API
    • Databus Mods
    • Databus Client
    • Integration with CI (Jenkins)
  • Running Your Own Databus Server
    • Run with Docker
    • Configuration
    • HTTPS & Proxy Setup
  • Development Environment
Powered by GitBook
On this page
  1. Use Cases

Data Crawling

PreviousData Quality ControlNextAutomated Deployment

Last updated 1 year ago

Crawling RDF data from various sources is a crucial task for organizations seeking to leverage linked data. In this article, we explore how DBpedia Databus can be utilized for RDF data crawling.

Using DBpedia Databus for RDF data crawling may employ the following procedures:

  1. Publishing seed data for crawling in Databus.

  2. Crawling the links from RDF data from Databus.

  3. Using the Databus for saving new data you found during crawling.

  4. Repeat steps 2 and 3 in a loop.

is an example of a service using Databus for data harvesting. It is an online ontology interface and augmented archive, that discovers, crawls, versions and archives ontologies on the DBpedia Databus. Each Databus Artifact represents one certain ontology and each version represents a new version of the ontology. Archivo also performs SPARQL requests to Databus for obtaining links for crawling.

Pros of Using DBpedia Databus for RDF Data Crawling:

  1. Automation and Efficiency: DBpedia Databus allows for continuous data discovery and integration, ensuring up-to-date and comprehensive RDF data.

  2. Data Quality Control: DBpedia Databus supports data quality control mechanisms, allowing you to validate and enhance the crawled RDF data before integration. This ensures the integrity and accuracy of the integrated data.

By adopting DBpedia Databus, organizations can streamline RDF data crawling processes, enhance data discovery, and integrate comprehensive linked data into their knowledge graphs or linked data repositories. Leveraging DBpedia Databus for RDF data crawling offers significant advantages, including automation, dataset management, and data quality control.

DBpedia Archivo