Structured Data in Commons and wikibase software tools

Published: Apr 13, 2023 by Steve Baskauf

In February, I gave a presentation at the Wikibase Working Hour about how my VanderBot tool for uploading data to Wikidata could be used more broadly with any kind of wikibase. One example I gave was using it to upload Structured Data on Commons (SDoC) statements that describe what is depicted in a media file, since SDoC is just another wikibase instance.

I recently created a blog post that briefly describes the issues related to using VanderBot in this way and the steps necessary to do an upload to SDoC. SDoC plays a crucial role in making media items in Wikimedia Commons more discoverable since it helps potential users search based on what the media item depicts. SDoC “depicts” statements can be made manually, but that’s very labor intensive. So there is a lot of interest in developing tools that could make the process faster and easier.

The Wikimedia Commons Query Service (WCQS, an analog of the popular Wikidata Query Service) leverages the fact that wikibase content can be queried using the SPARQL query language. However, the WCQS is more difficult to use programmatically because it requires authentication to access it via HTTP. The blog post also describes some code that I wrote to make it easy to download SDoC data via the WCQS using Python. This is a key piece for projects to add “depicts” statements since it allows one to determine what depicts statements have already been made about a media file, and avoid creating duplicate statements.

Share

Latest Posts

Humboldt Extension for Ecological Inventories Published
Humboldt Extension for Ecological Inventories Published

The Humboldt Extension for Ecological Inventories is a new metadata vocabulary that extends the Darwin Core Standard to make it possible to describe the inventories and sampling events that are used to collect organism occurrence data. This is the largest extension to Darwin Core since the original vocabulary was ratified in 2009 and it represents over three years of work by the Humboldt Extension Task Group. This group of international experts met weekly over that time period to develop the vocabulary, carry out implementation testing, and publish the vocabulary and associated documentation.

Camtrap DP paper published
Camtrap DP paper published

Camera trapping is an increasingly important method used by ecologists for monitoring animals in the wild. Camera trap data has previously been difficult to publish by conventional means, since the data includes many related images or videos that must be associated with the occurrence data. The new Camtrap DP standard provides a way to package camera trap data based on the open Frictionless Data Package specification. Camtrap DP datasets can be easily exchanged or published to the Global Biodiversity Information Facility (GBIF) where the included occurrence data will be integrated with biodiversity data collected by other means.

Nine hundred images added to Wikimedia Commons from ACT
Nine hundred images added to Wikimedia Commons from ACT

Charlotte Lew and I have been working for some time to improve access to images in the Art in the Christian Tradition database by linking descriptive metadata in Wikidata to the corresponding artwork images in Wikimedia Commons. In the first part of the project, we were primarily cleaning up and linking Wikidata metadata to images that were already in Commons.