Structured Data in Commons and wikibase software tools

Published: Apr 13, 2023 by Steve Baskauf

In February, I gave a presentation at the Wikibase Working Hour about how my VanderBot tool for uploading data to Wikidata could be used more broadly with any kind of wikibase. One example I gave was using it to upload Structured Data on Commons (SDoC) statements that describe what is depicted in a media file, since SDoC is just another wikibase instance.

I recently created a blog post that briefly describes the issues related to using VanderBot in this way and the steps necessary to do an upload to SDoC. SDoC plays a crucial role in making media items in Wikimedia Commons more discoverable since it helps potential users search based on what the media item depicts. SDoC “depicts” statements can be made manually, but that’s very labor intensive. So there is a lot of interest in developing tools that could make the process faster and easier.

The Wikimedia Commons Query Service (WCQS, an analog of the popular Wikidata Query Service) leverages the fact that wikibase content can be queried using the SPARQL query language. However, the WCQS is more difficult to use programmatically because it requires authentication to access it via HTTP. The blog post also describes some code that I wrote to make it easy to download SDoC data via the WCQS using Python. This is a key piece for projects to add “depicts” statements since it allows one to determine what depicts statements have already been made about a media file, and avoid creating duplicate statements.

Share

Latest Posts

Enabling ecological survey data integration
Enabling ecological survey data integration

Our paper describing the Humboldt Extension to Darwin Core has been published in Ecography. The Humboldt Extension adds 55 terms that enrich the Darwin Core, providing the terms needed to capture and share multiple types of biodiversity survey data. The paper illustrates the benefits of implementing the Humboldt Extension with three case studies and demonstrates how richer data can be used in research, modelling, and to inform decision-making.

Biological survey and monitoring data publishing guide
Biological survey and monitoring data publishing guide

My coauthors and I have published a guide to help people understand how to use the new Humboldt Extension for Biological Inventories of the Darwin Core standard. The guide includes diagrams and detailed information about how to structure the data to capture the hierarchical structure typically found in monitoring projects.

Open Science recipes published
Open Science recipes published

My colleage from the Vanderbilt Libraries’ Digital Lab, Shenmeng Xu, an I have published two chapters in the ACRL’s 2025 Open Science Cookbook. The Cookbook is a lighthearted take on technical topics where instructions are given in “recipe” format to teach beginners new tech skills.