Building a micro-service for custom queries to wikidata

This vital left-over has actually become a three course menu now for which developers and users alike are asked to take a bit of time and hopefully enjoy this info. From my #wikidata engagement a “data transformer” has emerged allowing me to think about guiding a data expedition for people interesting in learning more about and with wikidata. This post will act as an intro to the application.

The elementary work on the dm4-wikidata-toolkit module now done allows for easy adaptation and variation. Variation and revision are actually two of many good reasons for posting about this early and releasing stuff. This gives me the time to hold back for a moment, re-focus and reflect on what has been done and learnt. After this infokitchen-dinner i would to think and talk with you about: What if many different parties would do expose varying aspects of the full and public wikidata as a service (API)? What data covering which aspects of the world is in wikidata and which do we want to cultivate there? And ultimately, does this look like a good approach to be investigated any deeper?

Image of fresh and homemade pea soup with mint-citrus pesto.
Fresh and homemade pea soup with mint-citrus pesto.

Technical background

The dm4-wikidata-toolkit module got started as a branch of the wikidata search mode for deepamehta4 when i started to work with the so called “wikidata dumps” (=one-point-in-time, complete database exports) instead of performing GET requests against the HTTP API endpoint of wikidata.org. Now after that switch of methods in last December i decided to spin that branch off into a dedicated plugin which solely builds on top of the work done by Markus Kroetzsch (and others) who developed the WikidataToolkit. So this plugin integrates the WDTK into the OSGi environment of Deepamehta 4. For developers it maybe noteworthy that DeepaMehta 4 uses Neo4J as its current storage layer and it allows for integration of other Neo4J modules. Thus our storage layer has already built-in basic support for spatial and time ranges queries over the transformed data. Once the data is within deepamehta4, we can very easily write custom queries (through JAX-RS) and expose a REST API query endpoint, if we know a bit of Java and know how to traverse a graph.

In the example live here i focused on analyzing and exposing certain aspects of wikidata to answer the question (among others): Who was employed by the BBC while being a citizen of Germany? The custom endpoint at http://wikidata-topics.beta.wmflabs.org/wdtk/list/ is able to serve JSON topics and inform your application about Cities, Countries, Persons and Institutions stored, described and related to each other in wikidata. Adapting this and writing another custom endpoint is, with dm4-wikidata-toolkit, a pretty straight forward job. The complete transformation (analyze, import and mapping) process is outlined at the end of this post. The implementation and documentation for the four, currently supported custom queries can be found here from line 160 onwards.

Usage scenarios and research on use-cases

When surfing around the topic of “wikidata query service” i found a few pages proposing “user stories” and these examples (from my experience) prove always very insightful for developers trying to meet some needs. As described by Phase 3 of the wikidata project, which is the current stage, the latest and widest goal is the deployment and roll-out of listings on wikipedia pages . The advantage of this for Wikipedians seems to be that these future list are not going to be manually edited and maintained in n-hundred versions anymore but just at one place which then gets integrated into all (language specific) Wikipedia pages. To further inform yourself about some questions which a wikidata query endpoint should be able to answer you might want to jump on reading this page.

In plain terms, for the respectively transformed wikidata, our endpoint can currently answer simple queries (all items using a certain property, all items using a certain property involving a specific item) and a query “walking along” two properties (employee of, citizen of) involving two specific items (BBC, Germany) – called a traversing with? query. A special feature of our approach is that we can also respond with a list of all claims for a given wikidata propertyId while naming both involved players (items) for the respective claim. One disadvantage for the latter (“claim listing query”) is that the result set will definitely be very large and that we (currently) can not skim or page through it.

After searching a bit more i came across the following page created as a research documentation by the developers (Wikibase Indexing) and was a bit struck by how much of its content i could relate to. It was that page where i heard about WikiGrok for the first time, a few weeks ago. There it is written that a query-service could be valuable in detecting and thus filling up niche-areas in wikidata through engaging mobile users with a mobile frontend/web-app called WikiGrok.

After having connected the dots from the contents of these wiki pages, with the super-next release of this plugin i might very well aim at the goals set by this page. Read on to get to know more about what this effort covers.

(more…)

Search and visualize wikidata with the Wikidata Topicmaps UI (“WTUI”)

For this three course menu i took notes on how to search and visualize current items in #wikidata through the Wikidata Topicmaps UI. You’ll find out how to compose personal views after querying this open data-set in your language of choice and store these views as interactive presentations you can pull of, for example, in an intermediary class on politics.

Over the last twelve months i invested about two full and good weeks of time to get to know wikidata and bring the Wikidata Topicmaps UI (WTUI) live. The last development days were mostly spent on getting images and graphics related to wikidata items directly displayed the Webclient. I could do this through additionally connecting the Commons Media API by Magnus Manske. Today was the day to upgrade the service about the developments of the second half of 2014 and now there is this ready-to use web-app i wanted to tell you about. It is hosted at Wikimedia Labs, provided under the Terms of Service of the Wikimedia Labs Projects and currently maintained by (a researching/volunteering) me. If you don’t want to rely on me you can simply get your personal installation running after downloading the DeepaMehta 4.4 Standard Distribution and the dm44-wikidata search plugin, both are to find at http://downloads.deepamehta.de. The application works well in most browser though there is currently no support for Internet Explorer users. Presenting your Topicmaps to an audience requires either an internet connection or a personal installation on your PC, but “readers” of your maps do not need an account.

The look & feel of the WTUI is still a bit rugged at the moment but as i wrote in the post on “situational apps”, its a good and simple enough to be used and to be developed further by or with others. If we find an interested user base, why not get together and bring this a few steps further?

As a kind of foreword i want to link here to a post in which Javiera Atenas, Leo Havemann and Ernesto Priego wrote about make use open data sets (like wikidata) in educational contexts. As they note, from their perspective “research is currently lacking on open data as educational resources, and about how students can develop critical skills by using open datasets.“. To support and connect researchers and users of both communities, deepamehta4 and wikidata was the main motivation for this project to start. Now using the Wikidata Topicmaps UI (built into deepamehta4) will at some point very well turn everyone of us (deeamehta4) into a critical and contributing user of wikidata.org.

The stage is set and we will see who steps up. In the future, building on the results presented here i would like to improve on the current relationship between being an editor of wikidata and being a user of deepamehta4. At best these two “roles” will become one.

The data at hand here reaches over languages, linguistics, culture, politics to openness and with this web-app we can visualize items from various domains in their complexity. Furthermore we can also take a deep look under the hood and see how we (as the wikidata community) structure our most common information. Intriguing thoughts, no? At the end of this post you’ll find my first screen-recording/video in years, but i have to disappoint you, it comes without sound.

(more…)

DeepaMehta 4 Plugin Development: Apps in a network

A vital left-over: I don’t see that many reasons why one who can understand basic english, has access to a PC, has access to the web and time for engaging in some trial & error could not design, install and ship a situational app right now. Here are four thoughts for conceiving new networked apps.

Four thoughts on building networked, situational apps based on the free software platform DeepaMehta 4. These are based on some hints i sent out to the developers mailing list since i had the opportunity to do a lot of plugin development with deepamehta4 over the last two years:

1. Think of your application and its functionality as lots of small services integrated with many others in a network. Integrating other plugins services is as easy as injecting them into your Plugin implementation. Thus, try to build upon (and, if necessary, challenge and contribute to) existing semantics within the network of plugins wherever possible.

2. Likewise so, split up your application in more than one plugin, divide them by feature set or architecture level (client, server). Distinct plugins here mean variability and thus we actively increase the possibilities for everyone in the community.

3. Start think agile when writing new applications with type definitions. It’s a “data tree” ready to grow and ready to change in the future but now to plant. Structures and data will most probably evolve and change, may it be through the users themselves or through your new conception of the application. The migration machinery of deepamehta4 supports this at any time, don’t worry to much in advance.

4. The standard dm4-webclient allows you and your users to interactively and draft such “data trees”. You can use the dm4-webclient to share, edit and compose the terms your application is made. Make sure your users have heard of this type-building language as it is the same for plugin developers and users and may facilitate communications in your project.

It is fair to say that the type building vocabulary in deepamehta4, along with its basic terms, graphics and UI, were conceived to bridge and challenge the gap between software developers (as designers) and users (as mainly design addressees). I know that many have tried this before but that doesn’t mean it’s not worth trying to go further here in 2015. Lots of people work with the web and may have already built an app. So, who knows if with deepamehta4 things and ideas get together more easily. By today, except maybe for software security, i don’t see many reasons left why users could not start trying to design, install and ship their own situational apps based on DeepaMehta 4 or dmx. Read on to learn how to make use of existing plugins and type definitions in your software application.

(more…)