Caching and Metadata Enrichment Interfaces (Part 1 – Alpha)

You are likely here after seeing Eben and I’s presentation on RDF and are interested in some of the more advanced coding topics. Sadly, I’m still scrambling to get things up, so this is going to be done in parts over the next few days here at code4lib. In addition, there is no denying that this will be a Rails / Hydra ecosystem viewpoint… but the design being outlined may be duplicated in other systems. This first part is going to go over the following two items:

  1. Some initial Rails Linked Data Fragments setup with caching.
  2. A live demo site of one of the Metadata Enrichment Interfaces to play around with.

Why run a Rails Linked Data Fragments Instance?

What advantage does this have over just directly accessing those uri’s? Or why run this application? The following are the “wins” for this approach:

  1. You can pre-cache a resource such as in the Blazegraph steps above. If a resource doesn’t exist when requested (such as if you resolved “Berlin” from dbpedia that wasn’t cached in Blazegraph), then it will fetch it live and cache that triple automatically. This all equates to faster performance and more reliability.
  2. While one could use the Blazegraph, Marmotta, etc APIs directly, it becomes a nightmare to then share code that interfaces with Linked Data. I may be using Blazegraph, another Marmotta, someone else Apache Stanbol, to a bunch more caching solutions. Requiring other gems to account for all those implementations would be a bit of a nightmare. This centralizes that negotiation layer into a single place so neat stuff like Metadata Enrichment Interfaces or Sidecar Indexers (to keep your labels from URIs up to date) can just worry about understanding one API.
  3. Not all Linked Data sources are easy to parse. This centralizes any special negotiation cases rather than such exceptions having to be spread out elsewhere.

Setting up a Rails Linked Data Fragments instance

Of the options for a caching backend, I personally recommend Blazegraph. Performance and ease of use seem to be much better than Marmotta.

git clone https://github.com/ActiveTriples/linked-data-fragments.git
git fetch # Temporary step
git checkout repository_from_rdf_rb # Temporary step
bundle install

# To install a dev blazegraph / marmotta
rake ldfjetty:install
rake ldfjetty:start

# It should now be available at: http://localhost:8988. To stop, do:
rake ldfjetty:stop

# It is recommended to prepopulate your data with LCSH. To do this:
1. Download the latest subjects vocab from: http://id.loc.gov/download/ (the nt version of “LC Subject Headings (SKOS/RDF only)”)

2. Extract the above download into a directory.

3. Run the following command from within that directory: 

  curl -H 'Content-Type: text/turtle' --upload-file subjects-skos-20140306.nt -X POST "http://localhost:8988/blazegraph/sparql?context-uri=http://id.loc.gov/static/data/authoritiessubjects.nt.skos.zip"

If you used the ldfjetty scrips, you should be able to access both blazegraph and marmotta at http://localhost:8988. The next step is to configure which caching backend your linked data fragments server will use (among other settings). There are sample configs in your config directory… pick the desired one and copy it as “ldf.yml”. Make sure the settings are satisfactory (the defaults shouldn’t need adjustment for testing). Once done, you can run the normal command to start a test server (albeit likely on a custom port like below):

rails c -p 3005

Congrats! You have a running Linked Data Fragments server. Want to try some commands? The following are some examples with default settings:

Other Caching Options

The Linked Data Fragments interface is technically optional if you are only concerned with caching and not with sharing code using a standard API. In addition, there are caching layers we have yet to implement. For a lively discussion on these and some notes, see the following two sources:

Metadata Enrichment Interface Live Demo

While a better test bed interface needs to be provided, I’ve updated an old test server for people to test the actual form itself. It is located via the following procedure:

While you won’t be able to submit anything, you can try out the Metadata Enrichment Interface at that url. The alternative interface form Villanova University has yet to be implemented as an option. Speeds are likely to be magnitudes slower as this server space lacks the resources to host a caching layer and I am using one located elsewhere on the web.

Next Time

I’m hoping to wrap this up as I bring code to a more stable place. So a part two should follow by Friday. Hope this was interesting or helpful!