Content Negotiation with Linked Data

You would think that content negotiation on the Linked Data web is a well-understood and well-supported mechanism. Not so.

What is my problem?

Many organisations publish (airquote) Linked Data that is actually not Linked Data.

A typical response I get when pointing out their mistake is: "But look, I can type the URI into the browser, and lo and behold, here's a HTML table with the triples. What are you complaining about?".

Well, the goal is not to gaze at triples in a HTML table, but to process the triples in software.

Where does the problem occur?

Let's say you have a URI that identifies a thing.

The country of Germany:

The painter John Singer Sargent:

All these URIs are supposed to be Linked Data URIs. That is, the URIs identify things, not documents. Once a user agent dereferences such a thing URI, the server sends back RDF and the returned RDF can be further processed (in software).

Close, but no cigar for some of those URIs. Figuring out which ones are proper Linked Data URIs is left as exercise.

What can you do?

To check whether one can process the data behind Linked Data URIs in software, do the following.

Log on to a machine that has rapper and roqet installed. On Debian you can install the programs with:

# apt-get install raptor2-utils rasqal-utils

Now, if you can run rapper:

$ rapper http://dbpedia.org/resource/Germany

and the triples flow across the screen, Matrix-style, congratulations, your data seems to be ok.

Also try to run a query with roqet:

$ roqet -e "SELECT * FROM <http://dbpedia.org/resource/Germany> WHERE { ?s ?p ?o . }"

If both work, congratulations, the chances are good that you have actually published your data as proper Linked Data, including content negotiation that works. Thanks!

But the server supports content negotiation!?

There are different degrees of "working", depending on how you parse the Accept header of requests coming in at the server side.

The following lists the Accept header values of user agents in their requests:

Mozilla Firefox: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
rapper: application/rdf+xml, text/rdf;q=0.6, */*;q=0.1
roqet: application/rdf+xml, text/rdf;q=0.6, application/n-triples, text/plain;q=0.1, text/turtle, application/x-turtle, application/turtle, text/n3;q=0.3, text/rdf+n3;q=0.3, application/rdf+n3;q=0.3, application/x-trig, application/rss;q=0.8, application/rss+xml;q=0.8, text/rss;q=0.8, application/xml;q=0.3, text/xml;q=0.3, application/atom+xml;q=0.3, text/html;q=0.2, application/xhtml+xml;q=0.4, text/html;q=0.6, application/xhtml+xml;q=0.8, application/json;q=0.1, text/json;q=0.1, text/x-nquads, */*;q=0.1
ldfu: application/n-triples, text/turtle, application/rdf+xml, */*;q=0.1

On the server side, assume you do a simple string equality check for a particular content type. That is, you send RDF/XML if the Accept header value equals to application/rdf+xml. Otherwise, you send HTML.

Now, you probably do not serve the ideal content type for Linked Data user agents. In fact, you will send HTML to rapper, roqet, ldfu and other Linked Data user agents, where you should send some form of RDF.

The most comprehensive solution would be to use a library to parse the Accept header (e.g., see Which Java libraries do HTTP Accept Header Parsing on StackOverflow).

A simpler solution would be to do a cascade of contains checks on the accept header, starting with RDF content types. That solution might be not fully correct, but is preferable over doing equals checks. Content type matching with contains checks ensures that Linked Data user agents get some RDF, albeit not necessarily in their preferred formats.

Who can help?

A while ago, together with fellow Linked Data consumers I started the Pedantic Web Group. People on the mailing list are happy to help with questions regarding web server configuration.

Andreas Harth, Nov 2016 (grumpy), Feb 2017