Archive for the 'foaf' Category

Ego Linking Part I

Monday, November 17th, 2008

Now’s the time for a major update of your FOAF file! (If you don’t have one already use FOAF-a-matic to create an initial version to start with)

URIs for things are much more common now than they were in the pre-http-range-14 days, when blank nodes were en vogue. So, you now may find the following URIs denoting people, or containing information about them:

  • http://dblp.l3s.de/d2r/resource/authors/Firstname_Lastname
  • http://semanticweb.org/id/Firstname_Lastname
  • http://data.semanticweb.org/person/firstname-lastname
  • http://tools.opiumfield.com/twitter/nick/rdf exports data about http://twitter.com/nick (but confuses the Person with the Document and following a person with knowing one)
  • http://friendfeed.com/nick has direct FOAF export (but uses bnodes for people)
  • http://dbpedia.org/resource/Firstname_Lastname (if you are really really famous)

The problem: each URI is separate, and information about the same real-world entity may be connected to multiple identifiers.
OWL provides a number of mechanisms for inferring equality: inverse functional properties (to establish equality on the same values for properties, e.g. SSN, passport number), owl:sameAs (direct equality), and a few more (functional properties and cardinality constraints for example, but that’s a story for another day).

Inverse functional property reasoning doesn’t work too well currently since the data is too nosiy (a lot of “unique” property values are “n/a”, “”, “yes”, “mbox:”, and so on, which are not unique at all), which leads to many bogus inferences.

So for now, I suggest to add the respective person URI via owl:sameAs predicates to your FOAF URI, which enables data aggregators to fuse all information about a person into a single view.

Even before you publish data about something, it might be a good idea to check if there’s already a URI for that thing. A quick search on SWSE can help.

OpenID’s teething problems

Wednesday, September 10th, 2008

The idea behind OpenID sounds great. Create one account and re-use that account wherever you need to log in on the Web. Excellent idea, and very enticing because the system is completely decentralised and relies on basic Web technologies. Using Web tech is a great plus when you think about how tedious it can be inside an organisation to get access to an LDAP server because of firewall and external user policies and such.

Combine OpenID with FOAF and you get a completely decentralised social networking platform. Good stuff.

While I like the idea, the implementation side is still sketchy. There are two Java implementations: openid4java which requires you to include more than a dozen jars to be able to simply provide a login; the source archive has a whopping 74M, so I didn’t touch it. Luckily there’s joid, which is much smaller.

So joid is the jar of choice. Joid works fine with myopenid.com and verisign ids (it’s from the Verisign guys after all), but fails on livejournal ids. And, more annoyingly, the library doesn’t seem to support delegated ids. As the mailing list moderator is apparently dead, it means to wait another year or two until the kinks are ironed out.

I like being an early adopter. Really.

The timbl number

Thursday, September 4th, 2008

Mathematicians, boring as they are, have the cool Erdős number which measures how far away they are in the co-author graph from Paul Erdős, famous hobo mathematician. Actors have the Kevin Bacon number, which tells them how many steps they are away in the co-actor graph to Kevin Bacon, mediocre but apparently work-aholic actor.

In contrast, Web Science researchers have nothing more than the dubious honour of working in a field which needs to include “science” in its name, and on top of that have to struggle with the scruffy, chaotic, erroneous Web. Nothing too exciting here.

To make our dull work slightly more glamorous, I propose to introduce the “timbl number”, which tells people how many hops they are away in the foaf:knows graph from Web inventor and Semantic Web evangelist Tim Berners-Lee.

My timbl number recently dropped from three (via Richard Cyganiak) to two (via Christoph Bussler); I might be able to get another two-hop connection soon. My goal is to get a timbl number of one someday, i.e. Tim would state that he knows me in his FOAF file. Learn about the progress exclusively here on this blog!

State of the FOAF-sphere

Tuesday, September 2nd, 2008

The data quality on the Semantic Web improves. I’ve been crawling FOAF and RDF for a few years now, and the data available today is better, by leaps and bounds, than what it used to be. However, if the improvement continues at the current pace, it’ll be years before we get to something useful.

Building nice application on top of real-world data requires more or less connected data, i.e. shared use of URIs. Whilst schema-level URIs (in vocabularies such as FOAF and SIOC) are being used across many sources, instance-level agreement on URIs has still to happen.

While I prefer when different sources reuse common URIs to denote the same instance (a person, say), we’re smushing things based on OWL’s inverse functional properties. However, currently a lot of sources even don’t provide property values to smush on (e.g. friendfeed doesn’t provide homepage or email/email hashsum), which renders the current Semantic Web pretty much useless for real-world applications. Loads of islands and duplication of data, a grand mess.

Let’s hope over time sources provide keys that allow to fuse instance data from multiple sources, and people converge in their use of URIs. My URI, btw, is http://harth.org/andreas/foaf#ah if you want to add me to your FOAF file.