Friday, 24 September 2010

Linked Data: the future of the Web?

I attended a one day conference, organised by the UK arm of the International Society for Knowledge Organization, earlier this month. The subject was Linked Data. This being a subject I knew very little about, but which I suspected might be important, not least because it's related to Sir Tim Berners-Lee's Semantic Web concept, I decided to pay the fee and see what I could find out.

There was a very international group of speakers, including two from France, one from the Netherlands, one from Germany and one from Austria. The degree to which they engaged me varied, but most had something interesting to say. Presentations that I particularly liked were Nigel Shadbolt on what the government is up to with its data, and John Goodwin on Ordnance Survey.

The basic idea behind linked data is to extend the hyperlink concept of the Web from the current norm of linking web pages and other (largely) unstructured media like documents and video clips, to the linking of datasets. Provided this is marked up in a standard way, using RDF (resource description framework) it can be made sense of by machines and re-used in, for example, mashups.

There's a lot of quite complex stuff related to ontologies, with a nice new collection of acronyms to learn, such as OWL and SPARQL. There was also a fair bit of quasi-philosophical talk about the difference between the name of a thing and the thing itself. One of the problems with this conference was that it was billed as being for beginners (albeit KM-savvy beginners), yet it assumed an understanding of some of these concepts. I don't think anyone explained what a 'triple' was, for example.

The process of putting linked data out there on the Web, and using other people's linked data, is not one I can see non-technical people getting to grips with, at least in the short term. Maybe the process will become easier just as putting up a web page has. Remember when that required knowledge of HTML? Now it's all wysiwyg. I can, however, see how linked data could work nicely where a data provider (such as government - www.data.gov.uk) wants to see its data used and presented, but doesn't want to build the UI for that. Given that there are loads of developers out there willing to do it for them, why should they? Equally, if you build aps you'd be grateful for lots of free linked data to power them.

There are of course risks. I do wonder whether the government has really thought through the implications of licensing its data for any purpose, allowing even that it be modified. When this thought was put to Professor Shadbolt, he rightly pointed out that the media routinely (mis?)present data the way they need to for the 'angle' they are taking, so this is no worse. Good point, but I think the jury's out. It feels like a welcome move towards transparency, though, and could save the taxpayer money.

Where's this all going? I have two opposing thoughts on this, both related to the structured v unstructured information debate. The first is that one of the primary reasons for the success of Tim Berners-Lee's brilliant invention, the World-wide Web, was its focus on unstructured information (web pages with text and some images, mainly), because that's what non-technical people relate to most naturally. A move away from this would therefore be retrograde, it could be argued. The counter argument is that the Web only really got going properly when databases started being used behind the scenes to drive websites. Search engines and ecommerce sites both depend on them, and there are countless other examples. So linking datasets across the web is just an extension of this and could bring even greater benefits, perhaps.

I think the second argument's possibly true, but only if the structure of the data is kept well hidden from the layman, and he doesn't have to learn a second meaning for that nocturnal bird of prey with the round face.

What do you think?

No comments: