Libraries, Media and the Semantic Web Event at the BBC

The concept of a semantic web, or at least of a semantically structured web of knowledge, has been around since the late 60s. It was brought to the forefront of the layman’s consciousness by Tim Berners Lee, inventor of the world wide web, speech at TED 2009 in which he got the audience to chant the rallying cry of the Linked Data evangelist “Raw Data Now”.

It is a reflection of the renewed interest in this technology that the BBC was able to host a sell-out event on the Libraries, Media and the Semantic Web with less than two weeks notice. Last night, journalists, archivists and software developers crammed into a room at the BBC Academy in West London to hear what some of the major players in this field had to say about the emerging “Next Web”.

After a short introduction, Jon Voss, a key figure behind the LODLAM meet-ups, took to the stage to introduce the HistoryPin project. History Pin is one of the most exciting projects to leverage Linked Data technologies to create apps that help people interact with historical artifacts in new and interesting ways.

Jon likened the role of the Linked Data consumer to that of a DJ – mixing up and mashing a whole variety of content sources to create new and interesting perspectives on the underlying data. Linked Data makes it possible in HistoryPin, for instance, to lay historical images over present day Google Street Map views creating an augmented reality in which you can dig into features of your historical environment by brining up information about your surroundings such as notable residents that lived in the area you’re in or works of art created in a nearby apartment.

Following on from Jon Voss was Adrian Stevenson, from Mimas, University of Manchester. We were given a whistle stop tour of the Linked Data projects in the UK cultural heritage sector which included the Linking Lives project which hooks up with DBPedia and Bricolage which uses the vast Penguin Books Archive.

Some of the major obstacles to the development to a web of Linked Data were highlighted in Adrian’s talk, preventing the session from becoming the mutual group hug Linked Data events can often turn out to be. The “tricky stuff” highlighted included the difficulties around URI persistence, dirty data and, of course, licensing issues.

Evan Sandhaus from The New York Times Research and Development Labs was the next person to take up the microphone. We were introduced to the rNews standard for semantically annotating news articles on the web. While that might sound like a mouth full for the uninitiated it is basically the standard that enables us to get the useful division of data  we find in Google’s Rich Snippets view that many of you will be familiar with (see example below).

Adopting standards for semantically annotating your content is what allows our machines to distinguish between different kinds of content on the web for instance between what is an an article title and what is a byline and what denotes an author. It was, as Evan explained, an attempt to give computers an opportunity to catch up with the billions of years of evolution that have enabled us to easily recognise the different kind of content displayed in a document.

But the man who stole the show on the night was Dan Brickley from Schema.org. The folks at Schema.org – a collaboration between the major search engines Google, Yahoo, Yandex and Bing – are attempting to develop the general set of standards for the semantic web of which rNews is just one specific group. Dan explained that the driving force behind Schema.org was to improve the quality of search engine results.  Better practices with respect to labeling the data you put on the web, leads to much better search results – something that everyone can understand the value of.

Dan took the audience back 100 years, the year of Alan Turing’s birth, and long before the internet was invented. He pointed to the work of Paul Otlet and his development of a compositional semantics for organising knowledge, albeit with cards and some very skillful librarians rather than computer hardware. There was a lot, he argued, that semantic web standards organisations, could learn from early pioneers such as Otlet.

The job of a semantic web standards organisation is of course not easy. By their very nature standards bodies are centralised, but they want to retain and protect the decentralised nature of the web, without in Dan’s words “creating chaos”. It was highlighted repeatedly in both Dan’s talk and in the Q&A that the job of schema.org was not to set the vocabularies for specific areas but rather to plug-in the vocabularies from places like DBPedia or domain specific sites like Geonames.

The evening was appropriately wrapped up by the members of the technical team at the BBC who are now using Linked Data widely for archiving vast amounts of content and for displaying data on the BBC website from diverse sources. Anyone who keeps-up-to date this year’s Olympics Games via the BBC website will be, whether they know it or not, a consumer of Linked Data.