Speakerthon: Sharing Voice Samples

It was quite a strange sight. Over 30 people sat in silence, all with headphones on and all with big smiles on their faces! Welcome to Speakerthon!

BBC Voice Project

On Saturday 18th January 2014 I was lucky enough to represent the Open Education Working Group at the first BBC Speakerthon held in the Media Cafe, New Broadcasting House, London. The crux of the event, co-organised by the BBC, the Open Knowledge Foundation’s OpenGLAM initiative, Creative Commons UK and the Wikimedia Community, was that a group of people would create voice samples from the Radio 4 archive and upload them to Wikipedia.

speaker (1)

To explain… Wikipedia is a fantastic resource and a great way for finding out about famous individuals. While we can read about people and usually see an image of them we can rarely hear what they sound like. For most the audio dimension is missing. Back in 2012 Andy Mabbett (technology enthusiast and Wikipedian) pointed out this unfortunate emission in his call for Open-licensed, open-format recordings of the voices of Wikipedia subjects for Wikimedia Commons. The idea was picked up by the BBC, who have a fantastic sound archive, and the end result was the BBC Voice Project initiative and this Speakerthon.

Participants hard at work, photo courtesy of Andy Mabbett, Wikipedia Commons

Participants hard at work, photo courtesy of Andy Mabbett, Wikipedia Commons


During the day we were given editing access to the Radio 4 permanent audio archive (from 2007 – present, a set of programmes that includes Bookclub, Saturday Review and Woman’s hour). We were also supplied with Snippets, an open source tool which would let us take samples of voice recordings.

After an introduction to the tools we would be using from Rob Cooper and Zillah Watson from BBC R&D team we physically clustered around 6 topic areas: arts and culture, BBC presenters, politics, sport, science and other. For each topic there was a list of suggested programmes and a Google spreadsheet to help us record our activity. I chose to concentrate on arts and culture looking primarily at interviewees on Desert island Discs.

We were instructed to look for 20-40 second clips on Wikipedia subjects that met the following criteria:

  • Clips should be clean, i.e. have no music in the background or people interrupting and only contain the voice of the individual chosen
  • Clips should ideally cover the field of endeavour the person was/is best known for – so a writer talking about their books or a scientist talking about research
  • The clip should be representative of the Wikipedia subject, so here we were avoiding contentious statements

Finding appropriate clips from the sound archive wasn’t always easy. Radio 4 presenters interrupt continuously (or so it seems!) and some individuals were unnecessarily brief in their responses. Getting 20+ seconds cough and interruption free was a definite challenge! After some experimentation I settled on a technique that worked. I opted for well-known arts and cultural people and started off by listening to 20 minutes in to their interview on Desert Island Discs. This seemed to be the optimum time for individuals to talk about their work, and when the longest answers occurred. During the day I managed to capture 15 snippets from a variety of people including artist Damien Hirst, actor Dustin Hoffman and singer Debbie Harry.

After each snippet we added metadata to a Google form (Name, Wikimedia Page, Programme Title, BBC Programme Page, Broadcast Date, Gender and Snip URL. The snippets were then edited by the BBC team using Audacity, open source software for recording and editing sounds, ensuring a clean start and finish. The FLAC file was then uploaded to Wikimedia Commons under a Creative Commons license. Once the FLAC files were ready we could add them to Wikipedia ourselves. Here is my finished embedded voice file on the Wikimedia page for American author and activist Alice Walker (see the right hand information box) and the file on Wikimedia Commons.


Open content

Creating the voice samples was great fun and I think everyone there felt a very real sense of doing something useful and constructive.

For me the day ticked so many different boxes. Firstly we were helping to enhance what is effectively the worlds biggest Open Educational Resource (OER). The adding of sound files for Wikimedia subjects has great implications for teaching and learning, especially when you consider that those files can be reused in any way people chose. The Voice Project will be looking at uploading files from the BBC Archive for 1000 Wikimedia subjects, fingers crossed the project carries on beyond this initial sample. Secondly the day suggested a real turning of the tide for open content. It is the first time the BBC has allowed its archive to be used in this way and hopefully the start of more activities in this space, for example there was already discussion about adding animal sound files from the British Library. In November last year the BBC signed memorandums of understanding with the Open Knowledge Foundation, the Europeana Foundation, the Open Data Institute and the Mozilla Foundation. The MOUs mark a new commitment on the part of the BBC to embrace open data and open standards. And thirdly the day was a positive one for collaboration and crowdsourcing. As Sam Leon puts it in this post from the OpenGLAM blog, Speakerthon allows the BBC to develop “new and innovative ways to harness the power of their audiences to improve their own digital assets.“.

I also hope that the day was a step in the right direction for BBC R&D team who are looking to investigate whether voice samples on Wikimedia/pedia could be used to generate a voice box “fingerprint” which could then be used to identify speakers across a large archive. The process will enable them to create linked open data from open content, plans explained better in their original blog post.

Note that alongside the BBC Voice Project initiative Andy Mabbett is also running the Voice Intro Project which makes and solicits audio recordings in which Wikipedia subjects speak their name and introduce themselves.


So a fantastic day was had by all. Over 300 audio clips were prepared for upload to Wikipedia and over 50 Wikipedia articles are now with sound files. The icing on the cake was being taken on a tour of the New Broadcasting House where we got to see the new state of the art newsroom and the recently restored Radio Theatre. I even got to try my hand at reading the news!