Let’s Map Open Correspondence Data!

At the Open Knowledge Foundation we seek to empower people to use open data and open content in ways that improve the world.

In part this is about the provision of tools, such as our world-renowned CKAN open data portal, but it’s also about bringing people together who are passionate about making a change and giving them a space whether that’s online or face-to-face to wrangle open data, write code and take action together.

At the recent Open Interests hack participants developed a suite of apps that help us understand lobbying in the EU and how money is spent. A couple of weeks ago Open Data Maker Night in London people wrangled data from local authority websites to find out which companies receives the lion’s share of the Greater London’s Authorities resources. Across our various Working Group mailing lists people from all over the world are debating, sharing data and experimenting with code in a huge variety of domains from open science to open government data.

At bottom this is about bringing people with bright ideas coming together to collaborate around open content and open data to build things that have transformative potential.

##The Open Humanities Hangout

Over the past few months a group of people interested in open culture, including myself, have been getting together on Google Hangout in order to build stuff with the vast amount of open cultural data and content that’s out there.

In the cultural sphere much of the transformative potential of open lies in widening access to our treasured cultural heritage whether that’s classic literary texts or the paintings of the great masters. But as ever it’s not only about opening up huge amounts of data and content, there’s already a hell of a lot of that already on the Internet Archive and Wikimedia Commons, this is also about empowering people to actually use this material in ways that they deem valuable.

So on the Open Humanities Hangout we’ve tried to do things that address both these challenges:

In order to address the problem of access we’ve held hangouts on how to run a book scanning workshop and how share the works we’ve digitised online. On another occasion, we collectively reflected on how to evangelise about opening up cultural resources and distilled the results in a set of principles which we then shared and discussed on a public mailing list.

In terms of building stuff to help re-use, we’ve built an app that helps you to get to know Shakespeare better called Bardomatic. We’ve hacked on an annotation tool for public domain texts called TEXTUS trying to make it easier to use and deploy on Word Press. We’ve created interactive timelines of the great Western medieval philosophers helping to improve and de-bug the Timeliner tool in the process.

##The Challenge: Mapping Networks of Correspondence

I want more people to join the Open Humanities hangouts – more Java Script coders, more designers, more literature students, more bloggers… anyone who loves the humanities and wants to see the great works of our past accessible and re-usable by everyone regardless of their background or location.

I’m putting forward a challenge for our next set of monthly Hangouts based on some of the great work some of the Open Humanities Working Group members have been doing around open correspondence data and open booking scanning.

I’m challenging the Open Humanities Hangout crew to construct a workflow that will enable *anyone to turn a published set of letters and turn it into a visualisation of a network of correspondence.*

One of the great success stories of the so-called Digital Humanities is the wonderful Mapping the Republic of Letters project, a collaboration between Stanford and Oxford Universities that visualises the networks of correspondence of early modern scholars. The beautiful and insightful visualisations that have been created in the process have captured the imaginations of technologists and humanists world wide.

I want to see a million Mapping the Republic of Letters project. I want it to be as easy as possible to map the correspondence of historial figures, so that anyone can do this. This includes the first year school students wanting some beautiful images for their coursework and the scholar who will use much richer data to give a more through, in-depth and academic visual story for a research paper.
I want the underlying tools to be open source and well documented and perhaps, most importantly, I want the underlying data, that collection of metadata about who sent what when to be open for everyone to use and add to.

This effort doesn’t require the existence of a huge repository of data about letters that we tap into (although this might merge in the process). This is about small sets of open data, sourced and formatted in appropriate ways by passionate groups of people all around the world that can be combined and connected easily using open source web-based components.

##How do we begin?

To my eyes, this effort will involve the documentation of at least 4 steps:

Scan in a published collection of letters
Turn this scans intro structured data that contains relevant information on respondent, date, location
Geo-code all those locations
Visualise the results on a map

We’ve already made some progress on steps 1. – 2. and there’s a wealth of information already available on how to do your own scanning and OCRing including manuals on how to build your own scanner. For 3. – 4. there’s already some brilliant information over on the School of Data. However, I want to see this information synthesised into a single point — so any student, teacher or researcher can get all the information on how to go from that collected volume of letters of so-and-so on their shelf to a beautiful visualisation.

##What might result if we’re successful?

Well for one, I hope that a beautiful and insightful set of visualisations might emerge about the correspondence of a number of important figures all over the web. But perhaps a longer term goal is to stimulate the creation of databases of correspondence that are open to everyone to use and add to. To begin with we’ll be constrained to the published volumes of correspondence in print, but if we get enough people contributing we can re-combine these published volumes in all sorts of interesting ways filling in gaps and ultimately creating datasets that might enable us to map whole networks of correspondence for a given period.

##Get involved

So the challenge is on. The next Open Humanities Hangout will take place at 5pm BST on Tuesday May 28th. If you’re thinking of joining ping me a quick message on sam.leon@okfn.org!