Big Data vs. Small Data: What about GLAMs?

Last week, co-founder of the Open Knowledge Foundation Rufus Pollock published the first blogpost in a series on small data. In his post ‘Forget Big Data, Small Data is the real revolution‘, Pollock writes:

“Meanwhile we risk overlooking the much more important story here, the real revolution, which is the mass democratisation of the means of access, storage and processing of data. This story isn’t about large organisations running parallel software on tens of thousand of servers, but about more people than ever being able to collaborate effectively around a distributed ecosystem of information, an ecosystem of small data.

[…] Size in itself doesn’t matter – what matters is having the data, of whatever size, that helps us solve a problem or address the question we have.

And when we want to scale up the way to do that is through componentized small data: by creating and integrating small data “packages” not building big data monoliths, by partitioning problems in a way that works across people and organizations, not through creating massive centralized silos.

This next decade belongs to distributed models not centralized ones, to collaboration not control, and to small data not big data.”

How does this relate to the cultural sector? Europeana now offers access to more than 27 million metadata records, Wikimedia Commons has 16 million media files available, Internet Archive 9 million objects and last week the Digital Public Library of America launched with 2.5 million metadata records, and are quickly expanding. This is a fantastic achievement, but this amount of material is incomprehensible for any person and it is still just a fraction of all the digitised material, which is only a fraction of what could be digitised. How to make sense of that?

As Pollock describes, it is not about the size of your database, the real revolution is the mass democratisation of the public institutions. It is possible to create packages with the complete works of Shakespeare, beautiful paintings by Van Gogh or a set of Medieval Maps. Packages that are ready for re-use which can be linked to other sets of content for further exploration.

One question that arises is: who should create these packages of data? Who decides what content should be put together? Should we leave this to the traditional ‘experts’, the curators and archivists, or do we need to let the community do this? The most logical answer to this question is: both, or better, together. The dialogue between the public institutions and the user has traditionally been very important and when users have access to such vast amounts of content and metadata, guidance and curation becomes perhaps even more needed. At the same time these experts get the chance to work with thousands of contributors who can give feedback, enrich their data, link it, and work with it in ways that could not be imagined by the institution.

For this reason – besides releasing content and data under an open license and providing a standardised technical open infrastructure as described in the OpenGLAM principles – the Open GLAM should be prepared to engage in the discussion and build value together with the community. Opening up data is not about dumping it online and never look at it again, it is about a dialogue where the public institutions tries as much as possible to send the user on his way, only to see him wander off and explore paths and directions never seen before.

We would love to hear your opinion on this topic. Please subscribe to the OpenGLAM mailing list to join the discussion.