You are browsing the archive for Front Page.

The Crusade for Curious Images

Marieke Guy - December 19, 2014 in Digital Humanities, Events/Workshops, Featured, Front Page, Open Humanities

In December last year the British Library released over a million images on to Flickr Commons. The images were taken from the pages of 17th, 18th and 19th century books digitised by Microsoft and gifted to the British Library.
One-year on and it seems pertinent to mark the anniversary with an event held at the British Library Conference Centre in London looking at what researchers and artists have been doing with the released images, and other open content, and considering what the next phase is for the British Library Labs.

Christmas Carol

Image taken from page 17 of ‘A Christmas Carol … With illustrations [from drawings by S. Eytinge.]’, British Library on Flickr, public domain

Background

The British Library Flickr images received over 5 million views on their first day online and by October 2014 they were up to over 200 million image views, with every image being viewed at least 20 times. There are also now 150, 000 tags on the tags. Despite these fantastic results there remains a massive disparity between what has been digitised and what the British Library physical holds and there always plans in the pipeline for more opening up content.

The Curious Images event held yesterday offered a whirlwind tour of the reuse of the images by artists, researchers and other institutions and of the challenges that tracking use and finding appropriate images continue to pose.

Ben O'Steen

Ben O’Steen introduces the Mechanical Curator

Ben O’Steen, technical lead at British Library Labs, kicked off the day by giving an overview of the Mechanical Curator, a tool which randomly selects small illustrations and ornamentations and posts them on the hour on Tumbr and Twitter. He also highlighted other great digital scholarship work like the map tagathon, an effort to tag over 25,000 maps, and the book metadata and community tags now on Figshare.

Grouping and Organising

If we can’t search, group and organise images then we fail to use them on a grandscale. The Lost visions Project: retrieving the visual element of printed books from the nineteenth century was presented by Ian Harvey of Cardiff University. The project addresses the challenges of working with big data and making the information more accessible and easier to interpret by a lay audience. They have taken 65,000 volumes of literature / 1 million illustrations and are looking at ‘organising them’ by considering what do humans do well, what do computers do well (machine learning). Much of their work is around tagging through crowdsourcing and reusing Flickr tags, they have also been creating metadata linkages to external sources: for example, illustration and the original drawing. Through their work they have been able to identify various groupings e.g. women in trousers; Indian Mutiny; Shakespeare.

Software developer Peter Balman has been looking at what has exactly happened to the 1 million images and gauging their impact using various techniques (online detective work!) for the ‘Visibility’ project funded by the Technology Strategy Board. Methods include Googling using image search, using TinEye reverse image search, looking at the taxonomy of website using dmoz, searching for information on the domain the images are on using Whois.net, detecting the language using Alchemy language API, Dbpedia for more info about the URL. You can learn more about Peter Balman’s Visibility project here in this video.

Creating new work

It was fantastic to hear from two artists who have been reusing the British Library images. David Normal, a San Francisco based artist, has created 4 collages representing ‘a collection of dynamic human dramas’ using the images. His Crossroads of Curiosity project was showcased as four large iluminated light boxes (2.4 metres by 6 metres) at the Burning Man festival held in Nevada’s Black Rock Desert earlier this year. Personally I was blown away by David’s fantastic tour of photophilia (inbedding of icons within icons within icons and recognising icons of faces in things that don’t have facial form) and crazyology. His process of taking an image, identifying patterns, replacing elements with things with similar patterns is just brilliant!

Burning Man

Crossroads of Curiosity project by David Normal at Burning Man, photo from British Library blog

Mario Klingeman, a Code artist from Germany works on teaching computers to create art. He sees himself as an ‘Obsessive Compulsive Orderer’ captivated by trying to find clusters between images. His approach is to let human sort, then machines scale the sort and finally humans QA the machine. Mario reflected on how the British Library images had inspired his work and reflected on finding a Victorian image with one view and being the first person to (digitally) see it. Mario has created his sorting lap on his laptop in adobe air – it would be great to see some his work online and available for partners.

44

44 Gentlemen who look like 44, by Mario Klingemann, Flickr, CC-NC

Researching

Later in the day we heard from researchers who are using the British Library images as a test bed. Joon Son Chung from the University of Oxford has been looking at woodblocks and checking for wear and tear in 1,000 images. Elliot Crowley, also from the University of Oxford demoed ImageMatch, a tool which uses computer vision machine learning to aid image search. It employs the Google image search to check 100 images and ‘learn’ what an item looks like e.g. what does a dog looks like? It then applies these newly learned algorithms to other images. Tests so far have been run on the Yourpaintings database and they hope to soon test on the British Library images. All tag information is fed back in to the database. A beta version of the software will be out in the new year.

Image taken from page 139 of '[Sing-Song. A nursery rhyme book. ... With ... illustrations by A. Hughes, etc.]'

Image taken from page 139 of ‘[Sing-Song. A nursery rhyme book. … With … illustrations by A. Hughes, etc.]’, British Library on Flickr, public domain

Researchers have also been using a mixture of computational techniques to analyse digitised handwritten manuscripts (some from the British Library’s collections) and either connect them to their corresponding transcriptions or try and ‘read’ handwriting and create transcripts using machine learning / computational approaches. Enrique Vidal from the Universitat Politècnica de València spoke about the EU-funded Transcriptorium Project which is developing HRT handwriting recognition technology and has been recognising handwriting using the Bentham collection.

Desmond Schmidt from Queensland University also shared his Text and Image Linking Tool (TILT) tool that links transcripts with images of texts. He explained that we need to bring historical texts to a modern audience who can search and analyse them. The TILT software links at word level but unlike other tools takes an automated approach that doesn’t embedded markup. They use a geoJSON overlay of polygons.

Reuseing

The GLAM world are not quite there when it comes to offering quality open content that can be easily used by others. Conrad Taylor, illustrator and cartoonist, considered the complexities of digital imaging and reprinting. He covered the challenges of processing pre-1870 images before Fox Talbot invented half-toning and lamented that people “can’t tell their ppi’s from their dpi’s (doesn’t specify no of pixels in images)”. He has been working on a book entitled Contracts Textures and Hues by Anita Jeni McKenzie and shared the issues reusing low-quality items from Flickr.

Tim Weyrich from Univeristy College London explained how digital acquisition is now prominent in Cultural Heritage applications but digital humanities expectations probably need to be lowered . We are often over optimistic about what is possible, while engineers are good at making tradeoffs. He also offered a reminder that for researchers solving complex real world problems should come before just writing papers. He delivered two cases studies: Firstly reassembling the Theran Wall Paintings (3d heritage ‘jigsaws’!) and asked us to consider how you computationally replicate the real world feel of an exact match? And secondly the digitally reproducing the great Parchment book, a Domesday type book that describes property relation in Ireland. The book was damaged by fire 1786, parchment doesn’t burn so easily but it does warp. The team made a decision to treat the book as a 3d object, photographing it from many angles rendering it as a ‘globe type structure’. This allowed them to later flatten the book and transcribe.

Curious

Crossroads of Curiosity postcards by David Normal

Mixing with Science

In the final session of the day we were given a perspective on image analysis from health and science. Liangxiu Han from Manchester Metropolitan University talked about large-scale data processing and analysis on images. MMU used climate data from NASA and combined it with other data so it could be applied to life sciences using pattern recognition and annotation approaches. The last talk of the day was from Ros Sandler of the Missouri Botanical Garden on the Biodiversity Heritage Library who have instigated the Art of Life project biodivlib.wikispaces.com/Art+of+Life Crowdsourcing identification of Natural History images. They currently have 93,000 pages uploaded to their Flickr stream and on wikimedia commons including images of extinct species. They are building algorithms to find images, volunteers classifying images, then push to description platforms for metadata, then bring it back, share more widely. They have a Macaw classifying tool which allows volunteers to put pictures into broad groupings – including false positives.

The event ended with Adam Farquhar, head of Digital Scholarship at the British Library, sharing insight into in the next phase of the British Library Labs project: more data, more events and more images! The project has been funded by the Andrew W. Mellon Foundation for another two years from January 2015, congratulations to Mahendra Mahey, the British Library Labs project manager, and the rest of the BL Labs team!

We were then all treated to mulled wine and mincepies! Season’s Greetings everyone!

The British Library images are available on Flickr at: https://www.flickr.com/photos/britishlibrary/. You can also browse British Library images on Wikimedia Commons.

Thanks to James Baker for sharing his notes which proved useful for fill in the gaps.

 Image taken from page 16 of 'The Coming of Father Christmas'


Image taken from page 16 of ‘The Coming of Father Christmas’, British Library on Flickr, public domain

Let’s bring the Public Domain Calculators Worldwide !

Joris Pekel - July 10, 2014 in Featured, Front Page, Public Domain, Workshops


This blogpost was written by Pierre Chrzanowski, Open Knowledge Foundation France

Our cultural heritage is immense but it has been dispatched across countries, public institutions, private collections, and so forth. Hence, for a long time, there was unequal access to culture and knowledge. Those who were close to cultural institutions or knowledge centers had lots of cultural content at their disposal. Others could not access it, unless they had the opportunity to travel.

Today, digital could support the creation of a global public archive of knowledge, where every cultural artefact in the public domain would be free to access, use and share for everyone connected to the Internet. Yet, until now, there is uncertainty as to what are our rights to access that knowledge. The line between public domain and protected content are still fuzzy and unclear.

This is how came the idea of the public domain calculators.

Screen Shot 2014-07-10 at 11.59.55

The public domain calculators aim to make it easier for everyone to establish whether or not a given work is in the public domain in a given jurisdiction. Public domain calculation can provide value to memory institutions, lower costs for clearing rights and give the assurance that works can be reused without permission by copyrights holders. Public Domain Calculation can massively unlock our cultural heritage for reuse.

Since 2006 a loose network around the Open Knowledge Foundation, The Institute of Information Law and Kennisland have attempted to break through this barrier to reuse with products like outofcopyright.eu, publicdomain.okfn.org, and most recently calculateurdomainepublic.fr in France. We’ve learned a lot about public domain calculation and foresee some of the next challenges.
But we need you, developers, cultural institutions, and all those interested in making global public archive of knowledge real.
On July 16th, 12:00 am, during a one hour session at OKFest, we will present the state of the art on Public Domain Calculators and discuss and work on next challenges and how to bring the Public Domain Calculators Worldwide.

More on the Public Domain Calculators:
EU Flowcharts, Out of Copyright

Public Domain Calculators, Open Knowledge
French Public Domain Calculator
Mapping the Cultural Commons, Jonathan Gray

The session will be facilitated by Pierre Chrzanowski, Open Knowledge Foundation France, Marco Montanari, Open Knowledge Foundation Italy, Maarten Zeinstra, Kennisland

New Topic Report: Public Sector Information in Cultural Heritage Institutions

Joris Pekel - July 8, 2014 in Documentation, Featured, Front Page

This week a new Topic Report has been published on the ePSI Platform about public sector information in cultural heritage institutions. The report discusses the current state of the digital cultural heritage landscape in Europe and looks at what the recently accepted amendments to the PSI Directive mean for the sector.

###What is Public Sector Information?

Public Sector Information (PSI) is the single largest source of information in Europe. It is produced and collected by public bodies and includes legal data, economic/financial data, digital maps, geographic information, meteorological data, digitised books, statistics, and data from research projects.

Most of this Open Data can be re-used or integrated into new products and services that we use on a daily basis, such as car navigation systems, weather forecasts, and financial and insurance services.

###What does this mean for the cultural sector?

The original Directive dates from 2003 and it excluded the information that is produced by libraries, archives and museums. The amended Directive that was accepted one year ago changes this. These cultural heritage institutions will now also be asked to make their information available to the public. This includes both the metadata, the descriptive information, as well as the digitised content. The Directive does however allow institutions to make the data available under certain conditions such as charging marginal costs.

###Implementation of the Directive by the Member States

At the moment the new amended Directive is not fully in place yet. Member States have until July 2015 to implement the new Directive into their national law. How effective the new Directive will be is dependant on how the different Member States decide to implement the Directive. The report researches the current state of implementation of each Member State and the results can be found in this map.

PSIimplementation

It becomes quite clear that the implementation of the amended Directive is not on track at the moment. One year after the acceptance of the new text, only a few countries are currently working on implementing it. This is a concern especially when realising that some Member States took as long as 7 years to implement the Directive from 2003.

The full report also addresses other topics like maintaining a healthy Public Domain (also addressed next week during a workshop at the OKFestival), the potential of cultural heritage data for other sectors such as education and creative industries, and the more general issues memory institutions run into when digitising their collections. The full report can be found here.

Open Data in Cultural Heritage – OpenGLAM in Germany

Lieke Ploeger - June 10, 2014 in Events/Workshops, Featured, Front Page, Workshops

Are you working in a cultural heritage institution, or interested in opening up cultural heritage data for wider reuse? On the morning prior to the start of the Open Knowledge Festival, the OpenGLAM initiative, DM2E project, Open Knowledge Germany and Wikimedia Deutschland  are organising a half day workshop on open cultural data, with a special focus on German cultural heritage institutions.

facebook-cover

During the OpenGLAM workshop, we will investigate and discuss the possibilities and obstacles of opening up your cultural data as an institution. After a round of inspiring presentation from initiatives like Europeana, Wikidata, the German Digital Library and Coding da Vinci we will continue the discussion how to overcome the barriers to opening up data in the cultural heritage sector.

Finally, we will hear from the successful local OpenGLAM groups currently active in Switzerland and Finland, and kickstart a local OpenGLAM network for German memory institutions interested in open cultural content and open access. We invite everyone to join and help think about the focus points for such a German OpenGLAM group for the future, and look forward to start up a fruitful collaboration!

Programme

Picture 17

  • 9.30: Welcome & introduction to OpenGLAM – Lieke Ploeger, Open Knowledge
  • 9.40: Lightning talks on the value of open data for cultural heritage institutions.
    • We opened up – now what? An analysis of the open data policy of the Rijksmuseum – Joris Pekel, Europeana
    • 1 year in digital cultural heritage – what were the walls I ran into most often & how to tear them down – Stephan Bartholmei, Deutsche Digitale Bibliothek
    • Wikidata – Making your data available and useful for everyone – Lydia Pintscher, Wikimedia Deutschland
    • How to use cultural heritage data: Coding Da Vinci results – Helene Hahn, Open Knowledge Foundation Germany
    • Experiences from German GLAM projects / GLAM-Wiki-Kollaborationen in der Wissenschaft – Daniel Mietchen, Museum für Naturkunde, Berlin
  • 10.30: Coffee break
  • 10.45: Debate on the current situation around openness in Germany
  • 11.30: Forming a local German OpenGLAM group
    • With inspiring presentations of the OpenGLAM local groups from Switzerland & Finland
  • 13:00: End

Registration and details

      • Date: Tuesday 15 July, 9.30 – 13.00
      • Location: Wikimedia Deutschland, Tempelhofer Ufer 23-24
      • Attendance is free but places are limited: please register here
      • If you want to read more about OpenGLAM and open cultural data (in German), check out http://okfn.de/openglam/

OpenGLAM structure and Working Group

Joris Pekel - February 27, 2014 in Featured, Front Page, News, Working Group

The OpenGLAM initiative has been around now for more than two years. In this period we have advocated for more open data in the cultural heritage sector in a variety of ways. In this blog post I want to give you an overview of the structure of OpenGLAM and of our different activities we organise with the network.

openglam logo

OpenGLAM was set up as an initiative of the Open Knowledge Foundation. The aim of the initiative was to bring together people from a variety of different organisations, institutions and networks that share a similar set of principles and aims. We therefore work together with representatives from the Wikimedia Foundation, Creative Commons, the Internet Archive, people working at museums and libraries, open data advocates and much more.

The public side of OpenGLAM mainly lives on our blog and Twitter account. More in depth discussions are taking place on the public Mailing List. On the website we try to provide as much information as possible around the topic of open data in the cultural sector by showing best practices of open institutions in the Open Collections sections, explain what it means to be an ‘Open GLAM’ with the OpenGLAM Principles and provide Documentation for further reading.

Besides this more public side, we also have an Advisory Board with high profile thinkers in the open cultural heritage domain. They help us decide on our course and address important issues such as the recent discussion about adding open Creative Commons licenses to Public Domain material.

Finally we have a Working Group which consists out of active volunteers from a variety of domains. The Working Group has become the core of the OpenGLAM initiative and meets virtually on a monthly basis to discuss relevant topics, share news and updates and take actions that benefit the adoption of open data in the cultural sector. This has for example resulted in two successful cultural heritage topic streams at the annual conferences organised by the Open Knowledge Foundation, active discussions with a variety of cultural institutions on why and how to open up their data, and recently the OpenGLAM benchmark survey was started that aims to get more data about the adoption of open data principles in the heritage sector around the world.

The Working Group has established itself in several local Open Knowledge Foundation local groups, as well as OpenGLAM ambassadors to serve as the local point of contact in their area. The full list of local groups and ambassadors can be found here. All our meeting notes are made available to the public so if you want to get an idea abut what is being discussed have a look here.

We always invite new people to join the Working Group and help spread the word. Don’t worry if you are new to the field, we gladly bring you up to speed. More information about joining the OpenGLAM working group can be found in this document. If you want more information about the OpenGLAM initiative, the Working Group, or join us, please get in touch!

Obstacles to Opening Up Content and Data in the Cultural Heritage Sector

Sam Leon - May 15, 2012 in Case Studies, Front Page, Updates

Over the last few months we’ve run a series of workshops with representatives from cultural heritage institutions in Paris, London and Berlin.

Across these sessions we’ve gathered a large amount of feedback on the problems – legal, technical and economic – faced by institutions trying to open up cultural content and data.

Below we’ve listed some of the most prominent difficulties that have surfaced during the course of the workshops. In doing so we are building on the work already undertaken in this area by JISC, UK Discovery and Europeana.

The list is, of course, very much a work in progress and we strongly encourage people to add to it either by commenting on this post or responding on our Open GLAM mailing list. Building and enriching this list will help inform the shape of events we run in the future and the documentation we can write to help demystify some of the issues highlighted.

##Legal Uncertainty

Uncertainty concerning the legal status of digital reproductions and the originals themselves are some of the greatest obstacles to a more open cultural heritage.

A set of key issues have been identified in this field:

  • The status of digital reproductions of objects. Can new rights be applied to digital copies of works that belong to the public domain?
  • Rights clearance issues
  • Orphan works

##Economic issues

There are a plethora of economic issues that prevent GLAMs from opening up more of their collections. The cost of digitisation is perhaps the greatest obstacle to more freely available digitsed cultural content. But there are also costs associated with sorting data, hosting data, formatting data and exposing data, as well as the costs of clearing rights.

On top of the costs associated with digitising and opening data there is also the concern over the loss of existing revenue streams. A minority of GLAMs have made significant income from selling the data they hold about their collections. This issue is compounded by the fact that there is sometimes an expectation on GLAMs from local and national government that they turn over a profit with their data.

However, for the majority of GLAMs the fear seems to be not that they will lose an existing revenue stream but by opening their data and “letting go” of it, they will miss future, as yet, unrealised business opportunities.

##Control problems

The truth is that open data and the web involves a radical rethinking of the role of a GLAM institution and the traditional dichotomy between curator and visitor. A cultural heritage open data ecosystem is one in which non-professionals can contribute to the process of curation and data enrichment.

This often generates a concern that something disreputable might be done with data and content or that authoritative data might be degraded by the activities of non-experts.

This relates to the further fear, often expressed by those working within cultural heritage institutions, that opening up data will lead to a loss of attribution to the agency that created it.

But beyond this, there is also a discomfort many feel within the cultural heritage sector about opening up data because it will enable others to make money from it. This has proven to be something many within GLAMs are uncomfortable with.

##Technical constraints

There are a plethora of technical obstacles to opening up cultural heritage data. More needs to be done to clarify practices and standards that make cultural heritage datasets more open, more easily re-used and interoperable. Questions such as which formats (RDF, MARC etc), vocabularies (standard, ad hoc) or serialisation (xml, json etc) could be more effectively addressed.


Some of the challenges faced to a more open culture are based on misunderstandings about the nature of open data. This point was made very forcibly at the legal workshop we held in Berlin by a number of the participants. For instance, institutions are sometimes unwilling to open up their metadata thinking that this will necessarily commit them to a waiver on the rights of the content itself.

What is needed here are better more visible explanations and justifications of the key concepts within open data — both legal and technical. In the coming weeks we will be addressing precisely this problem with the team over at UK Discovery by writing blog posts on key concepts and continuing to develop freely available documentation on this topic such as the Open Metadata Handbook.

Stay tuned!

The Digital Public Library of America

Joris Pekel - May 8, 2012 in Case Studies, Front Page

The Digital Public Library of America (DPLA) is an initiative that has the goal to make the cultural and scientific heritage of humanity available, free of charge, to all. Where Google Books is caught up in an everlasting legal battle, a group of Harvard-led scholars have decided to launch their own project to put all of history online.

picture

When Google launched its Google Books project in 2004 with the goal to scan all the world’s books into its database, it was both praised and critisised heavily. Praised for its bold attempt to make it technically possible to digitise books on a scale never seen before. Critisised over the fact that a private company would control all of the worlds knowledge. In 2008, after being sued for copyright infringement for years, Google agreed to pay large sums to authors and publishers in return for permission to develop a commercial database of books. Under the terms of the deal, Google would be able to sell subscriptions to the database to libraries and other institutions while also using the service as a means for selling e-books and displaying advertisements. This led to even more controversy and several authors and libraries demanded to be excluded from Google’s database.

In a response to this, Robert Darnton, one of the biggest critics of Google Books, proposed to build a true ‘digital public library of America’ which would be ‘truly free and democratic’. Here, libraries and universities would work together to establish a distributed system aggregating collections from many institutions. Harvard’s Berkman Centre of Internet and Society accepted Darton’s ideas and is incubating it now. The project has several similarities with that other project that comes forth out of a response to Google Books: Europeana, and the two giants have already forged partnerships. Google still has to decide what their next steps are.

The vision of the DPLA is to provide one click access to many different resource types, with the initial focus on producing a resource that gives full text access to books in public domain, e.g. from Hathi Trust, the Internet Archive, and U.S and international research libraries. Most of its board members, including Brewster Kahle from the Internet Archive, favor a de-centralised network of different public libraries instead of building a centralised organisation which is responsible for all of its content, but this is still being discussed

In April 2013 the Harvard funded research program ends and the digital library has to be operational. A lot of progress has been made in the last year by organising several meetings and workshops and many volunteers have been recruited. Still, there are a lot of obstacles that have to be overcome.

As Google has also noticed, the technical implementation is not the hardest part, it is the copyright. Today, copyright for a work extends for 70 years after the death of the author and is applied by default to any created work. This means that it is now almost impossible to publish a work from the last century. Even when the copyright holders either are unknown or can’t be found, so called ‘Orphan Works’, the work can not be published online because the copyright law was automatically applied on all works retroactively, so without the copyright holder having to register it.
Many copyright experts argue that without a proper revision of the current copyright act, it will be very hard to include these orphan works in a digital database. Robert Danton however, believes that Congress might grant a non-commercial public library the right to digitise orphan books, which would make thousands of books available and an enormous step forward in the copyright debate.

The Digital Public Library is an ambitious project with great promise. In the next year they will continue to address the challenges that lie before them. A daunting task but with a potentially great outcome, where everybody with an internet connection can enjoy millions of books from America’s history.

Open GLAM workshop in Paris

Primavera De Filippi - May 8, 2012 in Events/Workshops, Front Page

On Friday, 27th of April 2012, the first Open Glam workshop was organized in Paris by the Open Knowledge Foundation and Wikimedia France. The aim of this first workshop was to gather together a variety of people interested in discussing the problems of Open Cultural Data, with the objective of creating a list of recommendations to be later incorporated into a white paper.

We had the pleasure to welcome an eclectic audience composed of lawyers, researchers, representatives of major cultural institutions – such as the National Library of France,  Centre Pompidou,  Cité des Sciences – as well as various spokesmen from the Ministry of Culture.

The workshop began with a keynote by Lionel Maurel (scinfolex.wordpress.com), who  provided a detailed analysis of the state of the art for Open Cultural Data in France.

He began by describing how the distinction between creative works (protected by copyright law) and data (protected by sui-generis rights) is getting increasingly blurry in the digital world. He then explained the importance of distinguishing between public data and cultural data – which has a special status under French law. While Article 10 of the 1978 Act introduces a generic right to the reuse of public data, Article 11 introduces an exception for educational institutions, or research and cultural institutions.

Although officially justified by the need to protect sensitive data or copyright information, Lionel criticized the cultural exception on two grounds:

  • Article 13 of the 1978 Act already stipulates that personal or sensitive data can not be made available to the public without the consent of the data subject, or without having been anonymized.
  • Article 10 of the same Act explicitly stipulates that any any work protected by copyright is not subject to the public data regime.

According to Lionel Maurel, the cultural exception is therefore an exception devoid of a proper foundation, but not devoid of any consequences. While Open Cultural Data is progressing abroad (cf. Europeana, the British Library, and the Harvard Library to name a few), in France, cultural institutions may, but are not obliged to make their data freely available on the national portal (data.gouv.fr).

The workshop continued with a panel discussion:

Remi Mathis (Wikimedia France) starts the discussion by presenting the public domain as a motor of creativity. Since culture in all ages is based on the reuse of prior works, the public domain is therefore essential for the development of culture in Europe and in the wider world.

Remi noted that, in the context of Wikimedia, the public domain  affects several projects – such as Wikisource, for instance, which collects public domain texts. Since copyright owners could potentially oppose a particular use of their content, all texts and historical photographs available on Wikisource are – and must be in the public domain.

According to Wikimedia’s policies, a work is considered in the public domain when it is so both in the United States (where the servers are located) and in the country where it was created. The problem is that the public domain is a very ambiguous concept under French law, as it can only be defined in negative as to include (a) anything that is not considered a  work of authorship, or (b) any work of authorship that is no longer protected by copyright law (except for moral rights which last forever).

The discussion continued with an intervention by Melanie Dulong de Rosnay (Communia), who presented the idea of introducing an Open Access obligation for public domain works.

The basic assumption is that the digitization of a work could give rise to a new right over the digital copy, allowing for the producer to impose limitations on the use of that copy. To date, this right is only a speculation which has not yet been recognized by any French court.

At European level, the Comité des Sages asked that any content digitized with public funding should be made freely available, with potential restrictions on commercial uses. The Communia association went further by requiring that digital reproductions of public domain works should also belong to the public domain and should therefore be freely available and usable without any kind of legal or technical restriction.

Given that this issue goes beyond the scope of the PSI Directive of 2003, Member States and institutions are however free to determine their own policies regarding the digital reproduction of public domain works.

Melanie suggested therefore to introduce a requirement that all digital reproductions of public domain works remain in the public domain and be therefore freely accessible and reusable to the extent that they have been digitized with public funds. She brings to the discussion the question whether this obligation should be limited to public funding, or whether it should also apply to private funding? Although the latter would be preferable, the private sector might be more reluctant to digitize works if it can no longer control the use of the digital copies generated. It should also be noted that civil society actually belongs to the private sector, and that such a requirement could therefore prevent the use of free licenses insofar as they would apply restrictions onto the public domain (a free license being less free than the absence of rights).

Benjamin Jean (lawyer) followed up, and presented the relationship between copyright law and free licenses — defined as a contract whereby an author licences his rights on a free and nonexclusive basis, for the whole duration of these rights. Originally designed for works protected by copyright law, free licenses began to be used on data after the introduction of a sui-generis right for databases. Specialized licenses have bene developed, such as ODbL by the Open Knowledge Foundation and the Open License from Etalab specifically designed for France.

The problem is that not many people properly understand these licenses and their use. If the goal is Open Data, it is first necessary to train and to educate people who create and disseminate data. Free licenses can enable the sharing and the reuse of data, but they are not sufficient as such. In order to achieve true Open Data, it must also be practically possible to access and reuse the data.

The main drawback is that only the rights holder may license data under a free license. In France, in the case of most works-for-hire (with the exception of software), it is thus necessary to first obtain permission from the employees, who hold the copyright to their creation. An exception exists for administrative staff, whose creations can be distributed freely, but only in a non-commercial context. This could be a problem to the extent that free licenses do not differentiate between commercial and non-commercial uses.

Finally, Benjamin suggested that even though free licenses represent a possible solution to promote Open Data, we should perhaps look back to the original intent of copyright law and modify the rules of the law so that the use of free licenses would no longer be necessary.

The panel discussion ended with the intervention of Agnès Simon (National Library of France), presenting an overview of the approach taken at the BNF for opening up their data.

After recalling that the cultural exception of the 1978 Act does not require libraries to freely disseminate their data, Agnès notes that, since the mission of the BNF is “to encourage the dissemination of knowledge “, Open Data was an obvious step to take.

Specifically, the BNF could choose between (a) entirely opening up the data, at the risk of providing incomplete or duplicate data, or (b) partially opening up the data to ensure its reliability and cleanliness. The BNF decided to make available to the public only part of its cultural data (see gallica.bnf.fr) – which can be freely re-used for non-commercial purposes, but whose commercial exploitation is subject to a fee.

Alongside this first phase, the BNF has also developed a series of technical means to permit and facilitate the non-commercial reuse of data, most of which is available in RDF format and licensed under the Open License of Etalab. The aim is to make this data more useful, by enabling users to access it directly through search engines, without passing through the BNF portal.

To conclude, Agnes pointed out that opening up the data was also seen as an opportunity by the BNF to achieve better internal and external alignment of their catalog, and, of course, to permit the largest dissemination and the widest reuse of their data.

The workshop continued with a brainstorming session aimed at identifying the various obstacles that GLAM institutions may encounter with regards to Open Cultural Data. The following points were identified:

Lack of knowledge concerning
  •      the concept of “Open Data”
  •      the eco-system in which GLAM institutions operate
  •      the related legal issues
Legal uncertainty concerning
  •    the legal status of digital reproductions
  •    copyright law: i.e. rights of the administrative staff and of the creators / contributors to databases (e.g. description of inventory)
  •   legal status of the content: fear to make available sensitive content, protected by copyright law or privacy law
Economic Issues concerning
  •     Financing: How to finance the digitization of data ?
  •     Compensation: How to offset the costs of the services with the benefits obtained ?
  •     Data reuse : Risk of conflicts with commercial exploitation by thirds parties
Control problems
  •      Lack of control over the data
  •      The issue of Data Integrity
  •      Control over the type of reuse: e.g. to avoid derogatory uses
Technical Barriers
  •      Costs of making available data
  •      Lack of ressource pooling
Policy Issues
  •    Lack of political will of the Ministry of Culture: Cultural System based on the maximization of intellectual property rights – which represents important economic issues
A follow-up event will be organised at the end of May in order to address those issues and eventually come up with a series of guidelines on how to resolve them in the French context.

Harvard Releases 12 Million Library Records

Sam Leon - April 25, 2012 in Front Page, News

Harvard Library, Cambridge, H. P. Kendrick, Public Domain

Big news came in yesterday that Harvard is releasing the entirety of its library metadata online and under a CC-0 license in accordance with its Open Metadata Policy. The collection includes information about books, videos, audio recordings, manuscripts and maps held within Harvard’s 73 libraries.

The 12 million records are in MARC21 format and are available to download from the Harvard servers and can also be accessed through The Data Hub. They will also be accessible through the APIs of the The Digital Public Library of America.

Stuart Shieber, director of Harvard’s Office for Scholarly Communication, acknowledged the likelihood that things will be done with the data that they never expected. “This data serves to link things together in ways that are difficult to predict,” he said. “The more information you release, the more you see people doing innovative things.”

See the original announce post on the Harvard website here.

Hacks Up from Hack On the Record at the UK’s National Archives

Sam Leon - April 24, 2012 in Front Page, Hack days

Photo of the National Archives by Nick Cooper (CC-by-SA)

A few weeks back I posted on here about an up-and-coming Hackathon at the UK’s National Archives. Jo Pugh, who organised the event, has now posted all the hacks that came out of the weekend, a summary of which can be found below:

At the close of the event, 11 teams opted to present. They were:

Additionally,