The Crusade for Curious Images
In December last year the British Library released over a million images on to Flickr Commons. The images were taken from the pages of 17th, 18th and 19th century books digitised by Microsoft and gifted to the British Library.
One-year on and it seems pertinent to mark the anniversary with an event held at the British Library Conference Centre in London looking at what researchers and artists have been doing with the released images, and other open content, and considering what the next phase is for the British Library Labs.
The British Library Flickr images received over 5 million views on their first day online and by October 2014 they were up to over 200 million image views, with every image being viewed at least 20 times. There are also now 150, 000 tags on the tags. Despite these fantastic results there remains a massive disparity between what has been digitised and what the British Library physical holds and there always plans in the pipeline for more opening up content.
The Curious Images event held yesterday offered a whirlwind tour of the reuse of the images by artists, researchers and other institutions and of the challenges that tracking use and finding appropriate images continue to pose.
Ben O’Steen, technical lead at British Library Labs, kicked off the day by giving an overview of the Mechanical Curator, a tool which randomly selects small illustrations and ornamentations and posts them on the hour on Tumbr and Twitter. He also highlighted other great digital scholarship work like the map tagathon, an effort to tag over 25,000 maps, and the book metadata and community tags now on Figshare.
Grouping and Organising
If we can’t search, group and organise images then we fail to use them on a grandscale. The Lost visions Project: retrieving the visual element of printed books from the nineteenth century was presented by Ian Harvey of Cardiff University. The project addresses the challenges of working with big data and making the information more accessible and easier to interpret by a lay audience. They have taken 65,000 volumes of literature / 1 million illustrations and are looking at ‘organising them’ by considering what do humans do well, what do computers do well (machine learning). Much of their work is around tagging through crowdsourcing and reusing Flickr tags, they have also been creating metadata linkages to external sources: for example, illustration and the original drawing. Through their work they have been able to identify various groupings e.g. women in trousers; Indian Mutiny; Shakespeare.
Software developer Peter Balman has been looking at what has exactly happened to the 1 million images and gauging their impact using various techniques (online detective work!) for the ‘Visibility’ project funded by the Technology Strategy Board. Methods include Googling using image search, using TinEye reverse image search, looking at the taxonomy of website using dmoz, searching for information on the domain the images are on using Whois.net, detecting the language using Alchemy language API, Dbpedia for more info about the URL. You can learn more about Peter Balman’s Visibility project here in this video.
Creating new work
It was fantastic to hear from two artists who have been reusing the British Library images. David Normal, a San Francisco based artist, has created 4 collages representing ‘a collection of dynamic human dramas’ using the images. His Crossroads of Curiosity project was showcased as four large iluminated light boxes (2.4 metres by 6 metres) at the Burning Man festival held in Nevada’s Black Rock Desert earlier this year. Personally I was blown away by David’s fantastic tour of photophilia (inbedding of icons within icons within icons and recognising icons of faces in things that don’t have facial form) and crazyology. His process of taking an image, identifying patterns, replacing elements with things with similar patterns is just brilliant!
Mario Klingeman, a Code artist from Germany works on teaching computers to create art. He sees himself as an ‘Obsessive Compulsive Orderer’ captivated by trying to find clusters between images. His approach is to let human sort, then machines scale the sort and finally humans QA the machine. Mario reflected on how the British Library images had inspired his work and reflected on finding a Victorian image with one view and being the first person to (digitally) see it. Mario has created his sorting lap on his laptop in adobe air – it would be great to see some his work online and available for partners.
Later in the day we heard from researchers who are using the British Library images as a test bed. Joon Son Chung from the University of Oxford has been looking at woodblocks and checking for wear and tear in 1,000 images. Elliot Crowley, also from the University of Oxford demoed ImageMatch, a tool which uses computer vision machine learning to aid image search. It employs the Google image search to check 100 images and ‘learn’ what an item looks like e.g. what does a dog looks like? It then applies these newly learned algorithms to other images. Tests so far have been run on the Yourpaintings database and they hope to soon test on the British Library images. All tag information is fed back in to the database. A beta version of the software will be out in the new year.Researchers have also been using a mixture of computational techniques to analyse digitised handwritten manuscripts (some from the British Library’s collections) and either connect them to their corresponding transcriptions or try and ‘read’ handwriting and create transcripts using machine learning / computational approaches. Enrique Vidal from the Universitat Politècnica de València spoke about the EU-funded Transcriptorium Project which is developing HRT handwriting recognition technology and has been recognising handwriting using the Bentham collection.
Desmond Schmidt from Queensland University also shared his Text and Image Linking Tool (TILT) tool that links transcripts with images of texts. He explained that we need to bring historical texts to a modern audience who can search and analyse them. The TILT software links at word level but unlike other tools takes an automated approach that doesn’t embedded markup. They use a geoJSON overlay of polygons.
The GLAM world are not quite there when it comes to offering quality open content that can be easily used by others. Conrad Taylor, illustrator and cartoonist, considered the complexities of digital imaging and reprinting. He covered the challenges of processing pre-1870 images before Fox Talbot invented half-toning and lamented that people “can’t tell their ppi’s from their dpi’s (doesn’t specify no of pixels in images)”. He has been working on a book entitled Contracts Textures and Hues by Anita Jeni McKenzie and shared the issues reusing low-quality items from Flickr.
Tim Weyrich from Univeristy College London explained how digital acquisition is now prominent in Cultural Heritage applications but digital humanities expectations probably need to be lowered . We are often over optimistic about what is possible, while engineers are good at making tradeoffs. He also offered a reminder that for researchers solving complex real world problems should come before just writing papers. He delivered two cases studies: Firstly reassembling the Theran Wall Paintings (3d heritage ‘jigsaws’!) and asked us to consider how you computationally replicate the real world feel of an exact match? And secondly the digitally reproducing the great Parchment book, a Domesday type book that describes property relation in Ireland. The book was damaged by fire 1786, parchment doesn’t burn so easily but it does warp. The team made a decision to treat the book as a 3d object, photographing it from many angles rendering it as a ‘globe type structure’. This allowed them to later flatten the book and transcribe.
Mixing with Science
In the final session of the day we were given a perspective on image analysis from health and science. Liangxiu Han from Manchester Metropolitan University talked about large-scale data processing and analysis on images. MMU used climate data from NASA and combined it with other data so it could be applied to life sciences using pattern recognition and annotation approaches. The last talk of the day was from Ros Sandler of the Missouri Botanical Garden on the Biodiversity Heritage Library who have instigated the Art of Life project biodivlib.wikispaces.com/Art+of+Life Crowdsourcing identification of Natural History images. They currently have 93,000 pages uploaded to their Flickr stream and on wikimedia commons including images of extinct species. They are building algorithms to find images, volunteers classifying images, then push to description platforms for metadata, then bring it back, share more widely. They have a Macaw classifying tool which allows volunteers to put pictures into broad groupings – including false positives.
The event ended with Adam Farquhar, head of Digital Scholarship at the British Library, sharing insight into in the next phase of the British Library Labs project: more data, more events and more images! The project has been funded by the Andrew W. Mellon Foundation for another two years from January 2015, congratulations to Mahendra Mahey, the British Library Labs project manager, and the rest of the BL Labs team!
We were then all treated to mulled wine and mincepies! Season’s Greetings everyone!
Thanks to James Baker for sharing his notes which proved useful for fill in the gaps.