Using Crowdcrafting for transcribing cultural works

Crowdcrafting is a free, open-source crowd-sourcing and micro-tasking platform. It is a joint effort between the Open Knowledge Foundation and Citizen Cyberscience Centre, It enables people to create and run projects that utilise online assistance in
performing tasks that require human cognition such as image classification, transcription, geocoding and more. Crowdcrafting is there to help researchers, civic
hackers and developers to create projects where anyone around the world with some time, interest and an internet connection can contribute.

picture

Crowdcrafting is different to existing efforts:

  • It’s a 100% open-source – everybody can use it or fork the code for their own purposes.
  • Unlike, say, “mechanical turk” style projects, Crowdcrafting is not designed to handle payment or money — it is designed to support volunteer-driven
    projects.
  • It’s designed as a platform and framework for developing deploying crowd-sourcing and microtasking apps rather than being a crowd-sourcing
    application itself. Individual crowd-sourcing apps are written as simple snippets of Javascript and HTML which are then deployed on a PyBossa
    instance (such as PyBossa.com). This way one can easily develop custom apps while using the Crowdcrafting platform to store your data, manage users, and
    handle workflow.

The team has been very busy in the last couple of weeks and has created a few basic templates that help new users and developers to create their own applications for the platform.

One of them is PDF Transcribe: a PDF transcription template that could be used for transcribing full PDF
documents one page at a time including scanned images. This application uses the Mozilla PDF.JS library to load an external PDF file and render it directly in the web browser
without using any third party plugin.

By using PDF.JS, there is the possibility of rendering almost any PDF that is hosted under an HTTP server and then use a customized form to get the desired data extracted from it .

In this simple demo application, a PDF file is loaded in one side of the page, and in the other one a form where the volunteer will be able to transcribe
the PDF page by typing the text in the input form. While this example is really simple, adapting the template to extract specific bits of information from
the PDF will be really easy. The idea is that it is possible for example to extract specific items from the documents, like captions, tabular data,
authorship, institutions, etc.

The application can be used for transcribing a variety of digital objects such as manuscripts, archives, postcards, shopping lists and so on. This brings great possibilities for big and smaller GLAM institutions to work with the community to get more of their digitised material transcribed.
The creators of the tool are very willing to help you to develop your application, so if you have questions, do get in touch!

You can read more about the architecture in the PyBossa Documentation and follow the step-by-step tutorial to create your own apps..

More open source tools for working with cultural data, see our CultureLabs page