Preparing our editions for online publication

November 22, 2011

Authored by: Rupert Mann, Editorial Digital Development, Oxford University Press

With the launch of Oxford Scholarly Editions Online (OSEO) scheduled for 2012, members of our editorial board, publishing staff at OUP, and our software developers are all focused on converting the plans for the site into a reality. All, in their various ways, are approaching the problem of how we make a single digital product from what was originally many print books.

That print content comprises, for launch, over 150 editions of works written between 1485 and 1660, ranging from classic editions of the early twentieth century, such as Herford and Simpson’s complete Ben Jonson, to significant modern editions, such as Taylor and Lavagnino’s prize-winning Thomas Middleton. (Subsequent releases will include works from other time periods.) All this print content has been digitized so that it can appear on the site.

But what does “digitized” actually mean? Two things. First, the words and spaces from the print editions are input into a computer — chiefly through scanning and optical character recognition. Secondly, that content is coded with XML tags. These tags mark parts of the content as being different kinds of content, and can apply to whole sections of an edition (the introduction, the play, the appendices); to parts of sections (Act III scene ii); or to elements in those parts (a speaker’s name, or a stage direction). These tags have been applied by our digitization service according to the book of rules we wrote when we began the project.

Those rules embodied two principles. First, we wanted to not lose any of the information from the print edition — even if that information was implicit. Any reader of a play text, for example, knows that an italic phrase in brackets in the middle of a speech is a stage direction, though it is nowhere explicitly labelled as such. Scholarly editions are shot through with such conventions, and we have tried to capture the information they contain in tagging. It’s an approach most conspicuous in the highly codified sections of critical apparatus, where different tags identify different elements at a very fine level — variant readings, for example.

Our second principle was to make sure that our tagging would make the online editions as easy as possible to use. Which begs the question of how exactly we imagine the site being used. Consultation with our editorial board, market research, and our own principles of digital publishing suggested a list of different use cases, stretching from the journalist in a hurry who just needs to check one quotation to the scholar who will be sitting down with the same text for many days. Different tags support them. The journalist needs tags that support very precise and easy navigation; the scholar needs tags that bring together all the material relevant to a text, neatly categorizing it.

At the moment we are in the thick of validating the proper capture of all this content. Hard cases we refer to our board, who are also underwriting our overall arrangement of the content — how do we put together a single digital site from a selection of individual print editions, all compiled by different editors, at different times? To solve this, we’re compiling a “map” of the works on the site, and the editorial board are making sure that the important works stand out in high relief, so that a user can easily find a particular poem or sermon without needing to know in what edition we have it.

The relation of that “map” of plays, poems, and other works in OSEO to the editions that contain them has been a challenge for our software developers. The conventions for putting books online are now broadly well-defined, but putting editions online is a different matter. The fundamental problem is that they are a book within a book — a work by an author, concealed in an edition by an editor. Or, indeed, multiple works — a collection of poetry, for example. This leads to problems of finding what you are looking for — in which of our several editions of John Donne is that particular poem that I can half-remember? — and presentation — how do we keep apart content that the editor wrote from the original work?

These may be challenges, but of course these are also some of the great opportunities for the digital publication of this content. With the right tagging, and a well-made site, search will make it easy to go straight to that half-remembered poem by Donne. And it will be possible for a user to see the text of the work at the same time as the notes — we often think of the student in the research library, trying to absorb the text, constantly flipping the pages to the back of the book in case there is an important note there. There’s a good argument for saying that the multiple layers of a scholarly edition are actually better suited to digital than print publication, for reasons just such as these.

It is one more example of how we’re hoping to bring the opportunities of digital to editions that were produced for print. (Certainly, in the future, we’ll be increasingly including content that was compiled in the digital age — but that’s another story.) As we approach launch, there will be regular posts on this blog that unpack some of the issues sketched out here, written from a number of different perspectives — by our board members, with an eye to the intellectual integrity of the site, and by technical and publishing staff on the project, about the tagging structure, the indexing metadata, and the site design. Do visit the site regularly to follow our progress to launch.