Jump to Content
Jump to chapter

Gary Taylor and Gabriel Egan (eds), The New Oxford Shakespeare: Authorship Companion

Contents
Find Location in text

Main Text

Chapter 2 A History of Shakespearean Authorship Attribution

gabriel egan

On 7 May 1996, Dorothy Woods, a retired health worker, was found dead in her home in Huddersfield in the north of England. She had been smothered by a pillow, and signs of a break-in made local police pursue the theory of a burglary gone wrong. A window at the point of entry was found to hold the oily impression of a human ear pressed against it. Unfortunately for local burglar Mark Dallagher, Huddersfield police consulted a Dutch police officer, Cornelis van der Lugt, who although he had no forensics training had become convinced that ear-prints are as incriminating as fingerprints. Comparison of Dallagher's ear with the print left at the crime scene led to his conviction for murder, followed six years later by his retrial and exoneration. The Court of Appeal found that the first trial judge misdirected the jury regarding the value of expert testimony and failed to identify fallacious reasoning about statistical probability.

The Dallagher case contains several lessons for the study of authorship attribution. At the time of Woods's murder, what was then described as the new forensic science of ear-print evidence was in its infancy with few experts. Since then, ear-printing has not established itself as a reputable branch of forensics and remains of doubtful value in identifying criminals. Shakespearean authorship attribution by computational stylistics too is a new field with relatively few experts and has already had spectacular failures because the value of evidence was wrongly weighed. At the trial of Dallagher, the judge instructed the jury that 'If you are sure that Mr Van Der Lugt's evidence is correct and you accept it then you would be entitled to convict on his evidence alone' (Kennedy, Curtis, and Pitchford 2002). In fact, the ear-print evidence should have been considered only where it might corroborate other evidence.

As with fingerprint and DNA evidence, the strongest kinds of argument in such cases are those used to exclude suspects rather than include them. If we find a partial human genome or fingerprint at a crime scene, we might with certainty declare that it matches no part of the DNA or the fingers of a given suspect. The suspect cannot have left this evidence. But finding that the fragment matches part of a suspect's DNA or finger is not itself proof of guilt since, being only a fragment, it might also match others' DNA or fingers. When evaluating so-called partial matches, we are forced to make statistical speculations about the likelihood that a fragment of a given size might match more than one person. Human beings, including experts, have not always made the correct judgements about such likelihoods.

The history of Shakespearean authorship attribution has parallels with the Dallagher case in its measurement of features that were wrongly thought to be distinctive, in scholarly overestimation pg 28of the value of evidence, and in faulty calculations of likelihood. The history offered here is nonetheless intended to persuade the reader that authorship of early modern writing can be ascertained using empirical techniques that draw solely on internal evidence. This is not a comprehensive survey of all the scholarship, but an outline of how the methods for authorship attribution in connection with Shakespeare have developed over the past 150 years, paying special attention to contributions that introduced new techniques and attempting to indicate the strengths and weaknesses of the various approaches to the problem.

By internal evidence we mean the writing itself as opposed to accounts of the writing, which we consider to be external evidence and which includes the presence or absence of authors' names printed on editions of their works. The earliest editions of Shakespeare's plays in the early 1590s did not routinely print his name on the title page, but this was true of English printed drama generally so it has no special significance. By the end of Shakespeare's career, editions of plays routinely printed the dramatist's name on the title page, but of course this is only evidence, not proof, of authorship. Shakespeare's name appeared on the title pages of the plays The London Prodigal in 1605, A Yorkshire Tragedy in 1608, Sir John Oldcastle in 1619, and 1, 2 The Troublesome Reign in 1622, but almost no one takes these attributions seriously. Similar misattributions dogged the early publication history of Shakespeare's poems.

In 1623 the First Folio gave Shakespeare sole credit for thirty-six plays that have since then formed the core of his accepted canon. Only one play that was already in print but left out of the First Folio has been universally accepted as part of the Shakespeare canon since the late eighteenth century: Pericles, which was published in 1609 with his name on the title page. In 1634 an edition of The Two Noble Kinsmen appeared with the names of Shakespeare and John Fletcher on its title page, and by the late twentieth century this had become widely accepted as an accurate attribution. One seemingly conservative way to define Shakespeare's dramatic canon, then, is to include the thirty-six First Folio plays plus Pericles and The Two Noble Kinsmen. These thirty-eight plays are the ones offered in the Royal Shakespeare Company edition Complete Works (Bate and Rasmussen 2007). Unfortunately this conservative definition is certainly wrong: there are undoubtedly more plays to which Shakespeare contributed parts, and substantial parts of plays in the 1623 First Folio are not his.

The former fact is newly discovered, but the latter realization came early in the history of attribution scholarship. In his Complete Works edition of 1725, the poet Alexander Pope gave the opinion that the First Folio was less authentic than the preceding quartos on account of 'additions of trifling and bombast passages' added 'by the actors, or … stolen from their mouths into the written parts' (Pope 1723–5a, xvii). Pope not only considered Pericles as inauthentic as The London Prodigal, A Yorkshire Tragedy, and Sir John Oldcastle, but he also dismissed the Folio's Love's Labour's Lost, The Winter's Tale, and Titus Andronicus as having 'only some characters, single scenes, or perhaps a few particular passages' by Shakespeare (Pope 1723–5a, xxi). Pope believed this because Ben Jonson made apparently disparaging remarks about Titus Andronicus and because his own finely tuned poetical judgement told him so.

The example set by Pope of attributing plays using little more than personal poetical taste was followed by others across the eighteenth and nineteenth centuries. In 1744, Thomas Hanmer described The Two Gentlemen of Verona as someone else's play to which Shakespeare added only 'some speeches and lines thrown in here and there, which are easily distinguish'd, as being of a different stamp from the rest' (Hanmer 1744, 143n.). In his Critical Observations, John Upton dismissed the play as self-evidently not Shakespeare's work, 'if any proof can be formed from manner and style' (Upton 1746, 274). It was to be another 100 years before the problem of identifying Shakespeare's writing was tackled with anything more objective.

pg 29In 1847 Samuel Hickson reviewed three books about The Two Noble Kinsmen, the only Shakespeare play whose first edition proclaimed it to be co-authored, and explored the problems of attributing particular parts to Shakespeare and Fletcher. After some loosely phrased comparisons of character development, 'sentiment', and 'boldness of metaphor', Hickson turned to 'redundant syllables', meaning feminine endings of verse lines, and noted that Shakespeare wrote them far less often than Fletcher (Hickson 1847, 63, 65, 66). Hickson assigned a speech to Fletcher on the grounds that it does something Fletcher favours, and Shakespeare does not: using 'in the plural certain nouns of quality or circumstance commonly used in the singular', such as honours and banishments (Hickson 1847, 68). Another long scene Hickson gave to Shakespeare because it is in prose, and he thought Fletcher virtually incapable of long prose scenes (Hickson 1847, 69). We see in these comments the first signs of a rigorous comparison of writing styles based on objective and countable features, but without the extensive listing of evidence that is needed to prove stylistic difference.

The first time that a play with strong external evidence of Shakespeare's sole authorship was seriously investigated for possible co-authorship was when Hickson and James Spedding independently considered the First Folio text of All Is True/Henry VIII (Hickson 1850; Spedding 1850). As with Hickson's previous work, the studies mixed subjectivity—'the same life, and reality, and freshness' (Spedding 1850, 118)—with objective counting of the phenomena of writing style. Spedding suggested that a reader with 'a practised ear' would perceive unaided the distinctions in style by which he assigned parts of All Is True/Henry VIII to Fletcher and Shakespeare. Those 'less quick in perceiving the finer rhythmical effects' might be more readily convinced by some counts of 'lines with a redundant syllable at the end', meaning feminine endings (Spedding 1850, 121).

Across Shakespeare's late plays Spedding found that about 28–38 per cent of lines have feminine endings; tabulating the figure for each scene of All Is True/Henry VIII, he found a marked difference between the scenes his practised ear already told him were Shakespeare's, in which the feminine endings ranged from 28 per cent to 40 per cent, and those he already thought were Fletcher's, in which the rate was 50 per cent to 77 per cent. There is no overlap in this stylistic feature: Shakespeare's maximum is 10 per cent lower than Fletcher's minimum. Moreover, the range for the scenes in All Is True/Henry VIII already subjectively attributed to Shakespeare matched the range in his other late plays The Winter's Tale and Cymbeline. Spedding had found a strong marker of the difference between the two men's styles. If it was true that Spedding first made his division on subjective grounds, then the agreement of the numbers with his impression is all the more convincing. And since Hickson independently arrived at the same conclusion about the same play, the claim is stronger still. The professional application of objective measures of style to the problem of Shakespearean authorship attribution had begun.

The first editors of Shakespeare to be university-employed professionals were W. G. Clark and W. Aldis Wright, whose Cambridge–Macmillan Complete Works of 1863–6 was by far the most scrupulous investigation of the texts to date. Clark and Wright's edition of Macbeth for Oxford University's Clarendon Press in 1869 revisited the previously observed likenesses between parts of this play and parts of Thomas Middleton's play The Witch. Their conclusion about Macbeth was that 'the play was interpolated after Shakespeare's death … The interpolator was, not improbably, Thomas Middleton; who … expanded the parts originally assigned by Shakespeare to the weird sisters, and also introduced a new character, Hecate' (Clark and Wright 1869, xii). Not until the late twentieth century would the full implications of this insight be widely accepted in Shakespeare scholarship.

The hero of the Victorian breakthrough on authorship problems is Frederick G. Fleay. In 1873 F. J. Furnivall, who had already founded the Early English Texts Society in 1864 and the Chaucer Society in 1868, founded the New Shakspere Society, whose purpose was to study 'the metrical pg 30and phraseological peculiarities of Shakspere' (Furnivall 1874b, vi). The point was to ascertain the order in which the plays were written and so track the progress of Shakespeare's mind across his career, but looking closely at Shakespeare's versification and phrasing meant counting certain features. Comparisons with other writers' counts were inevitable. The New Shakspere Society did not set out to alter the attribution of plays amongst Shakespeare and his contemporaries, but its philologically influenced focus on countable features—of which its member Fleay was the leading exponent—necessarily led that way.

The year 1874 was the annus mirabilis for authorship attribution by analysis of internal evidence. In his first paper addressed to the New Shakspere Society, Fleay acknowledged Furnivall's point that metrical tests can help determine the order in which Shakespeare's plays were written, but he saw a 'far more important end' in determining the genuineness of the plays traditionally assigned to Shakespeare (Fleay 1874a, 6). It was the act of making his counts that first led Fleay to suspect that The Taming of the Shrew and parts of Timon of Athens, Pericles, All Is True/Henry VIII, and the Henry VI plays are not by Shakespeare, and as he observed this was largely a new development in the field. Fleay's tests mentioned in this first paper were the rates of rhyming, 'double endings' (that is, feminine endings), 'incomplete lines' (presumably those with fewer than ten syllables), and 'Alexandrines' (that is, lines of iambic hexameters) (Fleay 1874a, 7).

From these rates, Fleay found reason to suspect the above plays and also—because their rates of these metrical phenomena put them at odds with the chronological order established by other means—he found reason to suppose that Troilus and Cressida and All's Well that Ends Well are Shakespeare's revisions of his earlier works. Importantly for our purposes, Fleay acknowledged that subjectivity entered the problem because the 'laws of metre' are not 'definitely laid down' (Fleay 1874a, 15). That is, there remains room for experts to disagree about how close in sound two words must be to count as a rhyme, about the permissible relineation of verse to regularize metre, and about how tightly to define a term such as alexandrine (does the caesura have to appear after the third iamb?) The lack of shared definitions of metrical features was to prove an obstacle to the corroboration of findings based on counting them, and in the discussion of Fleay's paper reported in the Society's Transactions the problem was extensively debated.

In a subsequent paper that attempted to show from metrical tests that The Taming of the Shrew is not Shakespeare's work, Fleay extended his tests to include rates of various categories of metrical deficiency including headless lines (lacking a first unstressed syllable) and broken-backed lines (lacking a syllable somewhere in the middle). He also introduced a new class of evidence: words occurring in the work under examination that do not occur elsewhere in the author's accepted canon (Fleay 1874b). This test was made possible by the publication of concordances: alphabetized lists of all the words in Shakespeare with, for each word, the play, act, scene, and line number where it appears. Fleay used the concordance created by Mary Cowden Clarke and acknowledged that its errors produced errors in his work.

Fleay was aware that every play would have a certain number of words that appear nowhere else in the canon—called hapax legomena—but found the number in The Taming of the Shrew to be disconcertingly large. In counting the hapax legomena, Fleay treated all three Henry VI plays, Titus Andronicus, Pericles, and All Is True/Henry VIII as 'plays wrongly ascribed to Shakespeare' (Fleay 1874b, 92) without regard to where in those plays the sought-for words appear. This mistake should alert us to the recurrent danger in authorship attribution studies that the evidence may be self-confirming. Once some plays are entirely removed from the accepted canon of Shakespeare, the ranges within which various phenomena must fall in order to be typical of Shakespeare are likely to become narrower simply because we are generating them from a smaller sample. This will make numerical counts that are merely outliers within the Shakespeare canon—that is, extreme values near the edge of Shakespeare's full range—appear to be outside his range. pg 31On the other hand, there may be good reasons to restrict a canon for the purpose of comparison. If we suspect that genre affects a particular feature we are counting—occurrences of the word death being demonstrably less frequent in comedies than elsewhere—then we may wish to compare only plays belonging to a particular genre.

The opposite phenomenon must also be guarded against. Once we admit a new play to a canon we might well thereby broaden the range of values that we will accept as typical of this author's writing, and this will make plays with values that were previously outside the accepted range for this author begin to look like mere outliers within his accepted range. All methods that depend on defining an author's range and that adjust this range as new plays are admitted to or excluded from the accepted canon suffer this weakness. The mitigation for this, which is practicable for Shakespeare but not for writers whose canons are much smaller, is to define a set of sole-authored well-attributed plays that is significantly smaller than the likely full canon and to test other plays only against that secure subset. For authors with especially small dramatic canons, such as Thomas Kyd whose only securely attributed play is The Spanish Tragedy, this may be impossible and the inclusion of marginally attributed plays may be the only way to create a testable canon.

Fleay's next paper for the new Society divided Timon of Athens between Shakespeare and an unknown author using the same metrical tests as his first paper (Fleay 1874c). His division is strikingly similar to the modern generally accepted division, in particular in giving scenes 1.2 and 3.1 to 3.5 to the other writer (Jowett 2004a, 202). At the fourth meeting of the Society, Fleay presented his evidence confirming the already widespread suspicion that Acts 1 and 2 of Pericles are by someone other than Shakespeare (Fleay 1874d). The starkest difference is in the number of rhyming lines: Acts 1 and 2 come to about the same length as Acts 3, 4, and 5, but have 195 rhymes to the latter's 14.

In the discussions of these early papers on authorship attribution, only one new test was added to those devised by Fleay. Spedding proposed what he called the Pause Test, building on what others had called the phenomenon of the stopped line (Spedding 1874). This measures what is now usually called enjambment, which is where the grammatical clauses of the verse run across multiple lines rather than ending at the ends of lines. As Spedding remarked, in early Shakespeare the ends of lines tended also to be the ends of grammatical clauses, while in late Shakespeare—and he rightly identified Cymbeline as an extreme case—enjambment predominates so that clauses run over the ends of lines, and in spoken delivery an actor pausing at the line ending would disrupt the sense.

For his last paper, delivered in the first year of the meetings of the New Shakspere Society, Fleay picked up the suggestion by Clark and Wright that Macbeth contains material added to the play by Middleton after Shakespeare's death (Fleay 1875). Unfortunately, he also saw such adaptation at work in Julius Caesar, which opinion found no followers. After a lengthy tour of what he considered the parts of Macbeth too poorly written to be Shakespeare's—a recurrent attitude in early authorship studies—Fleay provided the stylistic evidence for his division of the play.

The first piece of evidence was that Macbeth is abnormally short, the only comparable plays being The Comedy of Errors, The Two Gentlemen of Verona, and A Midsummer Night's Dream—all of which might be short simply because they are early interlude-style comedies not mature tragedies—and Julius Caesar, Pericles, and Timon of Athens that Fleay had already bracketed off as 'finished or altered by some other poet' (Fleay 1875, 355). Here we see the self-confirmation principle at work in the exclusion of the short Julius Caesar making the short Macbeth seem all the more anomalous. Fleay was willing to put a figure on the significance, claiming that the odds of the altered plays also being by chance the shortest plays is '1 in 101,120½' (Fleay 1875, 355). Fleay gave no account of the calculation leading to this number and was apparently unaware that such pg 32wild claims are apt to convince non-specialists that investigators' figures have no serious bearing on literary art.

Much more persuasive was Fleay's second piece of evidence: more scenes end with rhyming couplets in Macbeth than in any other Shakespeare play, and there are many more such couplets overall, and yet by the middle of the first decade of the sixteenth century (around the time Macbeth was written) Shakespeare had largely given up using rhyme. Turning to his startling claim that the Julius Caesar we have in the First Folio is Jonson's adaptation of Shakespeare's play, it is reassuring that Fleay offered no tabulated metrical evidence in support of this idea. If the methods that gave Fleay what we now think are the correct views of The Two Noble Kinsmen, All Is True/Henry VIII, Timon of Athens, Pericles, and Macbeth also point to Jonson's hand in Julius Caesar then those methods would fall under suspicion. Instead, Fleay eschewed his usual metrical tables—'I have not had time to count them' (Fleay 1875, 358)—and relied on verbal parallels (words and phrases in common) that he noticed were shared between Julius Caesar and plays by Jonson. Parallel hunting was to become the principal early twentieth-century attribution technique, and as we shall see certain strict rules must be applied if it is to have any value.

The last paper of interest to us that was read in the New Shakspere Society's first year made a subtle distinction between different kinds of line ending (Ingram 1875). Ordinarily the tenth syllable of a regular iambic pentameter line is stressed, and John K. Ingram was concerned to distinguish two kinds of deviation from this norm by use of a weak monosyllable in this position. In the first kind, which Ingram called a light ending, 'the voice can to a certain small extent dwell' at that point, while the other, which he thought properly deserved the name of a weak ending, is 'so essentially proclitic' that 'we are forced to run' it 'into the closest connection with the opening words of the succeeding line' (Ingram 1875, 447).

Most usefully, Ingram listed the particular monosyllabic words that in his scheme usually fall into each category, and detailed the circumstances—such as emphatic use or being followed by a parenthetical clause—that might on occasion put it in a different category. This was an important development in the formulating of precise rules for metrical analysis since even scholars who disagreed about the validity of the categories could nonetheless check that certain counts were being made according to the stated rules. Indeed, so long as the rules were being followed rigorously the validity of the categories need not be agreed upon if an investigator's purpose were merely to find verifiable discriminators of one writing style from another.

Ingram was alert to the problem that as more people started counting verse features, the freedom to interpret certain rules and to understand certain phonetic features in different ways might result in scholars' raw counts failing to agree. To forestall this he had an idea: 'I would strongly advise the appointment by the New Shakspere Society of a "Counting Committee", to fix beyond doubt the numbers of lines of different sorts in the several plays, and to verify all the figures brought out by the application of the different verse-tests' (Ingram 1875, 449 n. 2). We in the early twenty-first century are no nearer this ideal situation than Ingram was.

No subsequent papers for the New Shakspere Society quite matched those read in the first year. Jane Lee spoke about the authorship of 2 Henry VI and 3 Henry VI and the relationships between the 1590s editions and the Folio texts, but did not count or tabulate the metrical features on which her argument rested (Lee 1876). Instead—much like the famous proof of Pythagoras's Theorem reproduced by Bhaskara of India in the twelfth century with no working out and just the caption 'Behold!'—Lee merely quoted passages that she thought would 'serve to illustrate what these metrical differences are' and left them unanalysed (Lee 1876, 222). Lee thought that although the Folio versions were Shakespeare's, the 1590s editions of the Henry VI plays were not his, and for her attribution of them to Robert Greene and Christopher Marlowe she relied upon verbal parallels and the likeness of dramatic characters (Lee 1876, 241–50, 251–7). By the same means, Lee pg 33attempted to establish Shakespeare's authorship of the Folio versions, although Marlowe possibly helped out (Lee 1876, 263–7). By more of the same aimless parallel hunting, Robert Boyle 'detected' Philip Massinger's hand (instead of Shakespeare's) alongside Fletcher's in All Is True/Henry VIII (Boyle 1886).

The methods for authorship attribution by the analysis of internal features of the plays remained essentially unchanged for the next 100 years. There were just two methods: counting the frequencies of certain verse features—new studies introduced new countable features—and finding parallel passages showing that a work of known authorship contains the same words and/or phrases and/or sequences of ideas as the work for which the authorship is sought. Of all the things one might count in literary writing, habits of versification had the attraction that they could be counted fairly quickly and recorded quite easily—the key metric was generally expressed as the average number or lines per occurrence (or its inverse, occurrences per line)—and they demonstrably distinguished different writers. A complication was that writers might drift in their habits over time, so that on many tests the loose versification of late Shakespeare scores significantly differently from the metrically more regular writing of his early career. For Shakespeare, we can, to some extent at least, factor this into the calculations since the chronological order of his works is in large part well agreed upon.

Other than habits of verse, the obvious features of writing that may in principle be counted are the choices of words and the various frequencies of their occurrence. Until the 1960s this was virtually impossible on any substantial scale because without machine-readable texts the counting had to be done by hand and it is extraordinarily laborious. As we have seen, the existence of printed concordances to Shakespeare made it possible to locate his use of interesting lexical words, but concordances typically omit the high-frequency function words: the articles, prepositions, and others that serve primarily grammatical rather than lexical purposes. Because they occur at high frequencies that are demonstrably distinctive of authorship, function words are of special interest to attribution investigators. However, without concordances to all the other dramatists of Shakespeare's time, the comparison of Shakespeare's use of language with that of other writers could not be systematic, and where it was attempted it relied on scholars' happening upon or recalling parallel passages.

For the first fifty years after the formation of the New Shakspere Society, authorship attribution studies continued to appear using counts of the two kinds we have seen: verse features and parallel passages. In 1924, E. K. Chambers gave a talk entitled 'The Disintegration of Shakespeare' that retarded the progression of the field almost until the end of the twentieth century (Chambers 1924–5). Chambers's talk was more reasonable in tone and argument than one would expect from the reputation it has acquired. He objected to inaccuracies in the counting of metrical features by Fleay and insisted that short samples cannot be expected to show the same averages as whole plays: 'If a play has twenty-five per cent of double endings, they are not spread evenly at the rate of one double ending in every four lines' (Chambers 1924–5, 98–9). Rather, they cluster, and this means that a smallish section that has none or many should not for that reason alone be suspected as interpolation.

Chambers criticized the wilder theories of Fleay's successor J. M. Robertson for lacking evidential bases, and this too was reasonable. As John Jowett observed, while Chambers's critiques were valid, his talk's title and his reputation 'made undisintegrated Shakespeare an article of faith' for decades to come (Jowett 2014, 171). Indeed, over half a century later the vehemence of Steven Urkowitz's condemnation of the disintegrators' 'noxious voices' is witness to Jowett's point (Urkowitz 1988, 232). Chambers himself was sufficiently in sympathy with the disintegrators' aims that he printed the metrical tables of Hickson, Spedding, and Fleay as appendices to his magisterial William Shakespeare: A Study of Facts and Problems shortly after giving his talk pg 34(Chambers 1930, 2:397–408). On the basis of metrical and other evidence, he declared plausible the claims that parts of the plays Edward III and Sir Thomas More are by Shakespeare (Chambers 1930, 1:499–518). Chambers was by no means simply a textual conservative opposed to alteration of the Shakespeare canon by specialist analyses.

The parallels between Hand D's contribution to Sir Thomas More and scenes of riot elsewhere in Shakespeare consist mainly of likenesses of expression that do not alone prove shared authorship. Considering this problem in practical terms, Muriel St Clare Byrne put together what she considered a series of 'golden rules' for parallel hunting, the most important of which was that 'we must prove exhaustively that we cannot parallel words, images, and phrases as a body from other acknowledged plays of the period; in other words, the negative check must always be applied' (Byrne 1932–3, 24). This was in practice virtually impossible to achieve when she was writing since printed concordances for most early modern writers did not exist and manual checking by reading all the materials is beyond anyone's endurance. Not until electronic texts of early modern drama became widely available in the 1990s was rigorous parallel hunting possible, and even then key contributors to the field—such as Brian Vickers, who approvingly cited Byrne's negative-check rule (Vickers 2002b, 58–9)—failed to consistently apply it. The failings of Vickers in this regard are detailed in Chapter 4 and elsewhere in this volume.

The first to apply the established metrical tests to the systematic study of the whole of a single problem of authorship attribution was E. H. C. Oliphant. In a series of articles that he revised into a book, Oliphant sought to establish who wrote which plays in the 1647 and 1679 folios of Francis Beaumont and John Fletcher (Oliphant 1927). To supplement the meagre external evidence, Oliphant relied on the fact that Fletcher 'is distinguished by his excessive use of double endings' running at around 70 per cent of lines, and on his habit of stressing the eleventh syllable of a verse line (Oliphant 1927, 32). Oliphant also found that Fletcher favoured the end-stopped line over the enjambed line (90 per cent of lines) but to relieve 'the monotonous succession of iambic after iambic' he threw in trisyllabic feet (Oliphant 1927, 35).

Having likewise characterized the other candidate authors' styles—bringing in also vocabulary and phrasing—Oliphant presented samples from each play under consideration and invited the reader to agree that it sounded like the style of the author he favoured. This appears, then, to be the Victorian 'Behold!' approach, although at a few key points across his 553-page argument Oliphant included tables of figures showing numerically the contrasting habits of authorship (Oliphant 1927, 69, 80, 89, 348, 367, 482). Oliphant's seemingly subjective evaluations have in almost all cases been confirmed by subsequent investigators using entirely objective means. In fact, underlying the sound judgements were large bodies of quantitative evidence—counts of feminine endings, enjambment, and other metrical features—that are not in Oliphant's published works but have been found in his private research notebooks (Jackson 2003b). Importantly for our purposes, Oliphant attributed to Fletcher on internal and external evidence most of the second half of Lewis Theobald's revived play Double Falsehood, and this indirectly bolstered the case for the other half being Shakespeare's (Oliphant 1927, 282–302).

Just as the lack of concordances to the works of all the other dramatists of Shakespeare's time hindered the task of distinguishing truly significant parallel passages from the merely commonplace, so in versification the lack of tables of frequency rates for all the dramatists hindered the extensive comparisons that would make studies exhaustive. Philip W. Timberlake's Ph.D. thesis accepted by Princeton University in 1926 went some way towards remedying this deficiency, and despite covering only the drama up to 1595 it remains the most complete tabulation of the frequencies of feminine endings in existence (Timberlake 1931). Timberlake addressed head-on the problem alluded to by Ingram in his suggestion that a committee might standardize the counting of verse features: 'there has been no general agreement as to what constitutes a feminine ending' pg 35(Timberlake 1931, 1–2). Without standardized definitions, comparisons can be made only within individual studies—in which the investigator was, we hope, at least self-consistent—but not between studies.

A frequent point of disagreement between investigators was how to count lines ending in the words heaven, even, hour, bower, flower, tower, power, and friar, all of which may be pronounced monosyllabically to give a masculine ending or disyllabically to give a feminine one. Timberlake's solution was to count both ways, keeping separate tallies based on the assumption that they are all monosyllabic to give his 'strict' count and on the alternative assumption that they are all disyllabic to give his 'loose' count (Timberlake 1931, 5). Likewise, Timberlake separated out—and excluded from his 'strict' count—all feminine endings caused by proper nouns appearing at the ends of lines, which he thought might compel a poet to use feminine endings more often than he was otherwise wont; the 'loose' count included them.

Towards the end of his study, Timberlake applied his findings to various problems of Shakespearean authorship. In the anonymous play Edward III the Countess of Salisbury's scenes show a sharp rise in the rate of feminine endings from well below Shakespeare's norm at 2.1 per cent for the rest to the play to well within his norm of 4–16 per cent for these scenes; Timberlake concluded that it is distinctly possible that Shakespeare contributed them (Timberlake 1931, 78–80, 124). Regarding Sir Thomas More Timberlake could find no clear evidence since it consistently uses feminine endings in more than 18 per cent of its lines, and on that basis alone was probably written after 1596 when all writers began to use this feature more frequently (Timberlake 1931, 80). Dividing by scenes, Timberlake found significant variations in the rates of feminine endings in Titus Andronicus, with rates of 1.9 per cent in 1.1, 2.4 per cent in 2.1, and 1.5 per cent in 4.1. No other scenes fell below 4.1 per cent and most tested significantly higher still, leading Timberlake to suspect that Shakespeare's co-author was George Peele or Robert Greene (Timberlake 1931, 114–18).

Taking an innovative approach to metrical tests, Karl P. Wentersdorf tackled the problem of Shakespeare's chronology by counting four features across the plays: extra syllables (beyond the normal ten) occurring anywhere in the line; enjambment; heavy pauses within the line (marked by punctuation in a modern edition); and single verse lines split between two or more speakers (Wentersdorf 1951). Wentersdorf calculated what percentage of the verse lines in each play contained each feature and then averaged the four numbers to derive a metrical index that encapsulated the total deviation from the norm of end-stopped iambic pentameter verse. Wentersdorf found that genre mattered, with histories scoring consistently low on his metrical index. When the plays are put in their widely agreed chronological order, the general trend over time is towards more deviation from the metrical norm (larger indices), but there are distinct reversals where the figure dips for a particular play (Wentersdorf 1951, 186–7). The temptation would be to reorder the plays so as to achieve a smoothly rising metrical index, but this would be a mistake because we cannot assume that Shakespeare's poetical preferences drifted steadily over time rather than changing fitfully as he tried out new possibilities.

The problem of differentiating Shakespeare's writing from that of his co-author Fletcher was revisited by Cyrus Hoy as part of a series of seven articles on the purported Fifty Comedies and Tragedies by Beaumont and Fletcher, as their second folio of 1679 styles itself. In the first of these articles, Hoy laid out his chief means for detecting Fletcher's writing: 'use of such a pronominal form as ye for you, of third person singular verb forms in -th (such as the auxiliaries hath and doth), of contractions like 'em for them, i'th' for in the, o'th' for on/of the, h'as for he has, and 's for his (as in in's, on's, and the like)' (Hoy 1962, 130–1). Hoy acknowledged that such tests had been used before—most innovatively by W. E. Farnham and A. C. Partridge—and claimed only that his was the first study to apply them all systematically to the whole of a substantial body of pg 36writing. In large part, Hoy's method confirmed earlier divisions of All Is True/Henry VIII and The Two Noble Kinsmen between Shakespeare and Fletcher. The kinds of tests employed by Hoy have widely varying success with different authors, being particularly effective for distinguishing Massinger from Fletcher but less good for others.

A new kind of verse test was introduced by Ants Oras who counted the writers' preferences for where in an iambic pentameter line to place pauses (Oras 1960). The ten syllabic positions in a line give nine locations between those positions where a pause might fall, and for Oras the key metric was not the total number of pauses used in a work but the favoured places for them. Expressed as percentages showing how often each location is preferred, this enabled meaningful comparison between works with many and few pauses overall. Oras believed that these preferences are largely unconscious, which, if true, confers on this phenomenon the merit for authorship attribution of being immune to distortion by cases where one writer is imitating another. But just what did Oras mean by a pause? The punctuation of modern editions is not useful, he decided, because it imposes modern norms (largely derived from rules of grammar) in place of early modern ones, so he used early editions on the grounds that at least 'They keep with the rhythmical climate of the time' (Oras 1960, 3).

Oras counted three kinds of pause. The weakest he called A-patterns, all those indicated by internal punctuation. Next in strength were B-patterns, all those indicated by all punctuation marks except the comma. Strongest of all were C-patterns in which a verse line is split between two speakers (Oras 1960, 3). Oras called the graph he made from the data for a play a 'physiognomy' and was apt to assert individuality by using the phrase 'a physiognomy of their own' (Oras 1960, 23, 27). This was misleading, since he had not established that pause patterns are utterly individualized and distinctive of authorship. Indeed, his study of influence tended to show that trends across time are more strongly marked than authorship. Most visibly, for all writers the dominant pause drifted from the first half of the line (especially after the fourth syllable) to the second half of the line (especially after the sixth syllable) across the course of Shakespeare's career.

Comparisons between Oras's graphs for the Shakespeare and Fletcher portions of All Is True/Henry VIII and The Two Noble Kinsmen show some small differences in profile, although they agree on a pause after the sixth or seventh syllables occurring more often than pauses anywhere else in the line (Oras 1960, 49). Complicating the picture, though, is a distinct visual difference between the graphs for Fletcher's contributions to All Is True/Henry VIII and The Two Noble Kinsmen: the seventh-syllable pause dominates the former while in the latter the sixth- and seventh-syllable pauses are about equally frequent. These are not quite fingerprints, and Oras's findings were not as firmly grounded as his generalizations or claims about them.

Hoy's success in establishing a series of linguistic-preference tests and applying them to a substantial body of drama was inspirational to others in the field. Essentially the same kind of analysis—counting preferences for different ways of saying the same thing—was applied in the 1970s by David J. Lake and MacDonald P. Jackson to the problem of identifying Middleton's work, in the course of which the case for his hand in Shakespeare's Timon of Athens emerged most clearly (Lake 1975; Jackson 1979). Lake made no claims for innovation in the kinds of internal evidence he collected; indeed, quite the opposite: 'the general methods or particular tests I employ', he wrote, have 'all been used over the past fifty years in authorship investigations' (Lake 1975, 10).

One of Jackson's methods had not previously been applied to Shakespearean authorship attribution: the counting of the frequency of occurrence of so-called function words that express grammatical relationships between other words while carrying little or none of their own lexical value. Their role is to bring together the nouns, verbs, and adjectives in order to give a sentence its pg 37foundational structure. Typical function words in the English language are prepositions, conjunctions, articles, particles, auxiliary verbs, and pronouns, although linguists differ on just which words have so little lexical value that they properly belong in this category. The problem of identifying function words is thoroughly explored by Alexis Antonia (2009, 57–69).

The foundational work in this area was Frederick Mosteller and David L. Wallace's attribution of the authorship of the various newspaper essays published anonymously under the heading The Federalist in 1787–8 (Mosteller and Wallace 1963). For these essays the field of candidates was small—just Alexander Hamilton, John Jay, and James Madison—and discrimination between them was aided by the discovery that the word upon was used eighteen times more often by Madison than by Hamilton (Mosteller and Wallace 1963, 278). In Shakespeare studies, of course, the field of candidates is, in principle, always larger than this, comprising all playwrights working around the time of the play's composition. In practice, however, it is often possible to bring in external evidence to narrow the field to only a handful of candidates, as the chapters in this volume show.

Jackson prefaced his book with the observation that the fact that he and Lake had 'independently reached virtually identical conclusions about every disputed and collaborate play associated with Middleton surely constitutes … a vindication of these methods' (Jackson 1979, n. pag.). Jackson devoted a chapter to Timon of Athens (1979, 54–66). Charles Knight long ago suggested that the play is not wholly Shakespeare's but rather represents a hybrid created from an existing play by 'an artist very inferior to Shakespeare' into which Shakespeare grafted various scenes showing the character of Timon (Knight 1840, 333). This opinion was by the middle of the twentieth century much less popular than the theory that the play's unevenness is due to its being essentially experimental and unfinished. Yet being unfinished would not account for the oft-noted variations in certain linguistic forms, and these could not have been introduced during printing since, Jackson noted, Charlton Hinman had demonstrated that the entire play was typeset by one man, compositor B. Jackson showed that the most plausible explanation of the unevenness is that Middleton wrote the parts of the play containing the unShakespearean contractions and the frequent uses of has and does, in contrast to Shakespeare's preferred hath and doth (although over his career he began to adopt the more modern forms).

Jackson was particularly adept at making arguments that depend on the frequency of occurrence of certain features in a printed book because he had long worked on compositor identification and differentiation by habits of spelling and the setting of incidental features. This field of study was inaugurated with high hopes in the 1950s by the Virginian School of New Bibliography, but by the 1970s it was clear that many studies were vitiated by inexpert calculations of likelihood (Egan 2010, 81–99). Where the evidence shows that one or more sets of distinctive features coincide on certain pages or formes of a printed book, the key question is how unlikely it is that one man acting somewhat randomly might produce them. Jackson's work on this problem was unique in bringing statistical rigour to the analysis (Jackson 2001b).

In his adaptation of Mosteller and Wallace's method, Jackson counted the frequency of occurrence of each of thirteen function words—a/an, and, but, by, for, from, in, it, of, that, the, to, and with—in sample writings by Middleton, Cyril Tourneur, George Wilkins, Shakespeare, Thomas Dekker, George Chapman, John Marston, Jonson, Massinger, John Webster, Thomas Heywood, John Ford, Beaumont, Fletcher, James Shirley, Nathan Field, Henry Chettle, William Rowley, John Day, and Thomas Goffe. Jackson described the laborious process of manually counting the occurrences and the various shortcuts he devised to make the endeavour manageable.

Because he wanted to compare the relative use-preferences among his thirteen words rather than their absolute rates of usage across a play, Jackson started at the beginning of each play and counted the occurrences of each word until the total occurrences reached 1,000. Tabulating the pg 38results, he showed how many of the 1,000 words were a/an, how many and, and so on, and for the Middleton canon he gave the mean and the standard deviation. This enabled comparison of the Middleton and non-Middleton plays:

A sample in which the figure for any one of the function words falls outside the limits of three standard deviations above or below the Middleton mean can be regarded as unMiddletonian. A sample in which the figures for three or more of the function words fall outside the limits of two standard deviations above or below the Middleton mean can also be regarded as unMiddletonian (Jackson 1979, 85). Necessarily, this procedure defines a norm for an author and attributes writing that lies far from the norm as someone else's writing rather than being merely an outlier by that author. The required check upon such arbitrary reduction of an author's accepted style to a few arithmetical norms is that we take into account as many kinds of norm as possible and we require a candidate text to fail on several of them before it is excluded from a particular canon. By Jackson's method used here, 22 per cent of the samples of writing by dramatists other than Middleton passed his test for Middletonian authorship (Jackson 1979, 86). To do much better than this would require more data and processing that data would require automated counting by computers.

In his Ph.D. thesis on distinguishing Middleton and Shakespeare's writing, and especially apportioning their shares of Timon of Athens, Roger V. Holdsworth put himself squarely in the tradition of Hoy, Lake, and Jackson (Holdsworth 1982; 2012). Like them, he counted various linguistic features such as contractions and the preference for modern (and urban) you over archaic (and rural) thou, but Holdsworth also introduced the innovation of counting the various formulaic phrasings used in stage directions to find author-specific idiosyncracies (Holdsworth 1982, 181–235). His comprehensive study of the form 'Enter A and B, meeting', in which the placing of meeting makes clear that neither character is already on stage, was the first systematic proof that a recurrent form of stage direction could usefully distinguish authorship.

Without computer automation, the counting of linguistic features was always likely to be incomplete and error prone. The Textual Companion to the Oxford Complete Works was published in the late 1980s when such manual methods had taken the subject about as far as it could go, and its survey of the Canon and Chronology of Shakespeare's writing was a synthesis of the scholarship up to that point (Taylor 1987c). A chief innovation of the Oxford Complete Works (Wells et al. 1986) was its printing of the works in their chronological order of composition, and the merit of this was most tangibly apparent in the various tables and graphs presented in the Textual Companion, since certain literary features were readily explainable in relation to the sequence of writing.

A clear example of this is that a graph of the proportion of rhymed lines to verse lines in each work showed that after writing Venus and Adonis and The Rape of Lucrece in 1592–4 entirely in rhyme—which form he had little used before—Shakespeare's rate of rhyme in his next plays was substantially greater than hitherto, with large spikes for Love's Labour's Lost and A Midsummer Night's Dream in particular (Taylor 1987c, 98). This amply demonstrates that arguments about authorship are inseparable from arguments about chronology, since Shakespeare's norm for a particular feature may well vary over time, and indeed we know that certain verse features such as feminine endings rose in general popularity amongst writers during Shakespeare's career.

The first systematic and extensive application of computer counting methods to the authorship problems in Shakespeare was undertaken by Ward E. Y. Elliott and Robert J. Valenza in response, initially, to the unscholarly question of whether William Shakespeare of Stratford-upon-Avon was an author at all. Elliott and Valenza addressed themselves to the problem of just how far and in what ways an author might reasonably be expected in a particular work to deviate from his norms on the various features counted by the methods described above. Their approach, applied pg 39first to Shakespeare's poetry, was notable for its comparatively high-level mathematical analysis of the numbers thrown up by their counting, including the calculation of such recondite phenomena as co-variance and eigenvalues.

The selection criteria for the words counted by Elliott and Valenza was barely sketched: they were 'chosen mostly from among the more common, but not most common, words of Shakespeare's poetry' (Elliott and Valenza 1991, 204). Towards the end of their article, Elliott and Valenza briefly described other counting they did, and it included features that we might well object were likely to be imposed by scribes, compositors, and editors—'average sentence … length' and 'hyphenated compound words'—as well as other more certainly authorial features such as feminine endings (Elliott and Valenza 1991, 206).

Elliott and Valenza's research arose from an annual undergraduate event called the Claremont McKenna Shakespeare Clinic, which over the years since 1987 has grown its battery of tests and applied them to an increasing number of electronic texts of early modern drama (Elliott and Valenza 1996; 1997; 2004a; 2010a; 2010b). Because they use computers to count phenomena in electronic transcriptions of early modern plays, rather than counting them by hand, Elliott and Valenza's work is subject to certain important constraints. Most electronic texts of early modern plays do not record for each word what part of speech it is, so there is no simple way for the machine to distinguish the various meanings of the three-character string r-o-w, which among other things can be a verb for propelling a boat or a noun for an argument.

Similarly, it is not easy, when using unlemmatized electronic texts, to group all word conjugations and inflections under their dictionary headwords, so that by simple pattern matching the character strings ran, run, runs, and running would be counted by a computer as four words while to a lexicographer they all belong under the single headword run (vb.). This limitation is common to almost all mechanized word-counting techniques, and so long as all the texts are treated the same way it is reasonable to assume that all authors' counts will be equally affected and the effects will cancel one another out. It is in any case not clear whether this limitation is necessarily a disadvantage for authorship attribution, since a writer may well favour certain forms (ran and run but not runs and running), which information is lost if only dictionary headwords are counted. Moreover, the homonyms row (vb.) and row (n.) are not necessarily unconnected in the poetic mind and we do not know enough about linguistic creativity to say that we should treat them as entirely distinct in the way a dictionary does.

Because they were interested in the rates of occurrence of certain linguistic phenomena, Elliott and Valenza found it convenient to divide their dramatic materials into equally sized blocks, with 3,000 words being a typical unit in their early work. This enabled them to perform various kinds of validation of their tests by controlled substitution. For example, by extracting a block of known authorship and treating it as if it came from a text of unknown authorship they could see how often their tests correctly pointed to the known author.

Increasingly, Elliott and Valenza abandoned their tests based on manual counting and relied upon computerized counting, but they also introduced as an innovation Marina Tarlinskaja's classifications of proclitic and enclitic microphrases, described in full in Chapter 23 in this volume (Elliott and Valenza 1996, 201–2). Their methods of choosing which words to count also got more sophisticated. Having 'subtracted wordlists from a 120,000-word sample of six middle [period] Shakespeare plays from a wordlist from 120,000 words of [earlier] plays by Marlowe, Greene, Kyd, and Munday' they were able to derive a list of words that Shakespeare most preferred, which they called his 'badges', and words he liked least, which they called his 'flukes' (Elliott and Valenza 1996, 196).

It is not entirely clear what Elliott and Valenza meant by subtracting word lists from one another. Presumably the lists had counts for each word's frequency of occurrence so that the pg 40badges are those with negative results when Shakespeare's frequencies are subtracted from the frequencies for Marlowe, Greene, Kyd, and Munday (being the words he uses more often than they do) and flukes are those with positive results (used less often by him than them). Amongst the problems of this approach is that six middle-period plays (around one-sixth of his canon) stood for all of Shakespeare's work and only four dramatists stood for all non-Shakespearean writing.

In the event, Elliott and Valenza's testing excluded all the usual rival candidates who have over the years been supposed the true author of Shakespeare's works, and it confirmed that plays already thought likely to be collaborations really are. Although their tests could not have been performed manually—there were just too many tests and too many texts—Elliott and Valenza made no substantial advances in the science of computational stylistics. Theirs were the old methods, speeded up. Theirs was at least a contribution to the field, which cannot be said of the work of Donald Foster whose attribution to Shakespeare of the poem A Funeral Elegy for William Peter created much media interest in the 1990s (Foster 1989; 1996) before being conclusively disproved (Montsarrat 2002; Vickers 2002a).

Foster claimed to have a new technique based on a new database called SHAXICON that 'focuses on Shakespeare's "rare vocabulary"—words used in the canonical plays twelve times or less—and maps it by date, text, and speaking character' (Foster 1996, 1088). Foster sought a chronological correlation between the works in which Shakespeare used rare words—defined as those occurring no more than twelve times in his canon—and the parts he was playing as an actor. After Shakespeare learnt a new role to perform, in his own play or someone else's, the rare words in that role would, according to Foster, appear disproportionately more often in whatever he wrote next, simply because those words were now in the forefront of Shakespeare's mind.

This sounds plausible, but of course we do not know which roles Shakespeare performed so the hypothesis could not be tested. More than twenty years after Foster's announcement that SHAXICON would soon be published on the Worldwide Web, it has not appeared, although what might be the first step—a list of roles for each play—is at the time of writing (May 2016) available on a website using that name. Exactly how SHAXICON might help in authorship attribution was never clearly described by Foster, other than that it might help in finding shared rare words. He claimed that it showed that the vocabulary of A Funeral Elegy for William Peter convincingly 'matches Shakespeare's as it stood in 1612' (Foster 1996, 1089). This may well be so, but, as Byrne insisted, one must do the negative check—asking if it matches anybody else's too—before treating this match as significant. If Foster conducted negative checks it is impossible to evaluate them because the contents of SHAXICON have never been revealed or even closely described. Foster subsequently withdrew this claim when Gilles Monsarrat showed that A Funeral Elegy was written by John Ford.

Also using his own, small collection of electronic texts, M. W. A. Smith performed a series of counting tests that confirmed the long-suspected role of Wilkins as Shakespeare's co-author of Pericles (Smith 1988a; 1989a; 1989b; 1990). Smith's arbitrary phenomenon for his first counts was simply the first words of speeches, finding those most frequently chosen across a range of plays by candidate authors and then comparing those frequencies with the ones for Acts 1–2 and 3–5 of Pericles. Then Smith turned to all two-word phrases, again finding them all in his sample of electronic texts and using the most commonly occurring as the feature he would count in the texts to be attributed. Next Smith counted function words. Smith validated his method by showing that when he treated texts of known authorship as if their authorship was unknown his technique reliably identified their true authors. A notable innovation of Smith's that was to become important was his use of the power of the computer itself to find the words that are most discriminating between pairs of authors, which approach John Burrows was later to refine. A limiting factor to Smith's work was the paucity of early modern plays available in electronic texts at the time.

pg 41In 1994 Jonathan Hope published what he called 'a new method for determining the authorship of renaissance plays' (Hope 1994, xv). In fact, it was not quite new, as the method relied upon counting a number of linguistic choices that earlier investigators had counted. But because of Hope's expertise in sociolinguistics, the field now for the first time had a securely grounded theory of why particular writers made particular choices, and one that subtly took in differences of class and geographical origins and distinguished the habits that drifted over the lifetime of a dramatist from that those that did not. The habits that Hope put on a firm linguistic footing are, however, difficult to count by automated means.

The use of auxiliary do in the choice to say 'Did you go home?' (the modern, regulated form) instead of 'Went you home?' (the early modern, unregulated form) cannot be automated by a simple string-search within an electronic text since do, does, and did have non-auxiliary uses too. The poetic language must be parsed to find the auxiliary uses, and computerized techniques for this have not been perfected. The same is true of the relative markers counted by Hope, and largely true of the you/thou distinction, which is further complicated by frequent elision (as in Th'art for Thou art) and by the choice being meaningful to the characters in the play. That is, the dramatic situation rather than authorial preference might cause a character to address another as thou instead of you. Hope's study appeared just before large databases of early modern play texts became widely available, and his techniques have not been taken up by those using computers to do their counting, primarily because the problem of parsing the text to identify how each term is being used remains too difficult.

Using the principles of A. Q. Morton, Jill Farringdon sought to prove what was, by then, already suspected: that Shakespeare did not write A Funeral Elegy for William Peter (Farringdon 2001). Farringdon's essay is worth considering because it usefully illustrates that a correct conclusion can be reached by invalid methods and that we must always reject invalid methods even when their results are attractive. Farringdon began by explaining that function words form a large part of everybody's writing and that the same ones dominate the lists of words most frequent in Henry Fielding's novel Joseph Andrews and, more than 200 years later, in Dylan Thomas's Collected Poems. 'This surely confirms the usefulness of using these vocabulary items for recognizing authorship' (Farringdon 2001, 161). Of course, it does no such thing, since distinctiveness not ubiquity is the quality we seek. We all have ears and fingers, but as Mark Dallagher's prosecutors were finally forced to admit, only the latter leave prints that are distinctive of their owner.

Farrringdon's method, called cusum analysis, was a way of processing the counts of any linguistic phenomena; we may demonstrate it using the simple metric of sentence length measured in words. For each sentence in a block being examined, the method subtracts that sentence's length from the mean sentence length for the block, giving a positive number for short sentences and a negative number for long ones. This produces a series of positive and negative numbers (S1 to Sn), where n is the number of sentences in the block, and for which the cusum series is (S1), (S1+S2), (S1+S2+S3), up to (S1 … + Sn). Suppose a block of seven sentences has sentence lengths of, in turn, 8 words, 8 words, 9 words, 5 words, 6 words, 7 words, and 6 words. There are 49 words in total, so the mean sentence-length is 7 words (49 words divided into 7 sentences). The differences from this mean are, in turn, –1, –1, –2, 2, 1, 0, and 1. Adding these cumulatively gives –1 (= the first number), –2 (= the first two numbers added together), –4 (= the first three numbers added together), –2 (= the first four numbers added together), –1 (= the first five numbers added together), –1 (= the first six numbers added together), and 0 (= all seven numbers added together). A cusum series always ends with zero because the total of differences from the mean must add up to zero, since that is how a mean is defined.

A cusum graph is a trace showing how much variation there is in a particular writing habit (here, sentence length) across the text, and presented so that at any one point the total variation pg 42so far from the block's eventual norm is visible. This is not, it should be noted, a new stylometric method—it depends on the old technique of counting sentence length, word length, and so on—only a new way of processing the numbers that the counts produce. The same counting can be repeated for any habit, such as use of two-, three-, and four-letter words or frequencies of function word use. Farringdon's claim (based on Morton's) was that for a single writer the plot of total variation so far for one habit (say, sentence length) should be the same shape as the plot of total variation so far for another habit (say, use of two-, three-, and four-letter words), allowing necessary rescaling of the y-axis between the two plots. In other words, a writer's pattern of deviation from her own norm in one feature should be the same as her pattern of deviation from her own norm in the other.

The test, then, is to combine a piece of known authorship with the piece to be tested. If the resulting composite text is homogeneous in the way that Farringdon defined it—departing from the measured norms by the same amounts across the text—then the known author wrote the text being tested. In fact, as Giuliano Pascucci shows in Chapter 24 in this volume, the technique of creating a composite text and then measuring its homogeneity really can be a good test for authorship so long as one defines homogeneity properly, as he does using entropy. But there is no reason to suppose, as Farringdon did, that all the variations from the norm in a text must develop at the same rates across a text. In her essay, the graphs offered simply do not have the features that she ascribed to them in her prose analysis—the various lines separate earlier or later than she claimed—even with her credulity-straining rescaling of the y-axis.

In the 1980s and 1990s the Chadwyck-Healey company began to pay for the keyboarding of large quantities of out-of-copyright English literary texts in order to sell them as searchable electronic collections on CD-ROM—under the titles English Poetry, Early English Prose Fiction, English Verse Drama, and English Prose Drama—that were later combined to form a unified web-hosted database called Literature Online (LION). Having effectively all of English literature in one searchable database transformed the field of Shakespearean authorship attribution because it was at last possible to perform rapidly, and more or less definitively, the negative check demanded by Byrne. An investigator could now assert with some confidence just whose writing did and did not contain a particular feature by which she was attributing authorship. The first to put this potential into practice was Jackson in a series of articles (Jackson 1998; 1999b; 2001a; 2001c; 2001d) and then a ground-breaking book that established the co-authorship of Pericles beyond any reasonable doubt (Jackson 2003a).

Jackson's method has been refined over the years, and forms the basis for many of the studies in the present volume. The key feature that characterizes the approach is that words from the text to be attributed are searched for in the LION database, either as complete phrases (say, 'purple mantle torn') or as collocations ('purple NEAR mantle NEAR torn'). So long as LION contains works by all the possible candidates for authorship, every author has, as it were, a chance of using the same phrase or something like it, and the foundational assumption is that the true author of the text to be ascribed is likely to do this more often than other authors because, consciously or unconsciously, he favours that phrase.

Unfortunately, if one author has rather more writing in LION than the others then, all other things being equal, those writings have a disproportionately greater chance (or rather pseudo-chance) of turning up the same phrase or collocation, so steps must be taken to adjust for canon size. (The writings in the LION database are a determinate, known quantity, so we are using the notion of chance somewhat elastically here, which is a point we will return to shortly.) The necessary compensation for differing canon sizes can be accomplished either by restricting the searching—excluding from the searching some writings by some writers in order to equalize for each writer the body of writing being searched—or by giving a low weighting to (that is, treating pg 43as less significant) the hits for an author who has a large body of writing in LION and giving a high weighting to (treating as more significant) the hits for an author whose representation in LION is small. All the chapters in the present volume explicitly address this problem of differing canon sizes where it affects their method.

LION contains searchable texts of virtually all that we call English Literature, but this is far from being all published writing of the period. Around the time that Chadwyck-Healey launched LION, a partnership, the Text Creation Partnership (TCP), was formed between the university libraries of Michigan and Oxford, the corporation ProQuest (which bought Chadwyck-Healey and hence LION in 1999), and the US non-profit organization called the Council on Library and Information Resources (CLIR), with the goal of keyboarding all first editions of books published in Britain up to the year 1700. Virtually all these books were already available as digital images taken from the microfilms of ProQuest's Early English Books collection. The TCP transcriptions of these books are sold as an additional service for ProQuest's Early English Books Online database to make a composite called EEBO–TCP and to date (May 2016) it has released transcriptions of around 44,000 of the roughly 130,000 books in EEBO, which latter figure is supposed to be all those published in Britain up to the year 1700. In negative checks for authorship attribution claims, searches of LION are now typically supplemented by searches of EEBO–TCP.

We have seen that a grave demerit of Foster's publications reporting alleged findings from his SHAXICON database was that no one could check his work. SHAXICON was and is not published anywhere. Without the possibility of replication, an investigator's claims are effectively worthless. The rigour of having other investigators trying to reproduce someone's results is the best tool we have for rooting out investigator error or bias. Just as Foster was withdrawing from the field with the demolition of his claim that Shakespeare wrote A Funeral Elegy for William Peter, Brian Vickers, who prior to this had been a long-time antagonist to authorship attribution studies, was entering it. Just as Foster had drawn on his own collection of texts, Vickers in turn used his own collection of early modern literary electronic texts. As shown in my chapter elsewhere in this volume (Chapter 4), Vickers's database either lacks many of the texts he needs to search to substantiate his claims, or if they are there his searching method is failing to find them. (This illustrates the important difference between telling other investigators that you have included in your database what you think is the whole of an author's canon, as Vickers does, and showing that you have the whole canon by making the database available electronically for others to inspect and, if necessary, point out what they think are the omissions, or be able to interrogate different results from attempts to replicate the same work.)

When the Oxford Complete Works appeared in 1986–7 it was the first major edition to take seriously the nineteenth– and twentieth-century scholarly discoveries about Shakespeare's habits of co-authorship. Reviewing the edition's Textual Companion, and particularly Taylor's 1987 survey of 'The Canon and Chronology of Shakespeare's Plays', Vickers was scathing of it on precisely this head, finding that it relied on the work of 'a very miscellaneous group of scholars who tried, over the last century, to quantify Shakespeare's style' (Vickers 1989, 410) including Chambers, Wentersdorf, and Oras. These scholars' studies were, according to Vickers, utterly vitiated by their use of nineteenth-century editions of the plays so that 'whatever advances have been made in textual studies since then go for nothing' (Vickers 1989, 410). At this point in his career Vickers was sceptical of co-authorship—'so often bruited in the past and so often discredited for inadequate evidence' (Vickers 1989, 405).

Thirteen years later in Shakespeare, Co-Author Vickers revised his position and championed the same scholarship that he had earlier dismissed in his review, commiserating with the early pioneers—singling out Chambers, Wentersdorf, and Oras especially—and lamenting 'the ingrained resistance that still exists whenever the question of Shakespeare's co-authorship arises' pg 44(Vickers 2002b, 138). Vickers's book masterfully synthesized previous scholarship and brought to a general Shakespearean readership the simple conclusion supported by copious evidence that Titus Andronicus, Timon of Athens, Pericles, All Is True/Henry VIII, and The Two Noble Kinsmen are co-authored plays.

Although Vickers did not acknowledge that his former scepticism about co-authorship had been overcome, in empirical studies, unlike literary criticism where Vickers first established his reputation, investigators do readily change their minds when new evidence emerges. Asked if anything could shake his faith in the theory of evolution, the biologist J. B. S. Haldane is supposed to have replied 'Oh yes: finding rabbit fossils in Precambrian rocks'. Shakespearean authorship specialists need to adopt the scientific approach to knowledge.

In the same year that Vickers entered the field, John Burrows announced a new way of processing the rates of frequently occurring features such as function words, called Delta (Burrows 2002a; 2003), and, even more importantly, he went on to develop a new way of selecting just which words to count, called Zeta (Burrows 2007). The Delta method is described in detail in Jack Elliott and Brett Greatley-Hirsch's contribution to this volume (Chapter 9), and its key innovation is that it discounts the importance of words for which a set of authors is demonstrably variable in their rates of usage and weighs more heavily the evidence from words that they use at consistent rates. Moreover, Delta puts on an even footing words that are used at different rates of frequency, as it measures variations in rates of usage, not the absolute numbers of occurrences. When comparing a text to be attributed to the texts in the comparison set, Delta finds where the unknown text uses certain words more and other words less often than the average for the comparison set and finds where a particular author's contributions to the comparison set also show the same pattern of favouring the same words and disfavouring the same other words.

This principle of identifying on a case-by-case basis the words that are most discriminating between various authors, rather than relying on predetermined lists of words, also underlies Burrows's second innovation, the Zeta test. As a first step, the investigator establishes two sets of texts, each being the securely attributed works of a single candidate author or a group of authors. The software of Zeta, built into the Intelligent Archive software developed by Hugh Craig and others at the University of Newcastle in Australia, finds for itself the words that most distinguish these two sets, being especially common in the first set and especially uncommon in the second, and vice versa. The vice versa step means that the investigator has two lists of words, both of which are good discriminators between the two sets of texts.

When the numbers of occurrences of the discriminating words in each of the texts in the two text sets are plotted on an x/y graph—x for counts of words favoured by the first set and disfavoured by the second, and y for counts of words disfavoured by the first set and favoured by the second—the texts' scores fall into two distinct clusters: high-x/low-y for texts in the first set and and high-y/low-x for texts in the second set. This is just as we would expect since Zeta was made to find the words that would produce this outcome. Then the investigator has Zeta count the occurrences of the discriminating words in the text to be attributed and plot this on the same x/y graph. If the text to be attributed shares the word-preferences of one of the two text sets, its x and y values will place it near or within that set's cluster on the graph.

If the sets are chosen to be, say, Shakespeare plays on the one hand and Marlowe plays on the other, the Zeta method becomes for that application a good discriminator of these two writer's styles. One of the sets may be a multi-writer collective, so that the test may be, say, Shakespeare versus Marlowe + Greene + Peele + Nashe. As Burrows showed, and Craig confirmed with a great many validation runs for this technique (Craig and Kinney 2009d), when the investigator takes a text of known authorship out of one of the sets and reruns the experiment as if this text were of unknown authorship—without letting this text help choose the discriminating word lists—the pg 45correct author is identified with reliability that typically (depending on who is being tested) exceeds 95 per cent accuracy. Zeta is by some way the most powerful general-purpose authorship tool currently available. The Intelligent Archive software containing it is freely downloadable together with its source code so that any investigator can see exactly how it works.

Conclusion: Probability and Authorship

Contrary to popular belief, human beings are excellent at estimating probabilities. So long as the probabilities are ones we would have encountered on the plains of Africa 200,000 years ago and excellence is measured by speed of computation, human beings are consummate approximators of complex, multi-component risks. The risks of modern life, however, are poorly judged by our mental apparatus until it is supplemented by written symbols and the rules given us by the culture of mathematics. Popularizations of the mathematics of probability often begin by demonstrating how bad our innate capacities in this area are. Most of us wildly overestimate the likelihood that we have a rare disease if given a positive reading by a moderately reliable test for it, and we wildly underestimate the capacity for mere chance to produce apparently unlikely results. We are astounded to find that two people at a party share a birthday when in fact nine times out of ten any group of forty persons will contain one or more shared birthdays. The collective inability of judges, lawyers, and juries to evaluate probabilities has repeatedly incarcerated the innocent.

Educated people generally know that probabilities can be multiplied and divided. If I want a 3 from this six-sided die and a 2 from that one, the likelihood that I will get them both in one throw of the two dice is 1/6 × 1/6 = 1/36. If I do not care which die shows which number so long as I get a 3 and a 2 then I have doubled from one to two the number of outcomes I will accept—a 3 and a 2 or a 2 and a 3—and the probability of satisfaction correspondingly doubles to 1/18. This principle can easily be misapplied. In 1998 the British solicitor Sally Clark's second child died of cot death (Sudden Infant Death Syndrome), just as her first child had done two years earlier. The country's leading paediatric expert, Professor Sir Roy Meadows, convinced a jury that the likelihood of two such tragedies occurring by chance was the likelihood of one such tragedy (1/8500) multiplied by the same likelihood for the second (1/8500), making a combined likelihood of just 1 in 73 million.

Meadows had his licence to practise medicine revoked in 2005, but not before Clark had served three years for murder, which trauma on top of the loss of her children contributed to her early death in 2007. Meadows wrongly assumed that cot-death events are statistically independent, so that after one cot death the likelihood of a second remains unchanged. Certainly, after flipping a fair coin and getting heads ten times in a row, the likelihood of getting an eleventh head remains 1/2 despite the preceding run; these events really are statistically independent. But the cause of cot death is unknown and an innocent common factor—genetic disposition, home environment, sleeping position—might underlie two deaths. More simply, double child-murders are exceedingly rare, so that on likelihoods alone another explanation besides murder was more plausible.

A key problem for us is deciding which phenomena in authorship attribution are statistically independent (like coin-tossing) and which may be linked (like cot deaths). In particular, if we can show that certain tests for authorship are independent then we may with confidence multiply their accuracy rates when they point the same way. Responding to scepticism that Fletcher co-wrote All Is True/Henry VIII—especially from R. A. Foakes in his Arden2 edition of the play (Foakes 1957)—Marco Mincoff made the point that multiple independent tests (metrical, lexical, grammatical) indicate Fletcher's presence and 'each new test, no matter how little decisive by pg 46itself, increases the probabilities in a steady geometrical progression, resulting very soon in almost astronomical figures' (Mincoff 1961, 253).

Mincoff's point about multiple independent tests needs to be made afresh now because it has not been widely appreciated in Shakespeare studies. James Purkiss recently cited with approval Alois Brandl's claim that 'a hundred unreliable arguments … do not together make a reliable one' (Purkiss 2014, 153). Strictly speaking this is true, but Purkis ought to have observed that it takes only a moderate reliability in the individual tests upon which arguments are built for the power of multiplication to make the accumulated reliability quickly mount up. That is why highly reliable computer systems can be built out of relatively cheap and unreliable components, for example in a Redundant Array of Inexpensive Drives (RAID) that uses data-redundancy techniques to turn individually fallible hard disks into a collection that is virtually infallible. Each part might be so likely to fail that we would not rely on it for anything important, but string them together so that they all have to fail at once for disaster to strike and we have something we can bet our lives upon. And we do, every time we fly.

To be specific, let us suppose that three independent tests for authorship are 65 per cent, 75 per cent, and 80 per cent reliable. That is, the first will give the wrong answer more than one time in three that it is used, the second will be wrong one time in four, and the third one time in five. These are unreliable tests when used on their own, so how reliable are they when used together? Perhaps surprisingly, if they all point to the same author for a particular text then the likelihood that this person is not the author is lower than one chance in 57, or (1 – 0.65) × (1 – 0.75) × (1 – 0.8).

This is not to say that we have achieved perfection in authorship attribution tests. Far from it. Considerable problems remain in the practical implementations of tests and in our statistical analyses of the results. Importantly, though, all of the fresh attribution claims made with confidence in the New Oxford Shakespeare are based on multiple, independent tests pointing the same way. This is how the field progresses. When Hope came up with his new sociolinguistic tests for authorship, he applied them to the problems of Fletcher and Shakespeare's shares in All Is True/Henry VIII and The Two Noble Kinsmen and found what everyone else had found by other means. Being manual tests, Hope's can be replicated by anybody willing to take the trouble to learn the distinctions being made and laboriously count their occurrences.

There remain two especially significant obstacles in computer-aided authorship attribution, one practical and one philosophical, and they are related. The practical problem is that the canons of various authors we wish to test are not of an equal size. An extreme case is Kyd's dramatic canon containing only one universally agreed upon play, The Spanish Tragedy. When we aggregate the scores for phrases or collocations that are shared between a play whose author we seek and the plays in LION, we can see right away that any random set of phrases we might look for are more likely to have a match within the dozens of plays by Shakespeare than within the single play by Kyd. We feel compelled to make some adjustment to put authors on an equal footing, but of course reducing every candidate author's canon to one play is hardly the right response. We could in principle try adjusting the weighting of the matches so that we assign more significance to a match with Kyd than a match with Shakespeare, but just what the correct weighting would be is hard to say and investigations in this area are in their infancy.

One reason why it is not obvious how to weight the differences in canon size is that we are using probability in a somewhat metaphorical sense. There clearly is a meaningful sense in which, prior to throwing a die, I may assert that each particular number between 1 and 6 will have a 1/6 chance of coming up. But in what sense can I say that Kyd will have a smaller chance than Shakespeare of matching a phrase from a particular text? There is no random event, no throw of the die, in such a case: the phrase either does or does not feature in Kyd's writing even before I look for it. At best we may say that prior to looking for the phrase in the Kyd canon there pg 47is something of a pseudo-chance of finding it based on a likelihood derived from the canon's size (in relation to other canons tested) and treating the phrase we are looking for as if it were a random phrase.

If we find a phrase in a place where we do not expect it, say in the Kyd canon, any likelihood we apportion to that outcome needs to be carefully explained since there was no random process at work, no die was thrown. The figure given for such a probability, say 0.01, might be intended to convey what we would expect to happen if a process were repeated many times. That is, 0.01 might mean that if we searched one hundred times for different random phrases we would expect that just once would one of those phrases turn up in Kyd's writing. On the other hand, if we say that there is a 0.01 chance that Kyd wrote Edward III we clearly do not mean that some process might be repeated 100 times with the result that on one occasion Kyd wrote Edward III. In this case, 0.01 is something more like a statement of plausibility than probability in the die-throwing sense.

Probability is a slippery philosophical concept that brings in complex problems from epistemology and information theory, and its full implications are beyond this historical survey. However, it is clear that aside from all other obstacles to its widespread acceptance—not the least being the unashamed near innumeracy of many humanists—the field of authorship attribution will need in future to ground its uses of probability theory in well-explained philosophical principles that enjoy general approval.

logo-footer Copyright © 2024. All rights reserved. Access is brought to you by Log out