The Princeton Prosody Archive
Amanda Henrichs, Amherst College
Meredith Martin, Mary Naydan, and Rebecca Sutton Koeser
The Princeton Prosody Archive (PPA) is a full-text searchable database of English-language works on prosody (pronunciation and versification) published between 1570 and 1923, the only resource of its kind. The PPA includes historical documents about poetry, grammar, literary history, phonetics, and phonology divided into seven scholar-curated Collections that highlight these overlapping discourses and offer pathways into the archive. The PPA is a springboard for new scholarship in historical poetics, linguistics, and eighteenth- and nineteenth-century poetry. We feature essays about the PPA’s use in research and pedagogy in the Editorial section of the site, as well as essays by project team members that further detail the technical process of creating the site, its many unique data problems, and new directions for scholarship. As a curated corpus with unusual text and images, the PPA poses interesting challenges for scholars working with computational methods. Efforts are currently underway to make a token-based corpus available for computational use.
An ongoing project since 2009, the PPA publicly launched its third, most recent, version of the site in March 2019. PPA 3.x is implemented as a custom Python/Django web application. It was created following rigorous software development best practices and underwent extensive usability testing; the open source, documented, and citable code base is available on GitHub. A curated set of HathiTrust materials are made locally accessible to the application as a dataset retrieved via rsync, and minimal records are imported into a relational database to enable data curation and management. Metadata and full-text page content for HathiTrust materials are indexed in Apache Solr, which powers the archive search. Unlike other digital library interfaces, the PPA allows users to search metadata and full text at the same time, with results immediately contextualized by matching thumbnail images and text snippets. PPA users can further refine their searches by excluding or including items based on the scholar-curated collections. The current version supports only HathiTrust content and manually entered, metadata-only records, but was designed with the goal of supporting full-text content from other sources. Nothing in the site functionality is specific to prosody, and this project was built so that it could be extensible for creating other curated collections from HathiTrust materials on any subject.
The PPA is spearheaded by Meredith Martin, Associate Professor of English and Director of the Center for Digital Humanities (CDH) at Princeton University. As a sponsored project of the CDH beginning in 2016, PPA 3.x is the result of a truly collaborative effort by a project team of nine members, including Technical Lead Rebecca Sutton Koeser; Developers Nick Budak and Benjamin Hicks; User Experience Designers Xinyi Li and Gissoo Doroudian; Project Coordinator Rebecca Munson; and Project Managers Meagan Wilson and Mary Naydan. A full list of the project’s history and project contributors, including an advisory board of leading scholars in related fields, is available on the site. The PPA has been supported by several grants, and Martin has presented work related to the PPA across the globe. The project is cited in several recent scholarly books, articles, and digital commons; a partial list is available on the PPA site. The PPA project team is currently working on expanding the core collection of public domain HathiTrust materials by partnering with Gale-owned databases such as Eighteenth-Century Collections Online and including bibliographic records for non-English prosodic resources.
The Princeton Prosody Archive collects works on English prosody published between 1570 and 1923. All works are in the public domain; they are pulled from HathiTrust, an online database of works digitized by libraries and institutions worldwide. The PPA is intended most broadly for scholars of historical poetics: it collates an impressive number and variety of writings about prosody published in English. The PPA is an outstanding project marked by a coherence of design, vision, and scholarly rigor, evident in the detailed documentation of every level of the project. The backend code is opensource and citable on Github; the search function is built on Apache Solr; and the texts called up from HathiTrust are kept up to date via rsync. The frontend design is accessible, attractive, and easily navigable, implemented as a custom Python/Django web application. I would especially like to point to the impressive coherence and simplicity of the user interface, built in version 3.4.2 by Gissoo Doroudian. This is a truly custom-built project, designed from the ground up and benefiting from funding sources such as Princeton’s Center for Digital Humanities and the Mellon Foundation. The PPA has been cited in several earlier iterations in books and edited collections, and the director Meredith Martin has presented the project both nationally and internationally.
The most useful feature of the PPA is the division of its resources into Collections, including such categories as “Typographically Unique”—for works including musical notation or unusual diacritics—“Dictionaries,” and “Word Lists.” Users are able to search within a single collection or across them all; each work is marked up as belonging to one or more collections, enabling cross-navigation. This cross-categorization provokes interesting humanistic questions around the fuzzy borders between rhetoric, poetry, and oratory; a work like Samuel Johnson’s Dictionary belongs to both “Dictionaries” and “Literary” Collections, for example. Most broadly, the PPA provides scholars with an invaluable resource for understanding the history (and historical contingency) of concepts like meter and rhyme which have become fundamental to literary studies as a discipline.
Searching through the archive is simple and rewarding; the team has spent years de-duplicating HathiTrust’s records, and the result is a clear list of texts containing the search terms, without some of the frustrating redundancy of a search in Early English Books Online or Eighteenth-Century Collections Online. The returned citation records can be exported to Zotero. The results also contain a link to the HathiTrust digitized text, and the researcher can view high-quality images as well as OCR-generated text. The PPA’s next step is apparently to work with HathiTrust to create token-based corpora; this will be an exciting development. As it stands, however, the plain-text view of a result is almost always illegible due to OCR errors. Thus, a researcher can read the digitized book, but cannot rely on any kind of transcription. This is probably the biggest weakness of an all-around excellent project, which will more likely affect researchers interested in earlier periods—the quality of the OCR degrades progressively the earlier the text, often due to special characters such as the long s—or researchers who wish to perform computational analyses. However, for scholars of English prosody in the eighteenth and nineteenth centuries in particular, who do not need access to token-based versions of the texts, the PPA is an absolutely invaluable resource for considering the histories of English-language versification.