A review of Linked Archives, a digital tool for linked data in archives, led by Jason Ensor and Helen Bones
Jason Ensor, Western Sydney University
Helen Bones, Western Sydney University
Spencer D. C. Keralis, Southern Methodist University
Jason Ensor and Helen Bones
Linked Archives is a digital tool developed at Western Sydney University as part of the ARCHIVER project led by Jason Ensor (2016-19). Linked Archives is a workflow and model for exploiting the explanatory power of archives in a way that allows other researchers to use the resultant annotated data set, build on it, and, if desired, link different datasets together to form a digital mega-repository. In this way, big questions can be asked of multiple datasets.
The aim of the project is to revamp an archive curation tool and explore ways we can use it to harness the untapped potential of large manuscript collections. In particular, we are interested in the potential of Linked Archives to unravel the complexities of international systems that elide traditional research structures, such as the international book trade. The initial iteration houses and displays over 19,000 digital images of documents from the business archive of iconic Australian publisher Angus & Robertson from the State Library of New South Wales manuscript collection.
The final product is a tool that enables reflective humanistic practice by using metadata to think through thousands of historical documents and also allows all images and data to be downloaded in forms (XML, RDFa, Turtle, N-Triples) highly suitable to external digital environments exploiting linked open data concepts (as well as CSV). Like many historical sources, documentary metadata can be incomplete but Linked Archives supports scholars building an understanding of formerly print-based Australian historical materials by working with, and adding to, their metadata.
More widely, the project’s methodology around metadata curation is already seeing national and international impact. An important gap that Linked Archives is filling is the provision of a digital framework that merges four data sets — digitized data of unrestricted archival records (scanned analog materials), bibliometric data from linked resources, institutional data (which includes information about a collection’s or document’s accession and arrangement), and qualitative data (information like annotations and tagging which describe an item) — in ways that respond to user expectations of access, delivery online, and free-text search of analogue materials. Several invited presentations have been made by Jason Ensor and Helen Bones at the Angus & Robertson Symposium, State Library of New South Wales (an event inspired by the ARCHIVER project), the Australian Library and Information Association (ALIA) Information Online Conference, and the Division of Literatures, Cultures and Languages at Stanford University (United States of America). These talks have catalyzed further talks and interest from other institutions. Linked Archives has been funded by the Australian National Data Service (now the Australian Research Data Commons).
Spencer D. C. Keralis
The relationship between digital preservation and digital humanities is one of the many contested borderlands between the humanities and the information sciences. Digital humanities has typically referred to the development and use of digital tools and computational methods to complement and expand humanistic inquiry, whereas digital preservation, including the large scale digitization projects on which many digital humanities projects rely, fall firmly in the province of digital librarianship, where practitioners generally self-identify as information scientists. This boundary has been troubled recently with the advent of the Collections as Data movement (Padilla et al 2017), and there have been interventions seeking to expand the definition of digital humanities to include the “curation, preservation, and dissemination of archival materials” (Brannock et al 2018, 165). Linked Archives is a project that constructively bridges this divide, finding an interesting middle ground between digital scholarship and digital preservation.
Funded by the Australian National Data Service (ANDS), Linked Archives evolved out of Jason Ensor’s dissertation work and includes digitized copies of “noteworthy parts” of the Angus & Robertson archive of materials related to publishing and book selling in 20th-century Australia. While the collection of 18,000 images represents “only a small fraction of the total State Library holdings,” as the project information page notes, it is a rich trove of correspondence and manuscript primary sources for anyone interested in the history of the book and the business of publishing in Australia. Complementing the digitized curation, preservation, and dissemination of archival materials, each item includes robust metadata derived from the holding institutions’ records. The metadata is downloadable in multiple formats to enable analysis and exploration, and the project does include some online visualization tools.
The humanistic value of the project lies in the ways it reveals social networks across institutions, collections, and manuscript materials through the semantic links within the metadata. The visualizations derived from the metadata enable snapshots of correspondence networks that reveal relationships among businesses and individuals, which would be challenging and incredibly time consuming to derive from physical archives. The site provides a particularly rich environment for exploring connections between documents in these collections. This is one of the best implementations of linked data on a project scale that I’ve seen, and it allows for implementation of the broad principles of collections-as-data in constructive and interesting ways.
The project organizers make a generalized claim that “big questions can be asked of multiple datasets,” but as implemented those questions must be asked exclusively of the metadata, not the materials themselves. While generally digitization is not preservation, in this instance providing access to digital copies of these fragile materials does contribute to their long term viability. However, the absence of optical character recognition (OCR) eliminates full-text searching and text data mining as options for exploring the collection. While the aim of the project is primarily technological — the implementation of linked data to explore relationships among the documents — some humanist users will likely find this omission frustrating.
There are further challenges in the site in terms of both usability and accessibility. Some of the ledgers are photographed on an angle, which makes the tabular information difficult to read and would also make OCR very challenging. I don’t find a rationale for this practice in the project documentation but speculate that it may be because of shine on the original pages or other readability challenges in the original manuscripts. The project may benefit from the addition of a description of the digitization process and the decisions involved in formatting the images as they appear. Further, the display images lack descriptive alt-text, though one struggles to imagine what alt-text for a manuscript ledger might look like. The descriptive metadata on the display page likely provides enough context for users of adaptive technology, but without OCR their engagement with the materials would be limited to the metadata alone. These usability and accessibility issues ultimately point to the challenges of digital projects that span institutions and projects that work with legacy, digitized materials. Without shared commitments to accessibility that goes beyond user experience (UX) testing or accessibility compliance, subsequent projects will have to address these oversights. These are emerging challenges for digital humanities and digital preservation broadly as fields, not just for this project.
Overall this is an excellent digital preservation project that provides access to a rich archive or materials documenting the business of publishing in Australia. As a collections as data project, it is quite successful and provides an entrée for the innovative use of metadata to explore the networks documented in the materials. The project is ripe for further development to fulfill its potential — including providing text and tabular data derived from the materials — and offers opportunities to experiment with improvements to accessibility for archival and manuscript materials to increase access to a wider range of users. While the distinction between digital humanities and digital preservation may not be as hard and bright as has traditionally been thought in both the humanities and the information sciences, Linked Archives adumbrates the challenges of bridging these distinct fields effectively, and of meeting the demands of the eclectic interdisciplinary audiences that make up the broad digital scholarship community.
Brannock, J., C. Carey, and J. O. Inman. 2018. “Starting from the Archives: Digital Humanities Partnerships, Projects, and Pedagogies.” In Digital Humanities, Libraries, and Partnerships, edited by Robin Kear and Kate Joranson, 163-176. Oxford: Chandos Publishing. DOI: 10.1016/B978-0-08-102023-4.00012-4.
Padilla, T. L. Allen, H. Frost, S. Potvin, E. Russey Roke, S. Varner. 2018. Always Already Computational: Collections as Data. DOI 10.17605/OSF.IO/MX6UK