The Distant Viewing Toolkit
Patrick Sui, University of Texas at Austin
Taylor Arnold & Lauren Tilton
The Distant Viewing Toolkit (DVT) is a software library designed to facilitate the computational analysis of visual culture. The toolkit applies state-of-the-art computer vision algorithms to digitized collections of still and moving images in order to automatically create structured data. Features represented in the structured data include dominant colors found in the images, cut-detection, object detection and recognition, face detection and recognition, and image embeddings. The features map onto concepts from fields such as film, media, and visual culture studies. For example, face detection can be used to analyze shot type. The structured features can be aggregated and analyzed to find patterns across a corpus or used to increase search and discovery within a large archive. The toolkit is provided as a module for the open-source programming language Python, a popular language for research in both computer vision and machine learning.
The design and motivation for the toolkit also comes from the authors’ theoretical work on “distant viewing,” the concept that the challenges of a computational analysis of large visual corpora can be understood in terms of visual semiotics and communication theory.1 In conversation with critical work on computer vision, the toolkit provides access to and control over how and what a user decides to view through computer vision algorithms. The toolkit allows users to apply a sequence of algorithms to a corpus with various degrees of customization. The default pipeline works well for many analyses. For more control, and to improve performance on some input types, the toolkit makes it straightforward to modify additional tuning parameters; it is possible, even, to create entirely new annotation algorithms within the framework of the software.2 There is also a module included in the toolkit for visualizing and checking the automated annotations through an interactive website. The toolkit, therefore, is designed to give users the flexibility for customization to pose the questions that animate scholars who work with digital humanities methods, while understanding the assumptions being made by the software.
The primary audience in mind for the toolkit includes people working with collections of still and/or moving images who have some experience writing code but no specialized knowledge with computer vision or machine learning algorithms. In order to make the toolkit accessible, we have developed numerous resources and workshops for introducing the method to audiences from different backgrounds. The current project website has a direct link to a Google Collaboratory environment, which facilitates users from any background to get started with the toolkit.3 Another link provides a demo of the possibilities of visualizing extracted data from several feature-length films. In June 2019, we ran a week-long workshop through the Humanities Intensive Learning and Teaching (HILT) program. These notes, including Python notebooks, are available for a more extensive introduction to computer vision and its applications in the humanities.
DVT was developed with attention to best-practices for software development in the digital humanities and open-source software communities. We follow the guidelines established by the Journal of Open Source Software, include an open-source license (GPL-2), have extensive unit testing, and use the Contributor Covenant Code of Conduct (v1.4). The project has numerous forms of documentation, including in-line Python comments, a full API documentation, and several online tutorials. Additionally, a peer-reviewed technical paper outlines the major design decisions of the toolkit.4 In order to be consistently mindful of both our technical and humanities-oriented audience, every stage of development was done in direct collaboration between the project directors who have a background in statistics and data science (Arnold) and visual culture studies (Tilton).
More information about the toolkit is available on the project website, in our technical design paper,5 and in published research papers outlining our applications of the toolkit to the large-scale analysis of television series6 and historic photography.7 The toolkit is built and maintained by the Distant Viewing Lab (www.distantviewing.org). Development of the toolkit has been supported by the National Endowment for the Humanities Office of Digital Humanities (HAA-261239-18).
The Distant Viewing Toolkit (DVT) is an open-source project that gives humanities scholars access to the latest computer vision and machine learning algorithms to interrogate visual culture. Driven by the theory of distant viewing, it aims to lay bare “the interpretive nature of extracting semantic metadata from images” by unpacking “a representation of elements contained within the visual material” with digital methods.8 The project engages in meaning making for large visual corpora by using algorithms that interrogate how raw materials are converted into semiotic symbols, a cognitive process that happens unconsciously while we view pictures and videos. While the title of the tool might imply objectivity, DVT actually asserts that all visual content and the technologies that are used to produce and analyze them are socially and culturally constructed. It echoes much of what American studies, visual studies, and cultural studies more generally have been arguing for decades about photographs, moving images, and other forms of visual representation. From detecting narrative arcs within television and film to identifying stylistic features in photographs, DVT offers humanists another avenue to analyze the plethora of visual evidence in digital and born-digital forms.
While DVT provides technical tutorials and an API to users in addition to the Github repository, future developments for the project might consider broadening its appeal to teachers. The development of pedagogical materials for teaching purposes as well as sample datasets might extend the reach of the tool into digital humanities classrooms.