A review of In Search of the Drowned, a corpus development and data mining project that explores how to give voice to voiceless Holocaust victims, directed by Gabor Mihaly Toth
In Search of the Drowned: Testimonies and Testimonial Fragments of the Holocaust
Gabor Mihaly Toth, Yale University
Jan Burzlaff, Harvard University
Gabor Mihaly Toth
During the Holocaust, 5.8 million Jewish people were killed. Most of the victims did not leave behind any records that could help reconstruct their experiences. While survivor history has been well studied in the last decades, how millions of voiceless victims experienced their persecutions has remained a terra incognita. Generally, while perpetrator history is well-documented, the voiceless victims’ perspectives have resisted any form of documentation. Their emotional and mental experiences conveyed through novels and memoirs have remained fragmented, and these experiences have often been dismissed as subjective and unreliable.
Today, digital history and digital humanities offer new forms of inquiry and representations. They can unlock the emotional, mental, and physical realities in which voiceless victims of the Holocaust or other genocides were forced to live. To address the experience of the voiceless, In Search of the Drowned brings together theoretical considerations underlying genocide and Holocaust studies with new practices of digital scholarship. Most importantly, it elaborates on and features new digital representations that symbolically gives voice to the voiceless victim. The audience of the project consists of four key groups: educators, students, researchers, and general audiences interested in the Holocaust.
The project was developed by Gabor M. Toth (project conceptualization, data processing, backend development, scholarly argumentation, content development), in collaboration with the Yale Digital Humanities Lab (frontend development, user experience and web design) and the Yale Fortunoff Archive (funding and project facilitation).
This project consists of three core sections. The first section is a data edition of 2,700 English language testimonies (including original videos) from three major U.S. collections. The project is empowered by the corpus engine BlackLab, which makes testimony transcripts searchable as a linguistic corpus. This section also includes a tutorial on how users can take advantage of corpus linguistic approaches to study Holocaust experiences.
The second section features a digital representation aiming to give voice to the voiceless. It is an inventory presenting some aspects of the collective experience of persecutions. The inventory is the visualization of those mental, emotional, and physical experiences that any victim, including the voiceless, must have gone through. In practice, the digital inventory presents certain episodes of persecutions that are recurrent in the 2,700 English language survivor testimonies. Elements of the inventory — named testimonial fragments — have been retrieved by means of data mining techniques and visualized with the help of a hierarchical tree. Since the experiences that the inventory features are directly connected to complete and unabridged testimony transcripts, readers can also study and read these experiences in their original contexts.
The digital inventory memorizes the voiceless by presenting the possible episodes of persecutions they must have faced. On the other hand, it opens reading paths along the mental, physical, and emotional experiences of persecutions in thousands of testimonies. The digital inventory is an innovative exploratory tool that presents various aspects of persecutions from victims’ perspectives. From a scholarly point of view, the inventory draws on a key argument developed throughout a collection of essays that this project features: through the collective experience of persecutions it is possible to reconstruct a mosaic of possible physical, emotional, and mental episodes that murdered victims must have gone through.
The third section is the collection of scholarly essays that develop the theoretical and methodological underpinning of the digital representation rendering the experience of the voiceless. It focuses on three scholarly themes (experience of murdered victims, recovery of collective experience from testimonies, representation of the collective experience) in three respective parts followed by an epilogue and preceded by a prologue. In addition to the discussion of scholarly ideas, the collection of essays also involves the ideas and experiences of victims themselves. Yet, some arguments throughout the essays are developed by reflecting on leitmotifs in testimonies. The collection of scholarly essays actively uses digital storytelling to connect its arguments with victims’ experience. The third section also explains the digital methodology used as part of this project and gives a detailed description of the 2,700 testimonies that the project features.
The significance of this publication is twofold. Until recently, survivors could keep up a living memory of murdered victims. Following the death of the last survivors, it is an open question of how will carry the memory of the voiceless. There are approximately 100,000 Holocaust testimonies dispersed in the archives of the world. It has often been assumed that the voiceless are implicitly present in these testimonies. In the words of Geoffrey Hartmann, a Holocaust survivor and scholar, as well as one of the founders of the Yale Fortunoff Archive, in their testimonies survivors "speak for the dead and in the name of the dead." Today, with the imminent death of last survivors, it is crucial to reflect on how the implicit presence of the dead in thousands of testimonies can be made explicit to next generations. This is an important challenge for historians, archivists, and curators.
The digital humanities have arrived in Holocaust and genocide studies. Gabor Mihaly Toth’s In Search of the Drowned belongs to this welcome trend. This pathbreaking publication, in collaboration with the Yale Digital Humanities Lab and the Fortunoff Video Archive, is several things at once: a database of digitized testimonies, a method of representing key experiences of the “voiceless” 5.8 million Jews killed during the Holocaust, and an invitation to (finally) explore their experiences with new tools.
Until recently, personal experiences of Jewish people have been left out from histories of the Nazi genocide, particularly those who left no trace. Inspired by his computer-assisted research on Renaissance literacy and his family’s personal history between 1939 and 1945, Toth convincingly moves us one step further. He argues that we should think about the experiences of those who perished with and through those of people who survived. Two central arguments emerge. The first is the world of possibilities, which are shared experiences gathered through recurrent mentions, such as forced nudity, consolation in a shared fate, or a slow, emotional death. The second argument rests on the notion of testimonial fragment, reflecting one piece of collective suffering that grants “always incomplete but still sufficient” access to the most intimate yet collective experience of persecution.
The impressive methods used in the project include natural language processing using the BlackLab corpus engine, which makes transcripts searchable, and Stanford Parser, which helps annotate them. Scholars who start on this journey will particularly appreciate the useful tutorial offered in the separate “Methodology” section. It details the divergent data sets (notably regarding length and the institutional protocols used to record interviews) and their processing to form a database, including the annotation in three steps with the Stanford Parser: sentence splitting, tokenization, and part-of-speech tagging. Having worked with video testimonies on a similar scale, I applaud Toth for his painstaking work of processing and harmonizing the metadata.
Anyone working on genocides and mass violence must reckon with the two propositions: worlds of possibilities and testimonial fragments. Though I agree with them, historians like me will need more consideration of specific, local contexts to benefit from what Toth offers. How would places of recording other than the U.S., which dominates the 2,700 testimonies he selected for the corpus, affect his findings? Recent studies have shown the importance of cultural contexts for highlighting or neglecting certain leitmotifs. Yet the biggest challenge is also the most promising. To bring my discipline (fully) on board and get them used to the idea of probability in Big Data, digital projects should center on specific towns, regions, and occupation zones to encourage a transnational focus beyond camps and ghettos offered here.
To me, areas for growth relate to this set of collective experiences: in a world of possibilities — powerfully visualized in the “Fragments” section — there are also evident differences prior to mass murder, rooted in prewar biographies and local relations. Data mining and processing, as Toth does, will help shift us further away from (merely) empirical reconstructions that have dominated Holocaust studies. Simply asking these questions only shows the impact that Toth’s excellent project will have on all of us in the coming decades.