Skip to main content

Review: War History on the Semantic Web

A review of War History on the Semantic Web, three interconnected projects that create a rich and comprehensive picture of WWII data from Finland, directed by Eero Hyvönen

Published onMay 28, 2024
Review: War History on the Semantic Web
·

Project
War History on the Semantic Web

Project Director
Eero Hyvönen, Aalto University and University of Helsinki

Project URLs
WarSampo: https://seco.cs.aalto.fi/projects/sotasampo/en/ 
WarVictimSampo 1914-1922: https://seco.cs.aalto.fi/projects/sotasurmat-1914-1922/en/ 
WarMemoirSampo: https://seco.cs.aalto.fi/projects/war-memoirs/en/

Project Reviewers
Michelle Meagher, University of Alberta
Jana Smith Elford, Medicine Hat College
Sebastian Heath, New York University


Project Overview

Eero Hyvönen

This project concerns developing three in-use systems for publishing and studying Finnish 20th century war history on the Semantic Web: WarSampo, WarVictimSampo 1914-1922, and WarMemoirSampo. These systems include a Linked Open Data (LOD) service and a SPARQL endpoint on the Linked Data Finland platform (https://ldf.fi), and an in-use semantic portal on top of it. The systems are based on the so-called Sampo Model and are part of the larger Sampo series of systems.

In Sampo systems the idea is to aggregate and enrich heterogenous, distributed datasets into harmonized knowledge graphs, based on a shared ontology infrastructure. The services are then used for data analyses in digital humanities with tools such as Google Colab and Jupyter Notebooks, and for developing ready-to-use applications, where faceted search and browsing are integrated seamlessly with data analytic tools, such as the Sampo portals. 

WarSampo1 aggregates data about the Second World War (WW2) in Finland from some 20 data sources and several collaborating organizations. The core dataset includes all 95,000 death records of fallen Finnish soldiers from the National Archives. A key innovation of WarSampo is to try to automatically re-assemble the life stories of the soldiers by data linking. The portal has had over a million distinct users typically trying to find information about their lost relatives. The data has also been used for data analyses. WarSampo received the international LODLAM Open Data Prize in 2017.

WarVictimSampo is a smaller related system based on the death records and battles of the Civil War in Finland and Kindred Wars during 1914-1922.2

WarMemoirSampo3 demonstrates the idea of publishing and watching videos on the Semantic Web, with a focus on memoirs of WW2 veterans. The system enables scene segments in videos to be searched by their semantic content. While watching a video, additional contextual information is provided dynamically. The system is based on the WarSampo infrastructure and a knowledge graph that has been extracted automatically from time-stamped textual natural language descriptions of the video contents.

We believe that the more we know about the costs of war, the less there will be wars in the future.


Project Review #1

Michelle Meagher and Jana Smith Elford

As an LOD project, WarSampo is an excellent example of how data from multiple sources — often described by LOD researchers as “siloes” — can be drawn together to provide a richer view of an abundance of information. It is one of several related LOD projects including WarMemoirSampo and WarVictimSampo conducted by the Semantic Computing Research Group, or SeCo, run jointly at Aalto University and the University of Helsinki. These interconnected projects are all called sampo, which is a Finnish term that refers to a mythical and even magical machine; it is, as project leader Ero Hyvonen points out, “a kind of ancient metaphor of technology.”4  The sites are innovative and expansive experiments in LOD computing that are also deeply committed to drawing attention to the human cost of war.  WarSampo urges researchers to think about how LOD modeling of historical data can contribute to peace activism, a contribution that is particularly important given the ongoing impact of the past on current global conflicts.  

WarSampo’s platform is straightforward and generally easily navigable, a remarkable feat, given that it contains a lot of data: over 9 million RDF triples with information about the 95,000 deaths of Finnish soldiers killed in action, activities of more than 500 military units, 23,000 war diaries, thousands of magazine articles, and 160,000 photos, various historical maps, alongside a rich knowledge graph of WW2 events. The sampo manages potential overwhelm by providing numerous ways into the data, which the team describes using the term “perspectives.” The site currently accommodates visitors interested in nine different areas, including the recently added category of prisoners of war. Each perspective brings a visitor to a slightly different landing page with faceted searching capacities of data modeled in CIDOC-CRM’s event-based ontology, drawn from some 20 autonomously constructed sources including the National Archives of Finland, the National Land Survey of Finland, the Finnish Literature Society, the Wikimedia Foundation, and Aalto University, among others. 

WarVictimSampo uses the same logic, but applied to earlier conflicts — WWI, Finnish Civil War, and the Kindred Conflicts — between 1914 and 1922. Visitors to this site can explore two perspectives: war victims and battles. WarMemoirSampo, launched 2021 and still almost entirely in Finnish, provides access to video interviews with WWII veterans and employs LOD technologies to enable users to search the content of videos, which have been enriched semantically using natural language processing. The sampos are imagined as interconnected LOD projects, and work continues toward their full integration. 

The sites are particularly valuable for users searching to enrich their knowledge about a specific topic. For instance, the extensive search function easily returns the name of a relative to learn about their rank, unit, age, death, cause of death, or place of burial. Searching for village names returns maps, photographs, or magazine articles detailing war-related activities at that site. A user less familiar with Finnish history can browse a search page and uncover the political and historical facts of a range of battles. Among the highlights of WarSampo’s integration of Finnish war data are the innovative and interactive timeline visualizations that locate wartime activities — battles, deaths, political activities — both spatially and temporally. Notably, maps throughout the site include hotspots that represent Finnish war dead. The persistence of this data on each map attests to the team’s commitment to render visible the human cost of war. 

WarSampo is conceived as a tool for understanding the extensive impact of war on Finland and it provides a model for expansive and interconnected future projects centered on different war experiences. Some of the features we most appreciate about the site are its easy navigation tools and sophisticated data visualizations as well as its responsible ethical approach to the data represented in machine-readable form. On the technical front, we appreciate the sampo model’s commitment to shared open ontology structure — ontologies, datasets, and knowledge graphs are all freely available for researchers who want to learn about the processes employed by the SeCo research group.  Moreover, the SeCo project more generally provides valuable resources to researchers as the team members have written extensively about the project in the decade since it was first launched, and in this way offers a vital contribution to LOD research.5 

There has been an effort to provide English translation for most records and pages, and we were able to use the translation tools in our internet browsers to get reasonably clear translations of the pages in Finnish only. We note that translation of the oral histories and video interviews that are included in the WarMemoirSampo would expand the reach and impact of the project. Finally, the creators might consider providing additional context about the war activities it represents. Though these details are embedded in the timelines, as the audience of the sampos grows, there is benefit to offering an introduction to the political context of Finnish history and military engagements in order to orient users to the sampo’s content. We would also recommend that the design approach and general mission of these interrelated sampos be highlighted on the main sites. LOD digital humanities projects enable researchers to manipulate enormous amounts of data and to connect data from multiple sites. WarSampo is exemplary in this regard. As an historical project that links data related to war casualties, WarSampo has the additional capacity — and responsibility — to enable users to contextualize data in ways that further our shared understanding of the costs of war. 


Project Review #2

Sebastian Heath

The work presented under the rubric War History on the Semantic Web is easy to praise as a very well implemented and compelling site that makes good use of the principles of linked open data (LOD). The project encompasses a series of interconnected sites that use LOD and the related practices of the Semantic Web to encode and to make available a richly connected resource that has at its core the digitization of the personal experience of Finnish and other participants in the wars on Finnish and nearby territory from the period 1914-1922 — which of course overlaps with World War I — and from World War II. 

I do note that the particular links in the overview take readers to descriptions of the sites, not the sites themselves. The direct links that enable user-initiated browsing are https://www.sotasampo.fi/en/ for WarSampo  (WW2), https://sotasurmat.narc.fi/fi/ for WarVictimSampo 1914-1922, and https://sotamuistot.arkisto.fi/ for WarMemoirSampo (interviews of WW2 veterans and related videos). Readers may be interested to click through on those. For their part, the three links in the overview direct readers to technically inflected descriptions of these websites. Taken together, these descriptions emphasize that the sites use a shared ontology, a shared approach to faceted search, and draw from (if not directly share) a Resource Description Format (RDF) graph. This is all to be praised, in part because the sites do very much demonstrate the implementation of best practices in the creation and presentation of LOD. 

An example will illustrate this. Clicking on "Prisoners of War" on the WarSampo website takes one to a list of names accessed by the semantically clear URL: https://www.sotasampo.fi/en/prisoners/. The resulting page is essentially a list, and clicking on the first item — the name Aapeli Alarik Aalto — takes one to a table-oriented page of information about an unmarried (at the time of his service) Finnish Lance Corporal, born in 1920, captured in June of 1944, and released not too long afterwards, who lived until 2014. The WarSampo pages indicate that the greatest use of the site is by people looking up individuals — perhaps relatives — and I suspect that it can be a meaningful experience to do so.

From the perspective of best-practices in LOD, the page of information about Mr. Aalto exhibits many of these. The page itself has a straightforward url: https://www.sotasampo.fi/en/persons/person_wp1 (the URLs for people that I saw had the same pattern with a longer serial number after “_wp”, meaning that there is a consistent pattern and that the links are easily shareable). The page links to other named entities — such as sites, nationalities, and occupations — that are identified by similarly straightforward URLs. It is the case that the identifiers are URL escaped to allow the information that can be dereferenced to be displayed within the framework of the WarSampo site — meaning that the URL for “farmer” (http://ldf.fi/ammo/maanviljelija) is embedded in the WasSampo site by the messier URL https://www.sotasampo.fi/en/page/?uri=http:%2F%2Fldf.fi%2Fammo%2Fmaanviljelija

The equivalent is the case for the geographic entities, whose site specific URLs lead to maps. In the case of “farmers” (the english translation of the Suomi maanviljelija) it is compelling to see that WarSampo has information for over 46,000 individuals associated with that concept. One can go to the unembedded url http://ldf.fi/ammo/maanviljelija, which will give a definition of the concept that is linked into an ontology. It is worth hovering over the objects of the implied RDF triples to see that this ontology can be explored. 

This is all a very solid implementation of linked open data and a user can explore the other two sites — WarVictimSampo and WarMemoirSampo — by similar patterns of navigation, with an interesting focus on tagged videos in WarMemoirSampo. 

The overview suggests there is a SPARQL endpoint that can be queried via Google Colab or locally hosted Jupyter Notebooks. I did find some documentation via the https://www.ldf.fi/ site but not enough to actually query the triples that make up the three sites that are the focus here. Can that be done via the form at https://www.ldf.fi/sparql-services.html? I couldn't quite tell, and I write this as a person who can compose SPARQL queries. I don't doubt that better documentation would mean that third parties could access and download the data with great freedom. That, however, may not be the case on a practical basis right now for users such as me who don't happen to find the exact right endpoint and RDF identifiers to use. Nonetheless, the work described in the overview comprises a successful approach to presenting War History on the Semantic Web — one that rewards exploration and contemplation from both a technical and humanistic perspective.

Comments
0
comment
No comments here
Why not start the discussion?