Extracting Geographical References from Finnish Literature. Fully Automated Processing of Plain-Text Corpora
By Harri Kiiskinen, Asko Nivala, Jasmine Westerlund & Juhana Saarelainen
Harri Kiiskinen, Asko Nivala, Jasmine Westerlund, and Juhana Saarelainen (2023). “Extracting Geographical References from Finnish Literature. Fully Automated Processing of Plain-Text Corpora”. Journal of Computational Literary Studies 2 (1), doi: https://doi.org/10.48694/jcls.3584.
Abstract
In the Atlas of Finnish Literature 1870-1940 project, we extract geographical information from a Finnish-language corpus of literary texts published between 1870 and 1940. The texts are transformed from plain texts to TEI/XML, and further processed with named entity recognition and linking tools. The results are presented in a web-based environment. This article describes the technical structure of the analysis chain, the tools used and the metaprocesses used to manage the research dataset.
Download the PDF, CC BY 4.0 license.