Extracting Geographical References from Finnish Literature. Fully Automated Processing of Plain-Text Corpora
Harri Kiiskinen, Asko Nivala, Jasmine Westerlund & Juhana Saarelainen: “Extracting Geographical References from Finnish Literature. Fully Automated Processing of Plain-Text Corpora.” CCLS 2023 Würzburg – 2nd Annual Conference of Computational Literary Studies.
Abstract
In the Atlas of Finnish Literature 1870–1940 project, we extract geographical information from a Finnish-language corpus of literary texts published between 1870 and 1940. The texts are transformed from plain texts to TEI/XML, and further processed with named entity recognition and linking tools. The results are presented in web-based environment. This article describes the technical structure of the analysis chain, the tools used and the metaprocesses used to manage the research dataset.