Applying BLAST to Text Reuse Detection in Finnish Newspapers and Journals, 1771–1910


We present the results of text reuse detection, based on the corpus of scanned and OCR-recognized Finnish newspapers and journals from 1771 to 1910. Our study draws on BLAST, a software created for comparing and aligning biological sequences. We show different types of text reuse in this corpus, and also present a comparison to the software Passim, developed at the Northeastern University in Boston, for text reuse detection.

Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language