Publications

Jauhiainen, Heidi, Tommi Jauhiainen, Krister Lindén
Wanca in Korp: Text corpora for underresourced Uralic languages. Proceedings of the Research data and humanities (RDHUM) 2019 conference : data, methods and tools. Jantunen, J. H., Brunni, S., Kunnas, N., Palviainen, S. & Västi, K. (eds.). Oulu: University of Oulu, p. 21-40 (Studia Humaniora Ouluensia; no. 17). 2019

Tommi Jauhiainen, Marco Lui, Marcos Zampieri, Timothy Baldwin, Krister Lindén
Automatic Language Identification in Texts: A Survey. 2018. 97 pages.

Jauhiainen, Tommi, Heidi Jauhiainen, Krister Lindén
HeLI-based Experiments in Swiss German Dialect Identification. Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, VarDial 2018, 254-262. 2018.

Jauhiainen, Tommi, Heidi Jauhiainen, Krister Lindén
HeLI-based Experiments in Discriminating Between Dutch and Flemish Subtitles. Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, VarDial 2018, 137-144. 2018.

Jauhiainen, Tommi, Heidi Jauhiainen, Krister Lindén
Iterative Language Model Adaptation for Indo-Aryan Language Identification. Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, VarDial 2018, 66-75. 2018.

Jauhiainen, Tommi, Krister Lindén, Heidi Jauhiainen
Evaluation of language identification methods using 285 languages. Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden, 2017.

Jauhiainen, Tommi, Krister Lindén, Heidi Jauhiainen
Evaluating HeLI with Non-linear Mappings. Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects, VarDial 2017, 102-108. 2017.

Jauhiainen, Tommi, Krister Lindén, Heidi Jauhiainen
HeLI, a Word-Based Backoff Method for Language Identification. Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects, VarDial 2016, 153-162. 2016.

Jauhiainen, Tommi, Heidi Jauhiainen, Krister Lindén
Discriminating similar languages with token-based backoff. Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects, LT4VarDial ’15. 2015.

Jauhiainen, Tommi, Krister Lindén, Heidi Jauhiainen
Language Set Identification in Noisy Synthetic Multilingual Documents. in A. Gelbukh (Ed.), Proceedings of CICLing 2015, Part I. LNCS 9041, pp. 633-643. 2015.

Jauhiainen, Heidi, Tommi Jauhiainen, Krister Lindén
The Finno-Ugric Languages and The Internet project. Proceedings of the First International Workshop on Computational Linguistics for Uralic Languages, January 16th, 2015, Tromssø, Norway. Septentrio Conference Series, 2. 2015.

Jauhiainen, Heidi
Verkkoharavoinnin menetelmiä. Bachelor's thesis. University of Helsinki, 2013.

Jauhiainen, Tommi
Tekstin kielen automaattinen tunnistaminen. Master's thesis. University of Helsinki, 2010.