Tomsk Journal of Linguistics and Anthropology
RU EN






Today: 30.01.2026
Home Issues 2025 Year Issue №2 Morphological Guesser as a Tool for Analyzing Field Data: Experiences with The Naukan Yupik Language
  • Home
  • Current Issue
  • Bulletin Archive
    • 2025 Year
      • Issue №1
      • Issue №2
      • Issue №3
      • Issue №4
    • 2024 Year
      • Issue №1
      • Issue №2
      • Issue №3
      • Issue №4
    • 2023 Year
      • Issue №1
      • Issue №2
      • Issue №3
      • Issue №4
    • 2022 Year
      • Issue №1
      • Issue №2
      • Issue №3
      • Issue №4
    • 2021 Year
      • Issue №1
      • Issue №2
      • Issue №3
      • Issue №4
    • 2020 Year
      • Issue №1
      • Issue №2
      • Issue №3
      • Issue №4
    • 2019 Year
      • Issue №1
      • Issue №2
      • Issue №3
      • Issue №4
    • 2018 Year
      • Issue №1
      • Issue №2
      • Issue №3
      • Issue №4
    • 2017 Year
      • Issue №1
      • Issue №2
      • Issue №3
      • Issue №4
    • 2016 Year
      • Issue №1
      • Issue №2
      • Issue №3
      • Issue №4
    • 2015 Year
      • Issue №1
      • Issue №2
      • Issue №3
      • Issue №4
    • 2014 Year
      • Issue №1
      • Issue №2
      • Issue №3
      • Issue №4
    • 2013 Year
      • Issue №1
      • Issue №2
  • Search
  • Rating
  • News
  • Editorial Board
  • Information for Authors
  • Review Procedure
  • Information for Readers
  • Editor’s Publisher Ethics
  • Contacts
  • Submit paper
  • Subscribe
  • Service Entrance
vestnik.tspu.ru
praxema.tspu.ru
ling.tspu.ru
npo.tspu.ru
edujournal.tspu.ru

EBSCO

European reference index for the humanities and the social sciences (erih plus)

Search by Author
- Not selected -
  • - Not selected -
Яндекс.Метрика

Morphological Guesser as a Tool for Analyzing Field Data: Experiences with The Naukan Yupik Language

Budyanskaya E.M., Buzanov A.O., Zhornik D.O., Pikhtin A.A.

DOI: 10.23951/2307-6119-2025-2-9-19

Information About Author:

Elena M. Budyanskaya, Junior researcher. Institute of Linguistics of the RAS. Bolshoy Kislovsky lane, 1, bld. 1, Moscow, Russia, 125009. E-mail: budyanskaya.lena@gmail.com; ORCID ID: 0000-0002-6306-6280; SPIN-code: 7314-4254; Scopus Author ID: 57223128144 Anton O. Buzanov, Junior researcher. Institute of Linguistics of the RAS. Bolshoy Kislovsky lane, 1, bld. 1, Moscow, Russia, 125009. Junior researcher. High School of Economy. Pokrovskiy bul’var, 11, Moscow, Russia, 109028. E-meil: anton.buzanov.00@gmail.com, ORCID ID: 0000-0002-3069-1223; SPIN-code: 5993-1550; Scopus Author ID: 57424562500 Daria O. Zhornik, Researcher. Institute of Linguistics of the RAS. Bolshoy Kislovsky lane, 1, bld. 1, Moscow, Russia, 125009. E-mail: daria.zhornik@yandex.ru, ORCID ID: 0000-0002-6463-2547; SPIN-code: 4302-5986; Researcher ID: V-6283-2018: Scopus Author ID: 57203316879 Andrey A. Pikhtin, Junior researcher. Institute of Linguistics of the RAS. Bolshoy Kislovsky lane, 1, bld. 1, Moscow, Russia. Junior researcher. High School of Economy. Pokrovskiy bul’var, 11, Moscow, Russia, 109028. E-mail: p_nafanyka@gmail.com

The paper presents the development and evaluation of two automated morphological analysis tools for Naukan Yupik (Yupik Eskimo Eskimo-Aleut): a dictionary-based morphological analyzer and a dictionary-free morphological guesser. Both tools are implemented with a two-stage approach to morphological modeling based on finite state automata. The study examines in detail the morphological features of Naukan Yupik that influence the development of automated analysis tools, including rich inflection and derivation, homonymy of morphological markers, and complex morphophonological processes. The effectiveness of both tools will be evaluated using a corpus of oral texts from 2022–2023. Particular attention is paid to the problem of overgeneration in the output of the morphological guesser and to ways of solving this problem through part-of-speech-based analysis separation. The results show that when working with field data, the use of a guesser can be more effective despite its known limitations.

Keywords: Naukan Yupik, morphological analyzer, language documentation, natural language processing

References:

1. Menovschikov G.A. Yazyk naukanskikh eskimosov [The language of Naukan eskimos]. Leningrad., Nauka, 1975. 512 p. (in Russian).

2. Golovko E.V., Dobrieva E.A., Jacobson S., Krauss M. Slovar’ yazyaka naukanskikh eskimosov [Naukan yupik eskimo dictionary]. Fairbanks, Alaska native languages center, 2004. 369 p. (in Russian).

3. Vakhtin N.B. Morfologiya glagol’nogo slovoizmeneniya s yupikskikh (eskimosskikh) yazykakh [Inflectional morphology in yupik (eskimo) languages]. Rossiyskaya akademiya nauk, Institut lingvisticheskikh issledovaniy. Saint Petersburg, Nestor, 2007. 123 p. (in Russian).

4. Kanuparthi N., Inumella A., Sharma D.M. Hindi Derivational Morphological Analyzer. In: Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology. Montreal, Association for Computational Linguistics, 2012. Pp. 10–16.

5. Kessikbayeva G., Cicekli I. Rule Based Morphological Analyzer of Kazakh Language. In: Proceedings of the 2014 Joint Meeting of SIGMORPHON and SIGFSM. Baltimore, Association for Computational Linguistics, 2014. Pp. 46–54.

6. Khalifa S., Hassan S., Habash N. A Morphological Analyzer for Gulf Arabic Verbs. In: Proceedings of the Third Arabic Natural Language Processing Workshop. Valencia: Association for Computational Linguistics, 2017. Pp. 35–45.

7. Forbes C., Nicolai G., Silfverberg M. An FST morphological analyzer for the Gitksan language. In: Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology. Online: Association for Computational Linguistics, 2021. Pp. 188–197.

8. Merzhevich T., Ferraz Gerardi F. Introducing YakuToolkit. Yakut Treebank and Morphological Analyzer. In: Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages. Marseille, European Language Resources Association, 2022. Pp. 185–188.

9. Koskenniemi K. Two-level Morphology. A General Computational Model for Word-Form Recognition and Produc-tion. Helsinki, University of Helsinki, Department of General Linguistics, 1983.

10. Karttunen L. KIMMO: A General Morphological Processor. Texas Linguistics Forum. 1983. Vol. 22. Pp. 217–228.

11. Antworth E.L. PC-KIMMO: a two-level processor for morphological analysis. Dallas, Summer Institute of Linguistics, 1990.

12. Ritchie G.D., Russell G.J., Black A.W., Pulman S.G. Computational Morphology. Practical Mechanisms for the English Lexicon. Cambridge, The MIT Press, 1991.

13. Swanson D., Howell N. Lexd: A finite-state lexicon compiler for non-suffixational morphologies. Multilingual Facilitation. 2021. Pp. 133–146.

14. Karttunen L., Beesley K. R. Two-level rule compiler. Palo Alto, Xerox Corporation, Palo Alto Research Center, 1992.

15. Lindén K., Axelson E., Hardwick S., Pirinen T. A., Silfverberg M. HFST—framework for compiling and applying morphologies. In: Systems and Frameworks for Computational Morphology: Second International Workshop, SFCM 2011. Berlin, Springer, 2011. Pp. 67–85.

16. Chen E., Schwartz L. A morphological analyzer for St. Lawrence island / Central Siberian yupik. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). 2018.

budyanskaya_elena_mikhailovna_9_19_2_48_2025.pdf ( 572.99 kB ) budyanskaya_elena_mikhailovna_9_19_2_48_2025.zip ( 447.05 kB )

Issue: 2, 2025

Series of issue: Issue 2

Rubric: LINGUISTICS

Pages: 9 — 19

Downloads: 534

For citation:


2026 Tomsk Journal of Linguistics and Anthropology

Development and support: Network Project Laboratory TSPU