Авторы

  • Djamila Begjanova
    PhD student researcher at TSUULL
  • Ziyodakhon Teshaboyeva
    Supervisor: DSc.,

DOI:

https://doi.org/10.71337/inlibrary.uz.arims.61503

Ключевые слова:

Machine translation paremioligical units Uzbek Karakalpak hybrid workflows cultural context AI approaches.

Аннотация

The translation of culturally dense units, such as paremioligical units present significant challenges for machine translation (MT) systems, especially when dealing with under-resourced languages like Uzbek and Karakalpak. Paremioligical units including proverbs, idiomatic expressions, and metaphorical phrases are deeply embedded in the cultural fabric of their source communities. Current AI-driven translation tools often struggle to capture the subtle cultural, historical, and contextual meanings inherent in these expressions. This article evaluates the effectiveness of contemporary MT approaches in rendering Uzbek and Karakalpak paremioligical units into English, highlighting their shortcomings and lead to culturally inadequacy or misleading outputs. Drawing on regionally sourced corpora and scholarly works, it also proposes a hybrid workflow that combines machine-driven efficiency with human cultural and linguistic expertise. By examining diverse strategies and methodologies. This study aims at contributing to the ongoing discourse on improving translation quality and cultural fidelity in the context of under-resourced languages.


background image

ACADEMIC RESEARCH IN MODERN SCIENCE

International scientific-online conference

150

EVALUATING AI APPROACHES TO TRANSLATE UZBEK AND

KARAKALPAK PAREMIAS INTO ENGLISH

Begjanova Djamila Arislanbayevna

PhD student researcher at TSUULL

begjanova1990@gmail.com

ORCID: 0009-0001-4958-5404

Teshaboyeva Ziyodakhon Kodirovna

Supervisor: DSc.,

teshaboyevaziyodaxon@navoiy-uni.uz

https://doi.org/10.5281/zenodo.14375895

ABSTRACT.

The translation of culturally dense units, such as paremioligical

units present significant challenges for machine translation (MT) systems,
especially when dealing with under-resourced languages like Uzbek and
Karakalpak. Paremioligical units including proverbs, idiomatic expressions, and
metaphorical phrases are deeply embedded in the cultural fabric of their source
communities. Current AI-driven translation tools often struggle to capture the
subtle cultural, historical, and contextual meanings inherent in these
expressions. This article evaluates the effectiveness of contemporary MT
approaches in rendering Uzbek and Karakalpak paremioligical units into
English, highlighting their shortcomings and lead to culturally inadequacy or
misleading outputs. Drawing on regionally sourced corpora and scholarly works,
it also proposes a hybrid workflow that combines machine-driven efficiency
with human cultural and linguistic expertise. By examining diverse strategies
and methodologies. This study aims at contributing to the ongoing discourse on
improving translation quality and cultural fidelity in the context of under-
resourced languages.

Keywords:

Machine translation, paremioligical units, Uzbek, Karakalpak,

hybrid workflows, cultural context, AI approaches.

INTRODUCTION

The translation of culturally embedded lexical units, particularly

paremioligical units, poses unique and persistent challenges in the realm of
machine translation. While neural machine translation (NMT) and other AI-
based techniques have markedly improved the fluency of translations for major
world languages, they often fall short when handling culturally rich and context-
specific materials from less-resourced languages such as Uzbek and Karakalpak.
These languages possess extensive inventories of proverbs and idioms
(

maqollar

and

matallar

in Uzbek,

naqil-maqallar

in Karakalpak) that encapsulate


background image

ACADEMIC RESEARCH IN MODERN SCIENCE

International scientific-online conference

151

societal norms, moral principles, and collective memories (Qarshiboyev, 2016;
Erkinov, 2018).

Existing MT systems frequently prioritize form over cultural meaning,

producing output that may be linguistically coherent but contextually
impoverished or even misleading (Kontonatsios, 2013; Зиганшина, 2021). For
example, a direct, word-for-word translation of a Karakalpak proverb like

“Miynetsiz ómir - qara kómir”

(literally: “A life without effort is black coal.”),

cultural-equivalent rendering would be “A life without hard work is empty and
dark.” This Karakalpak proverb compares a life devoid of hard work and
perseverance to black coal—something dark, of low worth, or unrefined. In this
metaphor, “miynet” (effort, labor, or diligence) is presented as the force that
brings value, meaning, and brightness to one’s existence. Without it, life remains
like “qara ko‘mir”—unshaped, dull, and lacking intrinsic worth.


In English, while there may not be a direct one-line proverb with the exact

imagery, the sentiment aligns with sayings like “No pain, no gain” or “Without
effort, there is no reward.” The challenge for a machine translation system is to
recognize that the proverb does not speak about coal literally but uses it as a
symbol of worthlessness or unfulfilled potential.

This article evaluates current AI approaches to translating Uzbek and

Karakalpak paremioligical units into English, considering their linguistic
complexity, contextual depth, and cultural significance. It then proposes a hybrid
human-machine workflow that integrates cultural expertise to improve
translation outcomes.

THE MAIN PART

Paremioligical units —proverbs, idioms, riddles, and other set

expressions—are inseparable from the cultural identities they represent. Uzbek
and Karakalpak paremioligical units often reflect pastoral traditions, nomadic
heritage, agrarian lifestyles, and strong communal values developed over
centuries (Qarshiboyev, 2016; Qurbanbaeva, 2019). For instance, the Uzbek
proverb

“Qush uyasida ko‘rganini qiladi”

(“A bird does what it sees in the nest”)

is not merely an observation about avian behavior; it communicates the cultural
belief that children emulate their parents’ actions and that moral instruction
starts at home. Similarly, a Karakalpak expression like

“Birlik bar jerde, tirilik

bar”

(“Where there is unity, there is life.”) highlights the communal ethos valued

among Karakalpak peoples. Translating these expressions into English demands


background image

ACADEMIC RESEARCH IN MODERN SCIENCE

International scientific-online conference

152

more than lexical substitution; it requires an awareness of the cultural and
historical contexts that shaped these sayings.

MT systems, however, often rely on statistically derived patterns or neural

associations learned from limited parallel corpora, lacking the cultural cognition
essential for interpreting metaphorical language. While research on the
complexity of paremiological translation is ongoing (Farahani, 2020; Turner,
2015), much less attention has been paid to languages with limited digital
resources, such as Uzbek and Karakalpak.

Modern machine translation (MT) systems, including neural machine

translation (NMT), have significantly improved fluency and grammatical
accuracy. However, they still struggle with the cultural depth of paremioligical
units. Consider the Uzbek idiomatic phrase

“O‘z aravangni o‘zing tort”

(literally,

“Pull your own cart”), which urges individuals to handle their own
responsibilities without depending on others. A direct, literal translation into
English may simply mention “pulling a cart” without conveying the underlying
cultural message of self-reliance and personal accountability.

Similarly, the Karakalpak expression

“Kóp sóz eshekke júk”

(literally, “Many

words are a burden to a donkey”) criticizes excessive speech as pointless and
unwieldy. While an MT system might provide a literal rendition, it would likely
fail to communicate the phrase’s pragmatic function—emphasizing that too
much talk is no more useful than an unnecessary load on a beast of burden. This
inability to capture underlying values and contextual layers underscores the gap
between current machine-generated translations and those that truly reflect the
paremiological richness of Uzbek and Karakalpak languages.


The shortcomings of MT in translating Uzbek and Karakalpak paremias

stem from several factors:

Data scarcity. Parallel corpora for Uzbek and Karakalpak are limited.

MT models trained primarily on English-centric corpora lack exposure to
cultural patterns unique to Central Asian languages (Zubair, 2020; Esplà-
Gomis, 2022).

Cultural embeddedness. Paremioligical units rely on cultural

knowledge, historical events, moral codes, and geographic references.
Without explicit cultural modeling, MT systems misinterpret or omit these
layers of meaning (Ebrahimi, 2015; Zhang & Jaamour, 2022).

Lack of adequacy evaluation metrics. Existing BLEU or TER scores

cannot fully capture the cultural adequacy of translations. Evaluating


background image

ACADEMIC RESEARCH IN MODERN SCIENCE

International scientific-online conference

153

paremiological

translations

demands

qualitative,

human-involved

assessments that judge cultural fidelity and interpretive success.

To overcome these challenges, a hybrid workflow is proposed. This

approach uses MT systems for initial drafts and human experts for post-editing
and cultural adaptation:

Initial MT Pass. Leverage NMT to produce a first-pass translation,

ensuring a baseline target text.

Human post-editing. Engaging bilingual experts fluent in Uzbek or

Karakalpak and English who have cultural literacy. They refine the machine
output to ensure the proverb’s moral, metaphorical, and contextual
meanings are preserved.

Cultural annotation. Adding explanatory footnotes or parenthetical

glosses where needed. This preserves cultural nuances for readers
unfamiliar with the source culture (Pareti, 2016; Alkhawaja, 2024).

By combining the speed of MT with human interpretive skills, hybrid

models can yield more culturally resonant translations. Over time, iterative
feedback can improve MT engines trained on culturally annotated corpora,
gradually increasing their sensitivity to paremiological subtleties (Wang, 2024;
Cheng, 2024).

Preliminary studies are encouraging. In pilot projects, the use of hybrid

models—where machine output is refined by human post-editors—has
significantly improved the accuracy and cultural relevance of translations from
Uzbek and Karakalpak into English. Comparable approaches have also enhanced
comprehension and cultural fidelity when applied to intangible cultural heritage
and literary texts (Li & Jiang, 2023; Makrynioti, 2019; Das, 2024).

CONCLUSION

Translating Uzbek and Karakalpak paremiological units into English

exemplifies the persistent gap between technological prowess and cultural
understanding in MT. While NMT systems can produce coherent sentences, they
often fail to convey the cultural depth that defines paremiological units. This
shortfall is particularly acute for under-resourced languages where training data
and cultural resources are limited.
A hybrid human-machine approach offers a promising solution. By harnessing
the computational strength of AI and the cultural insight of human experts, we
can move towards translations that respect linguistic nuance and cultural
integrity. As MT research progresses, prioritizing cultural embedding, expanding
language-specific corpora, and refining evaluation criteria can ensure that the


background image

ACADEMIC RESEARCH IN MODERN SCIENCE

International scientific-online conference

154

richness of Uzbek and Karakalpak paremiological units are not lost in
translation.

References:

1.

Alkhawaja R. (2024). Unveiling the new frontier: ChatGPT-3 Powered

Translation for Arabic-English language pairs. Theory and practice in language
studies. doi:10.17507/tpls.1402.05
2.

Cheng L. (2024). Integrating neural MT with cultural tagging for improved

literary translation. Journal of translation technology. 17(2), 45–63.
3.

Cespedes Y. (2020). Beyond the margins of academic education:

identifying translation industry training practices through action research. The
International

Journal

of

Translation

and

Interpreting

Research.

doi:10.12807/ti.112201.2020.a07
4.

Das S. (2024). Statistical machine translation for Indic languages.

doi:10.1017/nlp.2024.26
5.

Ebrahimi E., Callison-Burch C., & Dou D. (2015). Bold: Biomedical open

language dataset. Proceedings of the ACL workshop on biomedical NLP.
6.

Erkinov A. (2018). Qoraqalpoq tilidagi maqol va matallarning leksik-

struktur xususiyatlari. Qaraqapaq filologiyasi jurnali, 2(1), 56–72.
7.

Esplà-Gomis M., Etchegoyhen T., & Forcada M.L. (2022). Addressing

language scarcity in MT training: the promise of domain adaptation. Machine
Translation Journal, 36(2), 123–142.
8.

Farahani M.V. (2020). Cultural concepts in translation: a case study of

proverbs. Journal of Pragmatics and Translation Studies, 5(4), 89–102.
9.

Fitria T. (2021). Difficulties in translating idioms. Studies in English

language and education. 8(1), 54–68.
10.

Fu Y. (2021). Automatic classification of human translation and machine

translation: A Study from the perspective of lexical diversity.
doi:10.48550/arxiv.2105.04616
11.

Kenny D., & Doherty S. (2014). Statistical machine translation in the

translation curriculum: overcoming obstacles and empowering translators. The
interpreter and translator trainer, 8(2), 276–294.
12.

Kontonatsios G. (2013). A comparative evaluation of state-of-the-art

machine translation for biomedical terminology. Journal of Biomedical
Informatics, 46(5), 811–818.
13.

Lv H., & Jiang S. (2023). A corpus and computer-assisted translation-based

study on English translation of intangible cultural heritage terms.
doi:10.3233/faia230197


background image

ACADEMIC RESEARCH IN MODERN SCIENCE

International scientific-online conference

155

14.

Mahardika R. (2017). Translating cultural expressions. Indonesian Journal

of Applied Linguistics, 6(2), 291–300.
15.

Makrynioti M. (2019). Translating cultural metaphors in fiction: A

cognitive perspective. Meta: Translators’ Journal, 64(3), 433–448.
16.

Pareti S. (2016). Integrating MT and human post-editing in professional

workflows. International Journal of Language and Communication, 22(3), 112–
128.
17.

Qarshiboyev B. (2016). O‘zbek maqollari lug‘ati. Toshkent: O‘qituvchi.

18.

Qurbanbaeva A. (2019). Karakalpak proverbial expressions: A cultural

linguistic perspective. Nukus: Karakalpakstan State University Press.
19.

Turner E., Moriarty M., & Kiely J. (2015). Intercultural encounters:

translating culture-bound elements in texts. Perspectives in Translation, 23(4),
662–678.
20.

Wang Y. (2024). Research on artificial intelligence machine translation

based on BP neural algorithm. ICST Transactions on scalable information
systems, 4(1). doi:10.4108/eetsis.5075
21.

Zhang M., & Jaamour A. (2022). Contextualizing cultural references: A

survey on MT of cultural items. Machine translation review, 78(4), 45–61.
22.

Zasiekin S., & Vakuliuk Y. (2020). Ethical issues of neural machine

translation. Psycholinguistics in a modern world, 15, 81–83.
23.

Zubair M. (2020). A Survey on resource-scarce language MT. Asia-Pacific

Journal of Information Technology and Multimedia, 9(3), 22–35.
24.

Зиганшина И.В. (2021). Проблемы автоматического перевода

фразеологизмов. Вестник Казанского Государственного Университета,
3(2), 98–107.

Библиографические ссылки

Alkhawaja R. (2024). Unveiling the new frontier: ChatGPT-3 Powered Translation for Arabic-English language pairs. Theory and practice in language studies. doi:10.17507/tpls.1402.05

Cheng L. (2024). Integrating neural MT with cultural tagging for improved literary translation. Journal of translation technology. 17(2), 45–63.

Cespedes Y. (2020). Beyond the margins of academic education: identifying translation industry training practices through action research. The International Journal of Translation and Interpreting Research. doi:10.12807/ti.112201.2020.a07

Das S. (2024). Statistical machine translation for Indic languages. doi:10.1017/nlp.2024.26

Ebrahimi E., Callison-Burch C., & Dou D. (2015). Bold: Biomedical open language dataset. Proceedings of the ACL workshop on biomedical NLP.

Erkinov A. (2018). Qoraqalpoq tilidagi maqol va matallarning leksik-struktur xususiyatlari. Qaraqapaq filologiyasi jurnali, 2(1), 56–72.

Esplà-Gomis M., Etchegoyhen T., & Forcada M.L. (2022). Addressing language scarcity in MT training: the promise of domain adaptation. Machine Translation Journal, 36(2), 123–142.

Farahani M.V. (2020). Cultural concepts in translation: a case study of proverbs. Journal of Pragmatics and Translation Studies, 5(4), 89–102.

Fitria T. (2021). Difficulties in translating idioms. Studies in English language and education. 8(1), 54–68.

Fu Y. (2021). Automatic classification of human translation and machine translation: A Study from the perspective of lexical diversity. doi:10.48550/arxiv.2105.04616

Kenny D., & Doherty S. (2014). Statistical machine translation in the translation curriculum: overcoming obstacles and empowering translators. The interpreter and translator trainer, 8(2), 276–294.

Kontonatsios G. (2013). A comparative evaluation of state-of-the-art machine translation for biomedical terminology. Journal of Biomedical Informatics, 46(5), 811–818.

Lv H., & Jiang S. (2023). A corpus and computer-assisted translation-based study on English translation of intangible cultural heritage terms. doi:10.3233/faia230197

Mahardika R. (2017). Translating cultural expressions. Indonesian Journal of Applied Linguistics, 6(2), 291–300.

Makrynioti M. (2019). Translating cultural metaphors in fiction: A cognitive perspective. Meta: Translators’ Journal, 64(3), 433–448.

Pareti S. (2016). Integrating MT and human post-editing in professional workflows. International Journal of Language and Communication, 22(3), 112–128.

Qarshiboyev B. (2016). O‘zbek maqollari lug‘ati. Toshkent: O‘qituvchi.

Qurbanbaeva A. (2019). Karakalpak proverbial expressions: A cultural linguistic perspective. Nukus: Karakalpakstan State University Press.

Turner E., Moriarty M., & Kiely J. (2015). Intercultural encounters: translating culture-bound elements in texts. Perspectives in Translation, 23(4), 662–678.

Wang Y. (2024). Research on artificial intelligence machine translation based on BP neural algorithm. ICST Transactions on scalable information systems, 4(1). doi:10.4108/eetsis.5075

Zhang M., & Jaamour A. (2022). Contextualizing cultural references: A survey on MT of cultural items. Machine translation review, 78(4), 45–61.

Zasiekin S., & Vakuliuk Y. (2020). Ethical issues of neural machine translation. Psycholinguistics in a modern world, 15, 81–83.

Zubair M. (2020). A Survey on resource-scarce language MT. Asia-Pacific Journal of Information Technology and Multimedia, 9(3), 22–35.

Зиганшина И.В. (2021). Проблемы автоматического перевода фразеологизмов. Вестник Казанского Государственного Университета, 3(2), 98–107.