ACADEMIC RESEARCH IN MODERN SCIENCE
International scientific-online conference
150
EVALUATING AI APPROACHES TO TRANSLATE UZBEK AND
KARAKALPAK PAREMIAS INTO ENGLISH
Begjanova Djamila Arislanbayevna
PhD student researcher at TSUULL
ORCID: 0009-0001-4958-5404
Teshaboyeva Ziyodakhon Kodirovna
Supervisor: DSc.,
teshaboyevaziyodaxon@navoiy-uni.uz
https://doi.org/10.5281/zenodo.14375895
ABSTRACT.
The translation of culturally dense units, such as paremioligical
units present significant challenges for machine translation (MT) systems,
especially when dealing with under-resourced languages like Uzbek and
Karakalpak. Paremioligical units including proverbs, idiomatic expressions, and
metaphorical phrases are deeply embedded in the cultural fabric of their source
communities. Current AI-driven translation tools often struggle to capture the
subtle cultural, historical, and contextual meanings inherent in these
expressions. This article evaluates the effectiveness of contemporary MT
approaches in rendering Uzbek and Karakalpak paremioligical units into
English, highlighting their shortcomings and lead to culturally inadequacy or
misleading outputs. Drawing on regionally sourced corpora and scholarly works,
it also proposes a hybrid workflow that combines machine-driven efficiency
with human cultural and linguistic expertise. By examining diverse strategies
and methodologies. This study aims at contributing to the ongoing discourse on
improving translation quality and cultural fidelity in the context of under-
resourced languages.
Keywords:
Machine translation, paremioligical units, Uzbek, Karakalpak,
hybrid workflows, cultural context, AI approaches.
INTRODUCTION
The translation of culturally embedded lexical units, particularly
paremioligical units, poses unique and persistent challenges in the realm of
machine translation. While neural machine translation (NMT) and other AI-
based techniques have markedly improved the fluency of translations for major
world languages, they often fall short when handling culturally rich and context-
specific materials from less-resourced languages such as Uzbek and Karakalpak.
These languages possess extensive inventories of proverbs and idioms
(
maqollar
and
matallar
in Uzbek,
naqil-maqallar
in Karakalpak) that encapsulate
ACADEMIC RESEARCH IN MODERN SCIENCE
International scientific-online conference
151
societal norms, moral principles, and collective memories (Qarshiboyev, 2016;
Erkinov, 2018).
Existing MT systems frequently prioritize form over cultural meaning,
producing output that may be linguistically coherent but contextually
impoverished or even misleading (Kontonatsios, 2013; Зиганшина, 2021). For
example, a direct, word-for-word translation of a Karakalpak proverb like
“Miynetsiz ómir - qara kómir”
(literally: “A life without effort is black coal.”),
cultural-equivalent rendering would be “A life without hard work is empty and
dark.” This Karakalpak proverb compares a life devoid of hard work and
perseverance to black coal—something dark, of low worth, or unrefined. In this
metaphor, “miynet” (effort, labor, or diligence) is presented as the force that
brings value, meaning, and brightness to one’s existence. Without it, life remains
like “qara ko‘mir”—unshaped, dull, and lacking intrinsic worth.
In English, while there may not be a direct one-line proverb with the exact
imagery, the sentiment aligns with sayings like “No pain, no gain” or “Without
effort, there is no reward.” The challenge for a machine translation system is to
recognize that the proverb does not speak about coal literally but uses it as a
symbol of worthlessness or unfulfilled potential.
This article evaluates current AI approaches to translating Uzbek and
Karakalpak paremioligical units into English, considering their linguistic
complexity, contextual depth, and cultural significance. It then proposes a hybrid
human-machine workflow that integrates cultural expertise to improve
translation outcomes.
THE MAIN PART
Paremioligical units —proverbs, idioms, riddles, and other set
expressions—are inseparable from the cultural identities they represent. Uzbek
and Karakalpak paremioligical units often reflect pastoral traditions, nomadic
heritage, agrarian lifestyles, and strong communal values developed over
centuries (Qarshiboyev, 2016; Qurbanbaeva, 2019). For instance, the Uzbek
proverb
“Qush uyasida ko‘rganini qiladi”
(“A bird does what it sees in the nest”)
is not merely an observation about avian behavior; it communicates the cultural
belief that children emulate their parents’ actions and that moral instruction
starts at home. Similarly, a Karakalpak expression like
“Birlik bar jerde, tirilik
bar”
(“Where there is unity, there is life.”) highlights the communal ethos valued
among Karakalpak peoples. Translating these expressions into English demands
ACADEMIC RESEARCH IN MODERN SCIENCE
International scientific-online conference
152
more than lexical substitution; it requires an awareness of the cultural and
historical contexts that shaped these sayings.
MT systems, however, often rely on statistically derived patterns or neural
associations learned from limited parallel corpora, lacking the cultural cognition
essential for interpreting metaphorical language. While research on the
complexity of paremiological translation is ongoing (Farahani, 2020; Turner,
2015), much less attention has been paid to languages with limited digital
resources, such as Uzbek and Karakalpak.
Modern machine translation (MT) systems, including neural machine
translation (NMT), have significantly improved fluency and grammatical
accuracy. However, they still struggle with the cultural depth of paremioligical
units. Consider the Uzbek idiomatic phrase
“O‘z aravangni o‘zing tort”
(literally,
“Pull your own cart”), which urges individuals to handle their own
responsibilities without depending on others. A direct, literal translation into
English may simply mention “pulling a cart” without conveying the underlying
cultural message of self-reliance and personal accountability.
Similarly, the Karakalpak expression
“Kóp sóz eshekke júk”
(literally, “Many
words are a burden to a donkey”) criticizes excessive speech as pointless and
unwieldy. While an MT system might provide a literal rendition, it would likely
fail to communicate the phrase’s pragmatic function—emphasizing that too
much talk is no more useful than an unnecessary load on a beast of burden. This
inability to capture underlying values and contextual layers underscores the gap
between current machine-generated translations and those that truly reflect the
paremiological richness of Uzbek and Karakalpak languages.
The shortcomings of MT in translating Uzbek and Karakalpak paremias
stem from several factors:
Data scarcity. Parallel corpora for Uzbek and Karakalpak are limited.
MT models trained primarily on English-centric corpora lack exposure to
cultural patterns unique to Central Asian languages (Zubair, 2020; Esplà-
Gomis, 2022).
Cultural embeddedness. Paremioligical units rely on cultural
knowledge, historical events, moral codes, and geographic references.
Without explicit cultural modeling, MT systems misinterpret or omit these
layers of meaning (Ebrahimi, 2015; Zhang & Jaamour, 2022).
Lack of adequacy evaluation metrics. Existing BLEU or TER scores
cannot fully capture the cultural adequacy of translations. Evaluating
ACADEMIC RESEARCH IN MODERN SCIENCE
International scientific-online conference
153
paremiological
translations
demands
qualitative,
human-involved
assessments that judge cultural fidelity and interpretive success.
To overcome these challenges, a hybrid workflow is proposed. This
approach uses MT systems for initial drafts and human experts for post-editing
and cultural adaptation:
Initial MT Pass. Leverage NMT to produce a first-pass translation,
ensuring a baseline target text.
Human post-editing. Engaging bilingual experts fluent in Uzbek or
Karakalpak and English who have cultural literacy. They refine the machine
output to ensure the proverb’s moral, metaphorical, and contextual
meanings are preserved.
Cultural annotation. Adding explanatory footnotes or parenthetical
glosses where needed. This preserves cultural nuances for readers
unfamiliar with the source culture (Pareti, 2016; Alkhawaja, 2024).
By combining the speed of MT with human interpretive skills, hybrid
models can yield more culturally resonant translations. Over time, iterative
feedback can improve MT engines trained on culturally annotated corpora,
gradually increasing their sensitivity to paremiological subtleties (Wang, 2024;
Cheng, 2024).
Preliminary studies are encouraging. In pilot projects, the use of hybrid
models—where machine output is refined by human post-editors—has
significantly improved the accuracy and cultural relevance of translations from
Uzbek and Karakalpak into English. Comparable approaches have also enhanced
comprehension and cultural fidelity when applied to intangible cultural heritage
and literary texts (Li & Jiang, 2023; Makrynioti, 2019; Das, 2024).
CONCLUSION
Translating Uzbek and Karakalpak paremiological units into English
exemplifies the persistent gap between technological prowess and cultural
understanding in MT. While NMT systems can produce coherent sentences, they
often fail to convey the cultural depth that defines paremiological units. This
shortfall is particularly acute for under-resourced languages where training data
and cultural resources are limited.
A hybrid human-machine approach offers a promising solution. By harnessing
the computational strength of AI and the cultural insight of human experts, we
can move towards translations that respect linguistic nuance and cultural
integrity. As MT research progresses, prioritizing cultural embedding, expanding
language-specific corpora, and refining evaluation criteria can ensure that the
ACADEMIC RESEARCH IN MODERN SCIENCE
International scientific-online conference
154
richness of Uzbek and Karakalpak paremiological units are not lost in
translation.
References:
1.
Alkhawaja R. (2024). Unveiling the new frontier: ChatGPT-3 Powered
Translation for Arabic-English language pairs. Theory and practice in language
studies. doi:10.17507/tpls.1402.05
2.
Cheng L. (2024). Integrating neural MT with cultural tagging for improved
literary translation. Journal of translation technology. 17(2), 45–63.
3.
Cespedes Y. (2020). Beyond the margins of academic education:
identifying translation industry training practices through action research. The
International
Journal
of
Translation
and
Interpreting
Research.
doi:10.12807/ti.112201.2020.a07
4.
Das S. (2024). Statistical machine translation for Indic languages.
doi:10.1017/nlp.2024.26
5.
Ebrahimi E., Callison-Burch C., & Dou D. (2015). Bold: Biomedical open
language dataset. Proceedings of the ACL workshop on biomedical NLP.
6.
Erkinov A. (2018). Qoraqalpoq tilidagi maqol va matallarning leksik-
struktur xususiyatlari. Qaraqapaq filologiyasi jurnali, 2(1), 56–72.
7.
Esplà-Gomis M., Etchegoyhen T., & Forcada M.L. (2022). Addressing
language scarcity in MT training: the promise of domain adaptation. Machine
Translation Journal, 36(2), 123–142.
8.
Farahani M.V. (2020). Cultural concepts in translation: a case study of
proverbs. Journal of Pragmatics and Translation Studies, 5(4), 89–102.
9.
Fitria T. (2021). Difficulties in translating idioms. Studies in English
language and education. 8(1), 54–68.
10.
Fu Y. (2021). Automatic classification of human translation and machine
translation: A Study from the perspective of lexical diversity.
doi:10.48550/arxiv.2105.04616
11.
Kenny D., & Doherty S. (2014). Statistical machine translation in the
translation curriculum: overcoming obstacles and empowering translators. The
interpreter and translator trainer, 8(2), 276–294.
12.
Kontonatsios G. (2013). A comparative evaluation of state-of-the-art
machine translation for biomedical terminology. Journal of Biomedical
Informatics, 46(5), 811–818.
13.
Lv H., & Jiang S. (2023). A corpus and computer-assisted translation-based
study on English translation of intangible cultural heritage terms.
doi:10.3233/faia230197
ACADEMIC RESEARCH IN MODERN SCIENCE
International scientific-online conference
155
14.
Mahardika R. (2017). Translating cultural expressions. Indonesian Journal
of Applied Linguistics, 6(2), 291–300.
15.
Makrynioti M. (2019). Translating cultural metaphors in fiction: A
cognitive perspective. Meta: Translators’ Journal, 64(3), 433–448.
16.
Pareti S. (2016). Integrating MT and human post-editing in professional
workflows. International Journal of Language and Communication, 22(3), 112–
128.
17.
Qarshiboyev B. (2016). O‘zbek maqollari lug‘ati. Toshkent: O‘qituvchi.
18.
Qurbanbaeva A. (2019). Karakalpak proverbial expressions: A cultural
linguistic perspective. Nukus: Karakalpakstan State University Press.
19.
Turner E., Moriarty M., & Kiely J. (2015). Intercultural encounters:
translating culture-bound elements in texts. Perspectives in Translation, 23(4),
662–678.
20.
Wang Y. (2024). Research on artificial intelligence machine translation
based on BP neural algorithm. ICST Transactions on scalable information
systems, 4(1). doi:10.4108/eetsis.5075
21.
Zhang M., & Jaamour A. (2022). Contextualizing cultural references: A
survey on MT of cultural items. Machine translation review, 78(4), 45–61.
22.
Zasiekin S., & Vakuliuk Y. (2020). Ethical issues of neural machine
translation. Psycholinguistics in a modern world, 15, 81–83.
23.
Zubair M. (2020). A Survey on resource-scarce language MT. Asia-Pacific
Journal of Information Technology and Multimedia, 9(3), 22–35.
24.
Зиганшина И.В. (2021). Проблемы автоматического перевода
фразеологизмов. Вестник Казанского Государственного Университета,
3(2), 98–107.