USING ENGLISH-UZBEK PARALLEL CORPORA IN EFL DATA-DRIVEN LEARNING

Makhbuba  Amonova

doi:10.71337/inlibrary.uz.arims.134965

Авторы

Makhbuba Amonova
English language teacher at General Education School No. 43.Bukhara

DOI:

https://doi.org/10.71337/inlibrary.uz.arims.134965

Ключевые слова:

parallel corpus English–Uzbek data-driven learning concordancing collocation translation shifts OPUS JW300 Sketch Engine

Аннотация

Parallel corpora create a bridge between form and meaning across languages and, when embedded in data-driven learning (DDL), enable learners to interrogate authentic evidence rather than rely on prescriptive rule lists. Building on well-established DDL principles and recent meta-analytic evidence, the article maps readily accessible English–Uzbek resources (OPUS, JW300, the Uzbek National Corpus’ parallel section) and workable tools (ParaConc, AntPConc, Sketch Engine’s parallel concordancer).

ACADEMIC RESEARCH IN MODERN SCIENCE

International scientific-online conference

172

USING ENGLISH-UZBEK PARALLEL CORPORA IN EFL DATA-

DRIVEN LEARNING

Amonova Makhbuba Olimovna

English language teacher at General Education

School No. 43.Bukhara

https://doi.org/10.5281/zenodo.16938818

Abstract.

Parallel corpora create a bridge between form and meaning

across languages and, when embedded in data-driven learning (DDL), enable
learners to interrogate authentic evidence rather than rely on prescriptive rule
lists. Building on well-established DDL principles and recent meta-analytic
evidence, the article maps readily accessible English–Uzbek resources (OPUS,
JW300, the Uzbek National Corpus’ parallel section) and workable tools
(ParaConc, AntPConc, Sketch Engine’s parallel concordancer).

Keywords:

parallel corpus; English–Uzbek; data-driven learning;

concordancing; collocation; translation shifts; OPUS/JW300; Sketch Engine

Introduction.

Data-driven learning positions learners as investigators who

query concordances and infer patterns from usage. Tim Johns captured the
stance crisply: “the language-learner is also, essentially, a research worker” and
“the most important computing tool… is the concordancer” [1. p. 6]. In parallel
settings, the same investigative loop extends across aligned translations, letting
EFL learners contrast English phraseology with Uzbek renderings and observe
real translation choices. Large open collections such as OPUS, now listing 747
languages and tens of billions of aligned sentence pairs, supply the raw material;
classroom-friendly concordancers and clear task design supply the pedagogy.

Methods and literature review.

We adopt a two-stage protocol: Stage 1 is

a resource audit (identify English–Uzbek sources in OPUS, JW300, and the Uzbek
National Corpus; record licensing, genres, and access modes). Stage 2 is
implementation: short DDL cycles where learners use a parallel concordancer
(ParaConc, AntPConc, or Sketch Engine) to query English targets (e.g., take into
account, according to); observe Uzbek correspondences; generalize
distributional constraints, and produce micro-outputs (gap-fills, micro-
translations, paraphrases). Tools were selected for parallel search, alignment
inspection, and frequency/collocation support.

DDL’s rationale reaches back to Johns and early classroom concordancing;

recent syntheses show consistent learning gains when learners consult corpora
directly. Johns’ argument for learner-as-researcher remains the governing idea,
while a multilevel meta-analysis reports medium positive effects for vocabulary
learning with corpus use, moderated by proficiency, training, and duration. For

ACADEMIC RESEARCH IN MODERN SCIENCE

International scientific-online conference

173

parallel data supply, OPUS provides the largest open ecosystem; JW300 adds
wide coverage across 300+ languages, including Uzbek, with sentence-aligned
articles.

Results.

Open repositories expose multiple English–Uzbek entry points.

OPUS offers umbrella access (OpenSubtitles, Tatoeba, CCMatrix, etc.); JW300
aggregates Watchtower/“Awake!” translations across 300+ languages; the
Uzbek National Corpus hosts a dedicated parallel section.

Table 1. Public English–Uzbek parallel resources for EFL DDL

Resource

EN–UZ

presence

Scale signal

License /

access

Typical genres

Citation

OPUS

collection

Yes
(multiple
subcorpora)

58.85B
sentence pairs
across 747
languages
(collection-
level)

Open
downloads /
web query

subtitles, web,

institutional

opus.nlpl.eu

JW300

Yes

300+
languages;
109M
sentences
overall

Research use;
paper with
links

religious/educati

onal expository

aclanthology.org

Uzbek
National
Corpus –
Parallel

Yes

developing;
institutional

Web interface

mixed; curated

uzbekcorpus.uz

Parallel concordancers differ in depth of alignment control and analytics.

Table 2. Parallel concordance tools and pedagogical affordances

Tool

Key

functions

for

parallel DDL

EFL

outcomes

it

supports

ParaConc

KWIC in two panes;
alignment

utilities;

collocate spans; regex

noticing

translation

options; form–function
mapping;

contrastive

collocation

AntPConc

Freeware; UTF-8; simple
parallel

search

for

beginners

low-barrier DDL tasks;
quick verification of
equivalents

Sketch Engine (Parallel
concordance)

Alignment

view;

lemma/POS

display;

frequency and filters;
user corpora

pattern generalization;
lexico-grammatical
profiling

across

languages

ACADEMIC RESEARCH IN MODERN SCIENCE

International scientific-online conference

174

We operationalized three repeatable templates for English–Uzbek DDL

sessions—each built around a parallel query plus a constrained output. Meta-
analytic results indicate medium gains for vocabulary learning with corpus
consultation, with stronger effects for in-depth knowledge dimensions and when
training/time are adequate. Representative figures reported: posttest Hedges’ g
≈ 0.74; delayed g ≈ 0.64.

Discussion.

Parallel concordances externalize contrasts that textbooks

often flatten. Observing how according to disperses into ga ko‘ra plus case
marking, or how light-verb constructions map onto Uzbek verbal morphology,
encourages hypothesis formation grounded in real usage, aligning with the
original DDL principle of learner inquiry.

For entry-level courses, AntPConc reduces friction; for advanced cohorts,

Sketch Engine’s alignment view and filtering accelerate deeper generalizations.
ParaConc sits comfortably in the middle for stable lab use. Mixed-tool sequences
work well: quick confirmation in AntPConc, broader sampling in Sketch Engine,
portfolio export in ParaConc.

Web-crawled parallel data vary in cleanliness and domain balance; audits

warn about mislabeling and noisy alignments. A simple classroom safeguard is
triangulation: cross-check a finding in two sources (JW300 and a curated
subcorpus in OPUS). When possible, rely on curated institutional corpora such
as the Uzbek National Corpus’ parallel section.

Conclusion.

Parallel corpora make contrast visible and negotiable for

learners; with modest training and carefully chosen tools, English–Uzbek DDL
can deliver robust vocabulary and phraseological gains while cultivating analytic
habits that persist beyond a single course. The resource landscape is mature
enough for immediate adoption, provided teachers curate datasets and model
evidence-based choices at each step.

References:

1. Johns T. Should you be persuaded: Two samples of data-driven learning
materials. – na, 1991. – Т. 4. – С. 1-16.
2. Gaskell D., Cobb T. Can learners use concordance feedback for writing errors?
//System. – 2004. – Т. 32. – №. 3. – С. 301-319.
3. Tiedemann J. Parallel data, tools and interfaces in OPUS //Lrec. – 2012. – Т.
2012. – С. 2214-2218.
4. Nigmatova L., Avezov S. Применение методов NLP в корпусных
исследованиях: особенности и ограничения //«Узбекские национальные
образовательные здания теоретическое и практическое создание

ACADEMIC RESEARCH IN MODERN SCIENCE

International scientific-online conference

175

вопросы" Международная научно-практическая конференция. – 2023. – Т.
2. – №. 2.
5. Sobirovich S. A. A PRAGMATICALLY ORIENTED APPROACH TO GENERATIVE
LINGUISTICS //CURRENT RESEARCH JOURNAL OF PHILOLOGICAL SCIENCES. –
2024. – Т. 5. – №. 04. – С. 69-75.
6. Авезов С. КОРПУСНАЯ ЛИНГВИСТИКА: НОВЫЕ ПОДХОДЫ К АНАЛИЗУ
ЯЗЫКА И ИХ ПРИЛОЖЕНИЯ В ОБУЧЕНИИ ИНОСТРАННЫМ ЯЗЫКАМ
//International Bulletin of Applied Science and Technology. – 2023. – Т. 3. – №.
7. – С. 177-181.
7. Sobirovich S. A. CORPUS LINGUISTICS: A HISTORICAL OVERVIEW //Educator
Insights: Journal of Teaching Theory and Practice. – 2025. – Т. 1. – №. 6. – С. 396-
403.

Библиографические ссылки

Johns T. Should you be persuaded: Two samples of data-driven learning materials. – na, 1991. – Т. 4. – С. 1-16.

Gaskell D., Cobb T. Can learners use concordance feedback for writing errors? //System. – 2004. – Т. 32. – №. 3. – С. 301-319.

Tiedemann J. Parallel data, tools and interfaces in OPUS //Lrec. – 2012. – Т. 2012. – С. 2214-2218.

Nigmatova L., Avezov S. Применение методов NLP в корпусных исследованиях: особенности и ограничения //«Узбекские национальные образовательные здания теоретическое и практическое создание вопросы" Международная научно-практическая конференция. – 2023. – Т. 2. – №. 2.

Sobirovich S. A. A PRAGMATICALLY ORIENTED APPROACH TO GENERATIVE LINGUISTICS //CURRENT RESEARCH JOURNAL OF PHILOLOGICAL SCIENCES. – 2024. – Т. 5. – №. 04. – С. 69-75.

Авезов С. КОРПУСНАЯ ЛИНГВИСТИКА: НОВЫЕ ПОДХОДЫ К АНАЛИЗУ ЯЗЫКА И ИХ ПРИЛОЖЕНИЯ В ОБУЧЕНИИ ИНОСТРАННЫМ ЯЗЫКАМ //International Bulletin of Applied Science and Technology. – 2023. – Т. 3. – №. 7. – С. 177-181.

Sobirovich S. A. CORPUS LINGUISTICS: A HISTORICAL OVERVIEW //Educator Insights: Journal of Teaching Theory and Practice. – 2025. – Т. 1. – №. 6. – С. 396-403.