ACADEMIC RESEARCH IN MODERN SCIENCE
International scientific-online conference
172
USING ENGLISH-UZBEK PARALLEL CORPORA IN EFL DATA-
DRIVEN LEARNING
Amonova Makhbuba Olimovna
English language teacher at General Education
School No. 43.Bukhara
https://doi.org/10.5281/zenodo.16938818
Abstract.
Parallel corpora create a bridge between form and meaning
across languages and, when embedded in data-driven learning (DDL), enable
learners to interrogate authentic evidence rather than rely on prescriptive rule
lists. Building on well-established DDL principles and recent meta-analytic
evidence, the article maps readily accessible English–Uzbek resources (OPUS,
JW300, the Uzbek National Corpus’ parallel section) and workable tools
(ParaConc, AntPConc, Sketch Engine’s parallel concordancer).
Keywords:
parallel corpus; English–Uzbek; data-driven learning;
concordancing; collocation; translation shifts; OPUS/JW300; Sketch Engine
Introduction.
Data-driven learning positions learners as investigators who
query concordances and infer patterns from usage. Tim Johns captured the
stance crisply: “the language-learner is also, essentially, a research worker” and
“the most important computing tool… is the concordancer” [1. p. 6]. In parallel
settings, the same investigative loop extends across aligned translations, letting
EFL learners contrast English phraseology with Uzbek renderings and observe
real translation choices. Large open collections such as OPUS, now listing 747
languages and tens of billions of aligned sentence pairs, supply the raw material;
classroom-friendly concordancers and clear task design supply the pedagogy.
Methods and literature review.
We adopt a two-stage protocol: Stage 1 is
a resource audit (identify English–Uzbek sources in OPUS, JW300, and the Uzbek
National Corpus; record licensing, genres, and access modes). Stage 2 is
implementation: short DDL cycles where learners use a parallel concordancer
(ParaConc, AntPConc, or Sketch Engine) to query English targets (e.g., take into
account, according to); observe Uzbek correspondences; generalize
distributional constraints, and produce micro-outputs (gap-fills, micro-
translations, paraphrases). Tools were selected for parallel search, alignment
inspection, and frequency/collocation support.
DDL’s rationale reaches back to Johns and early classroom concordancing;
recent syntheses show consistent learning gains when learners consult corpora
directly. Johns’ argument for learner-as-researcher remains the governing idea,
while a multilevel meta-analysis reports medium positive effects for vocabulary
learning with corpus use, moderated by proficiency, training, and duration. For
ACADEMIC RESEARCH IN MODERN SCIENCE
International scientific-online conference
173
parallel data supply, OPUS provides the largest open ecosystem; JW300 adds
wide coverage across 300+ languages, including Uzbek, with sentence-aligned
articles.
Results.
Open repositories expose multiple English–Uzbek entry points.
OPUS offers umbrella access (OpenSubtitles, Tatoeba, CCMatrix, etc.); JW300
aggregates Watchtower/“Awake!” translations across 300+ languages; the
Uzbek National Corpus hosts a dedicated parallel section.
Table 1. Public English–Uzbek parallel resources for EFL DDL
Resource
EN–UZ
presence
Scale signal
License /
access
Typical genres
Citation
OPUS
collection
Yes
(multiple
subcorpora)
58.85B
sentence pairs
across 747
languages
(collection-
level)
Open
downloads /
web query
subtitles, web,
institutional
opus.nlpl.eu
JW300
Yes
300+
languages;
109M
sentences
overall
Research use;
paper with
links
religious/educati
onal expository
aclanthology.org
Uzbek
National
Corpus –
Parallel
Yes
developing;
institutional
Web interface
mixed; curated
uzbekcorpus.uz
Parallel concordancers differ in depth of alignment control and analytics.
Table 2. Parallel concordance tools and pedagogical affordances
Tool
Key
functions
for
parallel DDL
EFL
outcomes
it
supports
ParaConc
KWIC in two panes;
alignment
utilities;
collocate spans; regex
noticing
translation
options; form–function
mapping;
contrastive
collocation
AntPConc
Freeware; UTF-8; simple
parallel
search
for
beginners
low-barrier DDL tasks;
quick verification of
equivalents
Sketch Engine (Parallel
concordance)
Alignment
view;
lemma/POS
display;
frequency and filters;
user corpora
pattern generalization;
lexico-grammatical
profiling
across
languages
ACADEMIC RESEARCH IN MODERN SCIENCE
International scientific-online conference
174
We operationalized three repeatable templates for English–Uzbek DDL
sessions—each built around a parallel query plus a constrained output. Meta-
analytic results indicate medium gains for vocabulary learning with corpus
consultation, with stronger effects for in-depth knowledge dimensions and when
training/time are adequate. Representative figures reported: posttest Hedges’ g
≈ 0.74; delayed g ≈ 0.64.
Discussion.
Parallel concordances externalize contrasts that textbooks
often flatten. Observing how according to disperses into ga ko‘ra plus case
marking, or how light-verb constructions map onto Uzbek verbal morphology,
encourages hypothesis formation grounded in real usage, aligning with the
original DDL principle of learner inquiry.
For entry-level courses, AntPConc reduces friction; for advanced cohorts,
Sketch Engine’s alignment view and filtering accelerate deeper generalizations.
ParaConc sits comfortably in the middle for stable lab use. Mixed-tool sequences
work well: quick confirmation in AntPConc, broader sampling in Sketch Engine,
portfolio export in ParaConc.
Web-crawled parallel data vary in cleanliness and domain balance; audits
warn about mislabeling and noisy alignments. A simple classroom safeguard is
triangulation: cross-check a finding in two sources (JW300 and a curated
subcorpus in OPUS). When possible, rely on curated institutional corpora such
as the Uzbek National Corpus’ parallel section.
Conclusion.
Parallel corpora make contrast visible and negotiable for
learners; with modest training and carefully chosen tools, English–Uzbek DDL
can deliver robust vocabulary and phraseological gains while cultivating analytic
habits that persist beyond a single course. The resource landscape is mature
enough for immediate adoption, provided teachers curate datasets and model
evidence-based choices at each step.
References:
1. Johns T. Should you be persuaded: Two samples of data-driven learning
materials. – na, 1991. – Т. 4. – С. 1-16.
2. Gaskell D., Cobb T. Can learners use concordance feedback for writing errors?
//System. – 2004. – Т. 32. – №. 3. – С. 301-319.
3. Tiedemann J. Parallel data, tools and interfaces in OPUS //Lrec. – 2012. – Т.
2012. – С. 2214-2218.
4. Nigmatova L., Avezov S. Применение методов NLP в корпусных
исследованиях: особенности и ограничения //«Узбекские национальные
образовательные здания теоретическое и практическое создание
ACADEMIC RESEARCH IN MODERN SCIENCE
International scientific-online conference
175
вопросы" Международная научно-практическая конференция. – 2023. – Т.
2. – №. 2.
5. Sobirovich S. A. A PRAGMATICALLY ORIENTED APPROACH TO GENERATIVE
LINGUISTICS //CURRENT RESEARCH JOURNAL OF PHILOLOGICAL SCIENCES. –
2024. – Т. 5. – №. 04. – С. 69-75.
6. Авезов С. КОРПУСНАЯ ЛИНГВИСТИКА: НОВЫЕ ПОДХОДЫ К АНАЛИЗУ
ЯЗЫКА И ИХ ПРИЛОЖЕНИЯ В ОБУЧЕНИИ ИНОСТРАННЫМ ЯЗЫКАМ
//International Bulletin of Applied Science and Technology. – 2023. – Т. 3. – №.
7. – С. 177-181.
7. Sobirovich S. A. CORPUS LINGUISTICS: A HISTORICAL OVERVIEW //Educator
Insights: Journal of Teaching Theory and Practice. – 2025. – Т. 1. – №. 6. – С. 396-
403.