Xorijiy lingvistika va lingvodidaktika –
Зарубежная лингвистика и
лингводидактика – Foreign
Linguistics and Linguodidactics
Journal home page:
https://inscience.uz/index.php/foreign-linguistics
The role of newspaper in the formation of corpus data
Maqsud KAKHOROV
1
Uzbekistan State World Languages University
ARTICLE INFO
ABSTRACT
Article history:
Received March 2025
Received in revised form
10
April 2025
Accepted 2 April 2025
Available online
25 May 2025
Newspapers have historically played a pivotal role in the
development and enrichment of linguistic corpora, serving as a
key source of contemporary language use, socio-political
discourse, and journalistic style. This paper explores the
significance of newspaper texts in corpus linguistics, examining
their contribution to lexical diversity, syntactic structure, and
language change over time. The study highlights how
newspaper content provides valuable insights into public
communication, ideological framing, and linguistic trends.
Furthermore, it discusses the limitations and biases inherent in
relying on newspapers as corpus data, including issues of
editorial influence, regional imbalance, and lack of
conversational language. The paper concludes by advocating for
a balanced approach to corpus design that integrates
newspaper material with other spoken and written sources for
a more comprehensive linguistic analysis.
2181-3701/© 2025 in Science LLC.
DOI:
https://doi.org/10.47689/2181-3701-vol3-iss5
This is an open-access article under the Attribution 4.0 International
(CC BY 4.0) license (
https://creativecommons.org/licenses/by/4.0/deed.ru
Keywords:
Grammar,
phonology,
morphology,
ESP,
ESL,
EFL,
General Service List (GSL),
Academic Word List (AWL).
Korpus ma’lumotlarini shakllantirishda gazetalarning
o‘rni
ANNOTATSIYA
Калит сўзлар:
Grammatika,
fonologiya,
morfologiya,
maxsus maqsadlar uchun
ingliz tili (ESP),
ikkinchi til sifatida ingliz tili
(ESL),
chet tili sifatida ingliz tili
(EFL),
Umumiy xizmat ro‘yxati
Gazetalar tarixan lingvistik korpuslarning rivojlanishi va
boyitilishida muhim rol o‘ynab, zamonaviy til qo‘llanishi,
ijtimoiy-siyosiy nutq va jurnalistik uslubning asosiy manbai
bo‘lib kelgan. Ushbu maqola gazeta matnlarining korpus
tilshunosligidagi ahamiyatini o‘rganib, ularning lug‘at boyligi,
sintaktik tuzilishi va tilning vaqt o‘tishi bilan o‘zgarishiga
qo‘shgan hissasini tahlil qiladi. Tadqiqot gazeta mazmunining
jamoatchilik bilan muloqot, g‘oyaviy shakllantirish va lingvistik
tendensiyalar
haqida
qimmatli
ma’lumotlar
berishini
1
Teacher, Uzbekistan State World Languages University. E-mail: maqsudqahhorov19@gmail.com
Xorijiy lingvistika va lingvodidaktika – Зарубежная лингвистика
и лингводидактика – Foreign Linguistics and Linguodidactics
Special Issue – 5 (2025) / ISSN 2181-3701
566
(GSL),
Akademik so‘zlar ro‘yxati
(AWL).
ta’kidlaydi. Bundan tashqari, maqolada gazetalarga korpus
ma’lumotlari sifatida tayanishning cheklovlari va muammolari,
jumladan tahririyat ta’siri, mintaqaviy nomutanosiblik va
so‘zlashuv tilining yetishmasligi masalalari muhokama qilinadi.
Maqola yanada keng qamrovli lingvistik tahlil uchun gazeta
materiallarini boshqa og‘zaki va yozma manbalar bilan
uyg‘unlashtiradigan muvozanatli korpus yaratish yondashuvini
tavsiya etish bilan yakunlanadi.
Роль газет в формировании корпусных данных
АННОТАЦИЯ
Ключевые слова:
Грамматика,
фонология,
морфология,
английский для
специальных целей (ESP),
английский как второй
язык (ESL),
английский как
иностранный язык (EFL),
Общий сервисный список
(GSL),
Академический список
слов (AWL).
Исторически газеты играли ключевую роль в развитии
и обогащении лингвистических корпусов, служа основным
источником
современного
употребления
языка,
общественно-политического дискурса и журналистского
стиля. В данной работе рассматривается значение
газетных текстов в корпусной лингвистике, их вклад
в лексическое разнообразие, синтаксическую структуру и
изменение языка с течением времени. Исследование
подчеркивает, как газетный контент предоставляет
ценные
сведения
о
публичной
коммуникации,
идеологическом фрейминге и языковых тенденциях. Кроме
того, в работе обсуждаются ограничения и предвзятости,
присущие использованию газет в качестве корпусных
данных, включая вопросы редакционного влияния,
регионального дисбаланса и отсутствия разговорной речи.
Работа завершается призывом к сбалансированному
подходу к дизайну корпуса, который объединяет газетный
материал с другими устными и письменными источниками
для более комплексного лингвистического анализа.
Reading is one of the most common and important ways of learning foreign
languages. Therefore, Nagy mentioned that vocabulary is a major prerequisite and
causative factor in reading comprehension. It is believed that vocabulary is probably the
key type of knowledge necessary for both first languages (L1) and second languages (L2),
“because if words to express concepts are not known, all syntactic and discourse
knowledge is of little use”. Also, Coxhead asserted that grammar, phonology, and
morphology can emerge by studying vocabulary. Newspapers have influential roles in
learning vocabulary and are “widely used in a range of education contexts”. For example,
through reading various newspapers and magazines, learners can cope with their
comprehension issues. Chung asserted that one of the most important sources of reading
materials is the newspaper. Consequently, it is necessary to develop a specialized word
list for English newspapers. Accordingly, There has been several efforts to create
newspaper word list that has practical applications for readers and writers of English
newspapers: The results can be beneficial to both English-language newspapers
produced in countries where English is the dominant or official language, and English-
language newspapers produced in countries where English is not the official language.
Xorijiy lingvistika va lingvodidaktika – Зарубежная лингвистика
и лингводидактика – Foreign Linguistics and Linguodidactics
Special Issue – 5 (2025) / ISSN 2181-3701
567
Furthermore, frequent words are more familiar, and this familiarity could be a
motivational factor for helping learners read and understand the related texts. This
makes more sense in EFL/ESL contexts, particularly, in the EFL context. This is because,
in EFL/ESL and specifically EFL contexts, readers need more familiar words to
understand the texts.
Throughout the history of word list establishment, a large number of attempts
have focused on developing core word lists for general purposes (CWLs) and English for
specific purposes (ESP) including general academic purposes and discipline-specific
academic/non-academic purposes (DSA/N-AWLs).
One of the most impactful core vocabulary lists created for general use is the General
Service List (GSL), established by West in 1953. The GSL comprises 2,000 of the most
frequently used word families, providing approximately 85% coverage of typical English
texts. Recognized as the most significant word list and widely acknowledged, the GSL has
influenced the development of Coxthead’s Academic Word List (AWL) and several
contemporary word lists. The GSL has faced criticism from various angles regarding its
principles for compiling words, its age, and the comprehensiveness of the list. These
objections have prompted the creation of alternative word lists for general use. As noted by
Hu and Nation, understanding approximately 98% of the total words in a text (tokens or
overall occurrences of running words) is essential for sufficient reading comprehension.
Such criticisms have inspired researchers to formulate new word lists tailored for ESP.
Coxhead’s academic word list (AWL), which is a General Academic Word List (GAWL), was
the first significant computerized list derived from corpora consisting of 3.5 million running
words. Utilizing range and frequency as the basis for its creation, the AWL incorporated
570 word families that were not included in the GSL. Despite the AWL's pioneering impact
on the development of word lists, it has been critiqued for its expansive corpus, its
applicability to ESP courses, and the semantic and grammatical variations it exhibits across
different fields. These critiques have resulted in the creation of new word lists aimed at
general academic use. Furthermore, the general academic word lists developed do not cater
to the specific needs of all learners. For instance, students focusing on applied linguistics or
chemistry have their own unique vocabulary. This limitation of GAWLs has prompted
scholars to create word lists tailored for specific purposes.
Over the course of word list development, numerous researchers have conducted
various studies focused on specific purposes. However, there are only two studies that
relate to a newspaper word list. Chung developed a specialized word list intended for
reading newspapers. The electronic newspaper texts were collected from a period of
publication spanning from February 23 to May 23, 2006. Chung's corpus was selected
from three different newspapers: The Dominion Post from New Zealand,
The Independent from the United Kingdom, and The New York Times from the United
States, totaling 579,849 running words. Additionally, Chung's corpus was segmented into
12 sections. By using range and frequency as criteria, Chung’s research produced a list of
588 word families that were not part of the GSL. Chung's study is subject to various
criticisms. First, the size of the corpus is too limited for lower-frequency words to be
included. This is due to the fact that larger corpora enhance opportunities for the
occurrence of less frequent words. First, Chung’s research only examined four news
divisions (Business, National, Sports, and International), which does not accurately
represent all sections of newspaper publications. These factors play a significant role in the
Xorijiy lingvistika va lingvodidaktika – Зарубежная лингвистика
и лингводидактика – Foreign Linguistics and Linguodidactics
Special Issue – 5 (2025) / ISSN 2181-3701
568
representativeness and generalizability of studies, which are seen as crucial elements in
conducting such research. In a separate investigation, Zhu developed a word list for English
newspapers using a seven-million-word corpus derived from The New York Times. Zhu’s
findings resulted in a word list comprising 405 technical word families related to the
newspaper. While Zhu’s corpus was sufficiently large, it was limited to various sections of
The New York Times. As a result, the representativeness of this corpus and its findings may
be somewhat unique to that particular source. To fill the gaps identified in Chung’s and
research, the present researchers sought to create a new English newspaper word list. They
assembled a corpus that was twice the size of Chung’s and revised the prior lists. It is clear
that the vocabulary in newspapers is shaped by ongoing global events, so lists derived from
these publications require regular updates. The corpus from Chung’s study was compiled in
2006, while the corpus for the present study was gathered in 2018. Thus, the outcomes of
the current research are more current compared to those of Chung’s study. Additionally, the
corpus for the current study encompasses a greater variety of sections, sub-sections, and
newspaper types (20 sections and 4 types specifically) than those found in Chung’s (2009)
and Zhu’s research.
REFERENCES:
1.
Nagy, W. E. (1988). Teaching vocabulary to improve reading comprehension.
International Reading Association
2.
Coxhead, A., & Hirsch, D. (2007). A pilot science-specific word list. Revue
Française de Linguistique Appliquée, 12(2), 65-78.
3.
Scott, M., & Tribble, C. (2006). Textual patterns: Key words and corpus analysis
in language. John Benjamins Publishing Company
4.
Chung, M. (2009). The newspaper word list: A specialised vocabulary for reading
newspapers. JALT Journal, 31(2), 159-182
5.
Brezina, V., & Gablasova, D. (2015). Is there a core general vocabulary?
Introducing the new general service list. Applied Linguistics, 36(1), 1-22.
https://doi.org/10.1093/applin/amt018
6.
Todd, R. W. (2017). An opaque engineering word list: Which words should a
teacher focus on? English for Specific Purposes, 45, 31-39. http://dx.doi.org/
10.1016/j.esp.2016.08.003
7.
Paquot, M. (2007). Towards a productively-oriented academic word list. In J.
Walinski, K. Kredens & S. GozdzRoszkowski (Eds.), Practical applications in language and
computers 2005 (pp. 127–140). Peter Lang
8.
Xodabande, I., & Xodabande, N. (2020). Academic vocabulary in psychology
research articles: A corpus-based study. MEXTESOL Journal, 44(3), 1-21
9.
Hu, M., & Nation, P. (2000). Unknown vocabulary density and reading comprehension.
Reading in a Foreign Language, 13(1), 403-430. https://nflrc.hawaii.edu/rfl/item-detail/43
10.
Chen, Q., & Ge, G. C. (2007). A corpus-based lexical study on frequency and
distribution of Coxhead’s AWL word families in medical research articles (RAs). English
for Specific Purposes, 26(4), 502-514. https://doi.org/ 10.1016/j.esp.2007.04.003
11.
Chung, M. (2009). The newspaper word list: A specialised vocabulary for
reading newspapers. JALT Journal, 31(2), 159-182.
12.
Zhu, J. (2017). The technical vocabulary of newspapers [Master’s thesis,
University of Western Ontario]. Scholarship@ Western: Electronic Thesis and
Dissertation. Repository. https://ir.lib.uwo.ca/etd/4872
