Авторы

  • Максуд Кахоров
    Преподаватель, Узбекский государственный университет мировых языков

DOI:

https://doi.org/10.71337/inlibrary.uz.foreign-linguistics.133399

Ключевые слова:

Key words: Grammar phonology morphology ESP ESL EFL General Service List (GSL) Academic Word List (AWL).

Аннотация

Исторически газеты играли ключевую роль в развитии и обогащении лингвистических корпусов, служа основным источником современного употребления языка, общественно-политического дискурса и журналистского стиля. В данной работе рассматривается значение газетных текстов в корпусной лингвистике, их вклад в лексическое разнообразие, синтаксическую структуру и изменение языка с течением времени. Исследование подчеркивает, как газетный контент предоставляет ценные сведения о публичной коммуникации, идеологическом фрейминге и языковых тенденциях. Кроме того, в работе обсуждаются ограничения и предвзятости, присущие использованию газет в качестве корпусных данных, включая вопросы редакционного влияния, регионального дисбаланса и отсутствия разговорной речи. Работа завершается призывом к сбалансированному подходу к дизайну корпуса, который объединяет газетный материал с другими устными и письменными источниками для более комплексного лингвистического анализа.


background image

Xorijiy lingvistika va lingvodidaktika –

Зарубежная лингвистика и
лингводидактика – Foreign

Linguistics and Linguodidactics

Journal home page:

https://inscience.uz/index.php/foreign-linguistics

The role of newspaper in the formation of corpus data

Maqsud KAKHOROV

1

Uzbekistan State World Languages University

ARTICLE INFO

ABSTRACT

Article history:

Received March 2025
Received in revised form
10

April 2025

Accepted 2 April 2025
Available online
25 May 2025

Newspapers have historically played a pivotal role in the

development and enrichment of linguistic corpora, serving as a

key source of contemporary language use, socio-political

discourse, and journalistic style. This paper explores the
significance of newspaper texts in corpus linguistics, examining

their contribution to lexical diversity, syntactic structure, and

language change over time. The study highlights how

newspaper content provides valuable insights into public
communication, ideological framing, and linguistic trends.

Furthermore, it discusses the limitations and biases inherent in

relying on newspapers as corpus data, including issues of

editorial influence, regional imbalance, and lack of
conversational language. The paper concludes by advocating for

a balanced approach to corpus design that integrates

newspaper material with other spoken and written sources for

a more comprehensive linguistic analysis.

2181-3701/© 2025 in Science LLC.
DOI:

https://doi.org/10.47689/2181-3701-vol3-iss5

/S

-pp565-568

This is an open-access article under the Attribution 4.0 International
(CC BY 4.0) license (

https://creativecommons.org/licenses/by/4.0/deed.ru

)

Keywords:

Grammar,

phonology,

morphology,

ESP,

ESL,

EFL,

General Service List (GSL),
Academic Word List (AWL).

Korpus ma’lumotlarini shakllantirishda gazetalarning

o‘rni

ANNOTATSIYA

Калит сўзлар:

Grammatika,

fonologiya,

morfologiya,

maxsus maqsadlar uchun

ingliz tili (ESP),

ikkinchi til sifatida ingliz tili
(ESL),

chet tili sifatida ingliz tili
(EFL),

Umumiy xizmat ro‘yxati

Gazetalar tarixan lingvistik korpuslarning rivojlanishi va

boyitilishida muhim rol o‘ynab, zamonaviy til qo‘llanishi,

ijtimoiy-siyosiy nutq va jurnalistik uslubning asosiy manbai

bo‘lib kelgan. Ushbu maqola gazeta matnlarining korpus
tilshunosligidagi ahamiyatini o‘rganib, ularning lug‘at boyligi,

sintaktik tuzilishi va tilning vaqt o‘tishi bilan o‘zgarishiga

qo‘shgan hissasini tahlil qiladi. Tadqiqot gazeta mazmunining

jamoatchilik bilan muloqot, g‘oyaviy shakllantirish va lingvistik
tendensiyalar

haqida

qimmatli

ma’lumotlar

berishini

1

Teacher, Uzbekistan State World Languages University. E-mail: maqsudqahhorov19@gmail.com


background image

Xorijiy lingvistika va lingvodidaktika – Зарубежная лингвистика

и лингводидактика – Foreign Linguistics and Linguodidactics

Special Issue – 5 (2025) / ISSN 2181-3701

566

(GSL),

Akademik so‘zlar ro‘yxati

(AWL).

ta’kidlaydi. Bundan tashqari, maqolada gazetalarga korpus

ma’lumotlari sifatida tayanishning cheklovlari va muammolari,
jumladan tahririyat ta’siri, mintaqaviy nomutanosiblik va

so‘zlashuv tilining yetishmasligi masalalari muhokama qilinadi.

Maqola yanada keng qamrovli lingvistik tahlil uchun gazeta

materiallarini boshqa og‘zaki va yozma manbalar bilan
uyg‘unlashtiradigan muvozanatli korpus yaratish yondashuvini

tavsiya etish bilan yakunlanadi.

Роль газет в формировании корпусных данных

АННОТАЦИЯ

Ключевые слова:

Грамматика,

фонология,

морфология,

английский для
специальных целей (ESP),

английский как второй
язык (ESL),

английский как
иностранный язык (EFL),

Общий сервисный список
(GSL),

Академический список

слов (AWL).

Исторически газеты играли ключевую роль в развитии

и обогащении лингвистических корпусов, служа основным

источником

современного

употребления

языка,

общественно-политического дискурса и журналистского

стиля. В данной работе рассматривается значение

газетных текстов в корпусной лингвистике, их вклад
в лексическое разнообразие, синтаксическую структуру и

изменение языка с течением времени. Исследование

подчеркивает, как газетный контент предоставляет

ценные

сведения

о

публичной

коммуникации,

идеологическом фрейминге и языковых тенденциях. Кроме

того, в работе обсуждаются ограничения и предвзятости,

присущие использованию газет в качестве корпусных

данных, включая вопросы редакционного влияния,
регионального дисбаланса и отсутствия разговорной речи.

Работа завершается призывом к сбалансированному

подходу к дизайну корпуса, который объединяет газетный

материал с другими устными и письменными источниками
для более комплексного лингвистического анализа.


Reading is one of the most common and important ways of learning foreign

languages. Therefore, Nagy mentioned that vocabulary is a major prerequisite and
causative factor in reading comprehension. It is believed that vocabulary is probably the
key type of knowledge necessary for both first languages (L1) and second languages (L2),
“because if words to express concepts are not known, all syntactic and discourse
knowledge is of little use”. Also, Coxhead asserted that grammar, phonology, and
morphology can emerge by studying vocabulary. Newspapers have influential roles in
learning vocabulary and are “widely used in a range of education contexts”. For example,
through reading various newspapers and magazines, learners can cope with their
comprehension issues. Chung asserted that one of the most important sources of reading
materials is the newspaper. Consequently, it is necessary to develop a specialized word
list for English newspapers. Accordingly, There has been several efforts to create
newspaper word list that has practical applications for readers and writers of English
newspapers: The results can be beneficial to both English-language newspapers
produced in countries where English is the dominant or official language, and English-
language newspapers produced in countries where English is not the official language.


background image

Xorijiy lingvistika va lingvodidaktika – Зарубежная лингвистика

и лингводидактика – Foreign Linguistics and Linguodidactics

Special Issue – 5 (2025) / ISSN 2181-3701

567

Furthermore, frequent words are more familiar, and this familiarity could be a
motivational factor for helping learners read and understand the related texts. This
makes more sense in EFL/ESL contexts, particularly, in the EFL context. This is because,
in EFL/ESL and specifically EFL contexts, readers need more familiar words to
understand the texts.

Throughout the history of word list establishment, a large number of attempts

have focused on developing core word lists for general purposes (CWLs) and English for
specific purposes (ESP) including general academic purposes and discipline-specific
academic/non-academic purposes (DSA/N-AWLs).

One of the most impactful core vocabulary lists created for general use is the General

Service List (GSL), established by West in 1953. The GSL comprises 2,000 of the most
frequently used word families, providing approximately 85% coverage of typical English
texts. Recognized as the most significant word list and widely acknowledged, the GSL has
influenced the development of Coxthead’s Academic Word List (AWL) and several
contemporary word lists. The GSL has faced criticism from various angles regarding its
principles for compiling words, its age, and the comprehensiveness of the list. These
objections have prompted the creation of alternative word lists for general use. As noted by
Hu and Nation, understanding approximately 98% of the total words in a text (tokens or
overall occurrences of running words) is essential for sufficient reading comprehension.
Such criticisms have inspired researchers to formulate new word lists tailored for ESP.
Coxhead’s academic word list (AWL), which is a General Academic Word List (GAWL), was
the first significant computerized list derived from corpora consisting of 3.5 million running
words. Utilizing range and frequency as the basis for its creation, the AWL incorporated
570 word families that were not included in the GSL. Despite the AWL's pioneering impact
on the development of word lists, it has been critiqued for its expansive corpus, its
applicability to ESP courses, and the semantic and grammatical variations it exhibits across
different fields. These critiques have resulted in the creation of new word lists aimed at
general academic use. Furthermore, the general academic word lists developed do not cater
to the specific needs of all learners. For instance, students focusing on applied linguistics or
chemistry have their own unique vocabulary. This limitation of GAWLs has prompted
scholars to create word lists tailored for specific purposes.

Over the course of word list development, numerous researchers have conducted

various studies focused on specific purposes. However, there are only two studies that

relate to a newspaper word list. Chung developed a specialized word list intended for

reading newspapers. The electronic newspaper texts were collected from a period of

publication spanning from February 23 to May 23, 2006. Chung's corpus was selected

from three different newspapers: The Dominion Post from New Zealand,

The Independent from the United Kingdom, and The New York Times from the United

States, totaling 579,849 running words. Additionally, Chung's corpus was segmented into

12 sections. By using range and frequency as criteria, Chung’s research produced a list of

588 word families that were not part of the GSL. Chung's study is subject to various

criticisms. First, the size of the corpus is too limited for lower-frequency words to be

included. This is due to the fact that larger corpora enhance opportunities for the

occurrence of less frequent words. First, Chung’s research only examined four news

divisions (Business, National, Sports, and International), which does not accurately

represent all sections of newspaper publications. These factors play a significant role in the


background image

Xorijiy lingvistika va lingvodidaktika – Зарубежная лингвистика

и лингводидактика – Foreign Linguistics and Linguodidactics

Special Issue – 5 (2025) / ISSN 2181-3701

568

representativeness and generalizability of studies, which are seen as crucial elements in

conducting such research. In a separate investigation, Zhu developed a word list for English

newspapers using a seven-million-word corpus derived from The New York Times. Zhu’s

findings resulted in a word list comprising 405 technical word families related to the

newspaper. While Zhu’s corpus was sufficiently large, it was limited to various sections of

The New York Times. As a result, the representativeness of this corpus and its findings may

be somewhat unique to that particular source. To fill the gaps identified in Chung’s and

research, the present researchers sought to create a new English newspaper word list. They

assembled a corpus that was twice the size of Chung’s and revised the prior lists. It is clear

that the vocabulary in newspapers is shaped by ongoing global events, so lists derived from

these publications require regular updates. The corpus from Chung’s study was compiled in

2006, while the corpus for the present study was gathered in 2018. Thus, the outcomes of

the current research are more current compared to those of Chung’s study. Additionally, the

corpus for the current study encompasses a greater variety of sections, sub-sections, and

newspaper types (20 sections and 4 types specifically) than those found in Chung’s (2009)

and Zhu’s research.

REFERENCES:

1.

Nagy, W. E. (1988). Teaching vocabulary to improve reading comprehension.

International Reading Association

2.

Coxhead, A., & Hirsch, D. (2007). A pilot science-specific word list. Revue

Française de Linguistique Appliquée, 12(2), 65-78.

3.

Scott, M., & Tribble, C. (2006). Textual patterns: Key words and corpus analysis

in language. John Benjamins Publishing Company

4.

Chung, M. (2009). The newspaper word list: A specialised vocabulary for reading

newspapers. JALT Journal, 31(2), 159-182

5.

Brezina, V., & Gablasova, D. (2015). Is there a core general vocabulary?

Introducing the new general service list. Applied Linguistics, 36(1), 1-22.

https://doi.org/10.1093/applin/amt018

6.

Todd, R. W. (2017). An opaque engineering word list: Which words should a

teacher focus on? English for Specific Purposes, 45, 31-39. http://dx.doi.org/

10.1016/j.esp.2016.08.003

7.

Paquot, M. (2007). Towards a productively-oriented academic word list. In J.

Walinski, K. Kredens & S. GozdzRoszkowski (Eds.), Practical applications in language and

computers 2005 (pp. 127–140). Peter Lang

8.

Xodabande, I., & Xodabande, N. (2020). Academic vocabulary in psychology

research articles: A corpus-based study. MEXTESOL Journal, 44(3), 1-21

9.

Hu, M., & Nation, P. (2000). Unknown vocabulary density and reading comprehension.

Reading in a Foreign Language, 13(1), 403-430. https://nflrc.hawaii.edu/rfl/item-detail/43

10.

Chen, Q., & Ge, G. C. (2007). A corpus-based lexical study on frequency and

distribution of Coxhead’s AWL word families in medical research articles (RAs). English

for Specific Purposes, 26(4), 502-514. https://doi.org/ 10.1016/j.esp.2007.04.003

11.

Chung, M. (2009). The newspaper word list: A specialised vocabulary for

reading newspapers. JALT Journal, 31(2), 159-182.

12.

Zhu, J. (2017). The technical vocabulary of newspapers [Master’s thesis,

University of Western Ontario]. Scholarship@ Western: Electronic Thesis and

Dissertation. Repository. https://ir.lib.uwo.ca/etd/4872

Библиографические ссылки

Nagy, W. E. (1988). Teaching vocabulary to improve reading comprehension. International Reading Association

Coxhead, A., & Hirsch, D. (2007). A pilot science-specific word list. Revue Française de Linguistique Appliquée, 12(2), 65-78.

Scott, M., & Tribble, C. (2006). Textual patterns: Key words and corpus analysis in language. John Benjamins Publishing Company

Chung, M. (2009). The newspaper word list: A specialised vocabulary for reading newspapers. JALT Journal, 31(2), 159-182

Brezina, V., & Gablasova, D. (2015). Is there a core general vocabulary? Introducing the new general service list. Applied Linguistics, 36(1), 1-22. https://doi.org/10.1093/applin/amt018

Todd, R. W. (2017). An opaque engineering word list: Which words should a teacher focus on? English for Specific Purposes, 45, 31-39. http://dx.doi.org/10.1016/j.esp.2016.08.003

Paquot, M. (2007). Towards a productively-oriented academic word list. In J. Walinski, K. Kredens & S. GozdzRoszkowski (Eds.), Practical applications in language and computers 2005 (pp. 127–140). Peter Lang

Xodabande, I., & Xodabande, N. (2020). Academic vocabulary in psychology research articles: A corpus-based study. MEXTESOL Journal, 44(3), 1-21

Hu, M., & Nation, P. (2000). Unknown vocabulary density and reading comprehension. Reading in a Foreign Language, 13(1), 403-430. https://nflrc.hawaii.edu/rfl/item-detail/43

Chen, Q., & Ge, G. C. (2007). A corpus-based lexical study on frequency and distribution of Coxhead’s AWL word families in medical research articles (RAs). English for Specific Purposes, 26(4), 502-514. https://doi.org/ 10.1016/j.esp.2007.04.003

Chung, M. (2009). The newspaper word list: A specialised vocabulary for reading newspapers. JALT Journal, 31(2), 159-182.

Zhu, J. (2017). The technical vocabulary of newspapers [Master’s thesis, University of Western Ontario]. Scholarship@ Western: Electronic Thesis and Dissertation. Repository. https://ir.lib.uwo.ca/etd/4872