CORPUS LINGUISTICS AND THE BRITISH NATIONAL CORPUS (BNC)

Lola  Nazarova

doi:10.71337/inlibrary.uz.science-research.34742

Authors

Lola Nazarova

DOI:

https://doi.org/10.71337/inlibrary.uz.science-research.34742

Abstract

In this article you can have some information about Corpus linguistics and The British National Corpus (BNC) and also it’s the fastest-growing methodologies in contemporary linguistics. In a conversational format, this article answers a few questions that corpus linguists regularly face from linguists who have not used corpus-based methods so far and the history, evaluation and features of The British National Corpus (BNC) in linguistics.

199

CORPUS LINGUISTICS AND THE BRITISH NATIONAL CORPUS (BNC)

Nazarova Lola Maqsadilla qizi

Graduate student of Termiz University of Economics and Service

+99890 608 70 56

https://doi.org/10.5281/zenodo.11643499

Abstract. In this article you can have some information about Corpus linguistics and

The

British National Corpus (BNC) and also it’s the fastest-growing methodologies in contemporary

linguistics. In a conversational format, this article answers a few questions that corpus linguists

regularly face from linguists who have not used corpus-based methods so far and the history,

evaluation and features of The British National Corpus (BNC) in linguistics.

Keywords: Corpus, corpus linguistics, methodology, the context of the classroom the

methodology of corpus linguistics, Quantitative and Qualitative Analyses, The British National

Corpus (BNC), orthographic transcriptions and others.

Aннотация. В этой статье вы можете получить некоторую информацию о

Корпусной лингвистике и Британском национальном корпусе (BNC), а также о самых

быстрорастущих методологиях в современной лингвистике. В разговорном формате эта

статья отвечает на несколько вопросов, с которыми регулярно сталкиваются корпусные

лингвисты от лингвистов, которые до сих пор не использовали корпусные методы, а

также на историю, оценку и особенности Британского национального корпуса (BNC) в

лингвистике.

Ключевые слова: Корпус, корпусная лингвистика, методология, контекст занятия,

методология корпусной лингвистики, количественный и качественный анализ, Британский

национальный корпус (BNC), орфографические транскрипции и другие.

Corpus linguistics is the study of language based on large collections of "real life" language

use stored in corpora (or corpuses)—computerized databases created for linguistic research. It is

also known as corpus-based studies.

Corpus linguistics isviewed by some linguists as a research tool or methodology and by

others as a discipline or theory in its own right. Sandra Kübler and Heike Zinsmeister state in their

book, "Corpus Linguistics and Linguistically Annotated Corpora," that "the answer to the question

whether corpus linguistics is a theory or a tool is simply that it can be both. It depends on how

corpus linguistics is applied."Although the methods used in corpus linguistics were first adopted

in the early 1960s, the term itself didn't appear until the 1980s.

200

"Corpus linguistics is...a methodology, comprising a large number of related methods

which can be used by scholars of many different theoretical leanings. On the other hand, it cannot

be denied that corpus linguistics is also frequently associated with a certain outlook on language.

At the centre of this outlook is that the rules of language are usage-based and that changes occur

when speakers use language to communicate with each other. The argument is that if you are

interested in the workings of a particular language, like English, it is a good idea to study language

in use. One efficient way of doing this is to use corpus methodology...."

"In the context of the classroom the methodology of corpus linguistics is congenial for

students of all levels because it is a 'bottoms-up' study of the language requiring very little learned

expertise to start with. Even the students that come to linguistic enquiry without a theoretical

apparatus learn very quickly to advance their hypotheses on the basis of their observations rather

than received knowledge, and test them against the evidence provided by the corpus."

"To make good use of corpus resources a teacher needs a modest orientation to the routines

involved in retrieving information from the corpus, and—most importantly—training and

experience in how to evaluate that information."

"Quantitative techniques are essential for corpus-based studies. For example, if you wanted

to compare the language use of patterns for the words big and large, you would need to know how

many times each word occurs in the corpus, how many different words co-occur with each of these

adjectives (the collocations), and how common each of those collocations is. These are all

quantitative measurements....

"A crucial part of the corpus-based approach is going beyond the quantitative patterns to

propose functional interpretations explaining why the patterns exist. As a result, a large amount of

effort in corpus-based studies is devoted to explaining and exemplifying quantitative patterns."

In corpus linguistics quantitative and qualitative methods are extensively used in

combination. It is also characteristic of corpus linguistics to begin with quantitative findings, and

work toward qualitative ones. But...the procedure may have cyclic elements. Generally it is

desirable to subject quantitative results to qualitative scrutiny—attempting to explain why a

particular frequency pattern occurs, for example. But on the other hand, qualitative analysis

(making use of the investigator's ability to interpret samples of language in context) may be the

means for classifying examples in a particular corpus by their meanings; and this qualitative

analysis may then be the input to a further quantitative analysis, one based on meaning...."

The British National Corpus (BNC) is a 100-million-word text corpus of samples of written

and spoken English from a wide range of sources.The corpus covers British English of the late

201

20th century from a wide variety of genres, with the intention that it be a representative sample of

spoken and written British English of that time. It is used in corpus linguistics for analysis of

corpora.

The project to create the BNC involved the collaboration of three publishers (with the

Oxford University Press as the lead collaborator, Longman and W. & R. Chambers), two

universities (the University of Oxford and Lancaster University), and the British Library. The

creation of the BNC started in 1991 under the management of the BNC consortium, and the project

was finished by 1994. There have been no additions of new samples after 1994, but the BNC

underwent slight revisions before the release of the second edition BNC World 2001 and the third

edition BNC XML Edition 2007.

The BNC was the vision of computational linguists whose goal was a corpus of modern (at

the time of building the corpus), naturally occurring language in the form of speech and text or

writing that could be analyzed by a computer. Hence, it was compiled as a general corpus to pave

the way for automatic search and processing in the field of corpus linguistics. One of the ways the

BNC was to be differentiated from existing corpora at that time was to open up the data not just to

academic research, but also to commercial and educational uses.

The corpus was restricted to just British English, and was not extended to cover World

Englishes. This was partly because a significant portion of the cost of the project was being funded

by the British government which was logically interested in supporting documentation of its own

linguistic variety. Because of its potentially unprecedented size, the BNC required funds from the

commercial and academic institutions as well. In turn, BNC data then became available for

commercial and academic research.

The BNC is a monolingual corpus, as it records samples of language use in British English

only, although occasionally words and phrases from other languages may also be present. It is a

synchronic corpus, as only language use from the late 20th century is represented; the BNC is not

meant to be a historical record of the development of British English over the ages. From the

beginning, those involved in the gathering of written data sought to make the BNC a balanced

corpus, and hence looked for data in various mediums.

90% of the BNC is samples of written corpus use. These samples were extracted from

regional and national newspapers, published research journals or periodicals from various

academic fields, fiction and non-fiction books, other published material, and unpublished material

such as leaflets, brochures, letters, essays written by students of differing academic levels,

speeches, scripts, and many other types of texts.

202

The remaining 10% of the BNC is samples of spoken language use. These are presented

and recorded in the form of orthographic transcriptions. The spoken corpus consists of two parts:

one part is demographic, containing the transcriptions of spontaneous natural conversations

produced by volunteers of various age groups, social classes and originating from different

regions. These conversations were produced in different situations, including formal business or

government meetings to conversations on radio shows and phone-ins. These were to account for

both the demographic distribution of spoken language and those of linguistically significant

variation due to context.

The other part involves context-governed samples such as transcriptions of recordings

made at specific types of meeting and event. All the original recordings transcribed for inclusion

in the BNC have been deposited at the British Library Sound Archive. The majority of the

recordings are freely available from the Oxford University Phonetics Laboratory.

The nature of the BNC as a large mixed corpus renders it unsuitable for the study of highly

specific text-types or genres, as any one of them is likely to be inadequately represented and may

not be recognisable from the encoding. For example, there are very few business letters and service

encounters in the BNC, and those wishing to explore their specific conventions would do better to

compile a small corpus including only texts of those types.

There are two general ways in which corpus material can be used in language

teaching.Firstly, publishers and researchers could use corpus samples to create language-learning

references, syllabuses and other related tools or materials. For example, the BNC was used by a

group of Japanese researchers as a tool in their creation of an English-language–learning website

for learners of English for specific purposes (ESP). The website enabled English-language learners

to download frequently heard and used sentence patterns, and then base their own usage of the

English language on these sentence patterns. The BNC served as the source from which the

frequently used expressions were extracted. In using this website, users thus relied on reference

samples from the BNC to guide them in their learning of the English language. Such creation of

materials that facilitate language-learning typically involves the use of very large corpora

(comparable to the size of the BNC), as well as advanced software and technology. A large amount

of money, time, and expertise in the field of computational linguistics are invested in the

development of such language-learning material.

Secondly, the analysis of the corpus can be incorporated directly into the language teaching

and learning environment. With this method, language learners are given the opportunity to

categorize language data from the corpus and subsequently form conclusions about the patterns

203

and features of their target language from their categorizations. This method involves a greater

amount of work on the part of the language leaner and is referred to as “data-driven learning” by

Tim Johns. The corpus data used for data-driven learning is relatively smaller, and consequently

the generalisations made about the target language may be of limited value.In general, the BNC is

useful as a reference source for the purposes of producing and perceiving text. The BNC can be

used as a reference source when studying the use of individual words in various contexts, so that

learners become familiar with the different ways to use particular words in suitable contexts.Other

than language-related information, encyclopedic information is also found in the BNC. Learners

perusing data from the BNC are also introduced to British cultural features and stereotypes.

REFERENCES

1.

Bednarek, Monika. 2008. Semantic preference and semantic prosody re-examined. Corpus

Linguistics and Linguistic Theory 4(2).119–40.

2.

Behrens, Heike (ed.) 2008. Corpora in language acquisition research: history, methods,

perspectives. Amsterdam, Philadelphia: John Benjamins

3.

Elena Tognini-Bonelli, Corpus Linguistics at Work. John Benjamins, 2001

4.

Geoffrey Leech, Marianne Hundt, Christian Mair, and Nicholas Smith, Change in

Contemporary English: A Grammatical Study. Cambridge University Press, 2012

5.

John McHardy Sinclair, How to Use Corpora in Language Teaching, John Benjamins,

2004

6.

Kübler, Sandra, and Zinsmeister, Heike. Corpus Linguistics and Linguistically Annotated

Corpora. Bloomsbury, 2015.

7.

The Author Language and Linguistics Compass 3 (2009): 1–17, 10.1111/j.1749-

818x.2009.00149.x Journal Compilation ª 2009 Blackwell Publishing Ltd

CORPUS LINGUISTICS AND THE BRITISH NATIONAL CORPUS (BNC)

Authors

DOI:

Abstract

Categories

Information

Issue

Section

Categories

Downloads

How to Cite

License