The importance of linguistic  module forms in the national corpus

Guli Toirova

doi:10.71337/inlibrary.uz.archive.30001

Авторы

Гули Тоирова
Бухарский Государственный Университет

DOI:

https://doi.org/10.71337/inlibrary.uz.archive.30001

Ключевые слова:

корпус орфографический модуль морфологический модуль лингвистический модуль словосочетательные модули алгоритм слова алгоритм формулы табличный алгоритм графический алгоритм.

Аннотация

В статье анализируется лингвистический модуль и алгоритм и его типы из независимых компонентов лингвистических программ. Необходимость в алгоритме фонологических, морфологических и орфографических правил для формирования лексико-грамматического кода научно обоснована. Подчеркивается важность таких лингвистических модулей, как фонология, морфология и орфография, в формировании лингвистической базы национального корпуса узбекского языка.

ELECTRONIC JOURNAL OF ACTUAL PROBLEMS OF MODERN SCIENCE, EDUCATION AND TRAINING. OCTOBER, 2020-V. ISSN 2181-9750

http://khorezmscience.uz

155

Uyingda cho’rilar bo’lay, alla.[1]

A mother who is ready to do anything to support her child is also ready to be a seat

for him. Acting as a chimney and even being a slave for her child is nothing for a
mother. All of these sacrifices show that the mother is ready for anything for the
pleasure of her child.

Conclusion.

Both language lullabies show that respect for mother and father is

highly valued. Lullaby examples are illustrated with linguoculturological elements.
These elements show English and Uzbek mentality, national identity and values,
culture and way of thinking. Ideas expressed in lullabies no matter which country they
belong to are innermost feelings and desires of mother. They teach children the culture
of his country with the help of lullabies.

The list of used literature

[1].

Abdumurod Tilavov. Eng sara allalar. Toshkent. Adabiyot uchqunlari. 2017. 36-

bet
[2]. Andrew Lang. The nursery rhyme book. William Clowes and songs. 1897.p148
[3]. Clarence Wesley Sumner. The birthright of babyhood. Chicago. Albert
Whitman&CO.1940.p30
[4]. Daiken, Leslie Herbert

, The Lullaby Book.

London.1959

[5]. Hasan Gunes, Nadide Gunes. The Effects of Lullabies on Children. International
Journal of Business and Social Science. April 2012. Volume 3 No. 7.p3
[6]. Irene Watt. An Ethnological Study of the Text, Performance, and Function of
Lullabies, PhD thesis, 2012. p256
[7]. Odette Chatham-Baker. Baby lore ceremonies, myths and traditions to celebrate a
baby birth. Macmillan Publishing Company. 1991.p34
[8]. Ohunjon Safarov. El suyarim, alla. Toshkent. O’zbekiston. 2009.72-bet
[9]. Ohunjon Safarov.O’zbek xalq allalari. ‘Alla-yo alla’.O’qituvchi.1999
[10].Yaqubbekova M.M.Жанровая специфика и поетические особенности
узбекских народных колыбелных песен “АЛЛА”. Aвтореферат.Ташкент 1990.8-
bet

UDC: 81`1:004=512.133

THE IMPORTANCE OF LINGUISTIC MODULE FORMS IN THE

NATIONAL CORPUS

Guli Toirova Ibragimovna

Associate Professor of the Department of

Uzbek Linguistics, Bukhara State University

PhD in рhilosophy, аssociate Professor

E-mail:

tugulijon@mail.ru

Abstract:

The state analyzes the linguistic module and the algorithm and its

types from independent components of the linguistic program code. The need for
algorithms for phonological, morphological and spelling rules for the formation of the

ELECTRONIC JOURNAL OF ACTUAL PROBLEMS OF MODERN SCIENCE, EDUCATION AND TRAINING. OCTOBER, 2020-V. ISSN 2181-9750

http://khorezmscience.uz

156

lexical and grammatical code is scientifically substantiated. The importance of such
linguistic modules as phonology, morphology and spelling in the formation of the
linguistic base of the national corpus of the Uzbek language is emphasized.

Keywords:

corpus, spelling module, morphological module, linguistic module,

word-combination modules, word algorithm, formula algorithm, tabular algorithm,
graphical algorithm.

Аннотация:

Мақолада лингвистик дастурларнинг мустақил таркибий

қисмларидан бир лингвистик модул ва алгоритм ҳамда унинг турлари таҳлил
қилинган. Лексик-грамматик кодни шакллантириш мақсадида фонологик,
морфонологик ва орфографик қоидалар алгоритми зарурияти илмий асосланган.
Ўзбек тили миллий корпусининг лингвистик базасини тузишда фонологик,
морфонологик ва орфографик каби лингвистик модулларнинг аҳамияти
ёритилган.

Калит сўзлар:

корпус, орфографик модул, морфологик модул, лингвистик

модул, сўз бирикмасининг модуллари, сўзли алгоритм, формулали алгоритм,
жадвалли алгоритм, графикли алгоритм.

Аннотация:

В статье анализируется лингвистический модуль и алгоритм и

его типы из независимых компонентов лингвистических программ.
Необходимость

в

алгоритме

фонологических,

морфологических

и

орфографических правил для формирования лексико-грамматического кода
научно обоснована. Подчеркивается важность таких лингвистических модулей,
как фонология, морфология и орфография, в формировании лингвистической
базы национального корпуса узбекского языка.

Ключевые слова:

корпус, орфографический модуль, морфологический

модуль, лингвистический модуль, словосочетательные модули, алгоритм слова,
алгоритм формулы, табличный алгоритм, графический алгоритм.

Introduction:

It is no secret that today's growth in developing countries is due to

many factors, including the process in innovation-advanced innovations, commitment
to timely implementation of technologies. Innovation is, in fact, the key to growth. As
a consequence of the event of new developments in research, the adoption of recent
words in language at the expense of external sources, the scale of their use is increasing
on a daily basis. In particular, we can see that Uzbek 's computer linguistics is getting
richer thanks to the words learned from international computer linguistics. Let's
observe the term "module" as an example.

This term is used in the field of informatics: “1) module - program file; 2) module

- an object that makes up the code; 3) module - a set of computer cooling systems; 4)
MOD is used in such senses as music file format ”[13], in mathematics:“ 1) absolute
height; 2) vector modules; 3) modulus of automorphism; 4) the coefficient of
conversion of a logarithm in one system to a logarithm in another system, as well as
the absolute value of the magnitude ”[9]. In the field of mechanics: “1) Young's
module; 2) modulus of elasticity; 3) displacement module ”[14]. Today, “a module is
a complete functional part of a program; modular teaching is modern education, ie step-
by-step teaching according to the level of knowledge ”[3,12].

The term "linguistic module" plays an important role in the field of computer

linguistics. For example, the conversion of natural language into a machine language ,

ELECTRONIC JOURNAL OF ACTUAL PROBLEMS OF MODERN SCIENCE, EDUCATION AND TRAINING. OCTOBER, 2020-V. ISSN 2181-9750

http://khorezmscience.uz

157

i.e. the development of ways to process text via a computer system. In this end,
linguistic programs in other languages are being created. The linguistic module is an
integral part of these linguistic programs. For example, if the lexical module is
surrounded by a dictionary layer (words), the grammatical module edits symbols,
punctuation, letters and other characters, the spelling rules of the spelling module, the
morphological module analyzes words (from word to lexeme analysis) and the
synthesis process (lexeme formation), the supersyntactic unit in the syntactic module-
the interconnecti phenomenon.

Literature review.

Analysis of the relevant literature. In her research, M.

Abzalova notes: "In order to obtain realistic results in the development of a linguistic
framework of word classes, first of all, the affixes that form them and their
combinations are attached to words and are the best way to reach the linguistic base."
We recommend using the following linguistic modules suggested by M. Abzalova in
the formation of the Uzbek Language National Corps:

“The affixes added to the key words in the modulation of the noun category are

defined as follows:

affix of affiliation: q_а= -

niki;

affix of place : u_j=

-dagi;

affix of limiting: ch_q[3]= {

-gacha, -kacha, -qacha

}

;

affix of plural: Pl_a=

-lar;

consonant affixes (with variants): k_a [7] = {-ning, -ni, -ga, -ka, -qa, -da, -dan};
possessive affixes: e_a [9] = {- m, -im, -ng, -ing, -lari, -miz, -imiz, -ngiz, -ingiz};

noun-forming affix: sh_y = -lik;
1st type affix of person-number category: sh_s1 [-man, -san, -miz, -siz; -simiz,

-sisiz]

affixes: -mi, -chi, -gina, -kina, -qina, -dir, -u, -yu, -da, -a, -ya.
The following examples can be given to the module of attaching the given

affixes to the core (A = base, N = derivative):1. N=A

q_а; боланики=

2. N=A

u_j; boladagi = bola

dagi

3. N=A

ch_а[1]; bolagacha= bola

gacha

4. N=A

Pl _a; bolalar= bola

lar

5. N=A

k_а[7]; bolaning= bola

ning

6. N=A

e_а[6]; bolam= bola m

7. N=A

k_a

e_а[6]; bolalarim= bola

lar

im

8. N=A

k_а[6]; bolamga = bola

m

ga

The modulation continues in this order” [2].

In the process of creating a national corpus in the Uzbek language, an optimum

version of M. Abzalova is being used. The algorithm of phonological , morphological
and orthographic rules shall be established in order to form a lexical-grammatical code
in the linguistic norms module of the Uzbek language phrases.

ELECTRONIC JOURNAL OF ACTUAL PROBLEMS OF MODERN SCIENCE, EDUCATION AND TRAINING. OCTOBER, 2020-V. ISSN 2181-9750

http://khorezmscience.uz

158

Methodology of research.

What's the [6] algorithm?

Algorithm, algorithm-a

clear rule (program) for the execution of actions in a certain order that are used to solve
problems of a particular type. One of the basic concepts for cybernetics and
mathematics. The rule that performed four arithmetic operations on a decimal number
system was called an algorithm in the Middle Ages. [15] The computer with its
computing power is fast, clean, accurate and at the same time "completely
incomprehensible"[7]. The idea that when we use it to solve a number of problems, the
computer invents something on its own is a mistake, and a clear and complete
instruction is needed for the computer to work. An algorithm is a rigidly set order that
performs the action needed to produce the final result. This may sound strange, but
we're always confronted with an algorithm in real life. An example of this is the use of
a payphone, which includes a sequence of actions required for a successful phone call.
The rules for the use of home appliances, etc., in a short, understandable way, tell us
what to do in one way or another, and determine the algorithm of our actions.
According to historians and mathematicians,[21] the word "algorithm" is derived from
the name of our great ancestor Abu Abdullah Muhammad ibn Musa al-Khwarizmi, and
his famous book "Kitab al-jabr wa al-muqabala" has given rise to another popular term
"algebra." It is fair to say that the basic algorithm for the production of instructions is
controlled in the process of computer-assisted activities. We can not, however, transfer
our records directly from the algorithm to the computer, because they are written in a
language that the computer does not understand, only people understand. For a
computer to understand an algorithm, it is translated into a machine language, just as
algorithms written in a machine language are called programs or computer programs.
Important features of the optional algorithm: the accuracy of the algorithm - the value
of each step, discreteness - the process of solving the problem can be divided into
several simple steps (execution steps) so as not to cause difficulties for the computer
or person, the publicity - usefulness of the algorithm - the end of the actions of the
algorithm, which allows to obtain the desired result with the initial data in the final
steps [20].

In practice, there are the following types of algorithms: linear-algorithm in which

actions are carried out sequentially, without any conditions being checked, branching-
algorithm in which instructions are predetermined by conditions change, cyclic-al-
algorithm in which individual processes or groups of processes are repeated. Methods
of writing algorithms are considered to be verbal, formulaic, tabular, graphical.

The information available serves as a raw material for the processing of

computers. In metallurgical production, that is, as metal ore is considered a raw
material. However, in order to be effective in processing, the optional raw material
must have an initial preparation. First, we collect information about the event we 're
interested in, then we systematize and classify this information. Next, we 're building
a module that represents a given event. The module represents an event using a special
mathematical device, graphics, diagrams. The module is structured to show the
characteristics and key aspects of the situation. Mathematical and simulation
modulation is also available. Mathematical modulation is the application of a
mathematical instrument to the study and expression of an event. The exact
mathematical module allows you to observe and analyze the status of an object.

ELECTRONIC JOURNAL OF ACTUAL PROBLEMS OF MODERN SCIENCE, EDUCATION AND TRAINING. OCTOBER, 2020-V. ISSN 2181-9750

http://khorezmscience.uz

159

Simulation modulation-mainly used in industry, allows you to perform a series of tests
on devices that do not exist in real time using computer equipment and special software.
The application of this modulation accelerates the production of raw materials, as the
construction and research process is reduced, the number of errors and their costs are
reduced. For example, Boeing declined to implement a long-standing plan for the
position of passenger seats, the development of natural cabin modules, and replacing
them with computer modules. This saved millions of dollars and reduced the time for
the production of new aircraft parts. Once the module is built, it moves to the step of
creating an algorithm that matches it. Problems that have been solved by algorithms.
In a computer language (machine code), the algorithm used to solve a problem in the
form of a series of commands is called a machine program. The command of a machine
program or machine is an elementary machine instruction that is executed
automatically without additional instructions and concepts. Programming is a
theoretical and practical program activity. The process of translating an algorithm into
a machine language is called compiling. The first step in "humanizing" machine
language was to create programs that convert symbolic names to machine code. Then
programs for converting arithmetic expressions were created, and finally, in 1958, the
Fortran translator, widely used in the programming language, came into being. Since
then, many programming languages have been developed. Computer processes
information by controlling machine program commands, using different data in the
process. The data used are divided into: 1. Incoming-inputs to the computer and is used
as a condition to solve the problem. 2. Current or internal-used to store and process
information in the program. 3. Output-data generated by the program as a result of the
processing of information : Text, graphics , video, etc. It could be visible. This means
that it is always important to create an algorithm for the creation of the national corpus
of the Uzbek language, as it is controlled in the process of computer work.

Analysis and results.

The national corpus of the Uzbek language is the lexical

unit that exists in the Uzbek language, such as synonyms , antonyms, homonyms,
assimilation words, hierarchies of words; it is necessary to be able to automatically
analyze the morphological structure of the word, the construction of the word, the
meaning of the word, its morphological features. In other words, in the process of
composing, lemming, marking the corpus, it is necessary, on the basis of individual
searches, to find and interpret those words which form part of the corpus in the texts.
In order to do this, the above-mentioned algorithm, linguistic modeling, must be carried
out. M. Abzalova 's research "Linguistic modules of the program for editing and
analyzing texts in the Uzbek language"[2], A. Eshmominov 's research" Synonymous
database of the Uzbek national corpus"[17], automatic analysis of the morphological
characteristics of words. It is necessary to use some parts of Sh. Khamroeva 's research
on "Linguistic bases for the creation of the author's corpus of the Uzbek language"[18],
N. Abdurahmanova 's research on" Linguistic support for the program for the
translation of English texts into Uzbek"[1].

“Dictionary of synonyms of Uzbek language”, “Explanatory dictionary of Uzbek

words”, “Dictionary of obsolete words of Uzbek language”, “Dictionary of synonyms
of Uzbek language”, “Dictionary of Uzbek words”, “Dictionary of synonyms of Uzbek
language” "Dictionary of contradictory words of the Uzbek language", "Dictionary of

ELECTRONIC JOURNAL OF ACTUAL PROBLEMS OF MODERN SCIENCE, EDUCATION AND TRAINING. OCTOBER, 2020-V. ISSN 2181-9750

http://khorezmscience.uz

160

word classification of the Uzbek language", "Educational etymological dictionary of
the Uzbek language", "Educational toponymic dictionary of the Uzbek language" can
serve as a linguistic support. Only such dictionaries are reworked, lemma words;
depending on the nature of the words, it is necessary to delimit their series and connect
the members of the lemma series with each other. Only then can the revised dictionary
form the basis of the software for the programmer.

In the final stage, texts prepared with meta-metric and morphological markings

undergo several more automatic transformations. The following programs written in
“Perl” language are used:

1)

The converter converts the working format of the socket to the final

format. The converter converts the morphological analysis in parentheses to the correct
format <w lex =… ..gr =….>. It also checks for some spelling errors in order to further
improve the quality of the search engine, translates the name into Latin, adds
insufficient characters, identifies different forms of the verb;

2)

Semantic markup program (Semmarkup).

The program adds basic

semantic characters to words using a special semantic dictionary. This method makes
semantic search in the corpus much easier. The semantic dictionary is formalized in
the form of a table, the first column contains a lexeme and a phrase, and the remaining
columns contain semantic symbols. After the program compares the morphological
characters of the word with the dictionary and finds similarities, it copies the semantic
characters in the sem attribute of the <w> tag. In multi-character words, however,
certain errors may occur in the semantic search;;

3)

Statistical programs (Gramstat, Metastat).

These programs are

designed to collect statistics on the distribution of grammatical and metamaterial
characters in texts. This method allows you to quickly find errors in the characters. The

gramstat

program allows distribution in morphological analysis (lexeme, word group,

lexeme, and grammatical features of word form) for individual parts.

The above technology helps automate complex processes for the preparation of

corpus texts. Some operations (cleansing of text, removing homonymy, metametric)
are not automated at all, but a number of service tools have been developed for these
operations, which makes it much easier. From the start the data was deliberately easy
to encode so that the additional marks did not interfere with the text edition. The
complex formatted output format takes place in the last stage automatically.

The Russian National Corps, the Modern American English Corps, Oxford English

Corps and Czech National Corps have been established worldwide. Uzbekistan has,
however, not created a linguistic foundation. Ziyonet does not work at the system to
process text automatically and perform searches based on different characteristics from
the text although it currently has an electronic library. It is not meant for vocabulary or
language learning. The text can not be heard aloud. A system of automatic processing
of texts and searches based on several characteristics is established in the national
corpus program, the database. Word, phrases and combinations that are rarely used are
very easy to find, use and spell (spell) from. This allows the learner to hear the text
aloud. This opens up the possibility for directional education. A key role for the div
is to mark or to identify (linguistic analysis). Marking means separating special tags
into texted and their components in linguistic and extra-linguistic terms. Currently,

ELECTRONIC JOURNAL OF ACTUAL PROBLEMS OF MODERN SCIENCE, EDUCATION AND TRAINING. OCTOBER, 2020-V. ISSN 2181-9750

http://khorezmscience.uz

161

there are the following types of markups: morphological, semantic, syntactic,
anaphoric, prosodic, discrete, and others [11]. An extralinguistic mark is distinguished
by the following features: a mark that reflects the specificity of the text format (chapter,
paragraph, section, etc.) and a mark that represents the information belonging to its
author.

Analysis and results.

Most modern layout languages are based on SGML / XML,

in which the defined text covers two parallel data layers: visible (text itself) and hidden
(tagged or marked) [11]. In this case, the hidden part of the information is placed inside
the text, but special markers <…> are included, which, in turn, separate it from the
visible text. Unlike external methods of annotation writing (e.g. comments), the
markup is always incorporated into the text and is an integral part of it. Subsequent
levels of structural analysis are used by some corporations. In particular, some small
corpuscles will be connected on the basis of a complete syntactic analysis. Such cases
are usually characterized by a profoundly interpreted or syntactic structure. For
example, a syntactic markup is like a large tree in itself. We know that manual analysis
of texts is a valuable and time-consuming task. Currently , various software analysis
tools are available on Russian and foreign sites, which are open (directly) accessible.
They are individual, i.e. independent and subdivided into websites. In this case, it
should be noted that in recent years, developers have focused on web applications.
These systems have several advantages: the ability to analyze (mark) a single document
by multiple users at once does not require the installation of additional software, but
with the exception of the browser, access rights are limited, and the marking process
can be monitored. In particular, let's pay attention to the process of analyzing the text
from the story "Speech" by A.Qahhor. Text goes as following: “

You don't love me, you

're not happy with our marriage, I've been waiting until this hour, this minute, you
haven't said a word, it's been a year since we put our heads on a pillow ...

The speaker really forgot about it, but he was talking.”

The text mentioned above is distinguished by the following features:

1-table

№

Type according to the sentence structure

1.

[simple sentence] <SS>, </SG>

2.

[organized speech]

3.

[complex sentence] <QG>, </QG>

№

The type of sentence used for the purpose of expression

1.

[darak gap]

<dg>

2.

[so’roq gap]

<sg>

3.

[buyruq gap]

<bg>

№

Depending on whether or not the owner is represented in the
linguistic construction of the speech

1.

[egali gap]

<Е+>

2.

[egasiz gap]

<E->

[shaxsi nomalum gap]

<sh.n.g>

[atov gap]

<a.g>

[semantic-funksional
gap]

<s.f.g>

ELECTRONIC JOURNAL OF ACTUAL PROBLEMS OF MODERN SCIENCE, EDUCATION AND TRAINING. OCTOBER, 2020-V. ISSN 2181-9750

http://khorezmscience.uz

162

№

According to the participation of the primary and

secondary segment

1.

[yigik gap]

<yg>

2.

[yoyiq gap]

<yg>

№

According to the presence of parts that do not make

grammatical connection with the sentence

1.

[undalma]

2.

[kiritma]

The morphological marking system includes word, lemma, and tag. A word form

is a morphological unit in a selected text. The first step in marking a word is to lemma
it, that is, to bring out the lexeme form of the word. The most difficult step in marking
inflected languages is lemmatization, that is, attaching the lexeme form of a word to a
word as a tag. Because we know that in inflected languages the grammatical meaning
of the word is mixed with the core of the word. Unlike inflected languages, the process
of lemma in agglutinative language is much easier [4]. Initially, the analysis options
for word forms are given in the form of a list, by selecting the correct option or editing
the existing option. The editor makes it easy to navigate the text and make global
changes and alterations. Thus, the marking application falls into a familiar environment
and makes effective use of all the features of this editor. For the purpose of visual
separation, different elements of the text are decorated in different colors and styles.
Particularly,

—Analysis of the layout and the command variant is formalized in the form of

hidden text and is usually not visible in normal mode;

—word forms are formalized in different colors depending on the number of

analysis options: zero, one or more.

The grammatically impersonal part of the word is the same as the stem or base

lemma. The mark is given in the character <*> of the lemma. If the lemma in all the
word categories is based on this principle, that is, the principle that "the root part of the
word is equal to the lemma," the verb lemma II in the verb group is given in the form
of an imperative mood. In dictionary articles, the verb is given in the form of an action
name: <go>. However, this form is not appropriate for the corpus because the text in
the corpus is searching for the <bar> form, not the <go> form of the word. The verb
lemma is therefore given as <taught>, not <be>, shown as <blind>, received as
<received> [17]. The marking process requires writing 5 to 10, sometimes even more,
morphological tags (comments) for each word.

The main advantage of SGML / XML compared to other layout languages (TEX,

RTF) is that it has strict syntax of markup commands, differentiating attributes and
elements, clear indication of element boundaries, self-documentation, automatic
verification of grammatically correct entry.

The most authoritative standards for corpus data encoding are: TEI (Text Encoding

Initiative)[5], CES (XML Corpus Encoding Standard)[8], EAGLES (European
Advisory Group on Language Engineering Standards)[10]. In particular, TEI is
recognized as a well-developed standard, defining the rules for the expression of
different types of texts and textual information elements, with particular emphasis on:

ELECTRONIC JOURNAL OF ACTUAL PROBLEMS OF MODERN SCIENCE, EDUCATION AND TRAINING. OCTOBER, 2020-V. ISSN 2181-9750

http://khorezmscience.uz

163

structure, title, style of speech (prose, poetry , drama), pages, quotations, footnotes or
links (footnotes, comments), corrections, tables, formulas, specific characters
(characters), linguistic annotations, etc. The special title of the standard shall be subject
to the rules for the coding of the case. Although TEI is not specifically tailored for
corpus applications, it often works in conjunction with similar standards. For example,
the British National Corpus (BNC), the Czech National Corps, the Hungarian National
Corps, etc. The XCES standard is an advanced application of TEI, designed solely for
the corpus and intended to identify specific labels specific to the corpus.

But when we studied the TEI and XCES universal standards in detail, we found

that they were too complex, unnecessary, and inconvenient for text mass marking. The
full provisions of the TEI are very broad and not always reasonable, and it is therefore
difficult enough to comply with all the requirements of this standard. The format is not
compact, and the size of the content is usually increased. The format loses its clarity
function, for example, it is suggested that meta-attributes be written in the form of text
in the tag, so that when the markup is removed, the original text returns to its original
state, error occurs.

You can also restrict yourself to TEI applications by rejecting "redundant" tags.

The minimum set of tags is selected from the TEI to represent the div: <text> -text,
<p> -header, <s> -word, <w> -word, and morphological analysis is written in the form
of <w ana = ...> attribute. However, such an appearance does not fully comply with the
standard of the housing layout. This view is reminiscent of a simplified HTML version.

The complexity of XML formats is not the main problem, but the complete lack of

popular programs such as preparation, processing, indexing and searching, which is a
major problem. Linguists have relatively simple programs available to them. Among
them: XML-analysts, editors, converters, linear search programs are widely used. It
turns out that such a set of programs is not enough for a corps with a volume of millions
of words. Of course, tasks such as preparing the internal problems and markings of the
case can be solved with the help of specially written converters, macros and other tools.

The data representation format in the case is developed based on existing coding

standards (TEI, XCES). HTML belongs to the SGML / XML family, is the most
common format, and can be used in many applications [19]. Today, search engines
have the ability to understand the semantics and structure of HTML tags.

HTML is a very simple format that provides minimum requirements in terms of

content and layout size, and is not able to use many commands in practice. It's a very
convenient and compact format for manual editing and visual perception. Typically,
when displaying language units, there are no tags in the standard itself, but HTML can
allow non-standard tags to be used, and this problem is resolved through a special setup
(correction) of the search server.

The corpus format has a number of HTML languages, with some special tags

attached for linguistic units. This format specifies the coding requirements for
important text information and includes:

1) meta text attributes;
2) text structure elements (title, paragraph, poems, footnote or link (footnotes,

comments) and tables at the bottom of the page);

3) linguistic units (sentences, words);

ELECTRONIC JOURNAL OF ACTUAL PROBLEMS OF MODERN SCIENCE, EDUCATION AND TRAINING. OCTOBER, 2020-V. ISSN 2181-9750

http://khorezmscience.uz

164

4) lexical information (grammatical, semantic signs);
5) text formatting parameters, special characters, etc [20].
Meta text attributes are written in texts in different situations, so that steps 2 and 3

can be done in parallel or arbitrarily. But the text must have the name of the file
identified and recorded. It does not perform any actions, such as renaming a single
connection or file, as such actions could disrupt the operation of the entire system. For
the purpose of storing metadata, simple Excel spreadsheets with a predefined structure
are used, with the first column containing the name of the file (clearly specified path)
and the other columns with metamata attributes and process information. This allows
you to use Excel's built-in tools effectively and makes the search engine much easier.
For example, search, filtering, analysis and data processing (to-do list, auto-filling,
statistics). In this case, the tables must be stored in a text format, and this format must
be understood by Excel. This allows the file stored in the spreadsheet view to accept
not only Excel but also other spreadsheet programs and increase the runtime efficiency.

Theoretically, metadata can be stored separately from each text, but according to

the HTML rules, the data must be stored in the file header so that the Yandex-server
can index the data. When storing metadata in separate memory, there is always a
problem of synchronization, meta-tables, and text interactions with each other.

Suggestions

. The following methods are used to store metadata in separate memory:

1)

The

metas

table creates meta-table headers by collecting meta-text attributes from the

file headers. In Excel, it can be modified manually. At the initial processing stage, some
metadata can be added to the text, such as the author's name, title and date of creation.
At the final stage, the Metas.bat program collects all attributes and completes the
verification phase.

2)

Meta.txt takes the meta text attributes from the modified meta-tables and transfers them
to the existing text. This program checks the availability of the file and updates the
title. In the tables, most attribute actions are separated by a" "symbol. When the text is
changed, each action will appear as a separate attribute. Metamata attributes can
therefore move freely between text and meta-tables. Meta-metric, on the other hand,
will need to be carried out interactively with several cycles of verification.

3)

MetaTest checks the accuracy of the meta-table. In this case, the actions of the attribute
in the normative table are compared with those shown in the templates. The program
identifies incorrect actions with a "#" character and can be checked and corrected
manually.
All the above programs are done in Perl.

At the final stage of processing, texts prepared with meta-metric and morphological

markings undergo several more automatic transformations. The converter checks for
some markup errors in order to further improve the quality of the search engine by
converting the morphological analysis in parentheses to the correct format <w lex =…
..gr =….>.

The semantic markup program adds basic semantic characters to words using a

special semantic dictionary. This method has the property of greatly facilitating
semantic search in the corpus. The semantic dictionary is formalized in the form of a
table, the first column contains a lexeme and a phrase, and the remaining columns
contain semantic symbols. After the program compares the morphological characters

ELECTRONIC JOURNAL OF ACTUAL PROBLEMS OF MODERN SCIENCE, EDUCATION AND TRAINING. OCTOBER, 2020-V. ISSN 2181-9750

http://khorezmscience.uz

165

of the word with the dictionary and finds similarities, it copies the semantic characters
in the sem attribute of the <w> tag. In multi-character words, however, various errors
can occur in semantic search.

The above technology helps to automate complex operations in the preparation of

texts for the corpus. Some operations are not automated at all (clearing texts, removing
homonymy, meta-metric), but a set of service tools has been developed for such
operations, which makes it much easier. From the very beginning, the data encoding
format is developed in a special simple form. As a result, a complex layout
development format occurs automatically at the final stage.

Conclusion:

In conclusion, it should be noted that the role of linguistic

modulation in the formation of the national div 's linguistic base is incomparable. It
is therefore necessary to create an algorithm as a basis for the production of controlled
instructions in the computer process. It is important to develop specific linguistic
module forms by marking each word group in the development of a morphological
marking algorithm.

Given that increasing the international status of the Uzbek language, raising it to

the level of a world language of communication, learning and teaching Uzbek abroad,
and expanding and polishing the capabilities of our national language directly through
the national div, the practical significance of the work will be a key factor for
development and survival.

References

1. Abduraxmonova N.Z. Linguistic support of the program for translating English texts
into Uzbek (on the example of simple sentences): Doctor of Philosophy (PhD) il dis.
aftoref. - Tashkent, 2018.
2. Abjalova M. Linguistic modules of the program of editing and analyzing texts in the
Uzbek language (for the program of editing texts in official and scientific style): Doctor
of Philosophy (PhD)… dis. – Fergana, 2019. – P.22.
3. Avliyokulov N.X. Technology of modular teaching of professional sciences. - T.:
Yangi asr avlodi, 2004. –106 p Stepanov A.N. 6.3. Archiving of file objects //
Informatics: basic course: for students of humanities specialties of universities. - Peter,
2010. - 719 p.
4. Vanyushkin A. S., Grashchenko L. A. Assessment of algorithms for the selection of
key words: tools and resources // New information technologies in automated systems.
- 2017. - № 20. - S.. 95–102.
5. Zakharov V.P. Corpus Linguistics: Uchebno-metod. posobie. - SPb., 2005. - 48 p.
6. Kasyanov V. N., Kasyanova E.V. Introduction to programming. -
http://pco.iis.nsk.su/ICP
7. Kasyanova E.V. Yazyk programming Zonnon for platforms .NET // Programmnye
sredstva i matematicheskie osnovy informatiki. - Novosibirsk: ISI SO RAN, 2004. -
P.189–205.
8. Kutuzov A.B. Corpus linguistics. - (Electronic resource): License Creative commons
Attribution Share-Alike 3.0 Unported (Electronic resource) - //lab314.brsu.by/kmp-
lite/kmp-video/CL/CorporeLingva.pdf
9. Manturov O.V. and dr. Explanatory dictionary of mathematical terms. –M .:
Prosveshchenie, 1965. - 509 p.

ELECTRONIC JOURNAL OF ACTUAL PROBLEMS OF MODERN SCIENCE, EDUCATION AND TRAINING. OCTOBER, 2020-V. ISSN 2181-9750

http://khorezmscience.uz

166

10. Melchuk I.A. Poryadok slov pri avtomaticheskom sinteze russkogo slova
(predvaritelnыe soobshcheniya) // Nauchno –texnicheskaya informatsiya. 1985, №12.
-S.12-36.
11. Nedoshivina E.V. Programs for working with corpus texts: a review of the main
corpus managers. Uchebno-metodicheskoe posobie. - St. Petersburg. - 2006. 26 p.
12. Safarova R.G. and b. Classification of pedagogical technologies used in the process
of modular teaching in general secondary schools. / Methodical manual. - T .: State
Scientific Publishing House "National Encyclopedia of Uzbekistan", 2016. –176 p.
13. Stepanov A.N. 6.3. Archiving of file objects // Informatics: basic course: for
students of humanities specialties of universities. - Peter, 2010. - 719 p.
14. Explanatory dictionary on theoretical mechanics. –M .: MFTI. 2007. – 68 p.
15. Toirova G. About the technological process of creating a national corps. // Foreign
languages in Uzbekistan. Electronic scientific-methodical journal. - Tashkent. 2020,
№ 2 (31), –B.57– 64. https://journal.fledu.uz/uz/ 2-31-2020
16. National encyclopedia of Uzbekistan. 5 volumes. Volume 1 - Tashkent: State
Scientific Publishing House of the National Encyclopedia of Uzbekistan, –2006. –
B.201.
17. Eshmo'minov A. Dictionary of synonyms of the National Corps of the Uzbek
language: Doctor of Philosophy (PhD) in Ph.D. aftoref. - Karshi, 2019.
18. Hamroeva Sh. Linguistic bases of creation of the author's corpus of the Uzbek
language: Doctor of Philosophy (PhD) in philology. aftoref. –Karshi, 2018. – 52 p.
19. Leech G. The State of Art in Corpus Linguistics // English Corpus Linguistics /
Aimer K., Altenberg K. (eds.) - London, 1991. - P. 8-29.
20. Fries Ch.C. The structure of English. An introduction to the construction of English
sentences. - L., 1969. – S.98.
21. Zemanek H. Lecture Notes in Computer Sciece 122 (1981), 1-81 [elek.res.]
Http://elganzua124.github.io/ taocp / OEBPS / Text / ch01.html

UDC: 811.44

PRAGMATIC BASIS OF ADVERTISING COMMUNICATION

Muminova Aziza Arslanovna,

Doctor of Philosophy (PhD) philology

Uzbekistan State World Language University

E-mail:

azikamadi@mail.ru

Abstract:

This article discusses the pragmatic foundations of advertising

communication, the main theoretical provisions related to the analysis of the pragmatic
foundations of advertising communication. It also talks about the theory of speech acts,
analyzes the pragmatic factors of advertising discourse, their influence on the content
of the advertising text.

Key words

: communication, information, pragmatics, communication, speech

act, advertising, addressee, destination.

Библиографические ссылки

Abduraxmonova N.Z. Linguistic support of the program for translating English texts into Uzbek (on the example of simple sentences): Doctor of Philosophy (PhD) il dis. aftoref. - Tashkent, 2018.

Abjalova M. Linguistic modules of the program of editing and analyzing texts in the Uzbek language (for the program of editing texts in official and scientific style): Doctor of Philosophy (PhD)… dis. – Fergana, 2019. – P.22.

Avliyokulov N.X. Technology of modular teaching of professional sciences. - T.: Yangi asr avlodi, 2004. –106 p Stepanov A.N. 6.3. Archiving of file objects // Informatics: basic course: for students of humanities specialties of universities. - Peter, 2010. - 719 p.

Vanyushkin A. S., Grashchenko L. A. Assessment of algorithms for the selection of key words: tools and resources // New information technologies in automated systems. - 2017. - № 20. - S.. 95–102.

Zakharov V.P. Corpus Linguistics: Uchebno-metod. posobie. - SPb., 2005. - 48 p.

Kasyanov V. N., Kasyanova E.V. Introduction to programming. - http://pco.iis.nsk.su/ICP

Kasyanova E.V. Yazyk programming Zonnon for platforms .NET // Programmnye sredstva i matematicheskie osnovy informatiki. - Novosibirsk: ISI SO RAN, 2004. - P.189–205.

Kutuzov A.B. Corpus linguistics. - (Electronic resource): License Creative commons Attribution Share-Alike 3.0 Unported (Electronic resource) - //lab314.brsu.by/kmp-lite/kmp-video/CL/CorporeLingva.pdf

Manturov O.V. and dr. Explanatory dictionary of mathematical terms. –M .: Prosveshchenie, 1965. - 509 p.

Melchuk I.A. Poryadok slov pri avtomaticheskom sinteze russkogo slova (predvaritelnыe soobshcheniya) // Nauchno –texnicheskaya informatsiya. 1985, №12. -S.12-36.

Nedoshivina E.V. Programs for working with corpus texts: a review of the main corpus managers. Uchebno-metodicheskoe posobie. - St. Petersburg. - 2006. 26 p.

Safarova R.G. and b. Classification of pedagogical technologies used in the process of modular teaching in general secondary schools. / Methodical manual. - T .: State Scientific Publishing House "National Encyclopedia of Uzbekistan", 2016. –176 p.

Stepanov A.N. 6.3. Archiving of file objects // Informatics: basic course: for students of humanities specialties of universities. - Peter, 2010. - 719 p.

Explanatory dictionary on theoretical mechanics. –M .: MFTI. 2007. – 68 p.

Toirova G. About the technological process of creating a national corps. // Foreign languages in Uzbekistan. Electronic scientific-methodical journal. - Tashkent. 2020, № 2 (31), –B.57– 64. https://journal.fledu.uz/uz/ 2-31-2020

National encyclopedia of Uzbekistan. 5 volumes. Volume 1 - Tashkent: State Scientific Publishing House of the National Encyclopedia of Uzbekistan, –2006. – B.201.

Eshmo'minov A. Dictionary of synonyms of the National Corps of the Uzbek language: Doctor of Philosophy (PhD) in Ph.D. aftoref. - Karshi, 2019.

Hamroeva Sh. Linguistic bases of creation of the author's corpus of the Uzbek language: Doctor of Philosophy (PhD) in philology. aftoref. –Karshi, 2018. – 52 p.

Leech G. The State of Art in Corpus Linguistics // English Corpus Linguistics / Aimer K., Altenberg K. (eds.) - London, 1991. - P. 8-29.

Fries Ch.C. The structure of English. An introduction to the construction of English sentences. - L., 1969. – S.98.

Zemanek H. Lecture Notes in Computer Sciece 122 (1981), 1-81 [elek.res.] Http://elganzua124.github.io/ taocp / OEBPS / Text / ch01.html