SYNTACTIC ANALYSIS IN COMPUTATIONAL LINGUISTICS: EXPLORING COLLOCATIONS AND VALENCY

Nurjaxon  Nurmuxammedova; Rustam  Asadov

doi:10.71337/inlibrary.uz.ijai.70616

Authors

Nurjaxon Nurmuxammedova
Samarkand State Institute of Foreign Languages
Rustam Asadov
Samarkand State Institute of Foreign Languages

DOI:

https://doi.org/10.71337/inlibrary.uz.ijai.70616

Abstract

This article examines the significance of syntactic analysis in computational linguistics, focusing specifically on the concepts of collocations and valence. It discusses how these linguistic phenomena contribute to the understanding of sentence structure and meaning, and highlights their applications in natural language processing (NLP) and artificial intelligence (AI). The article also outlines recent advancements in the field and suggests directions for future research.

INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE

ISSN: 2692-5206, Impact Factor: 12,23

American Academic publishers, volume 05, issue 02,2025

Journal:

https://www.academicpublishers.org/journals/index.php/ijai

page 925

SYNTACTIC ANALYSIS IN COMPUTATIONAL LINGUISTICS: EXPLORING

COLLOCATIONS AND VALENCY

Nurmuxammedova Nurjaxon Istam kizi

Master's student at Samarkand State Institute of Foreign Languages

nurmuxammedovanurjaxon@gmail.com

Asadov Rustam Muminovich

Samarkand State Institute of Foreign Languages PhD, dotsent

rustamasadov1972@gmail.com

Annotation:

This article examines the significance of syntactic analysis in computational

linguistics, focusing specifically on the concepts of collocations and valence. It discusses how

these linguistic phenomena contribute to the understanding of sentence structure and meaning,

and highlights their applications in natural language processing (NLP) and artificial intelligence

(AI). The article also outlines recent advancements in the field and suggests directions for future

research.

Keywords:

Syntactic analysis, computer linguistics, collocations, valiancy, natural language

processing, language models.

Introduction

Syntactic analysis is a fundamental component of computer linguistics, which seeks to

understand and model human language through computational means. The structure of

sentences—how words combine and interact—is essential for various applications, including

machine translation, information retrieval, and text mining. Among the critical aspects of

syntactic analysis are collocations and valence.

Collocations refer to the habitual co-occurrence of words that form phrases commonly

used by native speakers, while valence pertains to the capacity of verbs to combine with a

specific number and type of arguments. This article explores the roles of collocations and

valence in enhancing syntactic analysis, elucidating their importance in natural language

processing tasks and their implications for future research.

In December 2003, the World Summit dedicated to the problem of building an

information society was held in Geneva (Switzerland). The following slogan was introduced:

"Building an information society is the global challenge of the new millennium." Two important

documents were adopted at the summit: the Declaration on the principles of building an

information society and the work plan for building an information society. According to these

documents, the development of society will be closely related to computer technologies. It

should be noted here that computer linguistics is considered the most important direction that

defines the perspective of information technologies and it will be of decisive importance in the

development of the information society.

INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE

ISSN: 2692-5206, Impact Factor: 12,23

American Academic publishers, volume 05, issue 02,2025

Journal:

https://www.academicpublishers.org/journals/index.php/ijai

page 926

Main Part

One of the most promising areas of computer linguistics is artificial intelligence. On the

pages of Wikipedia, which is considered a virtual dictionary, artificial intelligence is defined as

follows: "Artificial intelligence is a feature of a computer or a robot that is mainly aimed at

solving issues related to human mental activity, in particular, thinking, understanding the content

of speech, and summarizing data. This term is also used for a branch of computer technology

related to the development of systems with these features."

Until the 70s of the 20th century, research on artificial intelligence was carried out within

the framework of cybernetics and informatics. Since the 80s and 90s of the 20th century,

artificial intelligence has become the object of study of many sciences. In particular, such

sciences as neurolinguistics, psychology, informatics, neurophysiology, epistemology (the

doctrine of knowledge in philosophy), cognitology, cognitive linguistics, and computer

linguistics deal with the problem of artificial intelligence.

Computational linguistics is a logical continuation of mathematical linguistics, which is

the most important part of applied linguistics. Computational linguistics began to form as a field

in 1954 at Georgetown University in the United States during the world's first experiment on

machine translation, and by 1960 it was formed as an independent science. Computer linguistics

is a copy of the English word "computational linguistics". Until the 80s of the 20th century, this

science was called by different names: computational linguistics, mathematical linguistics,

quantitative linguistics, and engineering linguistics. The main goal of this science is the

development of computer programs for solving linguistic problems, optimization of human-

machine (computer) communication, natural language processing. NLP involves computer

analysis and synthesis of natural languages in computational linguistics. In this case,

analysis refers to the computer understanding of natural language using morphological, syntactic

and semantic analysis, and synthesis means the grammatical formation and generation

(production) of text in a computer. Software developed for NLP are: AlchemyAPI, Expert

System S.p.A.. General Architecture for Text Engineering (GATE), Modular Audio Recognition

Framework, Monty Lingua, Natural Language Toolkit (NLTK). Computational linguistics has

many functions.

Collocations are combinations of words that frequently occur together in a language.

Word combinations are syntactic units formed from the combination of words based on certain

logical and grammatical rules. It is clear from this that the internal meaning, external

grammatical compatibility and their appropriateness are required for the words to enter into a

relationship. They can be classified into several types, including:

• Adjective + Noun: e.g., "strong coffee," "heavy rain."
• Verb + Noun: e.g., "make a decision," "give a presentation."
• Noun + Noun: e.g., "data analysis," "traffic jam."
Syntactic parsing is the automatic study of natural language syntactic structure, particularly

tagging spans of constituents (in constituency grammar) and syntactic interactions (in

dependency grammar). The challenge of structural ambiguity in natural language serves as its

INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE

ISSN: 2692-5206, Impact Factor: 12,23

American Academic publishers, volume 05, issue 02,2025

Journal:

https://www.academicpublishers.org/journals/index.php/ijai

page 927

impetus. A sentence may be assigned more than one grammatical parse, necessitating knowledge

beyond computational grammar rules to determine which parse is meant. One of the key jobs in

computational linguistics and natural language processing is syntactic parsing, which has been

studied since the middle of the 20th century when computers were invented.

Syntactic analysis in computational linguistics converts text into computer-process able

structured representations using algorithms. Computer science processes like code compilation

and natural language processing tasks like sentiment analysis depend on parsing algorithms.

Because it can adequately express the majority of computer language syntax, the context-free

grammar is frequently employed. For instance, the majority of contemporary programming

languages, including Python, Java, and C++, implement parsers using context-free grammars. A

collection of recursive rules or productions that specify which symbol strings make up

syntactically proper strings in the language are used to define these grammars.

Different formalisms are proposed by various grammar theories to describe the syntactic

structure of sentences. These formalisms can be categorized as constituency grammars or

dependency grammars for computational purposes. Different methods are needed for parsers for

each class, and different strategies have been used to tackle the two issues. Together with the

development of new parsing algorithms and techniques, the process of creating human-annotated

treebanks using different formalisms (such as Universal Dependencies) has continued.

Picture 1. Levels of syntactic analysis

Part-of-speech tagging (which resolves some semantic ambiguity) is a related problem,

and often a prerequisite for or a sub problem of syntactic parsing. Syntactic parses can be used

for information extraction (e.g. event parsing, semantic role labelling, entity labelling) and may

be further used to extract formal semantic representations.

INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE

ISSN: 2692-5206, Impact Factor: 12,23

American Academic publishers, volume 05, issue 02,2025

Journal:

https://www.academicpublishers.org/journals/index.php/ijai

page 928

Constituency parsing analyzes sentences by breaking them down into hierarchical

constituents (phrases), revealing the nested structure of a sentence. For example, a sentence

might be divided into a noun phrase and a verb phrase. Constituency parsing involves parsing in

accordance with constituency grammar formalisms, such as Minimalism or the formalism of the

Penn Treebank. This, at the very least, means telling which spans are constituents (e.g. [The man]

is here.) and what kind of constituent it is (e.g. [The man] is a noun phrase) on the basis of a

context-free grammar (CFG) which encodes rules for constituent formation and merging.

Dependency parsing focuses on the direct grammatical relationships (dependencies)

between individual words, representing these relationships as a directed graph. Parsing using a

dependency grammar formalism, such Universal Dependencies (a project that creates

multilingual dependency treebanks), is known as dependency parsing. By giving each token a

head (or several heads in some formalisms, such as Enhanced Dependencies, for example, in the

case of coordination) and a matching dependence relation for each edge, a tree or graph covering

the entire phrase is ultimately created. There are basically three recent paradigms for describing

dependency parsing: transition-based, grammar-based, and graph-based.

These techniques form the basis for extracting a deeper level of linguistic information,

such as identifying collocations and determining valiancy.

In natural language processing (NLP), collocations play a crucial role in:



Natural language generation. Algorithms that generate text must select words that

conform to native usage patterns. By incorporating collocation data, systems can produce

more fluent and contextually appropriate sentences.



Semantic understanding. Collocations often convey meanings that are not directly

inferred from individual words. For example, the phrase "kick the bucket" signifies death,

illustrating how collocations can alter semantic interpretation.



Language modeling. PModern language models utilize collocation data to enhance

predictions about word sequences. This is vital for applications such as speech

recognition and text completion, where accuracy in word choice significantly impacts

user experience.

INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE

ISSN: 2692-5206, Impact Factor: 12,23

American Academic publishers, volume 05, issue 02,2025

Journal:

https://www.academicpublishers.org/journals/index.php/ijai

page 929

Picture 2. The grammar structure of sentence

Picture 3. The syntactic analysis in computer linguistics

Valence refers to the capacity of verbs to combine with a specific number and type of arguments.

Understanding valence is essential for parsing sentences accurately.
Valence can be categorized into three main types:



Intransitive Verbs: Require only a subject (e.g., "He sleeps").



Transitive Verbs: Require both a subject and an object (e.g., "She reads a book").



Ditransitive Verbs: Require a subject, a direct object, and an indirect object (e.g., "He

gave her a gift").

Valence analysis aids in:

• Argument Structure Identification. Different verbs have unique argument structures that

dictate how they interact with other sentence elements. Recognizing these patterns allows for

more accurate parsing.

• Semantic Role Labeling. By understanding valence, systems can identify the roles

played by different constituents within a sentence (e.g., agent, patient), which is crucial for tasks

like information extraction and question answering.

• Grammar Induction. Knowledge of valence contributes to developing grammar models

capable of generalizing across languages and dialects, enhancing multilingual NLP systems.

INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE

ISSN: 2692-5206, Impact Factor: 12,23

American Academic publishers, volume 05, issue 02,2025

Journal:

https://www.academicpublishers.org/journals/index.php/ijai

page 930

Recent advancements in machine learning and deep learning have significantly improved

the ability to analyse collocations and valence. Techniques such as word embedding’s (e.g.,

Word2Vec, GloVe) capture semantic relationships between words, facilitating better

identification of collocations. Additionally, neural network architectures (e.g., Transformers)

have enhanced the modelling of syntactic structures by enabling context-aware representations.

Future research should focus on:
• Cross-Linguistic Studies. Investigating how collocations and valence differ across

languages can inform machine translation systems and improve their accuracy.

• Integration with AI Systems. As AI becomes more integrated into everyday applications

(e.g., chatbots), understanding syntactic nuances through collocation and valency analysis will be

critical for enhancing user interactions.

• Exploration of Contextual Variability. Examining how context influences collocation

usage and verb valency can lead to more sophisticated language models that adapt to varying

linguistic environments.

Conclusion

In conclusion, syntactic analysis is integral to the field of computational linguistics, with

collocations and valence serving as pivotal components in understanding sentence structure and

meaning. Their relevance spans various applications within natural language processing, from

improving text generation to enhancing semantic understanding. As research continues to

advance, further exploration of these concepts will lead to more robust language models and

improved AI systems capable of interacting with human language more naturally and effectively.

The ongoing study of collocations and valance will undoubtedly contribute to the evolution of

computational linguistics and its practical applications in technology and communication.

The scientific significance of the research results lies in clarifying the features of the

manifestation of valence theory at the linguistic levels and demonstrating that the role of

syntactic connections in determining the valence of linguistic units at the syntactic level is

important. Syntactic connections in sentences can be considered as the semantics of syntactic

units involved in the structure of a sentence, by highlighting the junctional models and

differential syntactic features of the components and ways of expression using componential

models in terms of functional syntax. As part of the analysis of simple sentences in English and

Uzbek, this approach to the analysis of simple sentences helps to solve some controversial

problems encountered in comparative typology and theoretical grammars.

References:

1.

Асадов Р.М. Синтаксическая валентность на примере синтаксемного анализа

трехвалентных элементов в позиции неядерного оппозитивного предицирующего

компонента (nap2) Вестник Челябинского государственного …, 2016. C. 25-35.

2. World summit on the information society. Plan of Action. Document WSIS-03/

GENEVA/DOC/5-E. 12 December 2003

INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE

ISSN: 2692-5206, Impact Factor: 12,23

American Academic publishers, volume 05, issue 02,2025

Journal:

https://www.academicpublishers.org/journals/index.php/ijai

page 931

3. Рассел С., Норвиг П. Искусственный интеллект: современный подход /Artificial

Intelligence: a Modern Approach / Пер. с англ. и ред. К. А. Птицына. 2-е изд. М.:

Вильямс, 2006. - С.8.

4. Новое в зарубежной лингвистике. Вып. 24. Компьютерная лингвистика. М.: Прогресс,

1989. - С.10.

5. A.K. Zulpukarova

INFORMATION

TECHNOLOGIES IN

LINGUISTICS //

Международный журнал гуманитарных и естественных наук. 2024. №4-1 (91).

6. Bozorov S.M. INFORMATION

COMMUNICATION

TECHNOLOGY

FIELD

COMPUTER LINGUISTICS AND DATA MINING SYSTEM // Экономика и социум.

2022. №3-2 (94).

7. Abdullaeva Nargiza Erkinovna International classification of proverbs in computational

linguistics // International scientific review. 2019. №LV.

8. Nurmuxammedova, N. (2024). The Importance of Syntactic Analysis in Computer

Linguistics (in Research Examples of Word Combinations and Valency). Journal of

Language

Pedagogy

and

Innovative

Applied

Linguistics,

2(5),

74-77.

https://doi.org/10.1997/c8eqhc44

9. Karimova M., Asadov R., Aminova N., Proficient English users’ vocabulary Dictionary for

advanced and proficiency English learners 2020.

http://www.morebooks.shop/

10.

Rustam Asadov, Shohista Mardiyeva, Syntactic Valency of Predicative Components

Conference Proceedings: Fostering Your Research …, 2024 С. 159-162.

References

Асадов Р.М. Синтаксическая валентность на примере синтаксемного анализа трехвалентных элементов в позиции неядерного оппозитивного предицирующего компонента (nap2) Вестник Челябинского государственного …, 2016. C. 25-35.

World summit on the information society. Plan of Action. Document WSIS-03/ GENEVA/DOC/5-E. 12 December 2003

Рассел С., Норвиг П. Искусственный интеллект: современный подход /Artificial Intelligence: a Modern Approach / Пер. с англ. и ред. К. А. Птицына. 2-е изд. М.: Вильямс, 2006. - С.8.

Новое в зарубежной лингвистике. Вып. 24. Компьютерная лингвистика. М.: Прогресс, 1989. - С.10.

A.K. Zulpukarova INFORMATION TECHNOLOGIES IN LINGUISTICS // Международный журнал гуманитарных и естественных наук. 2024. №4-1 (91).

Bozorov S.M. INFORMATION COMMUNICATION TECHNOLOGY FIELD COMPUTER LINGUISTICS AND DATA MINING SYSTEM // Экономика и социум. 2022. №3-2 (94).

Abdullaeva Nargiza Erkinovna International classification of proverbs in computational linguistics // International scientific review. 2019. №LV.

Nurmuxammedova, N. (2024). The Importance of Syntactic Analysis in Computer Linguistics (in Research Examples of Word Combinations and Valency). Journal of Language Pedagogy and Innovative Applied Linguistics, 2(5), 74-77. https://doi.org/10.1997/c8eqhc44

Karimova M., Asadov R., Aminova N., Proficient English users’ vocabulary Dictionary for advanced and proficiency English learners 2020. http://www.morebooks.shop/

Rustam Asadov, Shohista Mardiyeva, Syntactic Valency of Predicative Components Conference Proceedings: Fostering Your Research …, 2024 С. 159-162.