Authors

  • Mahzuna Rahimova
    National University of Uzbekistan

DOI:

https://doi.org/10.71337/inlibrary.uz.jmsi.111609

Abstract

This article argues that the use of digitized dictionary texts for corpus analysis offers numerous advantages and opens up expanded perspectives in phraseology. The systematic use of such resources can provide deeper insights into the development and structure of phrasemes, which is invaluable for linguistic research.


background image

https://ijmri.de/index.php/jmsi

volume 4, issue 4, 2025

722

INTRODUCTION TO THE CORPUS ANALYSIS OF PHRASEOLOGY

Rahimova Mahzuna Husan kizi

National University of Uzbekistan,

Faculty of Foreign Philology,

Linguistics (German)

Annotation:

This article argues that the use of digitized dictionary texts for corpus analysis

offers numerous advantages and opens up expanded perspectives in phraseology. The systematic

use of such resources can provide deeper insights into the development and structure of

phrasemes, which is invaluable for linguistic research.

Keywords:

Phraseology, corpus analysis, phrasemes, idioms and proverb lexicon, corpora.

A successful corpus analysis in the field of (historical) phraseology requires input that is

independent of expert intuition. The goal of this analysis is to search large corpora in a targeted

manner to find meaningful evidence for phrasemes[3]. These evidences are crucial for a)

establishing a chronological dating and b) analyzing, verifying, and describing the semantic

structure of the phraseme in context.
The procedure is similar to that of single-word lexicography, with the addition of a search for

variants in phraseography. A central problem is the lack of specific input necessary to know

which phrasemes and their variants existed historically. An example illustrates this problem: The

phrase "jemandem die Leviten lesen" (to read someone's riot act) is considered common today,

but no lexical variants are known.

The analysis of the DWDS core corpus shows that "Leviten" appears as a collocator only 139

times, without identifying any further variants.[5] In addition, the historical German Dictionary

of Proverbs provides at least some lexical variants, such as "Epistel" or "Vers." However, it

remains unclear how common these variants actually were, making cross-checking with a text

corpus essential.
Corpus analysis versus looking up dictionaries. Looking up the Duden dictionary of idiomatic

expressions doesn't yield many new insights, as it lacks further references. The German

Dictionary of Proverbs couldn't help much either, as the high hit rate is primarily due to the

explanations of the common variant.
However, the full-text search in the digitized German Dictionary of Proverbs makes it possible to

discover both lexical variants and phraseological synonyms. For example, alternatives such as

"Ich hab' ihm den Kümmel gerieben" (I rubbed him the caraway) can be found.

Digital corpus analysis.

This approach is far more than simple reference; it is corpus research.

Digital access to full texts transforms printed reference works into electronic corpora, opening up

new possibilities for phraseological research.
Corpus analysis plays a central role in phraseological research. This particularly concerns the

search for phrasemes, which is often treated as a special case. According to Rothkegel, it is about


background image

https://ijmri.de/index.php/jmsi

volume 4, issue 4, 2025

723

determining under which conditions a word chain is considered a lexical-semantic unit and how

this can be determined. The most commonly used method for identifying multi-word units is

collocation analysis, which analyzes large numbers of neighboring lexemes in a short time and

thus provides valuable clues to phraseologized word combinations.
Despite these advances, Heid still sees a need for development, especially with longer and more

varied phrases. Many idiomatic combinations are only recognized automatically, which is

effective, but does not capture all facets of the language. The interpretation of whether a

phraseme is a phrase is often left to experts. These must not only evaluate the semantic features,

but also determine the variants and boundaries of the phrasemes, which means additional

research effort.
A particular challenge is the historical dimension of phrasemes, which arises from their cultural

embedding and motivational history.[2] This leads to a competence problem, as experts cannot

easily assess how phrasemes functioned at different historical language stages. The availability

of contemporary sources decreases with time as we look back through language history, further

complicating analysis.
The automatic extraction of phraseological material, especially for more complex phrases,

proves unrealistic. Collocation analysis is highly dependent on sufficient data; for earlier

centuries, such as the 19th century, data are often sparse. Statistical conclusions are therefore

often unreliable, as relevant evidence for many variants is missing or scarce.
Overall, it appears that the focus on idiomatic phrasemes and a diachronic perspective

significantly worsens the cost-benefit ratio of corpus analysis.

Search for linguistically related noun forms.

In (historical) phraseology, real evidence is of

central importance, as it provides information about the variants and morphosyntactic variations

in which phrasemes existed[5]. Verifying the actual occurrence of these phrasemes in large

historical corpora is therefore essential. However, a lack of evidence does not negate the

fundamental existence of a phraseme.
Evidential dictionaries play a crucial role here, as they list realia for individual lemmas and

phrasemes and thus confirm the actual occurrence of the lemma. While the "Grammatisch-

kritisches Wörterbuch der Hochdeutschen Mundart" (Grammatical-Critical Dictionary of High

German Dialect) offers relatively few evidences due to its lexicographical approach and often

relies on self-formulated example sentences, the "Deutsches Wörterbuch" (German Dictionary)

(DWB) provides an extensive collection of evidence texts. This collection is particularly

valuable because it can be searched using corpus analysis methods and can contain extensive

semantic assignments[1].
However, the marking of phraseological entries in the German Dictionary of German Proverbs

(DWB) is less consistent than in Adelung's work. The evidence comes predominantly from

literary works and provides an indication of the fundamental existence of an expression. To

prove the phraseological status—namely, the stability or commonality of an expression—

multiple evidence is required, ideally including a list of existing nominal forms.
An example illustrating this process is the idiom "das Herz abfressen," (to eat one's heart away),

which is related to the still-common phrase "jemandem das Herz brechen" (to break someone's

heart). Evidence shows that this phrase was used in historical contexts to express emotional

states.[4] A comparison with the original sources could expand the textual context and thus

provide additional insights into the semantics.
The digital version of the German Dictionary of German Proverbs (DWB) not only provides


background image

https://ijmri.de/index.php/jmsi

volume 4, issue 4, 2025

724

access to literary sources but also opens up access to extensive historical collections of

proverbs.[5] These collections are often only accessible with considerable effort, but references

in the DWB make phraseographic work considerably easier. A simple analysis of the DWB

based on the names of authors of historical proverb collections allows relevant phrases to be

found quickly.
Overall, the DWB provides a valuable basis for the analysis and research of linguistic

expressions, their use and their development over the centuries.

Literature

1.Adelung, Johann Christoph (ed.) (1793–1801/1970): Grammatisch-kritisches Wörterbuch der

hochdeutschen Mundart. Mit beständiger Vergleichung der übrigen Mundarten, besonders aber

der Oberdeutschen. Nachdruck hrsg. v. Helmut Henne. Hildesheim/New York.
2.Brückner, Dominik/Knoop, Ulrich (2003): "Das Klassikerwörterbuch. Begründung und

Erläuterung eines digitalen Wörterbuchprojekts zum differenten Wortschatz in der klassischen

Literatur". Zeitschrift für germanistische Linguistik 31: 62–86.
3.Burger, Harald (1989): "Phraseologismen im allgemeinen einsprachigen Wörterbuch". In:

Hausmann, Franz Josef et al. (ed.): Wörterbücher. Dictionaries. Dictionnaires. Ein

internationales Handbuch zur Lexikographie. Bd. 1. Berlin/New York, de Gruyter: 593–599.
4.Čermák, František (2006): Statistical Methods for Searching Idioms in Text Corpora". In:

Burger, Harald/Häcki Buhofer, Annelies (ed.): Phraseology in Motion. Methoden und Kritik. Bd.

1. Baltmannsweiler, Schneider Verlag Hohengehren: 33–42.
5.Dräger, Marcel (2008): "Kurz angebunden. Historisch-lexikographische Betrachtungen einer

Redewendung". Erscheint in: Földes, Csaba (ed.) (im Druck): Phraseologie disziplinär und

interdisziplinär. Tübingen, Gunter Narr.

References

Adelung, Johann Christoph (ed.) (1793–1801/1970): Grammatisch-kritisches Wörterbuch der hochdeutschen Mundart. Mit beständiger Vergleichung der übrigen Mundarten, besonders aber der Oberdeutschen. Nachdruck hrsg. v. Helmut Henne. Hildesheim/New York.

Brückner, Dominik/Knoop, Ulrich (2003): "Das Klassikerwörterbuch. Begründung und Erläuterung eines digitalen Wörterbuchprojekts zum differenten Wortschatz in der klassischen Literatur". Zeitschrift für germanistische Linguistik 31: 62–86.

Burger, Harald (1989): "Phraseologismen im allgemeinen einsprachigen Wörterbuch". In: Hausmann, Franz Josef et al. (ed.): Wörterbücher. Dictionaries. Dictionnaires. Ein internationales Handbuch zur Lexikographie. Bd. 1. Berlin/New York, de Gruyter: 593–599.

Čermák, František (2006): Statistical Methods for Searching Idioms in Text Corpora". In: Burger, Harald/Häcki Buhofer, Annelies (ed.): Phraseology in Motion. Methoden und Kritik. Bd. 1. Baltmannsweiler, Schneider Verlag Hohengehren: 33–42.

Dräger, Marcel (2008): "Kurz angebunden. Historisch-lexikographische Betrachtungen einer Redewendung". Erscheint in: Földes, Csaba (ed.) (im Druck): Phraseologie disziplinär und interdisziplinär. Tübingen, Gunter Narr.