NEURAL MACHINE TRANSLATION AND THE LOSS OF CONTEXT

Khilola Alimova

doi:10.71337/inlibrary.uz.canrms.82976

Авторы

Khilola Alimova
Senior Lecturer, Interfaculty Department of English, Faculty of Foreign Philology, National University of Uzbekistan

DOI:

https://doi.org/10.71337/inlibrary.uz.canrms.82976

Ключевые слова:

machine translation neural machine translation problems context loss.

Аннотация

This paper discusses neural machine translation and context loss. Machine translation is a method that uses computers to automatically translate human languages. Currently, neural machine translation (NMT) has achieved great breakthroughs in translation performance. This paper provides an overview of the structure of NMT and discusses its main problems in terms of context loss.

CURRENT APPROACHES AND NEW RESEARCH IN

MODERN SCIENCES

International scientific-online conference

19

NEURAL MACHINE TRANSLATION AND THE LOSS OF CONTEXT

Alimova Khilola Rustamovna

Senior Lecturer, Interfaculty Department of English,

Faculty of Foreign Philology, National University of Uzbekistan

https://doi.org/10.5281/zenodo.15302950

Abstract:

This paper discusses neural machine translation and context loss.

Machine translation is a method that uses computers to automatically translate
human languages. Currently, neural machine translation (NMT) has achieved
great breakthroughs in translation performance. This paper provides an
overview of the structure of NMT and discusses its main problems in terms of
context loss.

Keywords:

machine translation, neural machine translation, problems,

context, loss.

The concept of machine translation (MT) was formally proposed in 1949 by

Weaver [1]. He envisioned that modern computers could be used to
automatically translate different languages. Since then, machine translation has
attracted much attention from scientists and has become one of the most
challenging problems in the field of natural language processing and artificial
intelligence. Machine translation has recently attracted much attention due to its
versatility and speed of translation. Much of the progress in neural machine
translation (NMT) is largely due to the invention and development of new neural
networks that act as the underlying models for NMT. The main types of neural
networks range from recurrent neural network equipped with an attention
mechanism, convolutional neural network, and the recently proposed
transformer. Neural architecture search [1-3] has also attracted much attention
recently due to the fact that it can automatically find neural architectures and
give better performance than hand-crafted networks for most tasks. Neural
architecture search has had great success in computer vision tasks such as
image classification, object detection [1-3]. The author of propose to consider
the process of searching for an architecture based on its optimization, namely on
one of the most representative methods - gradient-based. It was confirmed that
neural architecture optimization is more effective for image classification, but
not for neural machine translation. Direct application of optimization to neural
MT is not the best choice due to the following reasons:

1. NMT training is sensitive to the choice of hyperparameters. According to

preliminary studies, changing even seemingly insignificant indicators can lead to
a significant change in the NMT results;

CURRENT APPROACHES AND NEW RESEARCH IN

MODERN SCIENCES

International scientific-online conference

20

2. Architecture optimization allows finding one level or cell and adds them

several times, which leads to a limitation of the architecture space;

3. NMT is usually much larger than an image classification model, which

makes it impossible to trace the details of the implementation of standard
architecture optimization.

In order for the architecture optimization to have a positive impact on NMT

consistent with the above problems, it should be improved according to the
following aspects:

1. Design two search spaces: the network operation space, which consists of

commonly used architectural components for NMT, such as attention modules,
repeating units, etc.

2. Find all layers so that each layer has a customized architecture. This

design provides greater flexibility for NMT architectures.

3. Considering that the NMT model usually consists of many parameters,

use two methods to solve this problem: shared storage and successive halving
[8], which gradually reduces the size of architectural views during the search
process, discarding bad ones. From a methodology perspective, MT approaches
are mainly divided into two categories: method-rule and method-data. In the
rule-based method, bilingual linguists are responsible for designing specific
rules to analyze the source language, transform it into the target language, and
generate the target language. Since it is subjective and labor-intensive, this
method lost its appeal in the early 21st century.

However, the data-driven approach trains computers to translate from

many pairs of parallel human-translated sentences. This approach has three
main periods. In the mid-1980s, example-based MT was proposed, which
translates a sentence by extracting similar examples in pairs of human-
translated sentences. Since the early 1990s, statistical machine translation
(SMT) has been developed, in which word- or phrase-level translation rules can
be automatically learned from parallel corpora using probabilistic models. Since
2014, neural machine translation (NMT) based on deep neural networks has
been actively developed [1]. In 2016, various experiments with language pairs
showed that NMT has made great progress and received significant
improvements compared to previous versions [3]. Neural machine translation is
an end-to-end model following an encoder-decoder structure, which typically
involves two neural networks [2]. As shown in Figure 1, the encoder network
first maps each input token of a source language sentence to a low-dimensional
real-valued vector, and then encodes the sequence of vectors into distributed

CURRENT APPROACHES AND NEW RESEARCH IN

MODERN SCIENCES

International scientific-online conference

21

semantic representations, from which the decoder network generates a token of
a target language sentence from left to right.

NMT is formally defined as a sequence-to-sequence prediction problem,

which highlights several key issues. First, the input is a sentence instead of
paragraphs and documents. Second, the output sequence is formed with left-to-
right autoregression. Third, the NMT model is optimized based on the bilingual
training data, which should include large-scale parallel sentences to learn good
network parameters. Fourth, the processing objects of NMT are pure texts
instead of speech and video. Accordingly, we highlight four main issues as
follows:

1. In the formulation of NMT, a sentence is the main input for modeling.

However, some words in a sentence are ambiguous, and the meaning can only be
resolved in the context of surrounding sentences or paragraphs. And when
translating a document, we must ensure that the same terms in different
sentences lead to the same translation, while performing the translation
sentence by sentence independently cannot achieve this goal. It is a big
challenge how to fully utilize the contexts beyond sentences in neural machine
translation.

2. Non-autoregressive decoding and bidirectional inference. Left-to-right

token decoding follows an autoregressive style, which is consistent with human
reading and writing. However, it has several drawbacks. On the one hand, the
decoding efficiency is very limited because the i-th token of a translation can
only be predicted after all i - 1 previous predictions have been generated. On the
other hand, the prediction of the i-th token can only access the previous history
predictions, while it cannot use future context information in an autoregressive
manner, resulting in low translation quality. 3. Resource-limited translation.
There are thousands of human languages in the world, and abundant bitexts are
available only in a few language pairs. Even in a resource-rich language pair,
parallel data is unbalanced because most bitexts mainly exist in a few domains.
That is, the lack of a parallel learning corpus is very common in most languages
and domains. It is well known that neural network parameters can be well
optimized for frequently repeated events, and the standard NMT model will be
poorly learned on low-resource language pairs. As a result, the question arises
how to fully utilize parallel data in other languages (composite translation and
multilingual translation) and how to fully utilize non-parallel data.

4. Multimodal neural machine translation. Intuitively, human language is

not only texts, and understanding the meaning of language may require the help

CURRENT APPROACHES AND NEW RESEARCH IN

MODERN SCIENCES

International scientific-online conference

22

of other modal connections such as speech, images, and videos. In many cases,
we are required to translate speech or video. For example, simultaneous speech
translation is becoming increasingly popular in various conferences and
international live events. Therefore, how to implement multimodal translation
in the codec architecture is a big challenge for NMT. How to fully utilize various
methods in multimodal translation and how to balance the quality and latency in
simultaneous speech translation are two specific challenges. Current NMT
systems have a number of shortcomings, which lead to some serious translation
errors, which we often see when using complex expressions of the native
language. However, the rapid development of NMT technology is actively trying
to solve the above problems.