Machine translation is the process of translating texts (written, and ideally oral) from one natural language to another using a special computer program . The direction of scientific research related to the construction of such systems is also called.
Forms of Organization of Computer-Human Interaction in Machine Translation
- With post-editing: the source text is processed by the machine, and the human editor corrects the result.
- With pre-editing: the person adapts the text to machine processing (eliminates possible ambiguous readings, simplifies and marks the text), after which the program processing begins.
- With inter-editing: a person intervenes in the work of the translation system, resolving difficult cases.
- Mixed systems (for example, simultaneously with pre- and post-editing).
Automated Translation
Instead of “machine” , the word automatic is sometimes used, which does not affect the meaning. However, the term automated translation has a completely different meaning - with it, the program simply helps a person translate texts.
Automated translation involves the following forms of interaction:
- Partially automated translation: for example, the use by a human translator of computer dictionaries .
- Systems with the division of labor: the computer is trained to translate only phrases of a rigidly defined structure (but does it in such a way that it is not necessary to correct it), and everything that does not fit into the scheme is given to a person.
In English terminology, the terms English are also distinguished . machine translation, MT (fully automatic translation) and English. machine-aided or English machine-assisted translation (MAT) (automated); if it is necessary to indicate both, write M (A) T.
There are two fundamentally different approaches to the construction of machine translation algorithms: rule-based and rule-based, or statistical-based. The first approach is traditional and is used by most developers of machine translation systems (PROMT in Russia, SYSTRAN in France, Linguatec in Germany, etc.) [1] The second type includes the popular Yandex.Translator , Google Translator service , as well as a new service from ABBYY [ 2] .
Statistical Machine Translation
Statistical machine translation is a type of machine translation of a text based on a comparison of large volumes of language pairs. Language pairs - texts containing sentences in one language and corresponding sentences in the second, can be either variants of writing two sentences by a person who is a native of two languages, or a set of sentences and their translations made by a person. Thus, statistical machine translation has the property of "self-learning." The more language pairs available and the more precisely they correspond to each other, the better the result of statistical machine translation. The concept of “statistical machine translation” means a general approach to solving the translation problem, which is based on the search for the most probable translation of a sentence using data obtained from a bilingual collection of texts. An example of a bilingual body of texts is parliamentary reports, which are minutes of debate in parliament. Bilingual parliamentary reports are published in Canada, Hong Kong and other countries; official documents of the European Economic Community are published in 11 languages; and the United Nations publishes documents in several languages. As it turned out, these materials are invaluable resources for statistical machine translation.
Machine Translation History
The idea of using computers for translation was expressed in the USA in 1947 , immediately after the appearance of the first computers. The first public demonstration of machine translation (the so-called Georgetown experiment ) took place in 1954 . Despite the primitive nature of that system (a dictionary of 250 words, a grammar of 6 rules, a translation of several simple phrases), this experiment received a wide response: studies began in England , Bulgaria , East Germany , Italy , China , France , Germany , Japan and other countries; in the same 1954 and in the USSR .
By the mid -1960s, two systems of Russian-English translation were provided for practical use in the United States:
- MARK (in the Department of Foreign Technology of the US Air Force);
- GAT (developed by Georgetown University, was used at the National Atomic Energy Laboratory in Oak Ridge and at the center of Euratom in Ispra, Italy).
However, the commission established to evaluate such systems concluded that, due to the low quality of machine-translated texts, this activity is unprofitable in the USA. Although the commission recommended continuing and deepening the theoretical developments, on the whole its conclusions led to an increase in pessimism , a decrease in funding, and often to a complete cessation of work on this topic.
Nevertheless, in a number of countries, research continued, which was facilitated by the constant progress of computer technology. A particularly significant factor was the emergence of mini- and personal computers , and with them increasingly complex vocabulary, search , etc. systems oriented toward working with natural language data. The need for translation as such was also growing due to the growth of international relations. All this led to a new upsurge in this area, which began around the mid -1970s . In the 1980s, the time came for the wide practical use of translation systems, and a market for commercial developments on this topic developed.
However, the dreams that humanity took up the task of machine translation half a century ago to a large extent remain dreams: high-quality translation of texts on a wide range of topics is still unattainable. However, speeding up the work of a translator when using machine translation systems is undoubtedly up to five times, according to estimates from the late 1980s.
Currently, there are many commercial machine translation projects. One of the pioneers in the field of machine translation was . In Russia, a group led by prof. R. G. Piotrovsky ( Russian State Pedagogical University named after A.I. Herzen , St. Petersburg ).
Philosophical Reasons
In the 1960s, Stanislav Lem summarized statements about the problem of machine translation and the connection with the understanding of the text by the machine itself (which is associated, for example, with a discussion of the concept of the “Chinese room” formulated in 1980):
| ... we insist on endowing machine translators with the “fullness of human life”; however, we simply do not know to what extent it is possible to “undermine the personality” of a machine that is designed to translate well. We do not know whether it is possible to “understand” without possessing a “personality” at least in its infancy. <...> It is not possible to effectively use the operational language to the end as a translation tool in the field of discursive - mental languages. Either the machines will act “knowingly,” or there will be no truly effective machine translators at all [3] . |
Translation Quality
The quality of the translation depends on the subject and style of the source text, as well as the grammatical, syntactic and lexical affinity of the languages between which the translation is made. Machine translation of literary texts almost always turns out to be of unsatisfactory quality. Nevertheless, for technical documents in the presence of specialized machine dictionaries and some tuning of the system to the features of a particular type of text, it is possible to obtain a translation of acceptable quality, which needs only a small editorial adjustment. The more formal the style of the source document is formalized , the greater the quality of translation can be expected. The best results when using machine translation can be achieved for texts written in technical (various descriptions and manuals) and formal business style .
The use of machine translation without adjusting to the topic (or with an intentionally incorrect setting) is the subject of numerous jokes on the Internet . Of the oldest and most popular examples of such jokes, the most famous text is the translation of documentation for the mouse driver , known as “Mouse Covers,” declared as “translation of computer documentation by the Poliglossum machine translation system based on medical, commercial and legal dictionaries” [comm. 1] . Of the short ones, the phrase " Our cat gave birth to three kittens - two whites and one black ", which the online translator PROMT (version 7.0, 2007 ) turned into "Our cat gave birth to three kittens - two white and one African American ." [6] If the “African American” could still be made “black” by writing “ black kitten ”, then the “cat” could not change sex: for example, a female cat was translated as “female cat”.
Most often, such jokes are associated with the fact that the program does not recognize the context of the phrase and translates the terms verbatim, moreover, without distinguishing proper names from ordinary words. The same PROMT translator turned “Leo Tolstoy” into “Lion Thick” (“fat lion”), “ bra-ket notation ” into “note by Bra Keti”, “ Lie algebra ” into “Lie algebra”, “ eccentricity vector ” - in the “vector of originality”, “ Shawnee Smith ” in the “Shawnee Smith Indian”, etc. Google translator , on the contrary, often mistook the word “ rice ” for the name of the US Secretary of State .
See also
- Automated translation
- Automatic translation of spoken language
- Linguistic software
- Parsing
- Wikipedia: Online auto translators
Comments
- ↑ However, this is not so: Polyglossum ( sic ) is an electronic dictionary [4] , a program of the same class as Lingvo , unable to translate itself. At that time, it existed in versions for DOS and Windows 3.x and, inferior to Lingvo and Context in the quality of the general dictionary, had a record volume of specialized dictionaries. In addition, individual translation errors give out a fake - probably, after a machine translation, the text was edited manually: “A wonderful example of the text obtained supposedly using the biomedical dictionary as a result of the translation of the mouse driver manual is called Mouse Covers ... I don’t believe in the purity of the experiment: for sure there were some corrections made to the text by the hand of a person [5] . ”
Notes
- ↑ Machine translation: rules versus statistics
- ↑ ABBYY's New Text Translation Approach
- ↑ Summa Technologiae , 1963 (or 2nd ed. 1967), chapter 4.
- ↑ Polyglossum on the official website
- ↑ K. Knop. Socrates is my friend, but truth is more expensive (inaccessible link) // Computerra. - 1999. - No. 47 (November 23).
- ↑ Our cat gave birth to three kittens - two white and one African American
Literature
- Automatic translation / I. M. Boguslavsky // Big Russian Encyclopedia : [in 35 vols.] / Ch. ed. Yu.S. Osipov . - M .: Great Russian Encyclopedia, 2004—2017.
- O.S. Kulagina. On the current state of machine translation // Mathematical Problems of Cybernetics, vol. 3, Moscow: Nauka, 1991, pp. 5-50. Bibliography of 140 titles. ISBN 5-02-014323-5 .
- Grashchenko L.A., Klyshinsky E.S., Tumkovsky S.R., Usmanov Z.D. Conceptual model of the Russian-Tajik machine translation system // Reports of the Academy of Sciences of the Republic of Tajikistan. - 2011. - Volume 54, No. 4. - S. 279-285.
- The future of machine translation // Computerra No. 21, June 5, 2002.