Clever Geek Handbook
📜 ⬆️ ⬇️

Simultaneous automatic translation

Simultaneous automatic translation (Speech-to-Speech Real-Time Translation) - " instant " machine translation of speech , from one natural language to another, using special software and hardware . Also called the direction of scientific research related to the construction of such systems .

Unlike printed text or artificial signals, natural speech does not allow simple and unambiguous division into elements (phonemes, words, phrases), since they have no obvious physical boundaries. Word boundaries in a speech flow can be automatically determined only during recognition by selecting the optimal sequence of words that best matches the input speech flow by acoustic, linguistic, semantic and other criteria. [one]

Content

History

June 2012 - Program for automatic simultaneous translation (Technological Institute of Karlsruhe (Baden-Württemberg, Germany). [1] The device translates the oral lectures of the institute's teachers from German into English and reproduces the translation in the form of subtitles. [2]

October 2012 - Automatic, almost synchronous voice translation from English to Putonghua. Developer - Microsoft. [2] A machine learning system based on artificial neural networks (Deep Neural Networks) that reduces misunderstanding to every seventh to eighth word. But the biggest achievement is the generation of speech while preserving the modulations of the speaker’s voice. [3]

November 2012 - The service, the Japanese mobile operator NTT Docomo, opened, allowing subscribers who speak different languages ​​to communicate in real time. [4] Languages ​​supported by the service: (Japanese <-> English), (Japanese <-> Korean), (Japanese <-> Chinese). [five]

May 2015 - Blabber Messenger has appeared , which translates speech into 14 languages, and chat into 88.

Principle of Operation

The process of electronic translation of speech (S2S Real-Time Translation), as a rule, includes the following three stages) [6] [7] :

  1. automatic speech recognition (ASR - automatic speech recognition) - the conversion of speech into text;
  2. machine translation (MAT - Machine-Assisted Translation); - automatic translation of text from one language to another.
  3. speech synthesis (TTS - text-to-speech) - a technology that makes it possible to pronounce a text in a voice close to natural.

The speaker A speaks into the microphone, and the speech recognition module recognizes [ what? ] spoken. The input data are compared with phonological models consisting of a large number of speech libraries. Filtered in this way, using the dictionary and grammar of language A, is converted to a string of words based on an array of phrases of the language [ unknown term ] A. The automatic translation module converts this string. Early systems replaced every word, with the corresponding word in B. More advanced systems do not use literal translation, but take into account the entire context of the phrase in order to produce the corresponding translation. The created translation is transferred to the speech synthesis module , which evaluates the pronunciation and intonation corresponding to a number of words from the array of speech data of language B. The data corresponding to the phrase is selected, combined and displayed in the form required by the consumer in language B.

Speech Translation Systems

Speech translation systems (ST - Speech Translation) [8] , consist of two main components: Automatic speech recognition (ASR - automatic speech recognition) and Machine translation (MAT - Machine-Assisted Translation) and differ:

  • Working "on the client" (client-based).
  • According to the principle of “client-server” (client-server) (OnLine service).

Recognizing continuous spontaneous speech is the ultimate goal of all speech recognition efforts. Automatic speech recognition is divided into binding and its absence, to the voice of a particular person.

If we consider the classical scheme "science-technology-practical systems", then, the most serious problems in which the practical system of automatic recognition or understanding of speech will work, arise under the conditions: [9]

  • - an arbitrary, naive user;
  • - spontaneous speech, accompanied by agramatisms and speech "garbage";
  • - the presence of acoustic noise and distortion, including changing;
  • - the presence of speech interference.

Generalized classification of speech recognition systems. See ( [10] )

Traditionally, machine translation systems are divided into categories: [11] [12] [13]

  • Rule-Based Machine Translation (RBMT) - rules-based systems that describe language structures and their transformations.
  • Example-Based MT (EBMT) - systems based on examples of two texts, one of which is a translation of the other.
  • Statistical Machine Translation (SMT) - statistical machine translation [14] - a type of machine text translation based on a comparison of large volumes of language pairs.
  • Hybrid Machine Translation (SMT + RBMT) - Hybrid models "... where a breakthrough is expected as a translation." [13]

The boundaries between the Example-based and Rule-based systems are not very clear, because they both use dictionaries and dictionaries.

Statistical Machine Translation

Statistical machine translation is based on the search for the most probable translation of a sentence, using data from the bilingual corpus (Parallel Corpora) - Bitext . As a result, when performing a translation, the computer does not operate with linguistic algorithms, but calculates the probability of using a particular word or expression. A word or a sequence of words with optimal probability is considered the most appropriate for the translation of the source text and is substituted by the computer in the resulting text. In statistical machine translation, the task is not to translate the text, but to decrypt it.

Typical architecture of statistical systems MP. [15] [16]

  • Monolingual corpus (language of translation).
  • Language model - a set of n-grams (sequences of word forms of length n) from the corpus of texts.
  • Parallel housing.
  • Phrase table is a correspondence table of phrases of the source case and the translation body with some statistical coefficients.
  • Statistical decoder - among all possible translation options, selects the most probable one.

As a language model in statistical translation systems, mainly various modifications of the n-gram model are used, which states that the <grammatical> choice of the next word in the formation of the text is determined only by which (n-1) words go before it. [sixteen]

  • n-grams.
    • - Advantages: - high quality translation, for phrases that are entirely placed in the n-gram model.
    • - Disadvantages: - high-quality translation is possible only for phrases that are entirely placed in the n-gram model.

SMT Benefits

  • Quick setup
  • Easy to add new translation directions
  • Smoothness of translation

SMT disadvantages

  • <deficit> of parallel enclosures
  • Numerous grammar errors
  • Translation instability

Systems that do not use training are called “ Speaker Independent ” systems. Systems using training are “ Speaker Dependent ” systems.

MP systems based on Rule-Based rules

Rule-Based Machine Translation systems are subdivided into: [13] [17]

  • word translation systems;
  • transfer systems (Transfer) - transform the structures of the input language into the grammatical constructions of the output language;
  • Interlingua systems (Interlingua) - an intermediate language of description of meaning.

Components of a typical RBMT:

  • Linguistic databases: - bilingual dictionaries; - files of names, transliteration; - morphological tables.
  • Translation module: - grammar rules; - translation algorithms.

Features of RBMT systems:

  • Advantages: - syntactic and morphological accuracy; - stability and predictability of the result; - the ability to customize the subject area.
  • Disadvantages: - the complexity and duration of development; - the need to maintain and update linguistic databases; - “machine accent” in translation.

SMT + RBMT Hybrid Models

Hybrid Technology Architecture: [13]

  • Training: Parallel housing-> Training: - Language model; - Data for post-editing; - Synthesis rules; - Glossary of terminology.
  • Operation: Hybrid translation.

Stages of Hybrid Technology:

  • Parallel building RBMT training using statistical technologies;
  • Operation based on a trained system.

Speech Synthesis Systems

Typical Text-to-Speech System architecture. [18]

  • Text analysis : - Definition of the text structure; - Normalization of the text; - Linguistic analysis.
  • Phonetic analysis: - Grapho - Phonetic transformation.
  • Analysis prosodyki: - Step & Duration of collocations.
  • Speech Synthesis: - Voice rendering.

In turn, speech synthesis is divided into groups [19] :

  • parametric synthesis;
  • concatenative, or compilation (compilation) synthesis;
  • synthesis according to the rules;
  • subject oriented synthesis.

Noise Cleaning

Sources of noise in speech systems: [20] - interference from microphones, wires, ADC (analog-to-digital converter), external noise arising from the speaker’s environment.

Classification of noise relative to its characteristics:

  • periodic / non-periodic noise;
  • the width of the frequency range in which the noise energy is distributed: - broadband (bandwidth more than 1 kHz) and narrowband noise (bandwidth less than 1 kHz);
  • speech noise, consisting of the voices of people around the speaker.

The most dangerous in its effect on the speech signal and the most difficult to remove noise is white noise: - non-periodic noise, the spectral density of which is uniformly distributed over the entire frequency range.

In the field of speech recognition systems in noise, the following approaches exist:

  • Developers do not pay attention to noise.
  • First they get rid of the noise, and then they recognize the cleared speech signal. This concept is usually used in the development of noise reduction systems as an additional module of recognition systems.
  • Recognition of a noisy signal without first improving it, in which it is studied how a person recognizes and understands noisy speech; because it does not pre-filter the speech signal in order to clear it of noise.

Methods of achieving noise immunity :

  • either come down to highlighting some noise-invariant features, or to training in noise conditions or modifying recognition standards using noise level estimates.

The weak point of such methods is the unreliable operation of recognition systems configured to recognize in noise in the absence of noise, as well as a strong dependence on the physical characteristics of noise.

  • Calculation of linear prediction coefficients. As elements of standards, instead of numerical values, probability distributions (mathematical average, variance) are used.
  • Digital signal processing: - noise masking methods (numerical values ​​comparable to noise characteristics are ignored or used with lower weights) and noise reduction methods using several microphones (for example, cleaning of low-frequency noise using a microphone on one side of the device and high-frequency - with other side).
  • Cleaning the useful signal from extraneous noise using arrays of microphones simulating a directional microphone with a variable directional beam (the simplest method of “delay and sum” or more complicated with the modification of the weight of the microphone).

Optimization Models and Methods

Most existing machine translation automatic evaluation metrics are based on a comparison with the human standard. [15]

When teaching the Speech Translation System, the following methods for optimizing the quality and speed of translation are used: [8] [21] [22] [23]

  • Cascading ASR / WER with MT / BLEU

Automatic speech recognition (ASR - automatic speech recognition)

  • ASR / WER (Word Error Rate) - the probability of an error in the code word;
  • ASR / PER (Position-independent Word Error Rate) - error probability of position-independent words (in different sentences);
  • ASR / CSR (Command Success Rate) - the probability of a successful team execution.

Machine Translation (MAT - Machine-Assisted Translation)

  • MT / BLEU (Bilingual Evaluation Understudy) - the probability of a translation matching a pattern.

Features

In addition to the problems associated with text translation, simultaneous translation of speech deals with special problems, including incoherence of the spoken language, less restrictions on the grammar of the spoken language, unclear word boundaries of the spoken language and correction of speech recognition errors. In addition, simultaneous translation has its advantages over text translation, including the less complex structure of the spoken language and less vocabulary in the spoken language. [3]

As the capacity of hardware devices grows, one can expect the appearance of machine translators with fewer errors in translation, which is the main problem of all electronic speech translators. The situation worsens if speakers belong to different language groups. For example, English belongs to the Germanic group of the Indo-European family of languages, and Chinese belongs to the Sino-Tibetan language superfamily. The differences between them are very large, and to make the correct translation is not easy, besides the same word can mean two or more different in meaning translation options in another language. For these reasons, the percentage of errors when translating languages ​​that are far apart is still high. In contrast, for example, from the translation of related languages ​​- for example, Russian and Ukrainian. [four]

Standards

When many countries begin to research and develop voice translation, it will be necessary to standardize interfaces and data formats to ensure that the systems are mutually compatible.

An international collaborative study created by speech translation consortia:

  • (C-STAR) Consortium for Speech Translation Advanced Research - an international speech translation consortium for a joint study of speech translation;
  • (A-STAR) Asia-Pacific - for the Asia-Pacific region .

They were founded as an international joint research organization for the design of bilingual standard formats that are important for promoting the scientific research of this technology and standardizing interfaces and data formats in order to connect the speech translation module internationally. [five]

Assessment of translation quality

  • BLEU (Bilingual Evaluation Understudy) - an algorithm for evaluating and optimizing text quality, machine translation.
  • WER (Word Error Rate) - an algorithm for evaluating and optimizing text quality, machine translation.
  • The classifier "Speech / non-speech" ( speech / non-speech ) - determines the probability of correct speech recognition. The trade-off between definition is voice as noise or noise as voice ( Type I and type II errors ).

See also

  • Auto Speech Recognition
  • Automated translation
  • Machine translate
    • Statistical Machine Translation
    • Rule Based Machine Translation
    • Hybrid Machine Translation
  • Speech synthesis
  • Artificial neural network (Learning ability is one of the main advantages of neural networks over traditional algorithms)

Literature

  • Translation Technologies for Europe.-M.: MTsBS, 2008.
  • Patent RU 2419142: automatic speech-to-speech system
  • GOST R 52633.5-2011 "Information security. Information security technique. Automatic training of neural network converters biometrics-access code " - is built on a learning algorithm that has linear computational complexity and high stability. (The world's first standard for the automatic training of artificial neural networks)
  • A. Waibel, “Speech Translation Enhanced Automatic Speech Recognition”, in Interactive Systems Laboratories, Universitat Karlsruhe (Germany), Carnegie Mellon University (USA), 2005.
  • Dong Yu, “Transcription of spoken language using a context-sensitive deep neural network,” Microsoft Research, 2011.
  • Dong Yu, Li Deng, “Deep Neural Network or Gaussian Mixture Model?”, Microsoft Research, 2012.
  • Xuedong Huang, “Spoken Language Processing: a guide to Theory, Algorithm, and System Development, page 1-980,” Microsoft Research, 2000.

Links

  • Simultaneous Translation: University without Language Barriers
  • A program for simultaneous translation of lectures has been developed in Germany
  • Speech Recognition Breakthrough for the Spoken, Translated Word // Microsoft Corporation, November 7, 2012
  • Microsoft shows almost instant translation from English into Chinese
  • NTT DOCOMO to Introduce Mobile Translation of Conversations and Signage
  • The Japanese presented a system for automatic translation of telephone conversations
  • Protocols of Network-based Speech-to-Speech Translation
  • “Forecast for the Research and Development of Speech Translation Technologies.” By Satoshi, Nakamura in Science & Technology Trends - Quarterly Review No.31 April 2009.
  • [6] (unavailable link from 18-05-2013 [2263 days]) "Architectural overview of speech-centric information processing systems"
  • [7] Automatic Speech-to-Speech Translator from IBM
  • [8] S2S Real-Time Translation from AT&T Labs
  • [9] S2S Real-Time Translation from Nokia Research Center
  • en: Speech Translation
  • en: Speech Recognition
  • en: Speech Synthesis
  • en: Machine translation
  • en: Mobile translation
  • en: Statistical machine translation
  • en: Parallel text
  • en: Type I and type II errors

Notes

  1. ↑ http://www.proceedings.spiiras.nw.ru/data/src/2010/12/00/spyproc-2010-12-00-01.pdf
  2. ↑ Speech Recognition Breakthrough for the Spoken, Translated Word - Microsoft Research (Neopr.) . Date of treatment February 17, 2013. Archived March 15, 2013.
  3. ↑ Microsoft shows an almost instant translation from English into Chinese / Habrahabr (neopr.) . Archived March 15, 2013.
  4. ↑ The Japanese presented a system for automatic translation of telephone conversations (neopr.) . Archived March 15, 2013.
  5. ↑ NTT DOCOMO to Introduce Mobile Translation of Conversations and Signage | Press Center | NTT DOCOMO Global (Neopr.) . Date of treatment February 13, 2013. Archived February 16, 2013.
  6. ↑ IBM Research | Speech-to-Speech Translation (Neopr.) . Date of treatment February 17, 2013. Archived March 15, 2013.
  7. ↑ http://csl.anthropomatik.kit.edu/downloads/PaulikSchultz_ASRU2005.pdf
  8. ↑ 1 2 People - Microsoft Research
  9. ↑ Modern problems in the field of speech recognition. - Auditech . Ltd ( Neopr .) . Date of treatment March 3, 2013. Archived March 15, 2013.
  10. ↑ Account Suspended
  11. ↑ en: Machine translation
  12. ↑ http://www.cs.tut.fi/~puhtunn/lecture-01.pdf
  13. ↑ 1 2 3 4 http://www.promt.ru/images/deep_hybrid.pdf
  14. ↑ Speech Recognition, Machine Translation, and Speech Translation - A Unified Discriminative Learning Paradigm - Microsoft Research
  15. ↑ 1 2 http://www.promt.ru/images/ainl_molchanov_promt.pdf
  16. ↑ 1 2 Distributed statistical machine translation system | Ilya (w-495) Nikitin - Academia.edu (neopr.) . Date of treatment March 19, 2013. Archived March 22, 2013.
  17. ↑ Distributed statistical machine translation system | Ilya (w-495) Nikitin - Academia.edu (neopr.) . Date of treatment March 18, 2013. Archived March 22, 2013.
  18. ↑ http://www.library.wisc.edu/selectedtocs/bd025.pdf
  19. ↑ Sorokin V.N. Speech synthesis. - M .: Nauka, 1992, p. 392.
  20. ↑ http://www.sovmu.spbu.ru/main/sno/uzmf2/uzmf2_22.pdf
  21. ↑ http://www.lrec-conf.org/proceedings/lrec2008/pdf/785_paper.pdf
  22. ↑ Archived copy (unopened) (inaccessible link) . Date of treatment February 25, 2013. Archived June 18, 2006.
  23. ↑ http://iosrjournals.org/iosr-jce/papers/Vol5-issue1/G0513136.pdf
Source - https://ru.wikipedia.org/w/index.php?title=Synchronous_automatic_translation&oldid=100288839


More articles:

  • Yanagita, Genzo
  • European Modern Pentathlon Championship 2011
  • Tsotadze, Liana
  • Cheildag
  • Chernysheva, Lyubov Pavlovna
  • Nope, Luis
  • Odinokov, Viktor Nikolaevich
  • Denisov, Mikhail Ignatievich
  • Gvozdyvy-Rostovsky
  • Gvozdev-Rostovsky, Mikhail Fedorovich

All articles

Clever Geek | 2019