Hybrid machine translation (HMT) - integration of different machine translation approaches from possible MP options: [1]
- Rule-based machine translation (RBMT) - Rule-based machine translation.
- Corpus-based machine translation (CBMT) - Machine translation on the corpus of texts .
- Example-based machine translation (EBMT) Machine translation by examples .
- Statistical machine translation (SMT) - Statistical machine translation .
Hybrid architecture is expected to combine the benefits of these approaches. [1] Machine translation today is represented by two main technologies: Statistical machine translation (SMT) and Rule-Based Machine Translation (RBMT). [2]
Content
Hybrid MT Software Developers
- AppTek HMT [3] TranSphere® - full integration of SMT and RBMT methodologies.
- Asia Online [4] "SAIC's OmnifluentTM Human Language Technology."
- LinguaSys [5] "Carabao Machine Translation engine".
- Systran [6] [7] “SYSTRAN's hybrid engine”
- Polytechnic University of Valencia [8]
- PROMT [2] "PROMT DeepHybrid" [9]
Hybrid SMT and RBMT Technology
The hybrid translation technology involves the use of statistical methods to build vocabulary databases automatically on the basis of parallel bodies, the formation of several possible translations both at the lexical level and at the level of the syntactic structure of the sentence of the output language, the use of post-editing in automatic mode and the choice of the best (most probable) translation of the possible ones on the basis of a language model built according to a certain corpus of the output language. [2]
Hybrid (SMT + RBMT) System differ: (Section 2.4.3 [4] )
- Rule-based MT with post-processing statistical approach.
- Statistical MT with preprocessing according to the Rule-based approach.
- Full integration of RBMT and SMT. [3]
Statistical MP seeks to use linguistic data, and systems with a “classical” rule-based approach use statistical methods. [2] Adding some "end-to-end" rules, that is, the creation of hybrid systems, several [ how much? ] improves the quality of translations, especially with insufficient input data used to build index files for storing linguistic information of a machine translator based on N-grams. [ten]
Combining RBMT and statistical machine translation:
- Linguistic analysis of the input sentence;
- Generation of translation options;
- Use of statistical technologies;
- Evaluation and selection of the best translation option using the Language Model. [11] [12] [13]
Stages of SMT and RBMT Hybrid Technology: [2]
- Parallel building RBMT training using statistical technologies;
- Operation based on a trained system.
SMT and RBMT Hybrid Technology Architecture
In hybrid machine translation, the RBMT system is supplemented by two components [14] : a statistical post-editing module and a language model module. Statistical post-editing allows you to smooth out the RB-translation, bringing it closer to the natural language and at the same time maintaining a clear structure of the synthesized text. Language models are used to evaluate the smoothness and grammatical correctness of the translation options generated by the hybrid system.
Typical HMT architecture: [14]
- Parallel housing;
- Training;
- Language model;
- Data for post-editing;
- Synthesis rules;
- Glossary of terminology.
- Exploitation:
- - Hybrid translation.
HMT working principle
The combination of seemingly incompatible translation methods, namely the classic machine translation technology, Rule-Based MT and Statistical MT can be implemented in a hybrid translation technology. [15] The fundamental difference of the new solution is that instead of one translation option, the program generates many translations, the number of which in one sentence, depending on the polysemy of words, constructions, and results of statistical processing, can reach several hundred. Further, the probabilistic language model allows you to choose the most probable of the proposed options.
Typical HMT operation algorithm: [2]
- Creating a terminology dictionary from parallel texts for RBMT automatically.
- Generation of all possible translation options based on:
- - lexical options;
- - options for the synthesis of different designs;
- - application of post-editing.
- Choosing the best option, through the implemented Language Model.
Advantages and disadvantages
What does hybrid translation technology give?
- Fast automatic configuration based on the customer’s Translation Memories;
- The terminological accuracy of the translation, as well as the unity of style;
- Obtaining additional useful data - a bilingual terminological dictionary.
Advantages and disadvantages of Machine translation based on rules
Benefits of RBMT: [16]
Saved:
- - syntactic and morphological accuracy;
- - stability and predictability of the result;
- - the ability to customize the subject area.
Disadvantages of RBMT:
- - the complexity and duration of development;
- - the need to maintain and update linguistic databases;
- - “machine accent” in translation.
The disadvantages are leveled through the use of parallel cases and statistical methods.
- - automatic configuration of linguistic databases (quick and high-quality extraction of terminology),
- - “machine” accent disappears during translation (synthesis options and post-editing).
Advantages and disadvantages of Statistical Translation Systems
Benefits of SMT: [17]
- - quick setup;
- - Easily add new translation directions;
- - smoothness of the translation.
Disadvantages of SMT:
- - "Deficit" of parallel buildings;
- - numerous grammatical errors;
- - instability of translation.
See also
- Automatic translation of spoken language
- Automated translation
- Machine translate
- Rule Based Machine Translation
- Statistical Machine Translation
- Speech recognition
- Speech synthesis
Notes
- ↑ 1 2 Archived copy (inaccessible link) . Date of treatment March 27, 2013. Archived March 13, 2016.
- ↑ 1 2 3 4 5 6 Hybrid translation technology.-Yu. Epifantseva, PROMT LLC, Conference "Ros.Internet Technologies", 2011 . Archived on April 8, 2013.
- ↑ 1 2 Request Rejected
- ↑ 1 2 http://nlp.amrita.edu:8080/project/mhrd/ms/Final_Thesis.pdf (inaccessible link)
- ↑ Archived copy (inaccessible link) . Date of treatment March 29, 2013. Archived March 4, 2016.
- ↑ SYSTRAN's machine translation technology . Date of treatment April 1, 2013. Archived on April 8, 2013.
- ↑ SYSTRAN Hybrid Technology . Date of treatment April 1, 2013. Archived on April 8, 2013.
- ↑ http://web.iti.upv.es/~fcn/Students/ta/Talk-ToniL-PRACT_ISSUES-13_4p.pdf (link not available)
- ↑ http://www.statmt.org/wmt12/pdf/WMT43.pdf
- ↑ http://poiskbook.kiev.ua/art/ml/lande.pdf
- ↑ http://www.intsys.msu.ru/magazine/archive/v6(1-4)/kholod.pdf
- ↑ http://vestnik.stavsu.ru/70-2010/06.pdf
- ↑ On the automatic approximation of real languages - download a free abstract on the subject of Discrete mathematics and mathematical cybernetics. Order a dissertation delivery by matem ... . Date of treatment April 4, 2013. Archived on April 8, 2013.
- ↑ 1 2 Why do we need a hybrid translation technology.-A.Molchanov, LLC PROMT, Conference "AINL", 2013 . Archived on April 8, 2013.
- ↑ PROMT company - translators and dictionaries for translating text from English, Russian, German, French, Spanish, Portuguese and Italian (Inaccessible link) . Date of treatment March 23, 2013. Archived on April 8, 2013.
- ↑ Archived copy (inaccessible link) . Date of treatment March 27, 2013. Archived November 9, 2012.
- ↑ Why do we need a hybrid translation technology.-A.Molchanov, LLC <PROMT>, Conference "AINL", 2013 . Archived on April 8, 2013.