eSpeak

eSpeak
eSpeak
Type of	speech synthesizer
Author	Jonathan Duddington
Written on	C ++
operating system	Linux and others. UNIX-like , Windows
First edition	2006
Latest version	1.48.04 ( April 6, 2014 )
condition	inactive
License	GNU GPL
Site	espeak.sourceforge.net

eSpeak is a compact, free software speech synthesizer supporting Speech Synthesis Markup Language (SSML). Currently, the original project is inactive due to the disappearance of its author Jonathan Daddington ^[2] . The community is developing its fork of eSpeakNG

Content

Operating Systems

ESpeak versions exist for operating systems such as Microsoft Windows , Mac OS X , Linux , RISC OS , and its source code in C ++ is also available. In addition, the official documentation of the synthesizer gives instructions for compiling it under Windows Mobile . The program has one significant limitation - voice generation is possible only in a WAV file. ^[3]

In addition, eSpeak is used in Android mobile operating systems, starting with version 1.6, and Maemo , but these projects are not personally supervised by the developer, and there are no corresponding packages on the official eSpeak website, and the Android version has a number of significant errors when working in some languages , in particular, and Russian. ^[four]

Versions for Windows and Linux are regularly updated with the source code, and Mac and RISC versions have not been supported for a long time.

The Windows version of eSpeak is written for the Microsoft Speech API 5.x platform, and also comes as a console utility. The Mac OS X version is a standalone application that does not integrate with Apple’s system voice service and requires manual configuration. However, there is an option for expedited installation using the special eSpeak Macintosh Installer package. ^[five]

Supported Languages

eSpeak supports about five dozen different languages. During installation, the user is required to indicate the support of which particular dialects he is interested in. ^[6]

Below is a list of languages supported by the eSpeak synthesizer, and their notation, which are used in its settings.

Albanian - sq
English (American) - en-us
English (British with a northern accent) - en-n
English (British with West Midland accent) - en-wm
English (Classic British) - en
English (General Spoken) - en-rp
English (Scottish) - en-sc
Armenian (Western) - hy-west
Armenian (classic) - hy
Afrikaans - af
Bosnian - bs
Welsh - cy
Hungarian - hu
Vietnamese - vi
Dutch - nl
Voices MBROLA (voice xxx) - mb-xxx
Greek - el
Ancient Greek - grc
Indonesian - id
Icelandic - is
Spanish (Classic) - es
Spanish (Latin American) - es-la
Italian - it
Catalan - ca
Chinese ( Cantonese ) - zh-yue
Chinese ( Putonghua ) - zh
Kurdish - ku
Latin - la
Latvian - lv
Lojban - jbo
Macedonian - mk
German - de
Norwegian - no
Polish - pl
Portuguese (Brazilian) - pt
Portuguese (European) - pt-pt
Romanian - ro
Russian - ru
Serbian - sr
Slovak - sk
Slovenian - sw
Tamil - ta
Turkish - tr
Finnish - fi
French - fr
Hindi - hi
Croatian - hr
Czech (conversational) - cs
Swedish - sv
Esperanto - eo

The list of supported languages can also be expanded by using the MBROLA voice libraries that can be connected to eSpeak.

eSpeak and MBROLA

MBROLA is a special diphonic speech synthesis algorithm, on the basis of which many different software products have been created with the inclusion of technology for converting text to speech (TTS). This project is a record holder among other speech synthesis technologies in the number of different languages for which it was used. Although for some common languages MBROLA voices have not been created so far, including for Russian. ^[7]

eSpeak can work in conjunction with MBROLA, which provides the ability to use the voice libraries of this project as part of eSpeak itself. This allows you to further expand the list of supported languages for synthesizing speech by text.

The combination of eSpeak and MBROLA can be used on operating systems such as Windows , Linux ^[7] and Mac OS X ^[5] .

However, not all MBROLA voice libraries support integration with eSpeak.

Implementation Principles

Synthesis input text words go through two stages of processing:

the word in the literal representation is converted into a sequence of phonemes;
an audio signal is generated based on the received sequence.

The rules for obtaining a phoneme sequence are stored as “A, B, C = D”. Where B is the letter in question, A and C are the surroundings of the letter in the word, and D is the phoneme into which this letter can be transformed. The context of the environment can be specified either by specific letters or by special characters denoting groups of letters. Synthesizer rules allow ambiguous definition of such chains. To resolve this ambiguity, the synthesizer assigns priority to each rule, which is calculated on the basis of the number of letters involved in the rule and the degree of concreteness of determining the context of the environment. The rules can also indicate differences in translation depending on the stress.

In eSpeak, vowel sounds are always synthesized, voiced consonants are obtained by mixing synthesized sounds with pre-recorded voice noises, and all other sounds are simply recorded, for example, [w].

Each sound, except deaf consonants, is represented by a sequence of formants. In addition to information about formants, each phoneme has information about its amplitude, sound duration, and delay before the next phoneme. Based on these parameters, the sound of the vowel is synthesized using algorithms implemented in the synthesizer. Information about phonemes and formants is stored in separate files, which are also subsequently compiled into a binary format.

The synthesizer comes with eSpeak Edit. This is a GUI application written using the WXLib library. It allows you to visually edit finished phonemes. The phoneme is represented in the form of a graph of the curve, where you can sequentially select formants and change their values, such as frequency, height and width. Thanks to these features, on the basis of ready-made phonemes, you can get new, more faithful sounds for a certain language. At the same time, some of the phonemes cannot be obtained by modifying the existing ones. For example, when developing the Russian-language part of eSpeak, the sound [p] was specially recorded, since there was no decent analogue for it in other languages. ^[eight]

Projects Using eSpeak

eSpeak is an open source project, due to this, some developers have integrated it into their products.

NVDA

eSpeak is used as the primary speech synthesizer in the NVDA open source non-profit screen access program. With his help, the process of installing the program is announced, and also it is the default voice when it is first launched.

Speech Synthesizer “Captain”

In another speech synthesizer - Captain, developed by Anatoly Kamynin and Gennady Nefedov, an additional package is built on the basis of eSpeak that provides separate reading of multilingual texts: Russian or Ukrainian text is read by the Captain synthesizer, and English, French or German by the eSpeak synthesizer. This function is implemented in the Captain Speech Synthesizer both in the version for MS Speech API 4 ^[9] and in the version for MS Speech API 5.x ^[10] .

Third-Party Additions

Some languages do not have simple and universal rules for building competent speech, and eSpeak requires additional components to produce high-quality synthesis in these languages. In order to avoid increasing the size of the main eSpeak package, these components are distributed separately. In particular, in the Russian language there are no general rules establishing the stressed syllable in words. In these cases, eSpeak tries to determine the accent of the word, but the pronunciation is often not correct. To solve this problem, there is a special extended pronunciation dictionary that must be installed separately from the main eSpeak package.

In addition to Russian, there are also third-party components of eSpeak speech correction for Chinese (Putonghua and Cantonese).

You can download these dictionaries from the official site of the project.

Notes

Links

Official eSpeak website.
The eSpeak project at SourceForge.net .
Ukrainian vocabulary for eSpeak .

[repo-1] ¹ ² eSpeak Repository.

[2] Taking ownership of the eSpeak project and its future

[3] Manakhov P. - Review of mobile Text-To-Speech engines

[4] Tseikovets N. - Overview of Russian-language speech synthesizers for Android OS

[tiflocomp.ru-5] ¹ ² Tseikovets N. - Installing the eSpeak synthesizer in Mac OS X using the eSpeak Macintosh Installer

[6] Speak Installation Instructions

[win.tiflocomp.ru-7] ¹ ² Tseikovets N. - Use of MBROLA voices in MS Windows

[8] Pozhidaeva R. - Russification of the espeak speech synthesizer: Introduction

[9] Speech Synthesizer “Captain” (version for MS SAPI 4)

[10] Speech Synthesizer “Captain” (version for MS SAPI 5.x)