Silent speech interfaces ( SSI ) - speech processing systems based on the receipt and processing of speech signals at an early stage of articulation .
Content
History
Silent access interfaces have a very recent history since the early 2000s. In the past decade, the work of automatic speech processing systems, including speech recognition , text recognition, translation and speech synthesis, has improved significantly. This has led to the use of speech and speech technologies in a wide range of services, such as information retrieval systems, call centers , voice control of mobile phones and car navigation systems , personal translators, as well as the use of voice technology in the field of security. However, speech interfaces based on traditional acoustic speech signals still have a number of significant limitations.
Firstly, acoustic signals transmitted through air are subject to distortion due to noise . Reliable speech processing systems that would work seamlessly in crowded restaurants, airports and other public places, despite the titanic efforts, are still not visible.
Secondly, traditional speech interfaces require clearly and clearly delivered speech, which has two main drawbacks: in a public place, it threatens the confidentiality of the message and, secondly, it worries others. Services that require access, retrieval and transfer of private or confidential information, such as PIN codes , passwords, are especially vulnerable.
In the early 2000s, silent access interfaces were proposed to solve this problem, which allow users to communicate by saying “silently,” that is, without making any sounds. This is done by receiving speech signals in the early stages of human articulation, namely before speech appears in the air; after that, articulation signals are transmitted to the system for further processing and interpretation. With this new approach, silent access interfaces have the potential to overcome the main shortcomings of today's traditional voice interfaces:
- limitation of the reliability of speech recognition in the presence of background noise,
- lack of reliability in the transfer of private and confidential information,
- concern of others.
In addition, silent access interfaces could be an alternative for people with speech impairments (such as a laryngectomy), as well as for older or weakened people who cannot speak loudly, clearly, and legibly.
Technology
Pak. H. Chan et al. Proved ( 2001 , 2002) [1] that the myoelectric signal from articulating facial muscles contains enough information to accurately distinguish a small set of words. These words are recognized even when they are pronounced quietly, that is, in the absence of an audio signal (Jorgensen et al. 2003, Bradley et al. 2006). Recent work suggests that recognition of phoneme units based on electromyographic (EMG) units (Jou et al. 2006, Walliczek et al. 2006) open the way for the recognition of extensive vocabulary databases.
Also, studies have recently appeared that allow using the ultrasound and optical images to develop the Silent Access Interface based on the movements of the tongue and lips (Denby and Stone 2004, Denby et al. 2006, Hueber et al. 2007).
SSI systems that convert grunts into speech are primarily developed in Japan . In the United States, DARPA is sponsoring glottis activity studies to use sensors in noisy environments:
Within the framework of the program of modern speech coding ( Eng. Advanced speech encoding , abbr. ASE ) [2] , technologies will be developed that will allow the exchange of information in difficult military conditions.
Over the past 50 years, great successes have been achieved in the development of a voice encoder ( vocoder ), but, as before, the ultra-low bitrate (ULBR) of voice coding at 300 bps remains a serious problem. In particular, ULBR vocoders still do not have a high-quality speech analyzer that would recognize the speaker’s speech without interference; these shortcomings are hyperbolized in acoustically difficult environments (for example, in a noisy space or in a space with reflected sound).
The approach implemented by the modern speech coding (ASE) program is to use new sensors that are not affected by noise as a complement to processed acoustic signals (see. Fig.). Such sensors will be studied with respect to their potential so that pre-speech / audible speech can be used as alternative means of communication in acoustically harsh and dangerous conditions in which military disguise is required.
-
See also
- Voice control
- Voice interface
- Laryngophone
- Gesture interface
Links
- Special Session on Silent Speech Interfaces
- Alexey Esaulenko . Bad good IVR // "Networks / network world" No. 4, 2010
Notes
- ↑ Pak. H. Chan Handbook of Neurochemistry and Molecular Neurobiology
- ↑ Advanced speech encoding . Virtual worldlets network.