Celp

Linear Prediction with Multicode Control ( Code Excited Linear Prediction, CELP ) is a speech coding algorithm originally proposed by Manfred Schroeder and B. S. Atal in 1985. At that time, the algorithm provided significantly better quality than the then existing low bitrate algorithms, such as RELP and LPC audio codecs (for example, FS-1015 ). Along with options like ACELP , RCELP , LD-CELP and VSELP , today it is the most widely used speech coding algorithm. CELP is currently used as a general term for a class of algorithms, rather than a specific codec .

Content

1 Introduction
2 CELP decoder
3 CELP Encoder
- 3.1 Noise distortion
4 References

Introduction

The CELP algorithm is based on four main ideas:

Using a source-filter model for reproducing speech based on linear prediction (LP);
Using adaptive and fixed code tables as a base for a linear prediction model;
Closed search in a “perceptually weighted domain”.
Application of vector quantization (VQ)

The original Schroeder and Atal algorithm in 1983, when running on the Cray I supercomputer, required 150 seconds to encode a 1-second speech signal. With the advent of more efficient ways to implement code tables and improved computational capabilities, the launch of the algorithm became possible in embedded devices, such as mobile phones.

CELP Decoder

Before exploring the complex process of CELP coding, we will consider the principle of the decoder. The illustration (an external link to the circuit is given below) describes the universal CELP decoder. Excitation is performed by summing the contributions from the adaptive (otherwise clocked ) code table and the fixed (otherwise stochastic) code table:

e[n]=e_{a}[n]+e_{f}[n]

{\ displaystyle e [n] = e_ {a} [n] + e_ {f} [n]}

Where $e_{a}[n]$ ${\ displaystyle e_ {a} [n]}$ is an adaptive (clock) contribution of the code table and $e_{f}[n]$ ${\ displaystyle e_ {f} [n]}$ is a fixed (stochastic) contribution to the code table. A fixed code table is a vector quantization dictionary that is (implicitly or explicitly) hard-coded into a codec. This code table can be algebraic ACELP or stored explicitly (e.g. Speex ). Entries in the adaptive code table consist of deferred excitation versions. This allows you to effectively encode periodic signals, such as human speech.

The filter that forms the excitation has all the poles of the model in the form 1 / A (Z), where A (Z) is called the prediction and acquisition filter, using linear prediction, the Levinson-Durbin algorithm . This filter is applicable not only because it uses all the poles, but also because it is easy to calculate and it is a good representation of the human voice.

CELP Encoder

The basic principle underlying CELP is called (Absolute) “Analysis through Synthesis”, which means that encoding (analysis) is performed perceptually, optimizing the decoded signal in a closed loop. In theory, the best CELP stream would be produced as a result of combinations of all possible sets of binary characters and the choice of which produces the decoded signal of the best sound. This, obviously, is impossible for two reasons: the complexity of the implementation is higher than any currently available hardware, and the selection criterion “best sound” implies a person as a listener.

To implement real-time coding using limited computing resources, the CELP search is broken down into smaller, more manageable, sequential searches using a simple perceptual premium function. As a rule, encoding is performed in the following order:

Linear Prediction Coefficients (LPC) are computed and quantized, usually as LSP
A search occurs on the adaptive (clock) code table, and its assistance / contribution / is deleted
Search by fixed (stochastic) code table

Noise distortion

Most (if not all) modern audio codecs try to create distortion in the coding so that it manifests itself mainly in those frequency areas where the human ear cannot catch it. For example, the ear is more tolerant of distortion in parts of the audio range that are louder and vice versa. That is why instead of minimizing the quadratic error, CELP minimizes errors on the weighted area. The weighting result on the curve W (z), as a rule, follows from the LCP filter by expanding the bandwidth :

W(z)={\frac {A(z/\gamma _{1})}{A(z/\gamma _{2})}}

{\ displaystyle W (z) = {\ frac {A (z / \ gamma _ {1})} {A (z / \ gamma _ {2})}}}

Where $\gamma _{1}>\gamma _{2}$ ${\ displaystyle \ gamma _ {1}> \ gamma _ {2}}$ .