Verbalization of Neural Networks

Verbalization is a minimized description of the operation of a synthesized and already trained neural network in the form of several interdependent algebraic or logical functions.

Content

Verbalization Goals

One of the main disadvantages of trained neural networks , from the point of view of many users, is that it is difficult to extract an explicit and user-friendly algorithm for solving the problem from a trained neural network - the neural network itself is this algorithm, and if the network structure is complex, then this algorithm is incomprehensible. However, a specially constructed procedure of simplification and verbalization often allows you to extract an explicit method of solution.

Verbalization is carried out, in particular, to prepare a trained and simplified neural network for implementation in program code or in the form of a specialized electronic (optoelectronic) device, as well as to use the results in the form of explicit knowledge . ^[1] In this case, symptoms mean the input values of the neural network, and syndromes mean the values at the outputs of the neurons . End syndrome is the output of the neural network. Verbalization is usually done through specialized packages.

Particular tasks of simplification and verbalization

Simplify neural network architecture
Input signal reduction
Reducing the parameters of a neural network to a small number of selected values
Reduced accuracy requirements for input signals
The formulation of explicit knowledge in the form of a symptom-syndrome structure and explicit formulas for the formation of syndromes from symptoms.

Thinning Algorithms

Before verbalizing a network, usually using production rules, for some types of networks it was proposed to simplify the network structure - to thin out. The main idea of thinning (eng. Pruning) is that those elements of the model or those neurons of the network that have little effect on the approximation error can be excluded from the model without significant deterioration in the quality of the approximation ^[2] . But it must be borne in mind that this is true only for the problem being solved. If new statistics for training appear, then the thinned network will lose the generalization ability that it would have possessed if the connections had not been lost (at least the opposite has not been proved). Thus, we are talking about loss-of-quality algorithms that can be used for particular tasks, but cannot be used regardless of the task. By increasing their specialization, they lose flexibility.

For the multilayer Rumelhart perceptron and based on it

A second-order method (using sensitivity analysis based on the calculation of second derivatives) was proposed by LeKune in 1990 ^[3] and was called “optimal brain damage”. Then it was developed by Hassibi ^[4] and received the name "optimal brain surgery".

Somewhat earlier, thinning ^[5] and skeletonization ^{[6] of} neural networks were proposed, based simply on the removal of elements with the smallest weights ( zero-order methods ).

Finally, in the same 1990, A. N. Gorban proposed an effective method based on the analysis of the first derivatives in the course of training by gradient methods and not requiring separate differentiation. ^[7] In addition to the task of removing elements, other simplification problems were also solved: reducing the capacity of weights and signals (roughening), simplifying the functions of activation of neurons, obtaining interpreted knowledge, etc. The whole set of approaches was also called " contrasting of neural networks ." A description of the main indicators of sensitivity is presented in the review. ^[eight]

E. M. Mirkes, in the “Ideal Neurocomputer ” project, based on Gorban’s approach and application software development experience, introduced the “Contrater” element, built a library of its main functions and developed a description language. ^[9]

To prepare a neural network for simplification, it turns out to be useful to introduce in the evaluation of its work, minimized during training, penalty terms (English penalty), fines for complexity. These algorithms are introduced in the book of A. N. Gorban ^[7] . This approach was subsequently rediscovered and laid the foundation for the theory of structural learning of Ishikawa and Zurada. ^[10] ^[11]

For the Rosenblatt Perceptron and Based on It

Thinning algorithm for the Rosenblatt perceptron is not required. This is due to the fact that, unlike the multilayer Rumelhart perceptron , the perceptron is not a fully connected network, that is, the number of connections from the middle element to the input can be directly controlled by the experimenter depending on the complexity of the problem. Therefore, training with extra connections is not required, and you can immediately pick up the number of connections that is required for the task. Such selection is done experimentally, if convergence was obtained during training, then it can be further reduced. As soon as the convergence began to require a significantly larger number of iterations, this is a sign that the desired number of links has been reached.

Another controlled parameter that affects the number of bonds more significantly is the number of middle elements. The fewer secondary elements that can be used to train the perceptron, the more optimal the structure will be obtained.

Therefore, by controlling these two parameters, thinning is obtained automatically, without additional algorithms.

Notes

↑ Mirkes EM , Logically transparent neural networks and the production of explicit knowledge from data , In: Neuroinformatics / A.N. Gorban, V.L. Dunin-Barkovsky, A.N. Kirdin et al. - Novosibirsk: Science . Siberian enterprise of the Russian Academy of Sciences, 1998 .-- 296 p. ISBN 5-02-031410-2
↑ Optimal thinning of neural networks
↑ LeCun Y., Denker J. S., Solla S. A. Optimal brain damage / Touretzky D. S. ed., Advances in Neural Information Processing Systems 2. Morgan Kaufmann, San Mateo, CA. 1990. P. 598-605.
↑ Hassibi B., Stork D. G. Second order derivatives for network pruning: Optimal brain surgeon / NIPS 5. 1993.
↑ Sietsma J., Dow RJF, Neural net pruning - why and how. In: Proc. IJCNN'88, San Diego, CA., IEEE, Vol. 1. - pp. 325-333.
↑ Mozer MC, Smolensky P. Skeletonization: a technique for trimming the fat from a network via relevance assessment. In: Advances in Neural Network Information Processing Systems, Morgan Kaufmann, 1989. Vol. 1, pp. 107-115.
↑ ¹ ² Gorban A.N., Training of neural networks . M .: ed. USSR-USA JV Paragraph, 1990.160 s.
↑ Gorban AN, Mirkes Eu. M., Tsaregorodtsev VG Generation of Explicit Knowledge from Empirical Data through Pruning of Trainable Neural Networks In: Proc. IJCNN'99, Washington DC, July 1999, IEEE, Vol. 6, pp. 4393-4398.
↑ Mirkes E.M., Neurocomputer. Draft standard. - Novosibirsk: Science, Siberian Publishing Company RAS, 1999 .- 337 p. ISBN 5-02-031409-9 (Chapter 9: “Contrast”) Other online copies: Archived copy (unopened) (unavailable link) . Date of treatment October 15, 2008. Archived July 3, 2009. .
↑ Ishikawa S., Structural learning with forgetting, Neural Networks, 1996, Vol. 9, 3, 509-521.
↑ Miller DA, Zurada, JM, A dynamical system perspective of structural learning with forgetting, IEEE Transactions on Neural Networks, Vol. 9, 3, 1998, 508-515.

[1] Mirkes EM , Logically transparent neural networks and the production of explicit knowledge from data , In: Neuroinformatics / A.N. Gorban, V.L. Dunin-Barkovsky, A.N. Kirdin et al. - Novosibirsk: Science . Siberian enterprise of the Russian Academy of Sciences, 1998 .-- 296 p. ISBN 5-02-031410-2

[2] Optimal thinning of neural networks

[3] LeCun Y., Denker J. S., Solla S. A. Optimal brain damage / Touretzky D. S. ed., Advances in Neural Information Processing Systems 2. Morgan Kaufmann, San Mateo, CA. 1990. P. 598-605.

[4] Hassibi B., Stork D. G. Second order derivatives for network pruning: Optimal brain surgeon / NIPS 5. 1993.

[5] Sietsma J., Dow RJF, Neural net pruning - why and how. In: Proc. IJCNN'88, San Diego, CA., IEEE, Vol. 1. - pp. 325-333.

[6] Mozer MC, Smolensky P. Skeletonization: a technique for trimming the fat from a network via relevance assessment. In: Advances in Neural Network Information Processing Systems, Morgan Kaufmann, 1989. Vol. 1, pp. 107-115.

[Gorban-7] ¹ ² Gorban A.N., Training of neural networks . M .: ed. USSR-USA JV Paragraph, 1990.160 s.

[8] Gorban AN, Mirkes Eu. M., Tsaregorodtsev VG Generation of Explicit Knowledge from Empirical Data through Pruning of Trainable Neural Networks In: Proc. IJCNN'99, Washington DC, July 1999, IEEE, Vol. 6, pp. 4393-4398.

[9] Mirkes E.M., Neurocomputer. Draft standard. - Novosibirsk: Science, Siberian Publishing Company RAS, 1999 .- 337 p. ISBN 5-02-031409-9 (Chapter 9: “Contrast”) Other online copies: Archived copy (unopened) (unavailable link) . Date of treatment October 15, 2008. Archived July 3, 2009. .

[10] Ishikawa S., Structural learning with forgetting, Neural Networks, 1996, Vol. 9, 3, 509-521.

[11] Miller DA, Zurada, JM, A dynamical system perspective of structural learning with forgetting, IEEE Transactions on Neural Networks, Vol. 9, 3, 1998, 508-515.