Boltzmann Machine

An example of a graphic representation of a Boltzmann machine. In this example, 3 hidden and 4 visible neurons

The Boltzmann machine is a type of stochastic recurrent neural network invented by Jeffrey Hinton and in 1985 ^[1] . The Boltzmann machine can be considered as a stochastic generative version of the Hopfield network .

Statisticians call such networks random Markov fields . The network is called the Boltzmann machine in honor of the Austrian physicist Ludwig Boltzmann , one of the creators of statistical physics .

This network uses an annealing simulation algorithm for training and turned out to be the first neural network capable of learning internal representations and solving complex combinatorial problems. Despite this, due to a number of problems, Boltzmann machines with unlimited connectivity cannot be used to solve practical problems. If connectivity is limited, then training can be quite effective for practical use. In particular, the so-called deep trust network is built from a cascade of limited Boltzmann machines.

Content

Model

Like the Hopfield network, the Boltzmann machine is a network of neurons with the concept of “energy” defined for it. The calculation of global energy is made identical in form to the Hopfield network in the following way: ^[2]

E=-\sum _{i<j}w_{ij}\,s_{i}\,s_{j}-\sum _{i}\theta _{i}\,s_{i}

{\ displaystyle E = - \ sum _ {i <j} w_ {ij} \, s_ {i} \, s_ {j} - \ sum _ {i} \ theta _ {i} \, s_ {i}}

Where:

$w_{ij}$ ${\ displaystyle w_ {ij}}$ communication strength between neurons $j$ ${\ displaystyle j}$ and $i$ ${\ displaystyle i}$ .
$s_{i}$ ${\ displaystyle s_ {i}}$ state , $s_{i}\in \{0,1\}$ ${\ displaystyle s_ {i} \ in \ {0,1 \}}$ neuron $i$ ${\ displaystyle i}$ .
$\theta _{i}$ ${\ displaystyle \ theta _ {i}}$ threshold for neuron $i$ ${\ displaystyle i}$ .

Links have the following limitations:

$w_{ii}=0\qquad \forall i$ ${\ displaystyle w_ {ii} = 0 \ qquad \ forall i}$ . (a neuron cannot communicate with itself);
$w_{ij}=w_{ji}\qquad \forall i,j$ ${\ displaystyle w_ {ij} = w_ {ji} \ qquad \ forall i, j}$ (all connections are symmetrical).

Thermal balance

One of the main drawbacks of the Hopfield network is the tendency to “stabilize” the state of the network in a local rather than a global minimum. It is practically desirable that the network goes into deep energy minima more often than shallow ones, and that the relative probability of the network going into one of two minima with different energies depends only on the ratio of their depths. This would make it possible to control the probabilities of obtaining specific output state vectors by changing the profile of the energy surface of the system by modifying the bond weights. Based on these considerations, the Boltzmann machine was built.

The idea of using “thermal noise” to get out of local lows and increase the probability of falling into deeper lows belongs to S. Kirpatrick. Based on this idea, an annealing simulation algorithm is developed.

We introduce some parameter $t$ ${\ displaystyle t}$ - An analogue of the level of thermal noise. Then the activity probability of some neuron $k$ ${\ displaystyle k}$ is determined based on the Boltzmann probability function:

Pk=1/(1+e^{-E_{k}/t}),

{\ displaystyle Pk = 1 / (1 + e ^ {- E_ {k} / t}),}

Where $t$ ${\ displaystyle t}$ - level of thermal noise in the network; $E_{k}$ ${\ displaystyle E_ {k}}$ - sum of bond weights $k$ ${\ displaystyle k}$ -th neuron with all currently active neurons.

Limited Boltzmann Machine

Although the training possibilities of the Boltzmann machine are limited in practice, these problems can be solved by applying the architecture of the restricted Boltzmann machine (RBM). In this architecture, communications exist only between hidden and visible neurons, but are absent between neurons of the same class. This architecture was originally used by Paul Smolensky in 1986 under the name Harmonium ^[3] , but gained popularity only after Hinton invented fast learning algorithms in the mid-2000s.

Limited Boltzmann machines are used in deep learning networks. In particular, deep trust networks can be obtained by overlaying RBM and further training using the error back propagation algorithm.

Notes

↑ Ackley, David H .; Hinton, Geoffrey E .; Sejnowski, Terrence J. A Learning Algorithm for Boltzmann Machines. - Cognitive Science 9 (1), 1985. - S. 147–169.
↑ Loskutov A. Yu. , Mikhailov A. S. Introduction to synergetics. - M., Science, 1990. - ISBN 5-02-014475-4 . - with. 233-237
↑ Smolensky, Paul. Chapter 6: Information Processing in Dynamical Systems: Foundations of Harmony Theory // Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations . - MIT Press, 1986. - P. 194–281. - ISBN 0-262-68053-X . Archived June 13, 2013 on Wayback Machine

Links

Talk at Google by Geoffrey Hinton

[1] Ackley, David H .; Hinton, Geoffrey E .; Sejnowski, Terrence J. A Learning Algorithm for Boltzmann Machines. - Cognitive Science 9 (1), 1985. - S. 147–169.

[2] Loskutov A. Yu. , Mikhailov A. S. Introduction to synergetics. - M., Science, 1990. - ISBN 5-02-014475-4 . - with. 233-237

[3] Smolensky, Paul. Chapter 6: Information Processing in Dynamical Systems: Foundations of Harmony Theory // Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations . - MIT Press, 1986. - P. 194–281. - ISBN 0-262-68053-X . Archived June 13, 2013 on Wayback Machine