Conjugate Gradient Method

The conjugate gradient method is a method of finding the local extremum of a function based on information about its values and its gradient . In the case of a quadratic function in $\mathbb {R} ^{n}$ ${\ displaystyle \ mathbb {R} ^ {n}}$ $\ mathbb {R} ^ {n}$ the minimum is no more than $n$ ${\ displaystyle n}$ $n$ steps.

Key Concepts

Define the terminology:

Let be ${\vec {S_{1}}},\ldots ,{\vec {S_{n}}}\in \mathbb {X} \subset \mathbb {R} ^{n}$ ${\ displaystyle {\ vec {S_ {1}}}, \ ldots, {\ vec {S_ {n}}} \ in \ mathbb {X} \ subset \ mathbb {R} ^ {n}}$ ${\vec {S_{1}}},\ldots ,{\vec {S_{n}}}\in \mathbb {X} \subset \mathbb {R} ^{n}$ .

We introduce on $\mathbb {X}$ ${\ displaystyle \ mathbb {X}}$ $\mathbb{X}$ target function $f({\vec {x}})\in \mathrm {C^{2}} (\mathbb {X} )$ ${\ displaystyle f ({\ vec {x}}) \ in \ mathrm {C ^ {2}} (\ mathbb {X})}$ $f({\vec {x}})\in \mathrm {C^{2}} (\mathbb {X} )$ .

Vectors ${\vec {S_{1}}},\ldots ,{\vec {S_{n}}}$ ${\ displaystyle {\ vec {S_ {1}}}, \ ldots, {\ vec {S_ {n}}}}$ ${\vec {S_{1}}},\ldots ,{\vec {S_{n}}}$ are called conjugate if:

${\vec {S_{i}}}^{T}H{\vec {S_{j}}}=0,\quad i\neq j,\quad i,j=1,\ldots ,n$ ${\ displaystyle {\ vec {S_ {i}}} ^ {T} H {\ vec {S_ {j}}} = 0, \ quad i \ neq j, \ quad i, j = 1, \ ldots, n }$ ${\vec {S_{i}}}^{T}H{\vec {S_{j}}}=0,\quad i\neq j,\quad i,j=1,\ldots ,n$
${\vec {S_{i}}}^{T}H{\vec {S_{i}}}\geqslant 0,\quad i=1,\ldots ,n$ ${\ displaystyle {\ vec {S_ {i}}} ^ {T} H {\ vec {S_ {i}}} \ geqslant 0, \ quad i = 1, \ ldots, n}$ ${\vec {S_{i}}}^{T}H{\vec {S_{i}}}\geqslant 0,\quad i=1,\ldots ,n$

Where $H$ ${\ displaystyle H}$ $H$ - Hessian matrix $f({\vec {x}})$ ${\ displaystyle f ({\ vec {x}})}$ $f({\vec {x}})$ .

Theorem ( on existence ).
There is at least one system

n

{\ displaystyle n}

n

conjugate directions for matrix

H

{\ displaystyle H}

H

because matrix itself

H

{\ displaystyle H}

H

(its own vectors ) is such a system.

Justification of the method

Zero iteration

Illustration of successive approximations of the steepest descent method (green broken line) and the conjugate gradient method (red broken line) to the extremum point.

Let be ${\vec {S_{0}}}=-\nabla f({\vec {x_{0}}})\qquad (1)$ ${\ displaystyle {\ vec {S_ {0}}} = - \ nabla f ({\ vec {x_ {0}}} \ qquad (1)}$ ${\vec {S_{0}}}=-\nabla f({\vec {x_{0}}})\qquad (1)$

Then ${\vec {x_{1}}}={\vec {x_{0}}}+\lambda _{1}{\vec {S_{0}}}\qquad$ ${\ displaystyle {\ vec {x_ {1}}} = {\ vec {x_ {0}}} + \ lambda _ {1} {\ vec {S_ {0}}} \ qquad}$ ${\vec {x_{1}}}={\vec {x_{0}}}+\lambda _{1}{\vec {S_{0}}}\qquad$ .

Define the direction

${\vec {S_{1}}}=-\nabla f({\vec {x_{1}}})+\omega _{1}{\vec {S_{0}}}\ \qquad (2)$ ${\ displaystyle {\ vec {S_ {1}}} = - \ nabla f ({\ vec {x_ {1}}} + + omega _ {1} {\ vec {S_ {0}}} \ \ qquad (2)}$ ${\vec {S_{1}}}=-\nabla f({\vec {x_{1}}})+\omega _{1}{\vec {S_{0}}}\ \qquad (2)$

so that it is interfaced with ${\vec {S_{0}}}$ ${\ displaystyle {\ vec {S_ {0}}}}$ ${\vec {S_{0}}}$ :

{\vec {S_{0}}}^{T}H{\vec {S_{1}}}=0\qquad (3)

{\ displaystyle {\ vec {S_ {0}}} ^ {T} H {\ vec {S_ {1}}} = 0 \ qquad (3)}

{\vec {S_{0}}}^{T}H{\vec {S_{1}}}=0\qquad (3)

Decompose $\nabla f({\vec {x}})$ ${\ displaystyle \ nabla f ({\ vec {x}})}$ in the surrounding area ${\vec {x_{0}}}$ ${\ displaystyle {\ vec {x_ {0}}}}$ and substitute ${\vec {x}}={\vec {x_{1}}}$ ${\ displaystyle {\ vec {x}} = {\ vec {x_ {1}}}}$ :

\nabla f({\vec {x_{1}}})-\nabla f({\vec {x_{0}}})=H\,({\vec {x_{1}}}-{\vec {x_{0}}})=\lambda _{1}H{\vec {S_{0}}}

{\ displaystyle \ nabla f ({\ vec {x_ {1}}}) - \ nabla f ({\ vec {x_ {0}}} = H \, ({\ vec {x_ {1}}} - {\ vec {x_ {0}}}) = \ lambda _ {1} H {\ vec {S_ {0}}}}

Transpose the resulting expression and multiply by $H^{-1}$ ${\ displaystyle H ^ {- 1}}$ on right:

(\nabla f({\vec {x_{1}}})-\nabla f({\vec {x_{0}}}))^{T}H^{-1}=\lambda _{1}{\vec {S_{0}}}^{T}H^{T}H^{-1}

{\ displaystyle (\ nabla f ({\ vec {x_ {1}}}) - \ nabla f ({\ vec {x_ {0}}}) ^ {T} H ^ {- 1} = \ lambda _ {1} {\ vec {S_ {0}}} ^ {T} H ^ {T} H ^ {- 1}}

Due to the continuity of the second partial derivatives $H^{T}=H$ ${\ displaystyle H ^ {T} = H}$ . Then:

{\vec {S_{0}}}^{T}={\frac {(\nabla f({\vec {x_{1}}})-\nabla f({\vec {x_{0}}}))^{T}H^{-1}}{\lambda _{1}}}

{\ displaystyle {\ vec {S_ {0}}} ^ {T} = {\ frac {(\ nabla f ({\ vec {x_ {1}}}) - \ nabla f ({\ vec {x_ {0 }}})) ^ {T} H ^ {- 1}} {\ lambda _ {1}}}}

Substitute the resulting expression in (3):

{\frac {(\nabla f({\vec {x_{1}}})-\nabla f({\vec {x_{0}}}))^{T}H^{-1}H{\vec {S_{1}}}}{\lambda _{1}}}=0

{\ displaystyle {\ frac {(\ nabla f ({\ vec {x_ {1}}}) - \ nabla f ({\ vec {x_ {0}}}) ^ {T} H ^ {- 1} H {\ vec {S_ {1}}}} {\ lambda _ {1}}} = 0}

Then, using (1) and (2):

(\nabla f({\vec {x_{1}}})-\nabla f({\vec {x_{0}}}))^{T}(-\nabla f({\vec {x_{1}}})-\omega _{1}\nabla f({\vec {x_{0}}})))=0\qquad (4)

{\ displaystyle (\ nabla f ({\ vec {x_ {1}}}) - \ nabla f ({\ vec {x_ {0}}}) ^ {T} (- \ nabla f ({\ vec { x_ {1}}}) - \ omega _ {1} \ nabla f ({\ vec {x_ {0}}}))) = 0 \ qquad (4)}

If a $\lambda =\arg \min _{\lambda }f({\vec {x_{0}}}+\lambda {\vec {S_{0}}})$ ${\ displaystyle \ lambda = \ arg \ min _ {\ lambda} f ({\ vec {x_ {0}}} + \ lambda {\ vec {S_ {0}}}}}$ then the gradient at the point ${\vec {x_{1}}}={\vec {x_{0}}}+\lambda {\vec {S_{0}}}$ ${\ displaystyle {\ vec {x_ {1}}} = {\ vec {x_ {0}}} + \ lambda {\ vec {S_ {0}}}}$ perpendicular to the gradient at a point ${\vec {x_{0}}}$ ${\ displaystyle {\ vec {x_ {0}}}}$ , then according to the rules of scalar product of vectors :

(\nabla f({\vec {x_{0}}}),\nabla f({\vec {x_{1}}}))=0

{\ displaystyle (\ nabla f ({\ vec {x_ {0}}}), \ nabla f ({\ vec {x_ {1}}})) = 0}

Taking into account the latter, we obtain from expression (4) the final formula for calculating $\omega$ ${\ displaystyle \ omega}$ :

\omega _{1}={\frac {||\nabla f({\vec {x_{1}}})||^{2}}{||\nabla f({\vec {x_{0}}})||^{2}}}

{\ displaystyle \ omega _ {1} = {\ frac {|| \ nabla f ({\ vec {x_ {1}}}) || ^ {2}} {|| \ nabla f ({\ vec {x_ {0}}}) || ^ {2}}}}

Kth iteration

At the kth iteration, we have the set ${\vec {S_{0}}},\ldots ,{\vec {S_{k-1}}}$ ${\ displaystyle {\ vec {S_ {0}}}, \ ldots, {\ vec {S_ {k-1}}}}$ .

Then the following direction is calculated by the formula:

{\vec {S_{k}}}=-\nabla f({\vec {x_{k}}})-\|\nabla f({\vec {x_{k}}})\|^{2}{\cdot }\left({\frac {\nabla f({\vec {x}}_{k-1})}{\|\nabla f({\vec {x}}_{k-1})\|^{2}}}+\ldots +{\frac {\nabla f({\vec {x_{0}}})}{\|\nabla f({\vec {x}}_{0})\|^{2}}}\right)

{\ displaystyle {\ vec {S_ {k}}} = - \ nabla f ({\ vec {x_ {k}}} - \ | \ nabla f ({\ vec {x_ {k}}}) \ | ^ {2} {\ cdot} \ left ({\ frac {\ nabla f ({\ vec {x}} _ {k-1})} {\ | \ nabla f ({\ vec {x}} _ { k-1}) \ | ^ {2}}} + \ ldots + {\ frac {\ nabla f ({\ vec {x_ {0}}})} {\ | \ nabla f ({\ vec {x} } _ {0}) \ | ^ {2}}} \ right)}

This expression can be rewritten in a more convenient iterative form:

{\vec {S_{k}}}=-\nabla f({\vec {x_{k}}})+\omega _{k}{\vec {S}}_{k-1},\qquad \omega _{i}={\frac {\|\nabla f({\vec {x_{i}}})\|^{2}}{\|\nabla f({\vec {x}}_{i-1})\|^{2}}},

{\ displaystyle {\ vec {S_ {k}}} = - \ nabla f ({\ vec {x_ {k}}} + + omega _ {k} {\ vec {S}} _ {k-1} , \ qquad \ omega _ {i} = {\ frac {\ | \ nabla f ({\ vec {x_ {i}}}) \ | ^ {2}} {\ | \ nabla f ({\ vec {x }} _ {i-1}) \ | ^ {2}}},}

Where $\omega _{k}$ ${\ displaystyle \ omega _ {k}}$ directly calculated at the kth iteration.

Algorithm

Let be ${\vec {x}}_{0}$ ${\ displaystyle {\ vec {x}} _ {0}}$ - starting point, ${\vec {r}}_{0}$ ${\ displaystyle {\ vec {r}} _ {0}}$ - the direction of the anti-gradient and we are trying to find the minimum function $f({\vec {x}})$ ${\ displaystyle f ({\ vec {x}})}$ . Put ${\vec {S}}_{0}={\vec {r}}_{0}$ ${\ displaystyle {\ vec {S}} _ {0} = {\ vec {r}} _ {0}}$ and find the minimum along the direction ${\vec {S}}_{0}$ ${\ displaystyle {\ vec {S}} _ {0}}$ . Denote the minimum point ${\vec {x}}_{1}$ ${\ displaystyle {\ vec {x}} _ {1}}$ .

Let at some step we are at the point ${\vec {x}}_{k}$ ${\ displaystyle {\ vec {x}} _ {k}}$ , and ${\vec {r}}_{k}$ ${\ displaystyle {\ vec {r}} _ {k}}$ - direction of the anti-gradient. Put ${\vec {S}}_{k}={\vec {r}}_{k}+\omega _{k}{\vec {S}}_{k-1}$ ${\ displaystyle {\ vec {S}} _ {k} = {\ vec {r}} _ {k} + \ omega _ {k} {\ vec {S}} _ {k-1}}$ where $\omega _{k}$ ${\ displaystyle \ omega _ {k}}$ choose either ${\frac {({\vec {r}}_{k},{\vec {r}}_{k})}{({\vec {r}}_{k-1},{\vec {r}}_{k-1})}}$ ${\ displaystyle {\ frac {({\ vec {r}} _ {k}, {\ vec {r}} _ {k})} {({\ vec {r}} _ {k-1}, { \ vec {r}} _ {k-1})}}}$ (standard algorithm - Fletcher-Reeves, for quadratic functions with $H>0$ ${\ displaystyle H> 0}$ ), or $\max(0,{\frac {({\vec {r}}_{k},{\vec {r}}_{k}-{\vec {r}}_{k-1})}{({\vec {r}}_{k-1},{\vec {r}}_{k-1})}})$ ${\ displaystyle \ max (0, {\ frac {({\ vec {r}} _ {k}, {\ vec {r}} _ {k} - {\ vec {r}} _ {k-1} )} {({\ vec {r}} _ {k-1}, {\ vec {r}} _ {k-1})}})}$ ( Polak – Ryber algorithm ). Then we find a minimum in the direction ${\vec {S_{k}}}$ ${\ displaystyle {\ vec {S_ {k}}}}$ and denote the minimum point ${\vec {x}}_{k+1}$ ${\ displaystyle {\ vec {x}} _ {k + 1}}$ . If the function does not decrease in the calculated direction, then you need to forget the previous direction by setting $\omega _{k}=0$ ${\ displaystyle \ omega _ {k} = 0}$ and repeating the step.

Formalization

Set by the initial approximation and error: ${\vec {x}}_{0},\quad \varepsilon ,\quad k=0$ ${\ displaystyle {\ vec {x}} _ {0}, \ quad \ varepsilon, \ quad k = 0}$
The initial direction is calculated: $j=0,\quad {\vec {S}}_{k}^{j}=-\nabla f({\vec {x}}_{k}),\quad {\vec {x}}_{k}^{j}={\vec {x}}_{k}$ ${\ displaystyle j = 0, \ quad {\ vec {S}} _ {k} ^ {j} = - \ nabla f ({\ vec {x}} _ {k}), \ quad {\ vec {x }} _ {k} ^ {j} = {\ vec {x}} _ {k}}$
$x$ $\to$ $k$ $j$ $+$ $one$ $=$ $x$ $\to$ $k$ $j$ $+$ $λ$ $S$ $\to$ $k$ $j$ $,$ $λ$ $=$ $arg$ $min$ $λ$ $f$ $($ $x$ $\to$ $k$ $j$ $+$ $λ$ $S$ $\to$ $k$ $j$ $)$ $,$ $S$ $\to$ $k$ $j$ $+$ $one$ $=$ $-$ $\nabla$ $f$ $($ $x$ $\to$ $k$ $j$ $+$ $one$ $)$ $+$ $ω$ $S$ $\to$ $k$ $j$ $,$ $ω$ $=$ $|$ $|$ $\nabla$ $f$ $($ $x$ $\to$ $k$ $j$ $+$ $one$ $)$ $|$ $|$ $2$ $|$ $|$ $\nabla$ $f$ $($ $x$ $\to$ $k$ $j$ $)$ $|$ $|$ $2$ {\ displaystyle {\ vec {x}} _ {k} ^ {j + 1} = {\ vec {x}} _ {k} ^ {j} + \ lambda {\ vec {S}} _ {k} ^ {j}, \ quad \ lambda = \ arg \ min _ {\ lambda} f ({\ vec {x}} _ {k} ^ {j} + \ lambda {\ vec {S}} _ {k} ^ {j}), \ quad {\ vec {S}} _ {k} ^ {j + 1} = - \ nabla f ({\ vec {x}} _ {k} ^ {j + 1}) + \ omega {\ vec {S}} _ {k} ^ {j}, \ quad \ omega = {\ frac {|| \ nabla f ({\ vec {x}} _ {k} ^ {j + 1} ) || ^ {2}} {|| \ nabla f ({\ vec {x}} _ {k} ^ {j}) || ^ {2}}}}
- If a $||{\vec {S}}_{k}^{j+1}||<\varepsilon$ ${\ displaystyle || {\ vec {S}} _ {k} ^ {j + 1} || <\ varepsilon}$ or $||{\vec {x}}_{k}^{j+1}-{\vec {x}}_{k}^{j}||<\varepsilon$ ${\ displaystyle || {\ vec {x}} _ {k} ^ {j + 1} - {\ vec {x}} _ {k} ^ {j} || <\ varepsilon}$ then ${\vec {x}}={\vec {x}}_{k}^{j+1}$ ${\ displaystyle {\ vec {x}} = {\ vec {x}} _ {k} ^ {j + 1}}$ and stop.
- Otherwise
  - if a $(j+1)<n$ ${\ displaystyle (j + 1) <n}$ then $j=j+1$ ${\ displaystyle j = j + 1}$ and transition to 3;
  - otherwise ${\vec {x}}_{k+1}={\vec {x}}_{k}^{j+1},\quad k=k+1$ ${\ displaystyle {\ vec {x}} _ {k + 1} = {\ vec {x}} _ {k} ^ {j + 1}, \ quad k = k + 1}$ and go to 2.

The case of a quadratic function

Theorem.
If conjugate directions are used to find the minimum of a quadratic function, then this function can be minimized in

n

{\ displaystyle n}

steps, one in each direction, and the order is not significant.

Literature

Akulich I. L. Mathematical programming in examples and tasks: Textbook. allowance for students econom. specialist. universities. - M .: Higher. school., 1986.
Gill F., Murray W., Wright M. Practical Optimization. Per. from English - M .: World, 1985.
Korshunov Yu. M., Korshunov Yu. M. Mathematical foundations of cybernetics. - M .: Energoatomizdat, 1972.
Maksimov Yu. A., Filipovskaya E. A. Algorithms for solving nonlinear programming problems. - M .: MEPhI, 1982.
Maksimov Yu. A. Linear and discrete programming algorithms. - M .: MEPhI, 1980.
Korn G., Korn T. Handbook of mathematics for scientists and engineers. - M .: Nauka, 1970 .-- S. 575-576.