Clever Geek Handbook
📜 ⬆️ ⬇️

Conjugate Gradient Method

The conjugate gradient method is a method of finding the local extremum of a function based on information about its values ​​and its gradient . In the case of a quadratic function inRn {\ displaystyle \ mathbb {R} ^ {n}} \ mathbb {R} ^ {n} the minimum is no more thann {\ displaystyle n} n steps.

Key Concepts

Define the terminology:

Let beSone→,...,Sn→∈X⊂Rn {\ displaystyle {\ vec {S_ {1}}}, \ ldots, {\ vec {S_ {n}}} \ in \ mathbb {X} \ subset \ mathbb {R} ^ {n}} {\displaystyle {\vec {S_{1}}},\ldots ,{\vec {S_{n}}}\in \mathbb {X} \subset \mathbb {R} ^{n}} .

We introduce onX {\ displaystyle \ mathbb {X}} \mathbb{X} target functionf(x→)∈C2(X) {\ displaystyle f ({\ vec {x}}) \ in \ mathrm {C ^ {2}} (\ mathbb {X})} {\displaystyle f({\vec {x}})\in \mathrm {C^{2}} (\mathbb {X} )} .

VectorsSone→,...,Sn→ {\ displaystyle {\ vec {S_ {1}}}, \ ldots, {\ vec {S_ {n}}}} {\displaystyle {\vec {S_{1}}},\ldots ,{\vec {S_{n}}}} are called conjugate if:

  • Si→THSj→=0,i≠j,i,j=one,...,n{\ displaystyle {\ vec {S_ {i}}} ^ {T} H {\ vec {S_ {j}}} = 0, \ quad i \ neq j, \ quad i, j = 1, \ ldots, n } {\displaystyle {\vec {S_{i}}}^{T}H{\vec {S_{j}}}=0,\quad i\neq j,\quad i,j=1,\ldots ,n}
  • Si→THSi→⩾0,i=one,...,n{\ displaystyle {\ vec {S_ {i}}} ^ {T} H {\ vec {S_ {i}}} \ geqslant 0, \ quad i = 1, \ ldots, n} {\displaystyle {\vec {S_{i}}}^{T}H{\vec {S_{i}}}\geqslant 0,\quad i=1,\ldots ,n}

WhereH {\ displaystyle H} H - Hessian matrixf(x→) {\ displaystyle f ({\ vec {x}})} f({\vec  {x}}) .

Logo arte.jpgTheorem ( on existence ).
There is at least one systemn {\ displaystyle n} n conjugate directions for matrixH {\ displaystyle H} H because matrix itselfH {\ displaystyle H} H (its own vectors ) is such a system.

Justification of the method

Zero iteration

Illustration of successive approximations of the steepest descent method (green broken line) and the conjugate gradient method (red broken line) to the extremum point.

Let beS0→=-∇f(x0→)(one) {\ displaystyle {\ vec {S_ {0}}} = - \ nabla f ({\ vec {x_ {0}}} \ qquad (1)} {\displaystyle {\vec {S_{0}}}=-\nabla f({\vec {x_{0}}})\qquad (1)}

Thenxone→=x0→+λoneS0→ {\ displaystyle {\ vec {x_ {1}}} = {\ vec {x_ {0}}} + \ lambda _ {1} {\ vec {S_ {0}}} \ qquad} {\displaystyle {\vec {x_{1}}}={\vec {x_{0}}}+\lambda _{1}{\vec {S_{0}}}\qquad } .

Define the direction

Sone→=-∇f(xone→)+ωoneS0→(2){\ displaystyle {\ vec {S_ {1}}} = - \ nabla f ({\ vec {x_ {1}}} + + omega _ {1} {\ vec {S_ {0}}} \ \ qquad (2)} {\displaystyle {\vec {S_{1}}}=-\nabla f({\vec {x_{1}}})+\omega _{1}{\vec {S_{0}}}\ \qquad (2)}

so that it is interfaced withS0→ {\ displaystyle {\ vec {S_ {0}}}} {\displaystyle {\vec {S_{0}}}} :

S0→THSone→=0(3){\ displaystyle {\ vec {S_ {0}}} ^ {T} H {\ vec {S_ {1}}} = 0 \ qquad (3)} {\displaystyle {\vec {S_{0}}}^{T}H{\vec {S_{1}}}=0\qquad (3)}

Decompose∇f(x→) {\ displaystyle \ nabla f ({\ vec {x}})}   in the surrounding areax0→ {\ displaystyle {\ vec {x_ {0}}}}   and substitutex→=xone→ {\ displaystyle {\ vec {x}} = {\ vec {x_ {1}}}}   :

∇f(xone→)-∇f(x0→)=H(xone→-x0→)=λoneHS0→{\ displaystyle \ nabla f ({\ vec {x_ {1}}}) - \ nabla f ({\ vec {x_ {0}}} = H \, ({\ vec {x_ {1}}} - {\ vec {x_ {0}}}) = \ lambda _ {1} H {\ vec {S_ {0}}}}  

Transpose the resulting expression and multiply byH-one {\ displaystyle H ^ {- 1}}   on right:

(∇f(xone→)-∇f(x0→))TH-one=λoneS0→THTH-one{\ displaystyle (\ nabla f ({\ vec {x_ {1}}}) - \ nabla f ({\ vec {x_ {0}}}) ^ {T} H ^ {- 1} = \ lambda _ {1} {\ vec {S_ {0}}} ^ {T} H ^ {T} H ^ {- 1}}  

Due to the continuity of the second partial derivativesHT=H {\ displaystyle H ^ {T} = H}   . Then:

S0→T=(∇f(xone→)-∇f(x0→))TH-oneλone{\ displaystyle {\ vec {S_ {0}}} ^ {T} = {\ frac {(\ nabla f ({\ vec {x_ {1}}}) - \ nabla f ({\ vec {x_ {0 }}})) ^ {T} H ^ {- 1}} {\ lambda _ {1}}}}  

Substitute the resulting expression in (3):

(∇f(xone→)-∇f(x0→))TH-oneHSone→λone=0{\ displaystyle {\ frac {(\ nabla f ({\ vec {x_ {1}}}) - \ nabla f ({\ vec {x_ {0}}}) ^ {T} H ^ {- 1} H {\ vec {S_ {1}}}} {\ lambda _ {1}}} = 0}  

Then, using (1) and (2):

(∇f(xone→)-∇f(x0→))T(-∇f(xone→)-ωone∇f(x0→)))=0(four){\ displaystyle (\ nabla f ({\ vec {x_ {1}}}) - \ nabla f ({\ vec {x_ {0}}}) ^ {T} (- \ nabla f ({\ vec { x_ {1}}}) - \ omega _ {1} \ nabla f ({\ vec {x_ {0}}}))) = 0 \ qquad (4)}  

If aλ=arg⁡minλf(x0→+λS0→) {\ displaystyle \ lambda = \ arg \ min _ {\ lambda} f ({\ vec {x_ {0}}} + \ lambda {\ vec {S_ {0}}}}}   then the gradient at the pointxone→=x0→+λS0→ {\ displaystyle {\ vec {x_ {1}}} = {\ vec {x_ {0}}} + \ lambda {\ vec {S_ {0}}}}   perpendicular to the gradient at a pointx0→ {\ displaystyle {\ vec {x_ {0}}}}   , then according to the rules of scalar product of vectors :

(∇f(x0→),∇f(xone→))=0{\ displaystyle (\ nabla f ({\ vec {x_ {0}}}), \ nabla f ({\ vec {x_ {1}}})) = 0}  

Taking into account the latter, we obtain from expression (4) the final formula for calculatingω {\ displaystyle \ omega}   :

ωone=||∇f(xone→)||2||∇f(x0→)||2{\ displaystyle \ omega _ {1} = {\ frac {|| \ nabla f ({\ vec {x_ {1}}}) || ^ {2}} {|| \ nabla f ({\ vec {x_ {0}}}) || ^ {2}}}}  

Kth iteration

At the kth iteration, we have the setS0→,...,Sk-one→ {\ displaystyle {\ vec {S_ {0}}}, \ ldots, {\ vec {S_ {k-1}}}}   .

Then the following direction is calculated by the formula:

Sk→=-∇f(xk→)-‖∇f(xk→)‖2⋅(∇f(x→k-one)‖∇f(x→k-one)‖2+...+∇f(x0→)‖∇f(x→0)‖2){\ displaystyle {\ vec {S_ {k}}} = - \ nabla f ({\ vec {x_ {k}}} - \ | \ nabla f ({\ vec {x_ {k}}}) \ | ^ {2} {\ cdot} \ left ({\ frac {\ nabla f ({\ vec {x}} _ {k-1})} {\ | \ nabla f ({\ vec {x}} _ { k-1}) \ | ^ {2}}} + \ ldots + {\ frac {\ nabla f ({\ vec {x_ {0}}})} {\ | \ nabla f ({\ vec {x} } _ {0}) \ | ^ {2}}} \ right)}  

This expression can be rewritten in a more convenient iterative form:

Sk→=-∇f(xk→)+ωkS→k-one,ωi=‖∇f(xi→)‖2‖∇f(x→i-one)‖2,{\ displaystyle {\ vec {S_ {k}}} = - \ nabla f ({\ vec {x_ {k}}} + + omega _ {k} {\ vec {S}} _ {k-1} , \ qquad \ omega _ {i} = {\ frac {\ | \ nabla f ({\ vec {x_ {i}}}) \ | ^ {2}} {\ | \ nabla f ({\ vec {x }} _ {i-1}) \ | ^ {2}}},}  

Whereωk {\ displaystyle \ omega _ {k}}   directly calculated at the kth iteration.

Algorithm

  • Let bex→0 {\ displaystyle {\ vec {x}} _ {0}}   - starting point,r→0 {\ displaystyle {\ vec {r}} _ {0}}   - the direction of the anti-gradient and we are trying to find the minimum functionf(x→) {\ displaystyle f ({\ vec {x}})}   . PutS→0=r→0 {\ displaystyle {\ vec {S}} _ {0} = {\ vec {r}} _ {0}}   and find the minimum along the directionS→0 {\ displaystyle {\ vec {S}} _ {0}}   . Denote the minimum pointx→one {\ displaystyle {\ vec {x}} _ {1}}   .
  • Let at some step we are at the pointx→k {\ displaystyle {\ vec {x}} _ {k}}   , andr→k {\ displaystyle {\ vec {r}} _ {k}}   - direction of the anti-gradient. PutS→k=r→k+ωkS→k-one {\ displaystyle {\ vec {S}} _ {k} = {\ vec {r}} _ {k} + \ omega _ {k} {\ vec {S}} _ {k-1}}   whereωk {\ displaystyle \ omega _ {k}}   choose either(r→k,r→k)(r→k-one,r→k-one) {\ displaystyle {\ frac {({\ vec {r}} _ {k}, {\ vec {r}} _ {k})} {({\ vec {r}} _ {k-1}, { \ vec {r}} _ {k-1})}}}   (standard algorithm - Fletcher-Reeves, for quadratic functions withH>0 {\ displaystyle H> 0}   ), ormax(0,(r→k,r→k-r→k-one)(r→k-one,r→k-one)) {\ displaystyle \ max (0, {\ frac {({\ vec {r}} _ {k}, {\ vec {r}} _ {k} - {\ vec {r}} _ {k-1} )} {({\ vec {r}} _ {k-1}, {\ vec {r}} _ {k-1})}})}   ( Polak – Ryber algorithm ). Then we find a minimum in the directionSk→ {\ displaystyle {\ vec {S_ {k}}}}   and denote the minimum pointx→k+one {\ displaystyle {\ vec {x}} _ {k + 1}}   . If the function does not decrease in the calculated direction, then you need to forget the previous direction by settingωk=0 {\ displaystyle \ omega _ {k} = 0}   and repeating the step.

Formalization

  1. Set by the initial approximation and error:x→0,ε,k=0 {\ displaystyle {\ vec {x}} _ {0}, \ quad \ varepsilon, \ quad k = 0}  
  2. The initial direction is calculated:j=0,S→kj=-∇f(x→k),x→kj=x→k {\ displaystyle j = 0, \ quad {\ vec {S}} _ {k} ^ {j} = - \ nabla f ({\ vec {x}} _ {k}), \ quad {\ vec {x }} _ {k} ^ {j} = {\ vec {x}} _ {k}}  
  3. x→kj+one=x→kj+λS→kj,λ=arg⁡minλf(x→kj+λS→kj),S→kj+one=-∇f(x→kj+one)+ωS→kj,ω=||∇f(x→kj+one)||2||∇f(x→kj)||2{\ displaystyle {\ vec {x}} _ {k} ^ {j + 1} = {\ vec {x}} _ {k} ^ {j} + \ lambda {\ vec {S}} _ {k} ^ {j}, \ quad \ lambda = \ arg \ min _ {\ lambda} f ({\ vec {x}} _ {k} ^ {j} + \ lambda {\ vec {S}} _ {k} ^ {j}), \ quad {\ vec {S}} _ {k} ^ {j + 1} = - \ nabla f ({\ vec {x}} _ {k} ^ {j + 1}) + \ omega {\ vec {S}} _ {k} ^ {j}, \ quad \ omega = {\ frac {|| \ nabla f ({\ vec {x}} _ {k} ^ {j + 1} ) || ^ {2}} {|| \ nabla f ({\ vec {x}} _ {k} ^ {j}) || ^ {2}}}}  
    • If a||S→kj+one||<ε {\ displaystyle || {\ vec {S}} _ {k} ^ {j + 1} || <\ varepsilon}   or||x→kj+one-x→kj||<ε {\ displaystyle || {\ vec {x}} _ {k} ^ {j + 1} - {\ vec {x}} _ {k} ^ {j} || <\ varepsilon}   thenx→=x→kj+one {\ displaystyle {\ vec {x}} = {\ vec {x}} _ {k} ^ {j + 1}}   and stop.
    • Otherwise
      • if a(j+one)<n {\ displaystyle (j + 1) <n}   thenj=j+one {\ displaystyle j = j + 1}   and transition to 3;
      • otherwisex→k+one=x→kj+one,k=k+one {\ displaystyle {\ vec {x}} _ {k + 1} = {\ vec {x}} _ {k} ^ {j + 1}, \ quad k = k + 1}   and go to 2.

The case of a quadratic function

 Theorem.
If conjugate directions are used to find the minimum of a quadratic function, then this function can be minimized inn {\ displaystyle n}   steps, one in each direction, and the order is not significant.

Literature

  1. Akulich I. L. Mathematical programming in examples and tasks: Textbook. allowance for students econom. specialist. universities. - M .: Higher. school., 1986.
  2. Gill F., Murray W., Wright M. Practical Optimization. Per. from English - M .: World, 1985.
  3. Korshunov Yu. M., Korshunov Yu. M. Mathematical foundations of cybernetics. - M .: Energoatomizdat, 1972.
  4. Maksimov Yu. A., Filipovskaya E. A. Algorithms for solving nonlinear programming problems. - M .: MEPhI, 1982.
  5. Maksimov Yu. A. Linear and discrete programming algorithms. - M .: MEPhI, 1980.
  6. Korn G., Korn T. Handbook of mathematics for scientists and engineers. - M .: Nauka, 1970 .-- S. 575-576.
Source - https://ru.wikipedia.org/w/index.php?title=Conjugate_Gradients_Method&oldid=94287375


More articles:

  • John (Kistrusky)
  • Karachinsky, Anatoly Mikhailovich
  • Water (Ipatovsky urban district)
  • Air Nigeria
  • Saddick, Yvonne
  • French Justice Minister
  • Corporal Punishment in Singapore
  • Gorlinka (Stavropol Territory)
  • Egart, Mark Moiseevich
  • Dontsovo (Stavropol Territory)

All articles

Clever Geek | 2019