Clever Geek Handbook
📜 ⬆️ ⬇️

Subgradient methods

Subgradient methods are iterative methods for solving convex minimization problems. Subgradient methods developed by Naum Zuselevich Shor and others in the 1960s and 1970s converge even if applied to undifferentiated objective functions. When a function is differentiable, subgradient methods for tasks without restrictions use the same direction of search as the steepest descent method.

Subgradient methods are slower than Newton methods when they are used to minimize twice continuously differentiable convex functions. However, Newton's methods cease to converge on problems that have undifferentiable bends.

In recent years, some internal point methods have been proposed for convex minimization problems, but also subgradient projection methods and associated beam descent methods remain competitive. For convex minimization problems with a large number of dimensions, subgradient projection methods are acceptable, since they require a small memory size.

Subgradient projection methods are often applied to large tasks using decomposition techniques. Such decomposition methods often allow a simple distributed method for the task.

Content

  • 1 The rules of the classic subgradient
    • 1.1 Step size rules
    • 1.2 Convergence
  • 2 Sub-gradient projections and beam methods
  • 3 Optimization with restrictions
    • 3.1 Subgradient projection method
    • 3.2 General restrictions
  • 4 notes
  • 5 Literature
  • 6 Literature for further reading
  • 7 References

Classic Sub-Gradient Rules

Let bef:Rn→R {\ displaystyle f: \ mathbb {R} ^ {n} \ to \ mathbb {R}} {\displaystyle f:\mathbb {R} ^{n}\to \mathbb {R} } will be a convex function with scopeRn {\ displaystyle \ mathbb {R} ^ {n}} \mathbb {R} ^{n} . The classic subgradient method iterates

x( k + one ) = x ( k ) - α k g ( k ){\ displaystyle x ^ {(k + 1)} = x ^ {(k)} - \ alpha _ {k} g ^ {(k)} \} {\displaystyle x^{(k+1)}=x^{(k)}-\alpha _{k}g^{(k)}\ }

Whereg(k) {\ displaystyle g ^ {(k)}} {\displaystyle g^{(k)}} mean subdifferential functionf {\ displaystyle f \} f\ at the pointx(k) {\ displaystyle x ^ {(k)}} x^{{(k)}} , butx(k) {\ displaystyle x ^ {(k)}} x^{{(k)}} - kth iteration of the variablex {\ displaystyle x} x . Iff {\ displaystyle f \} {\displaystyle f\ } differentiable, then its only subgradient is the gradient∇f {\ displaystyle \ nabla f} \nabla f . It may happen that-g(k) {\ displaystyle -g ^ {(k)}} {\displaystyle -g^{(k)}} is not a decreasing direction forf {\ displaystyle f \} {\displaystyle f\ } at the pointx(k) {\ displaystyle x ^ {(k)}} x^{{(k)}} . We therefore listfbest {\ displaystyle f _ {\ rm {best}} \} {\displaystyle f_{\rm {best}}\ } , in which we store the found smallest values ​​of the objective function, that is,

fbest(k)=min{fbest(k-one),f(x(k))}.{\ displaystyle f _ {\ rm {best}} ^ {(k)} = \ min \ {f _ {\ rm {best}} ^ {(k-1)}, f (x ^ {(k)}) \ }.} {\displaystyle f_{\rm {best}}^{(k)}=\min\{f_{\rm {best}}^{(k-1)},f(x^{(k)})\}.}

Step Size Rules

Subgradient methods use a large number of different step size selection rules. Here we mention five classical rules for which evidence of convergence is known:

  • Constant step size,αk=α. {\ displaystyle \ alpha _ {k} = \ alpha.} {\displaystyle \alpha _{k}=\alpha .}
  • Constant stride length,αk=γ/‖g(k)‖2 {\ displaystyle \ alpha _ {k} = \ gamma / \ lVert g ^ {(k)} \ rVert _ {2}} {\displaystyle \alpha _{k}=\gamma /\lVert g^{(k)}\rVert _{2}} , what gives‖x(k+one)-x(k)‖2=γ. {\ displaystyle \ lVert x ^ {(k + 1)} - x ^ {(k)} \ rVert _ {2} = \ gamma.} {\displaystyle \lVert x^{(k+1)}-x^{(k)}\rVert _{2}=\gamma .}
  • Summable with a square, but not summable step size, i.e. any step size for which
αk⩾0,∑k=one∞αk2<∞,∑k=one∞αk=∞.{\ displaystyle \ alpha _ {k} \ geqslant 0, \ qquad \ sum _ {k = 1} ^ {\ infty} \ alpha _ {k} ^ {2} <\ infty, \ qquad \ sum _ {k = 1} ^ {\ infty} \ alpha _ {k} = \ infty.}  
  • Non-cumulative decreasing step size, i.e. any step satisfying
αk⩾0,limk→∞αk=0,∑k=one∞αk=∞.{\ displaystyle \ alpha _ {k} \ geqslant 0, \ qquad \ lim _ {k \ to \ infty} \ alpha _ {k} = 0, \ qquad \ sum _ {k = 1} ^ {\ infty} \ alpha _ {k} = \ infty.}  
  • Non-cumulative decreasing stride length, i.e.αk=γk/‖g(k)‖2 {\ displaystyle \ alpha _ {k} = \ gamma _ {k} / \ lVert g ^ {(k)} \ rVert _ {2}}   where
γk⩾0,limk→∞γk=0,∑k=one∞γk=∞.{\ displaystyle \ gamma _ {k} \ geqslant 0, \ qquad \ lim _ {k \ to \ infty} \ gamma _ {k} = 0, \ qquad \ sum _ {k = 1} ^ {\ infty} \ gamma _ {k} = \ infty.}  

For all five rules, the step size is determined "in advance", before the method starts. The step size does not depend on previous iterations. The “advance” step selection property for subgradient methods differs from the “in process” step selection rules used in methods for differentiable functions - many methods for minimizing differentiable functions satisfy Wolf conditions for convergence, where the step sizes depend on the current position of the point and the current direction of the search. A wide discussion of step selection rules for subgradient methods, including incremental versions, is given in the book of Bertsekas [1] , as well as in the book of Bertsekas, Nedich and Ozdaglar [2] .

Convergence

For a constant step length and scalable subgradients having a Euclidean norm equal to unity, the subgradient method approaches arbitrarily close to the minimum value, i.e.

limk→∞fbest(k)-f∗<ϵ{\ displaystyle \ lim _ {k \ to \ infty} f _ {\ rm {best}} ^ {(k)} - f ^ {*} <\ epsilon}   according to shore . [3] .

Classical subgradient methods have poor convergence and are no longer recommended for use [4] [5] . However, they are still used in specialized applications, because they are simple and easily adapt to special structures in order to use their features.

Subgradient Projections and Beam Methods

During the 1970s, Claude Lemechel and Phil Wolf proposed “beam methods” for descent for convex minimization problems [6] . The meaning of the term “beam methods” has changed a lot since then. Modern versions and a complete analysis of convergence were given by Kiel [7] . Modern beam methods often use the rules of to select the step size, which develop techniques from the method of "projection subgradient" Boris T. Polyak (1969). However, there are problems due to which often the beam methods give a small advantage over the methods of projecting subgradients [4] [5] .

Restricted Optimization

Subgradient Projection Method

One of the extensions of subgradient methods is the subgradient projection method , which solves the optimization problem with constraints

minimizef(x) {\ displaystyle f (x) \}   under conditions
x∈C{\ displaystyle x \ in {\ mathcal {C}}}  

WhereC {\ displaystyle {\ mathcal {C}}}   is a convex set . Subgradient Projection Method Iterates

x(k+one)=P(x(k)-αkg(k)){\ displaystyle x ^ {(k + 1)} = P \ left (x ^ {(k)} - \ alpha _ {k} g ^ {(k)} \ right)}  

WhereP {\ displaystyle P}   is a projection ontoC {\ displaystyle {\ mathcal {C}}}   , butg(k) {\ displaystyle g ^ {(k)}}   is any subgradientf {\ displaystyle f \}   at the pointx(k). {\ displaystyle x ^ {(k)}.}  

General restrictions

The subgradient method can be extended to solve the problem with constraints in the form of inequalities

minimizef0(x) {\ displaystyle f_ {0} (x) \}   under conditions
fi(x)⩽0,i=one,...,m{\ displaystyle f_ {i} (x) \ leqslant 0, \ quad i = 1, \ dots, m}  

where are the functionsfi {\ displaystyle f_ {i}}   convex. The algorithm takes the same form of the case without restriction.

x(k+one)=x(k)-αkg(k){\ displaystyle x ^ {(k + 1)} = x ^ {(k)} - \ alpha _ {k} g ^ {(k)} \}  

Whereαk>0 {\ displaystyle \ alpha _ {k}> 0}   is the step size, andg(k) {\ displaystyle g ^ {(k)}}   is a subgradient of the objective function or one of the constraint functions at the pointx. {\ displaystyle x. \}   Here

g(k)={∂f0(x)fi(x)⩽0∀i=one...m∂fj(x)∃j:fj(x)>0{\ displaystyle g ^ {(k)} = {\ begin {cases} \ partial f_ {0} (x) & f_ {i} (x) \ leqslant 0 \; \ forall i = 1 \ dots m \\\ partial f_ {j} (x) & \ exists j: f_ {j} (x)> 0 \ end {cases}}}  

Where∂f {\ displaystyle \ partial f}   mean subdifferential functionf {\ displaystyle f \}   . If the current point is valid, the algorithm uses the subgradient of the objective function. If the point is not valid, the algorithm selects the sub-gradient of any violated constraint.

Notes

  1. ↑ Bertsekas, 2015 .
  2. ↑ Bertsekas, Nedic, Ozdaglar, 2003 .
  3. ↑ The convergence of subgradient methods with a constant (scaled) step is affirmed in exercise 6.3.14 (a) of Bertsekas’s book (page 636) ( Bertsekas 1999 ) and he ascribes this result to Shor ( Shor 1985 )
  4. ↑ 1 2 Lemaréchal, 2001 , p. 112–156.
  5. ↑ 1 2 Kiwiel, Larsson, Lindberg, 2007 , p. 669–686.
  6. ↑ Bertsekas, 1999 .
  7. ↑ Kiwiel, 1985 , p. 362.

Literature

  • Dimitri P. Bertsekas . Convex Optimization Algorithms. - Second. - Belmont, MA .: Athena Scientific, 2015 .-- ISBN 978-1-886529-28-1 .
  • Dimitri P. Bertsekas, Angelia Nedic, Asuman Ozdaglar. Convex Analysis and Optimization. - Second. - Belmont, MA .: Athena Scientific, 2003. - ISBN 1-886529-45-0 .
  • Naum Z. Shor . Minimization Methods for Non-differentiable Functions. - Springer-Verlag , 1985. - ISBN 0-387-12763-1 .
  • Dimitri P. Bertsekas . Nonlinear Programming. - Second. - Cambridge, MA .: Athena Scientific, 1999 .-- ISBN 1-886529-00-0 .
  • Krzysztof Kiwiel. Methods of Descent for Nondifferentiable Optimization. - Berlin: Springer Verlag , 1985 .-- ISBN 978-3540156420 .
  • Claude Lemaréchal. Lagrangian relaxation // Computational combinatorial optimization: Papers from the Spring School held in Schloß Dagstuhl, May 15–19, 2000. - Berlin: Springer-Verlag, 2001. - T. 2241. - (Lecture Notes in Computer Science). - ISBN 3-540-42877-1 . - DOI : 10.1007 / 3-540-45586-8_4 .
  • Krzysztof C. Kiwiel, Torbjörn Larsson, Lindberg PO Lagrangian relaxation via ballstep subgradient methods // Mathematics of Operations Research. - 2007. - August ( t. 32 , No. 3 ). - S. 669–686 . - DOI : 10.1287 / moor.1070.0261 .

Further Reading

  • Andrzej Piotr Ruszczyński. Nonlinear Optimization. - Princeton, NJ: Princeton University Press , 2006 .-- C. xii + 454. - ISBN 978-0691119151 .

Links

  • EE364A and EE364B , Stanford's convex optimization course sequence.
Source - https://ru.wikipedia.org/w/index.php?title=Subgradient methods_old&oldid = 99405616


More articles:

  • Los Angeles Dodgers 1997 Season
  • William of Salisbury
  • Novonezhino
  • Sonoda, Kenichi
  • Benevskoe
  • Kozeradsky, Anatoly Alexandrovich
  • Captive of Time
  • Congregacion Hidalgo
  • Elchingen Monastery
  • Winter European Youth Olympic Festival 2019

All articles

Clever Geek | 2019