Subgradient methods

Subgradient methods are iterative methods for solving convex minimization problems. Subgradient methods developed by Naum Zuselevich Shor and others in the 1960s and 1970s converge even if applied to undifferentiated objective functions. When a function is differentiable, subgradient methods for tasks without restrictions use the same direction of search as the steepest descent method.

Subgradient methods are slower than Newton methods when they are used to minimize twice continuously differentiable convex functions. However, Newton's methods cease to converge on problems that have undifferentiable bends.

In recent years, some internal point methods have been proposed for convex minimization problems, but also subgradient projection methods and associated beam descent methods remain competitive. For convex minimization problems with a large number of dimensions, subgradient projection methods are acceptable, since they require a small memory size.

Subgradient projection methods are often applied to large tasks using decomposition techniques. Such decomposition methods often allow a simple distributed method for the task.

Content

1 The rules of the classic subgradient
- 1.1 Step size rules
- 1.2 Convergence
2 Sub-gradient projections and beam methods
3 Optimization with restrictions
- 3.1 Subgradient projection method
- 3.2 General restrictions
4 notes
5 Literature
6 Literature for further reading
7 References

Classic Sub-Gradient Rules

Let be $f:\mathbb {R} ^{n}\to \mathbb {R}$ ${\ displaystyle f: \ mathbb {R} ^ {n} \ to \ mathbb {R}}$ $f:\mathbb {R} ^{n}\to \mathbb {R}$ will be a convex function with scope $\mathbb {R} ^{n}$ ${\ displaystyle \ mathbb {R} ^ {n}}$ $\mathbb {R} ^{n}$ . The classic subgradient method iterates

x^{(k+1)}=x^{(k)}-\alpha _{k}g^{(k)}\

{\ displaystyle x ^ {(k + 1)} = x ^ {(k)} - \ alpha _ {k} g ^ {(k)} \}

x^{(k+1)}=x^{(k)}-\alpha _{k}g^{(k)}\

Where $g^{(k)}$ ${\ displaystyle g ^ {(k)}}$ $g^{(k)}$ mean subdifferential function $f\$ ${\ displaystyle f \}$ $f\$ at the point $x^{(k)}$ ${\ displaystyle x ^ {(k)}}$ $x^{{(k)}}$ , but $x^{(k)}$ ${\ displaystyle x ^ {(k)}}$ $x^{{(k)}}$ - kth iteration of the variable $x$ ${\ displaystyle x}$ $x$ . If $f\$ ${\ displaystyle f \}$ $f\$ differentiable, then its only subgradient is the gradient $\nabla f$ ${\ displaystyle \ nabla f}$ $\nabla f$ . It may happen that $-g^{(k)}$ ${\ displaystyle -g ^ {(k)}}$ $-g^{(k)}$ is not a decreasing direction for $f\$ ${\ displaystyle f \}$ $f\$ at the point $x^{(k)}$ ${\ displaystyle x ^ {(k)}}$ $x^{{(k)}}$ . We therefore list $f_{\rm {best}}\$ ${\ displaystyle f _ {\ rm {best}} \}$ $f_{\rm {best}}\$ , in which we store the found smallest values of the objective function, that is,

f_{\rm {best}}^{(k)}=\min\{f_{\rm {best}}^{(k-1)},f(x^{(k)})\}.

{\ displaystyle f _ {\ rm {best}} ^ {(k)} = \ min \ {f _ {\ rm {best}} ^ {(k-1)}, f (x ^ {(k)}) \ }.}

f_{\rm {best}}^{(k)}=\min\{f_{\rm {best}}^{(k-1)},f(x^{(k)})\}.

Step Size Rules

Subgradient methods use a large number of different step size selection rules. Here we mention five classical rules for which evidence of convergence is known:

Constant step size, $\alpha _{k}=\alpha .$ ${\ displaystyle \ alpha _ {k} = \ alpha.}$ $\alpha _{k}=\alpha .$
Constant stride length, $\alpha _{k}=\gamma /\lVert g^{(k)}\rVert _{2}$ ${\ displaystyle \ alpha _ {k} = \ gamma / \ lVert g ^ {(k)} \ rVert _ {2}}$ $\alpha _{k}=\gamma /\lVert g^{(k)}\rVert _{2}$ , what gives $\lVert x^{(k+1)}-x^{(k)}\rVert _{2}=\gamma .$ ${\ displaystyle \ lVert x ^ {(k + 1)} - x ^ {(k)} \ rVert _ {2} = \ gamma.}$ $\lVert x^{(k+1)}-x^{(k)}\rVert _{2}=\gamma .$
Summable with a square, but not summable step size, i.e. any step size for which

\alpha _{k}\geqslant 0,\qquad \sum _{k=1}^{\infty }\alpha _{k}^{2}<\infty ,\qquad \sum _{k=1}^{\infty }\alpha _{k}=\infty .

{\ displaystyle \ alpha _ {k} \ geqslant 0, \ qquad \ sum _ {k = 1} ^ {\ infty} \ alpha _ {k} ^ {2} <\ infty, \ qquad \ sum _ {k = 1} ^ {\ infty} \ alpha _ {k} = \ infty.}

Non-cumulative decreasing step size, i.e. any step satisfying

\alpha _{k}\geqslant 0,\qquad \lim _{k\to \infty }\alpha _{k}=0,\qquad \sum _{k=1}^{\infty }\alpha _{k}=\infty .

{\ displaystyle \ alpha _ {k} \ geqslant 0, \ qquad \ lim _ {k \ to \ infty} \ alpha _ {k} = 0, \ qquad \ sum _ {k = 1} ^ {\ infty} \ alpha _ {k} = \ infty.}

Non-cumulative decreasing stride length, i.e. $\alpha _{k}=\gamma _{k}/\lVert g^{(k)}\rVert _{2}$ ${\ displaystyle \ alpha _ {k} = \ gamma _ {k} / \ lVert g ^ {(k)} \ rVert _ {2}}$ where

\gamma _{k}\geqslant 0,\qquad \lim _{k\to \infty }\gamma _{k}=0,\qquad \sum _{k=1}^{\infty }\gamma _{k}=\infty .

{\ displaystyle \ gamma _ {k} \ geqslant 0, \ qquad \ lim _ {k \ to \ infty} \ gamma _ {k} = 0, \ qquad \ sum _ {k = 1} ^ {\ infty} \ gamma _ {k} = \ infty.}

For all five rules, the step size is determined "in advance", before the method starts. The step size does not depend on previous iterations. The “advance” step selection property for subgradient methods differs from the “in process” step selection rules used in methods for differentiable functions - many methods for minimizing differentiable functions satisfy Wolf conditions for convergence, where the step sizes depend on the current position of the point and the current direction of the search. A wide discussion of step selection rules for subgradient methods, including incremental versions, is given in the book of Bertsekas ^[1] , as well as in the book of Bertsekas, Nedich and Ozdaglar ^[2] .

Convergence

For a constant step length and scalable subgradients having a Euclidean norm equal to unity, the subgradient method approaches arbitrarily close to the minimum value, i.e.

\lim _{k\to \infty }f_{\rm {best}}^{(k)}-f^{*}<\epsilon

{\ displaystyle \ lim _ {k \ to \ infty} f _ {\ rm {best}} ^ {(k)} - f ^ {*} <\ epsilon}

according to shore . ^[3] .

Classical subgradient methods have poor convergence and are no longer recommended for use ^[4] ^[5] . However, they are still used in specialized applications, because they are simple and easily adapt to special structures in order to use their features.

Subgradient Projections and Beam Methods

During the 1970s, Claude Lemechel and Phil Wolf proposed “beam methods” for descent for convex minimization problems ^[6] . The meaning of the term “beam methods” has changed a lot since then. Modern versions and a complete analysis of convergence were given by Kiel ^[7] . Modern beam methods often use the rules of to select the step size, which develop techniques from the method of "projection subgradient" Boris T. Polyak (1969). However, there are problems due to which often the beam methods give a small advantage over the methods of projecting subgradients ^[4] ^[5] .

Restricted Optimization

Subgradient Projection Method

One of the extensions of subgradient methods is the subgradient projection method , which solves the optimization problem with constraints

minimize

f(x)\

{\ displaystyle f (x) \}

under conditions

x\in {\mathcal {C}}

{\ displaystyle x \ in {\ mathcal {C}}}

Where ${\mathcal {C}}$ ${\ displaystyle {\ mathcal {C}}}$ is a convex set . Subgradient Projection Method Iterates

x^{(k+1)}=P\left(x^{(k)}-\alpha _{k}g^{(k)}\right)

{\ displaystyle x ^ {(k + 1)} = P \ left (x ^ {(k)} - \ alpha _ {k} g ^ {(k)} \ right)}

Where $P$ ${\ displaystyle P}$ is a projection onto ${\mathcal {C}}$ ${\ displaystyle {\ mathcal {C}}}$ , but $g^{(k)}$ ${\ displaystyle g ^ {(k)}}$ is any subgradient $f\$ ${\ displaystyle f \}$ at the point $x^{(k)}.$ ${\ displaystyle x ^ {(k)}.}$

General restrictions

The subgradient method can be extended to solve the problem with constraints in the form of inequalities

minimize

f_{0}(x)\

{\ displaystyle f_ {0} (x) \}

under conditions

f_{i}(x)\leqslant 0,\quad i=1,\dots ,m

{\ displaystyle f_ {i} (x) \ leqslant 0, \ quad i = 1, \ dots, m}

where are the functions $f_{i}$ ${\ displaystyle f_ {i}}$ convex. The algorithm takes the same form of the case without restriction.

x^{(k+1)}=x^{(k)}-\alpha _{k}g^{(k)}\

{\ displaystyle x ^ {(k + 1)} = x ^ {(k)} - \ alpha _ {k} g ^ {(k)} \}

Where $\alpha _{k}>0$ ${\ displaystyle \ alpha _ {k}> 0}$ is the step size, and $g^{(k)}$ ${\ displaystyle g ^ {(k)}}$ is a subgradient of the objective function or one of the constraint functions at the point $x.\$ ${\ displaystyle x. \}$ Here

g^{(k)}={\begin{cases}\partial f_{0}(x)&f_{i}(x)\leqslant 0\;\forall i=1\dots m\\\partial f_{j}(x)&\exists j:f_{j}(x)>0\end{cases}}

{\ displaystyle g ^ {(k)} = {\ begin {cases} \ partial f_ {0} (x) & f_ {i} (x) \ leqslant 0 \; \ forall i = 1 \ dots m \\\ partial f_ {j} (x) & \ exists j: f_ {j} (x)> 0 \ end {cases}}}

Where $\partial f$ ${\ displaystyle \ partial f}$ mean subdifferential function $f\$ ${\ displaystyle f \}$ . If the current point is valid, the algorithm uses the subgradient of the objective function. If the point is not valid, the algorithm selects the sub-gradient of any violated constraint.

Notes

↑ Bertsekas, 2015 .
↑ Bertsekas, Nedic, Ozdaglar, 2003 .
↑ The convergence of subgradient methods with a constant (scaled) step is affirmed in exercise 6.3.14 (a) of Bertsekas’s book (page 636) ( Bertsekas 1999 ) and he ascribes this result to Shor ( Shor 1985 )
↑ ¹ ² Lemaréchal, 2001 , p. 112–156.
↑ ¹ ² Kiwiel, Larsson, Lindberg, 2007 , p. 669–686.
↑ Bertsekas, 1999 .
↑ Kiwiel, 1985 , p. 362.

Literature

Dimitri P. Bertsekas . Convex Optimization Algorithms. - Second. - Belmont, MA .: Athena Scientific, 2015 .-- ISBN 978-1-886529-28-1 .

Dimitri P. Bertsekas, Angelia Nedic, Asuman Ozdaglar. Convex Analysis and Optimization. - Second. - Belmont, MA .: Athena Scientific, 2003. - ISBN 1-886529-45-0 .
Naum Z. Shor . Minimization Methods for Non-differentiable Functions. - Springer-Verlag , 1985. - ISBN 0-387-12763-1 .

Dimitri P. Bertsekas . Nonlinear Programming. - Second. - Cambridge, MA .: Athena Scientific, 1999 .-- ISBN 1-886529-00-0 .
Krzysztof Kiwiel. Methods of Descent for Nondifferentiable Optimization. - Berlin: Springer Verlag , 1985 .-- ISBN 978-3540156420 .
Claude Lemaréchal. Lagrangian relaxation // Computational combinatorial optimization: Papers from the Spring School held in Schloß Dagstuhl, May 15–19, 2000. - Berlin: Springer-Verlag, 2001. - T. 2241. - (Lecture Notes in Computer Science). - ISBN 3-540-42877-1 . - DOI : 10.1007 / 3-540-45586-8_4 .
Krzysztof C. Kiwiel, Torbjörn Larsson, Lindberg PO Lagrangian relaxation via ballstep subgradient methods // Mathematics of Operations Research. - 2007. - August ( t. 32 , No. 3 ). - S. 669–686 . - DOI : 10.1287 / moor.1070.0261 .

Links

EE364A and EE364B , Stanford's convex optimization course sequence.

[_7d5e5956a341985b-1] Bertsekas, 2015 .

[_6bdd65ff80cda579-2] Bertsekas, Nedic, Ozdaglar, 2003 .

[3] The convergence of subgradient methods with a constant (scaled) step is affirmed in exercise 6.3.14 (a) of Bertsekas’s book (page 636) ( Bertsekas 1999 ) and he ascribes this result to Shor ( Shor 1985 )

[_aa34c4510de8552e-4] ¹ ² Lemaréchal, 2001 , p. 112–156.

[_137586efd4bee450-5] ¹ ² Kiwiel, Larsson, Lindberg, 2007 , p. 669–686.

[_d046438f55e4775f-6] Bertsekas, 1999 .

[_acd37d7efa9015da-7] Kiwiel, 1985 , p. 362.