# Convex optimization brief overview

Contents

## Introduction

Mathematical optimization problem

Standard and inequality form semidefinite programs

Semidefinite programs

### Convex function

1. convex function plus convex function is also convex function

2. convex function minus convex function may not be convex function

Inequality can not subtract another Inequality, so above formula is not true. For example:

There are two minimum points, it is not a convex function.

## Duality

### Lagrange function

We consider an optimization problem in the standard form:

Lagrange function

$L\left(x,\lambda ,v\right)={f}_{0}\left(x\right)+\sum _{i=1}^{m}{\lambda }_{i}{f}_{i}\left(x\right)+\sum _{i=1}^{p}{v}_{i}{h}_{i}\left(x\right)$

Lagrange dual function

$g\left(\lambda ,v\right)=\underset{x\in \mathcal{D}}{inf}L\left(x,\lambda ,v\right)=\underset{x\in \mathcal{D}}{inf}\left({f}_{0}\left(x\right)+\sum _{i=1}^{m}{\lambda }_{i}{f}_{i}\left(x\right)+\sum _{i=1}^{p}{v}_{i}{h}_{i}\left(x\right)\right)$

Lagrange dual problem

### Convex conjugate

Let $X$$X$ be a real topological vector space and let ${X}^{\ast }$$X^{*}$ be the dual space to $X$$X$. Denote by

$⟨\cdot ,\cdot ⟩:{X}^{\ast }×X\to \mathbb{R}$

the canonical dual pairing, which is defined by

$\left({x}^{\ast },x\right)↦{x}^{\ast }\left(x\right).$

For a function

$f:X\to \mathbb{R}\cup \left\{-\mathrm{\infty },+\mathrm{\infty }\right\}$

taking values on the extended real number line, its convex conjugate is the function

${f}^{\ast }:{X}^{\ast }\to \mathbb{R}\cup \left\{-\mathrm{\infty },+\mathrm{\infty }\right\}$

whose value at

${x}^{\ast }\in {X}^{\ast }$

is defined to be the supremum:

or, equivalently, in terms of the infimum:

This definition can be interpreted as an encoding of the convex hull of the function's epigraph in terms of its supporting hyperplanes.

### Weak duality

The optimal value of the Lagrange dual problem, which we denote ${d}^{\ast }$$d^{*}$, is, by definition, the best lower bound on ${p}^{\ast }$$p^{*}$ that can be obtained from the Lagrange dual function. In particular, we have the simple but important inequality

${d}^{\ast }\le {p}^{\ast }$

which holds even if the original problem is not convex. This property is called weak duality.

A starved camel is bigger than a horse.

### Strong duality

${d}^{\ast }={p}^{\ast }$

### Slater’s condition

$\underset{\lambda ⪰0}{sup}\underset{x}{inf}L\left(x,\lambda \right)=\underset{x}{inf}\underset{\lambda ⪰0}{sup}L\left(x,\lambda \right)$

$\left(x,\lambda ,v\right)$$(x, \lambda, v)$ over the Lagrange function $\left(L\right)$$(L)$ is saddle point, is the necessary and sufficient condition of that $\left(x,\lambda ,v\right)$$(x, \lambda, v)$ is the primal and dual optimal points, in the other words, ${p}^{\ast }-{d}^{\ast }=0$$p^{*}-d^{*}=0$ .

### Compelementary slackness

$\begin{array}{rl}{f}_{0}\left({x}^{\ast }\right)& =g\left({\lambda }^{\ast },{v}^{\ast }\right)\\ & =\underset{x}{inf}\left\{{f}_{0}\left(x\right)+\sum _{i=1}^{m}{\lambda }_{i}^{\ast }{f}_{i}\left(x\right)+\sum _{i=1}^{p}{v}_{i}^{\ast }{h}_{i}\left(x\right)\right\}\\ & \le {f}_{0}\left({x}^{\ast }\right)+\sum _{i=1}^{m}{\lambda }_{i}^{\ast }{f}_{i}\left({x}^{\ast }\right)+\sum _{i=1}^{p}{v}_{i}^{\ast }{h}_{i}\left({x}^{\ast }\right)\\ & \le {f}_{0}\left({x}^{\ast }\right)\end{array}$

important conclusion is that

$\sum _{i=1}^{m}{\lambda }_{i}^{\ast }{f}_{i}\left({x}^{\ast }\right)=0$

We can express the complementary slackness condition as

$\left\{\begin{array}{rl}{\lambda }_{i}^{\ast }>0& ⟹{f}_{i}\left({x}^{\ast }\right)=0\\ {f}_{i}\left({x}^{\ast }\right)<0& ⟹{\lambda }_{i}^{\ast }=0\end{array}$

### KKT optimality conditions

which are called the Karush-Kuhn-Tucker (KKT) conditions.

For any optimization problem with differentiable objective and constraint functions for which strong duality obtains, any pair of primal and dual optimal points must satisfy the KKT conditions.(necessary condition)

necessary condition: When $\left(x,\lambda ,v\right)$$(x, \lambda, v)$ is the primal and dual optimal points, optimization problem is differentiable objective and constraint functions for which strong duality obtains, then $\left(x,\lambda ,v\right)$$(x, \lambda, v)$ must satisfy the KKT conditions.

When the primal problem is convex, the KKT conditions are also sufficient for the points to be primal and dual optimal.

sufficient condition: When $\left(x,\lambda ,v\right)$$(x, \lambda, v)$ is satisfied with KKT and the primal problem is convex, then $\left(x,\lambda ,v\right)$$(x, \lambda, v)$ is the primal and dual optimal points.

### Perturbation and sensitivity analysis

Perturbed version

We define ${p}^{\star }\left(u,v\right)$$p^{⋆}(u, v)$ as the optimal value of the perturbed problem

When the original problem is convex, the function ${p}^{\star }\left(u,v\right)$$p^{⋆}(u,v)$ is a convex function of $u$$u$ and $v$$v$.

Now we assume that strong duality holds, and that the dual optimum is attained. This is the case if the original problem is convex, and Slater’s condition is satisfied. Let $\left({\lambda }^{\star },{v}^{\star }\right)$$(\lambda^{⋆}, v^{⋆})$ be optimal for the dual of the unperturbed problem. Then for all $u$$u$ and $v$$v$ we have

${p}^{\ast }\left(u,w\right)\ge {p}^{\ast }\left(0,0\right)-{{\lambda }^{\ast }}^{T}u-{{v}^{\ast }}^{T}w$

## Algorithm

### Steepest descent method

$\underset{x}{min}f\left(x\right)$

At the first, we are going to make an equivalent substitution.

$\underset{v}{min}f\left({x}^{\left(k\right)}+v\right)$

Taylor expansion:

$f\left({x}^{\left(k\right)}+v\right)\approx f\left({x}^{\left(k\right)}\right)+\mathrm{\nabla }{f}^{T}\left({x}^{\left(k\right)}\right)\cdot v$

Then we get the direction:

Dual norm

We just need the direction, that's all.

### Newton's method

The second-order Taylor approximation the hat f of the real f at the point ${x}^{\left(k\right)}$$x^{(k)}$ is

$f\left(x\right)\approx \stackrel{^}{f}\left(x\right)=f\left({x}^{\left(k\right)}\right)+\mathrm{\nabla }f\left({x}^{\left(k\right)}{\right)}^{T}\left(x-{x}^{\left(k\right)}\right)+\frac{1}{2}\left(x-{x}^{\left(k\right)}{\right)}^{T}{\mathrm{\nabla }}^{2}f\left({x}^{\left(k\right)}\right)\left(x-{x}^{\left(k\right)}\right)\phantom{\rule{0ex}{0ex}}{\mathrm{\nabla }}^{2}f\left(x\right)={\mathcal{S}}_{++}^{n},\phantom{\rule{1em}{0ex}}x\in \mathcal{D}$

The $\stackrel{^}{f}\left(x\right)$$\hat{f}(x)$ is convex quadratic function of $x$$x$.

The optimal necessary and sufficient condition:

$\mathrm{\nabla }\stackrel{^}{f}\left(x\right)=\frac{\mathrm{\partial }\stackrel{^}{f}\left(x\right)}{\mathrm{\partial }x}=0\phantom{\rule{0ex}{0ex}}⇓\phantom{\rule{0ex}{0ex}}\mathrm{\nabla }f\left({x}^{\left(k\right)}\right)+{\mathrm{\nabla }}^{2}f\left({x}^{\left(k\right)}\right)\left(x-{x}^{\left(k\right)}\right)=0$

The Hesse matrix is invertible due to that it is positive definiteness matrix.

$x={x}^{\left(k\right)}-{\left[{\mathrm{\nabla }}^{2}f\left({x}^{\left(k\right)}\right)\right]}^{-1}\mathrm{\nabla }f\left({x}^{\left(k\right)}\right)\phantom{\rule{0ex}{0ex}}{x}_{new}={x}^{\left(k+1\right)}=x$

The vector

$\mathrm{\Delta }{x}_{nt}=-{\left[{\mathrm{\nabla }}^{2}f\left({x}^{\left(k\right)}\right)\right]}^{-1}\mathrm{\nabla }f\left({x}^{\left(k\right)}\right)$

is called the Newton step. We can get the optimal of $\stackrel{^}{f}$$\hat{f}$, but $\stackrel{^}{f}$$\hat{f}$ is a approximate function to real $f$$f$.

If the real $f\left(x\right)$$f(x)$ is quadratic, then ${x}^{\left(new\right)}$$x^{(new)}$ is the exact minimizer of $f$$f$. If the function $f$$f$ is nearly quadratic, intuition suggests that ${x}^{\left(new\right)}$$x^{(new)}$ should be a very good estimate of the minimizer of $f$$f$.

Since the real $f$$f$ is twice differentiable, the quadratic model of the $\stackrel{^}{f}$$\hat{f}$ will be very accurate when $x$$x$ is near ${x}^{\star }$$x^{⋆}$. It follows that when $x$$x$ is near ${x}^{\star }$$x^{⋆}$, the point ${x}^{\left(new\right)}$$x^{(new)}$ should be a very good estimate of ${x}^{\star }$$x^{⋆}$. Also, the ${\stackrel{^}{f}}^{\prime }$$\hat{f}'$ is the linear approximation of the real ${f}^{\prime }$$f'$ at the point $x$$x$. Even so, the direction will approximately point to the real $f$$f$ optimal.  ## Appendices

The minimization over $x$$x$:

$g\left(\lambda \right)=\underset{x\in \mathbb{D}}{inf}L\left(x,\lambda \right)$

The maximization over $\lambda \ge 0$$\lambda \geq 0$:

$h\left(x\right)=\underset{\lambda \ge 0}{sup}L\left(x,\lambda \right)$

Matrix split

Dual norm