Convex optimization brief overview

Introduction

Mathematical optimization problem

minimizef0(x)subject tofi(x)bi,i=1,,m.

quadratic programming

minimize12xTPx+qTx+r ,PS+nsubject toAxb

Standard and inequality form semidefinite programs

minimizetr(CX)subject totr(AiX)=bi ,i=1,,pX0
XS+n ,CRn×n ,AiRn×n ,biR

Semidefinite programs

minimizecTxsubject tox1F1++xnFn+G0Ax=b

Convex function

θ(0,1), f(θx+(1θ)y)θf(x)+(1θ)f(y)
  1. convex function plus convex function is also convex function

  2. convex function minus convex function may not be convex function

f(x),g(x) is convexh(x)=f(x)g(x)h(θx+(1θ)y)=f(θx+(1θ)y)g(θx+(1θ)y)

Inequality can not subtract another Inequality, so above formula is not true. For example:

f(x)=x2, g(x)=x3, x[0,1]h(x)=f(x)g(x)=x2x3, x[0,1]h(x)=2x3x2=0x=0,23

There are two minimum points, it is not a convex function.

Duality

Lagrange function

We consider an optimization problem in the standard form:

minimizef0(x)subject tofi(x)0,i=1,2,,mhi(x)=0,i=1,2,,p

Lagrange function

L(x,λ,v)=f0(x)+i=1mλifi(x)+i=1pvihi(x)

Lagrange dual function

g(λ,v)=infxDL(x,λ,v)=infxD(f0(x)+i=1mλifi(x)+i=1pvihi(x))

Lagrange dual problem

maxmizeg(λ,v)subject toλ0

Convex conjugate

Let X be a real topological vector space and let X be the dual space to X. Denote by

,:X×XR

the canonical dual pairing, which is defined by

(x,x)x(x).

For a function

f:XR{,+}

taking values on the extended real number line, its convex conjugate is the function

f:XR{,+}

whose value at

xX

is defined to be the supremum:

f(y):=sup{y,xf(x) : xX},

or, equivalently, in terms of the infimum:

f(y):=inf{f(x)y,x : xX}.

This definition can be interpreted as an encoding of the convex hull of the function's epigraph in terms of its supporting hyperplanes.

Weak duality

The optimal value of the Lagrange dual problem, which we denote d, is, by definition, the best lower bound on p that can be obtained from the Lagrange dual function. In particular, we have the simple but important inequality

dp

which holds even if the original problem is not convex. This property is called weak duality.

A starved camel is bigger than a horse.

Strong duality

d=p

Slater’s condition

minimizef0(x)maxmimizefi(x)0, i=1,,mAx=b, (hi(x)=0), i=1,,p
xrelintDfi(x)<0, i=1,,mAx=b

Saddle-point interpretation

supλ0infxL(x,λ)=infxsupλ0L(x,λ)

(x,λ,v) over the Lagrange function (L) is saddle point, is the necessary and sufficient condition of that (x,λ,v) is the primal and dual optimal points, in the other words, pd=0 .

Compelementary slackness

f0(x)=g(λ,v)=infx{f0(x)+i=1mλifi(x)+i=1pvihi(x)}f0(x)+i=1mλifi(x)+i=1pvihi(x)f0(x)

important conclusion is that

i=1mλifi(x)=0

We can express the complementary slackness condition as

{λi>0fi(x)=0fi(x)<0λi=0

KKT optimality conditions

fi(x)0, i=1,,mhi(x)=0, i=1,,pλi0, i=1,,mλifi(x)=0, i=1,,mf0(x)+i=1mλifi(x)+i=1pvihi(x)=0,

which are called the Karush-Kuhn-Tucker (KKT) conditions.

For any optimization problem with differentiable objective and constraint functions for which strong duality obtains, any pair of primal and dual optimal points must satisfy the KKT conditions.(necessary condition)

necessary condition: When (x,λ,v) is the primal and dual optimal points, optimization problem is differentiable objective and constraint functions for which strong duality obtains, then (x,λ,v) must satisfy the KKT conditions.

When the primal problem is convex, the KKT conditions are also sufficient for the points to be primal and dual optimal.

sufficient condition: When (x,λ,v) is satisfied with KKT and the primal problem is convex, then (x,λ,v) is the primal and dual optimal points.

Perturbation and sensitivity analysis

Perturbed version

minimizef0(x)maxmimizefi(x)ui, i=1,,mhi(x)=wi, i=1,,p

We define p(u,v) as the optimal value of the perturbed problem

p(u,w)=infx{f0(x)| xD, fi(x)ui, i=1,,mhi(x)=wi, i=1,,p}p(0,0)=p

When the original problem is convex, the function p(u,v) is a convex function of u and v.


Now we assume that strong duality holds, and that the dual optimum is attained. This is the case if the original problem is convex, and Slater’s condition is satisfied. Let (λ,v) be optimal for the dual of the unperturbed problem. Then for all u and v we have

p(u,w)p(0,0)λTuvTw

Algorithm

Steepest descent method

minxf(x)

At the first, we are going to make an equivalent substitution.

minvf(x(k)+v)

Taylor expansion:

f(x(k)+v)f(x(k))+fT(x(k))v

Then we get the direction:

d(k+1)=argminv{f(x(k))+fT(x(k))v | v=1}

Dual norm

z=sup{zTx | x1}

We just need the direction, that's all.

Newton's method

The second-order Taylor approximation the hat f of the real f at the point x(k) is

f(x)f^(x)=f(x(k))+f(x(k))T(xx(k))+12(xx(k))T2f(x(k))(xx(k))2f(x)=S++n,xD

The f^(x) is convex quadratic function of x.

The optimal necessary and sufficient condition:

f^(x)=f^(x)x=0f(x(k))+2f(x(k))(xx(k))=0

The Hesse matrix is invertible due to that it is positive definiteness matrix.

x=x(k)[2f(x(k))]1f(x(k))xnew=x(k+1)=x

The vector

Δxnt=[2f(x(k))]1f(x(k))

is called the Newton step.

Interpretation

We can get the optimal of f^, but f^ is a approximate function to real f.

If the real f(x) is quadratic, then x(new) is the exact minimizer of f. If the function f is nearly quadratic, intuition suggests that x(new) should be a very good estimate of the minimizer of f.

Since the real f is twice differentiable, the quadratic model of the f^ will be very accurate when x is near x. It follows that when x is near x, the point x(new) should be a very good estimate of x.

Interpretation

Also, the f^ is the linear approximation of the real f at the point x. Even so, the direction will approximately point to the real f optimal.

Interpretation

Interpretation

Appendices

The minimization over x:

g(λ)=infxDL(x,λ)

The maximization over λ0:

h(x)=supλ0L(x,λ)

Matrix split

x=x+x, x+0, x0

Dual norm

z=sup{zTx | x1}