Convex optimization brief overview


Mathematical optimization problem

minimizef0(x)subject tofi(x)bi,i=1,,m.

quadratic programming

minimize12xTPx+qTx+r ,PS+nsubject toAxb

Standard and inequality form semidefinite programs

minimizetr(CX)subject totr(AiX)=bi ,i=1,,pX0
XS+n ,CRn×n ,AiRn×n ,biR

Semidefinite programs

minimizecTxsubject tox1F1++xnFn+G0Ax=b

Convex function

θ(0,1), f(θx+(1θ)y)θf(x)+(1θ)f(y)
  1. convex function plus convex function is also convex function

  2. convex function minus convex function may not be convex function

f(x),g(x) is convexh(x)=f(x)g(x)h(θx+(1θ)y)=f(θx+(1θ)y)g(θx+(1θ)y)

Inequality can not subtract another Inequality, so above formula is not true. For example:

f(x)=x2, g(x)=x3, x[0,1]h(x)=f(x)g(x)=x2x3, x[0,1]h(x)=2x3x2=0x=0,23

There are two minimum points, it is not a convex function.


Lagrange function

We consider an optimization problem in the standard form:

minimizef0(x)subject tofi(x)0,i=1,2,,mhi(x)=0,i=1,2,,p

Lagrange function


Lagrange dual function


Lagrange dual problem

maxmizeg(λ,v)subject toλ0

Convex conjugate

Let X be a real topological vector space and let X be the dual space to X. Denote by


the canonical dual pairing, which is defined by


For a function


taking values on the extended real number line, its convex conjugate is the function


whose value at


is defined to be the supremum:

f(y):=sup{y,xf(x) : xX},

or, equivalently, in terms of the infimum:

f(y):=inf{f(x)y,x : xX}.

This definition can be interpreted as an encoding of the convex hull of the function's epigraph in terms of its supporting hyperplanes.

Weak duality

The optimal value of the Lagrange dual problem, which we denote d, is, by definition, the best lower bound on p that can be obtained from the Lagrange dual function. In particular, we have the simple but important inequality


which holds even if the original problem is not convex. This property is called weak duality.

A starved camel is bigger than a horse.

Strong duality


Slater’s condition

minimizef0(x)maxmimizefi(x)0, i=1,,mAx=b, (hi(x)=0), i=1,,p
xrelintDfi(x)<0, i=1,,mAx=b

Saddle-point interpretation


(x,λ,v) over the Lagrange function (L) is saddle point, is the necessary and sufficient condition of that (x,λ,v) is the primal and dual optimal points, in the other words, pd=0 .

Compelementary slackness


important conclusion is that


We can express the complementary slackness condition as


KKT optimality conditions

fi(x)0, i=1,,mhi(x)=0, i=1,,pλi0, i=1,,mλifi(x)=0, i=1,,mf0(x)+i=1mλifi(x)+i=1pvihi(x)=0,

which are called the Karush-Kuhn-Tucker (KKT) conditions.

For any optimization problem with differentiable objective and constraint functions for which strong duality obtains, any pair of primal and dual optimal points must satisfy the KKT conditions.(necessary condition)

necessary condition: When (x,λ,v) is the primal and dual optimal points, optimization problem is differentiable objective and constraint functions for which strong duality obtains, then (x,λ,v) must satisfy the KKT conditions.

When the primal problem is convex, the KKT conditions are also sufficient for the points to be primal and dual optimal.

sufficient condition: When (x,λ,v) is satisfied with KKT and the primal problem is convex, then (x,λ,v) is the primal and dual optimal points.

Perturbation and sensitivity analysis

Perturbed version

minimizef0(x)maxmimizefi(x)ui, i=1,,mhi(x)=wi, i=1,,p

We define p(u,v) as the optimal value of the perturbed problem

p(u,w)=infx{f0(x)| xD, fi(x)ui, i=1,,mhi(x)=wi, i=1,,p}p(0,0)=p

When the original problem is convex, the function p(u,v) is a convex function of u and v.

Now we assume that strong duality holds, and that the dual optimum is attained. This is the case if the original problem is convex, and Slater’s condition is satisfied. Let (λ,v) be optimal for the dual of the unperturbed problem. Then for all u and v we have



Steepest descent method


At the first, we are going to make an equivalent substitution.


Taylor expansion:


Then we get the direction:

d(k+1)=argminv{f(x(k))+fT(x(k))v | v=1}

Dual norm

z=sup{zTx | x1}

We just need the direction, that's all.

Newton's method

The second-order Taylor approximation the hat f of the real f at the point x(k) is


The f^(x) is convex quadratic function of x.

The optimal necessary and sufficient condition:


The Hesse matrix is invertible due to that it is positive definiteness matrix.


The vector


is called the Newton step.


We can get the optimal of f^, but f^ is a approximate function to real f.

If the real f(x) is quadratic, then x(new) is the exact minimizer of f. If the function f is nearly quadratic, intuition suggests that x(new) should be a very good estimate of the minimizer of f.

Since the real f is twice differentiable, the quadratic model of the f^ will be very accurate when x is near x. It follows that when x is near x, the point x(new) should be a very good estimate of x.


Also, the f^ is the linear approximation of the real f at the point x. Even so, the direction will approximately point to the real f optimal.




The minimization over x:


The maximization over λ0:


Matrix split

x=x+x, x+0, x0

Dual norm

z=sup{zTx | x1}