(WWW - Entrance-Exam - Net) - GRE Sample Paper 1

Sparse & Redundant Representations
and Their Applications in

Signal and Image Processing
Greedy Pursuit Algorithms – The Practice
Michael Elad
The Computer Science Department
The Technion – Israel Institute of technology
Haifa 32000, Israel
Defining Our Objective
and Directions
We Return to (P0)
So, we are considering again the

general (P0) problem
(P0 ) min x 0
s.t. A x  b
x
and this time we

would like to discuss n A 
practical ways for
solving it b
m
x
Michael Elad | The Computer-Science Department | The Technion

What are Our Options ?
 Here is a possible recipe for solving (P0): An

exhaustive search over all the possible supports
 Denote by k – the number of non-zeros in the
solution
 As we do not know how many non-zeros there are in
the optimal solution, we should check k=1, 2, … till
we find the sparsest solution
Solve the LS problem
Gather all the
2
Set k=1 supports {Si}i min A x  b 2
s.t. sup  x   S i error ≤ ε ?
x
of cardinality k
for each support
No Yes
There are (mk )
such supports Set k=k+1 Done

Option 1: Exhaustive Search
 A typical example: Assume that

o m=2000 (number of atoms in A)
o k is known – k=15 (so no need to start at k=1, …)
o It takes 1 nano-second to check each LS
We shall need ~7.5e20 years to solve this problem!!
Solve the LS problem

GatherThis
all theis a combinatorial
supports {Si}i min A x  b 2 s.t. sup  x   S i
2 LS error
Set k=1
problem,
of cardinality k proven
x
to be
for each support
≤ ε2 ?
There are (mk)

NP-Hard! No Yes
such supports Set k=k+1 Done

What are Our Options ?
(P0 ) min x 0
s.t. A x  b
x
DEFINITELY NOT EXHAUSTIVE SEARCH !

 So, what are the alternatives? The general
answer is
Approximation Algorithms
 Approximation? This means that are be willing
to sacrifice accuracy and NOT obtain the truly
optimal solution of (P0)
 So, how do we design such approximation
algorithms?

Approximation Algorithms: Greedy
 Very similar to the exhaustive search rationale, one

could say that the true unknown in (P0) is the
support of the solution, which is discrete by nature
 The set of support possibilities forms a m-tuple tree,
and exhaustive search S={}
implies checking each node
 An approximation: S={1} S={2} S={m}
--- ---
Search the tree
of possibilities S={2,1} S={2,3} S={2,m}
while pruning --- -- - ---
many “unlikely” states S={2,3,7}

---
 This leads to “Greedy Methods”
Approximation Algorithms: Relaxation
(P0 ) min x 0
s.t. A x  b
x
 As opposed to the above, one could
consider the whole vector x as an
unknown rather than focusing on its
support
 We have massive knowledge in
continuous optimization… but …
 Main difficulty: (P0) is highly non-
smooth due to the L0 penalty
 Solution: Smooth (P0) somehow,
which leads to “Relaxation Methods”

Approximation Algorithms
(P0 ) min x 0
s.t. A x  b
x
Greedy methods Relaxation methods

Build the solution one Smooth the L0 and use
non-zero element at a continuous optimization
time techniques

Michael Elad
Haifa 32000, Israel
Greedy Algorithms:
The Orthogonal Matching
Pursuit Algorithm
Lets Go Greedy
 Core idea:
exploit the
best support
from the last

round
 Start: find the atom that best matches
Ax to b
 Next: given the previously chosen atoms,
find the next one to best fit Ax to b
 The solution grows the support one item at a time
 The algorithm should stop when the error Ax-b is
getting close enough to zero
The Relation to the Pruned Tree
x 0
0 S={}
x 0
1 S={1} S={2} S={m}
x 0
2 S={2,1} S={2,3} S={2,m}
x 0
3 S={2,3,1} S={2,3,4} S={2,3,7} S={2,3,m}
x 0
4 S={2,3,7,1} S={2,3,7,4} S={2,3,7,8} S={2,3,7,m}
Many of the possibilities are never checked,

as every round has only O(m) tests
(instead of m-choose-k)
From Concept to Algorithms
 As it turns out, there are various ways to practice

the above rationale, all of them considered
“Greedy Algorithms”
 We shall meet several such variants, ranging from
the most sophisticated down to simpler methods:
o Least-Squares Orthogonal Matching Pursuit (LS-OMP)
o Orthogonal Matching Pursuit (OMP) Our starting
point
o Matching Pursuit (MP)
o Weak Matching Pursuit (WMP)
o The Thresholding Algorithm

OMP: The Rationale
 Greedy
A
algorithms
such as OMP  + 
build the
solution
sequentially
by adding one non-zero at a time b r 41320
x0, x1, x2, … , xk, …
 In this path, the found solution xk may x 20143
not satisfy the equation: Axkb
 We shall refer to this error as the current residual:
rk=b-Axk
 OMP Strategy: choose the next non-zero such as to
reduce the “energy” in the residual as best as possible
OMP: The Details
Our goal: Approximating the solution of

m in x 0
s.t. A x  b
x
Initialization Main Iteration

2
1. Compute E(i)  min z  a i  r k 1 for 1  i  m
k  0, x 0  0 z 2
r0  b  A x0  b k  k 1 2. Choose i0 s.t. 1  i  m, E(i0 )  E(i)

and S 0   3. Update S k : S k  S k 1  i0 
2
4. LS : x k  min Ax  b 2
s.t. sup  x   S k
x
5. Update Residual: r k  b  A x k
No Yes
rk 2
 Stop

OMP: Choosing the Next Atom
 Lets assume that the columns of A are L2-normalized

 Evaluating which atom to choose relies on computing
E(i)
2
 1  i  m E(i)  min z  a i  r k 1 2
z
T
a i r k 1
 z  a i  r k 1   0
T T
ai z opt  T
 a i r k 1
ai ai
 
2 2 2
T T
E(i)  a i r k 1  a i  r k 1  r k 1 2
 a i r k 1
2
Conclusion: Instead of minimizing E(i),

T
we can maximize a i r k 1

OMP: Choosing the Next Atom
In order to choose the next atom to join the

support, OMP performs the following computation

T
A r k 1
T
A r k 1 m
and seeks the maximal entry –

its location points to the atom to be chosen

OMP: Least-Squares
After we have updated the support, we should update

both the current solution xk and the residual rk
2
x k  m in A x  b
x
2
A 
2
 b
 AS x  b
2
 min Sk x
2 x
 
1
T T †
xk  AS AS AS b  ASb

Why the Name “Orthogonal …” ?
 Observe that the updates solution xk satisfy

2
AS x  b 2

x
min
T
AS rk
0  A S  A S x k  b    A ST r k
T

 The solution in each step is chosen such

that the new residual (rk) is orthogonal to
all the chosen atoms in A
A Positive Consequence:
OMP can never choose the same atom twice

Numerical Shortcut
2
 Our Least-Squares task is given as 
 
2 1
T T
m in A S x  b  A AS Sk
AS b
x k 2 k k
2
 Could we exploit the fact that we already
have the inversion from the previous
A  ?
1
T
iteration S k 1
AS
k 1
2 2
The answer is positive –
 
There is a recursive method to
 update the solution (which will
not be discussed here)
2 2
 A S , ak 
 k 1 
OMP: Complexity
 The two most demanding parts of the OMP are

o The sweep stage in which we choose the next atom
This stage requires the computation of
ATrk-1 and finding the maximal value, a
process that requires O(mn) operations
o The Least-Squares stage in which we update the solution
This stage requires computing ASTAS and
updating the solution, a process that
requires O(k2m) operations
 The overall complexity of the OMP is governed

by the first of the two (applied k times):
OMP complexity is O(mnk)

Michael Elad
Haifa 32000, Israel
Variations over the
Orthonormal Matching
Pursuit
Other Greedy Algorithms
 The OMP is just one interpretation of the greedy

rationale, and there are others that could be
proposed
 The alternative algorithms suggest different
tradeoffs between accuracy and complexity
Weak Matching
OMP More
Pursuit (WMP)
accurate
Faster
Thresholding Matching Pursuit Least-Squares
(MP) OMP

Least-Squares OMP
 Here is an algorithm that may appear at first to be

equivalent to the OMP ...
Set k=1 Gather all Find the solution of

& S0=  
2
the supports min A x  b 2
s.t. sup  x   S i
S i i1  S k 1 , ii1
m m x
for each support

There are only Sk
m such Find the support
No
supports Set k=k+1 Ek ≤ ε ? with the smallest
error {Sk,Ek}
Yes
 Is it different than OMP? Done
 Yes! While OMP uses the residual as a proxy to

the error, this method computes the actual error
directly
LS-OMP vs. OMP
Our goal: Approximating the solution

of m in x s.t. A x  b
x 0
This is the OMP:

Main Iteration
Initialization
1. Compute p(i)  a iT r k 1 for 1  i  m
k  0, x 0  0
k  k 1 2. Choose i0 s.t. 1  i  m, p(i0 )  p(i)
r0  b  A x0  b
4. LS : x k  min A x  b s.t. sup  x   S k
x
and this part will be
replaced in order to No
rk 
Yes
Stop
obtain the LS-OMP 2

LS-OMP: Key Observation
 Who is the best atom to join the support?

o OMP’s Answer: The one that most correlates with the
residual
o LS-OMP’s Answer: The OMP strategy is sub-optimal since
it relies on the residual. A better choice can be made by
computing the following set of m LS problems:

2
2
E(i)  m in A x  b
x 2 min
x
s.t. sup  x   S k 1  i for 1  i  m
 A S , ai  2
 k 1 
and choosing the atom that led to the smallest error
 Observation: OMP does this very LS but only once per
iteration, while LS-OMP performs it m times in order to
choose the next atom – this is the best possible greedy step
LS-OMP: Details

2
k  0, x 0  0 1. Com pute E(i)  m in A x  b
x 2
s.t.
k  k 1
r0  b  A x0  b sup  x   S k 1  i for 1  i  m
and S 0   2. Choose i0 s.t. 1  i  m, E(i0 )  E(i)
3. Update S k : S k  S k 1  i0 
2
4. LS : x k  min A x  b 2
x
Comments:
o Step 1 can be done faster by
No Yes
exploiting the recursive LS alg. rk 2
 Stop
o Step 4 is not needed since its
result is already given in step 1
o The residual is computed here
only for the stopping criterion

LS-OMP: Complexity
Claim 1: Complexity(OMP)<Complexity(LS-OMP)
Claim 2: Nevertheless, LS-OMP can be made

more efficient by exploiting the recursive LS
solution mention earlier
Weak Matching
OMP More
Pursuit (WMP)
accurate
Faster
Thresholding Matching Pursuit Least-Squares
(MP) OMP

Simplifying the OMP
 How can we simplify the OMP?

o Matching Pursuit: By avoiding the LS step somehow
o Weak MP: By simplifying the search for the next atom
Main Iteration
Initialization
T
k  0, x 0  0 1. Compute p(i)  a i r k 1 for 1  i  m
k  k 1
r0  b  A x0  b
2. Choose i0 s.t. 1  i  m, p(i0 )  p(i)
and S 0  
3. Update S k : S k  S k 1  i0 
4. LS : x k  min A x  b s.t. sup  x   S k
x
No Yes
rk 2
 Stop

Matching Pursuit: Rationale
 MP uses the same method for choosing the next atom,

so assume that the current support Sk has been set
 When updating 2
m in A S x  b The original
the solution, x k 2 OMP LS
OMP chooses to 2
MP suggested
“forget” the m in A S
z k 1
x k 1  a i0 z  b
2 solution
previous
xk-1 and re-compute x k  i0   m in a i0 z  r k 1
2
Looks
familiar?
it by a full LS z 2
 MP approach: x k  i0   ai0 r k 1
T The new
coefficient
Keep xk-1 and simply
update it by
adding the new atom with its coefficient
MP: Details
Main Iteration
Initialization
k  0, x 0  0
r0  b  A x0  b
T
4. Update : x k Axx
LS : x k x kmin k 1b &s.t. (i0 ) x x k(iS
x ksup )  a i r k 1
0 k 0
x
Comments: 5. Update Residual: r k  b  A x k
o MP might choose the same

No Yes
atom twice, explaining the rk 2
 Stop
‘+’ in the update formula
o Clearly, MP is faster (and

less accurate) compared to
the OMP, since it avoids all
the LS computations

Weak Matching Pursuit: Rationale
 Weak-MP seeks to further simplify the Matching Pursuit

by targeting the step of choosing the next atom:
o MP approach: Compute |ATrk-1| and choose the largest entry
o WMP’s rationale: If A is huge, this step is too expensive
 The alternative: compute |aiTrk-1| values, and stop when
the obtained value is big enough. How big is “big
enough”?
 
2 2
T T
E(i)  a i r k 1  a i  r k 1  r k 1 2
 a i r k 1 0
Range of Possible
Values Weak-MP strategy: If |aiTrk-1| is
T above t (<1) times the upper
0  a i r k  1  r k 1 2 bound, it is sufficiently big

WMP: Details
Main Iteration
Initialization
k  0, x 0  0
k  k 1 2. Choose i0 as soon as p  i   t  r k 1
r0  b  A x0  b 2

4. Update x k : x k  x k 1 & x k (i0 )  x k (i0 )  aiT r k 1
0
Comments:
o WMP uses the same
‘trick’ as the MP for No Yes
rk  Stop
avoiding the LS
2
computation
o t is a parameter – the
larger it is, the faster
(and less accurate) the
algorithm becomes

Michael Elad
Haifa 32000, Israel
The Thresholding Algorithm
What Have we Seen So Far?
 There is a set of possible greedy methods that

span a wide range of complexities and
accuracies
Weak Matching
OMP More
Pursuit (WMP)
accurate
Faster
Matching Pursuit Least-Squares
Thresholding (MP) OMP
 We now turn to discuss the thresholding

algorithm, which is the simplest and crudest of
all, but could also be considered as the most
popular pursuit technique

Thresholding: Core Idea
 How should we choose the next atom to join the support?

o LS-OMP’s Answer: By trying each of the atoms and
choosing the one that leads to the eventual smallest error
o OMP’s Answer: The one that most correlates with the
residual
o Thresholding Answer: The above two are too complicated ...
 So, what is the alternative? T
A b
 Take the very first OMP step
|ATb| and extract from it all
T 
the decisions about the order
of atoms to bring into S,
A b
m
based on the magnitude of
the values of this vector
Thresholding: Details

m in x 0
s.t. A x  b
x

1. Com pute   A T b and 1.Update S k : S k  S k 1  {ik }
sort this vector by absolute k  k 1 2. LS : x k  min A x  b s.t.

x
descending order i1 , i2 , ... sup  x   S k

 i1   i2   i3    ik 3. Update Residual: r k  b  A x k
2. Set : k  0, x 0  0 , S 0   
No Yes
rk  Stop
Here as well one could 2
use the recursive LS to speed-up

the process
Thresholding Made Even Simpler

m in x 0
s.t. A x  b
x

1. Com pute   A T b and 1.Update S k : S k  S k 1  {ik }
sort this vector by absolute 2. LS : x k  min A x T b s.t.

x k sup Ax S Sb
k  k 1 x
decending order i1 , i2 , ...

  k k
i  i  i   i 3. Update Residual: r k  b  A x k
1 2 3 k
2. Set : k  0, x 0  0 , S 0   
No Yes
 In high dim. problems, the LS step becomes
r  prohibitive
Stop k 2
 It can be replaced by a simple projection,

following the Matching-Pursuit approach
A Test Case:
Demonstrating and Testing
Greedy Algorithms
Proposed Experiment
 We have
a several Draw A nm
A
algorithms for somehow
(m  n)
approximating
the solution of b n
A Solver of x̂
min x s.t. A x  b Multiply min x
x 0
x 0 b=Ax0
s.t. A x  b
 Let’s test these
algorithms
Draw an s-sparse
 This is the x0 at Random Compare
structure of the x 0  m
, x0 0
s n
experiment

Proposed Experiment
 We shall use
a random A Draw A A nm
of size somehow
(m  n)
50×100
with normal b n
A Solver of x̂
entries Multiply min x
x 0
b=Ax0
 We shall s.t. A x  b
L2-normalize
the columns Draw an s-sparse
of A x0 at Random Compare
m
x0  , x0 0
s n

Proposed Experiment
 We shall test the

cardinalities s in
Draw A nm
the range [1,15] somehow
A
(m  n)
 The non-zeros
will be drawn
n
b A Solver of x̂
from the uniform Multiply min x
x 0
b=Ax0
distribution [1,2] s.t. A x  b
and given a
random sign
Draw an s-sparse
x0 at Random Compare
-2 -1 ? 1 2
x0  m
, x0 0
s n

Proposed Experiment
 We will compute
the relative error Draw A nm
2 A
x̂  x 0 somehow
(m  n)
ErrorL2  2
2
x0 b n
2 A Solver of x̂
Multiply min x
 We will also b=Ax0
x 0
evaluate the s.t. A x  b

support recovery
Draw
Ŝ  S0 an s-sparse
x0 at Random Compare
ErrorS  1 
max S  0 
ˆx 0 ,S m , x 0 0  s n
 Results averaged over 200 experiments

Proposed Experiment: Results
2
x̂  x 0 2
2
x0 2

Ŝ  S0
1

ˆ,S
max S 0 

Performance of Greedy Algorithms
Question: Should we be happy with the results

we got?
Answer: Yes & No
Negative: The success is conditioned on a very

low s value
Positive: These algorithm succeed for sparse

enough x0 (e.g. OMP for s=9 versus
an exhaustive search?)

Relaxation Pursuit Algorithms
Michael Elad
Haifa 32000, Israel
Relaxation of the
L0-Norm: The Core Idea
Back to (P0)
We are considering again the

general (P0) problem
(P0 ) min x 0
s.t. A x  b
x
m
and we would like to n

A  n
discuss practical
ways for solving it
b
that are NOT based on x
the greedy rationale
Relaxation?
(P0 ) min x 0
s.t. A x  b
x
 We have massive knowledge in

continuous optimization
… but …
 The main difficulty: (P0)
is highly non-smooth due
to the L0 penalty
 Solution: Smooth (P0)
 This leads us to
“Relaxation Methods”

Smoothing the L0-Norm
 Recall the L0-norm:

m
0 x 0
x   k   x  w here  *  x   
0
k 1 1 x 0
 x
 There are infinite ways to smooth this function

 We shall propose few smooth variations of *(x)
to illustrate the possibilities

Here are Few Popular Options …
 x2  x
2
   x   1  exp     x     x   x

 2
   x
These options all share the property

  x      *  x 
0
1.2
0.8
0.6
0.4
0.2
0
-4 -3 -2 -1 0 1 2 3 4

Graduated Optimization
An appealing option is to solve a chain of such

problems with a smoothing effect that starts wide
(nearly convex), and progressively gets closer to L0
1.2
For example: 1
1=5.0
x
2 0.8 2=1.5

  x   1  e 
Decrease  3=0.5
0.6
4=0.1
0.4
0.2
0
-4 -3 -2 -1 0 1 2 3 4

Graduated Optimization
m
 Define  P0     m in   x 
 k
s.t. A x  b
x
k 1
 Each problem provides a warm-start initialization

to the next
Initialization
x j 1 xj
set : j  1 Solve
x0  0 (P0{}) j=j+1
  1 j
somehow
 Such process may lead to better final solution

compared to the use of a single  value
Numerical Solution of the Relaxed (P0)
 How can we numerically solve the problem

m
m in
x
  x 
 k
s.t. A x  b ?
k 1
 Optimization theory provides various such ways

 We shall present one simple option:
Iterative Reweighted Least-Squares (IRLS)
 The core idea: m m   x k 
Pseudo L2-norm:     x k  
2
o 2
xk
k 1 k 1 x k2
o Refer to this term as “fixed” weights wk

IRLS: The Details
m   x k  This algorithm is also known as

2
m in 2
x k
s.t. A x  b FOcal Underdetermined
x xk
k 1
System Solver (FOCUSS)
wk
m This problem
 P2 { W }  m in 
T
x W
w kxx s.t.
2
s.t. AAx xbb
k has a closed
x
k 1
form solution
IRLS iterates between a solution of the L2
problem and an update of the weights
set Update the
Solve
diagonal
(P2{W})
x0  1 ? matrix W
x

Alternative: Convex Relaxation
 Another possible relaxation that has drawn

much attention is (x)=|x|, implying …
 P0  m in x
x 0
s.t. A x  b
 P1  m in x
x 1
s.t. A x  b
 The resulting problem (P1) is known as

Basis Pursuit
 This relaxation is a convex problem that can
be handled by Linear-Programming solvers
Relaxation Pursuit Algorithms
Michael Elad
Haifa 32000, Israel
A Test Case:
Demonstrating and Testing
Relaxation Algorithms
Proposed Experiment: As Before
Draw A A
nm
somehow
(m  n)
n
b An
Multiply Approximate
x̂
b=Ax0 Solver
of (P0)
m
x0 
Draw an
s-sparse x0 Compare
at Random
x0 0
s n
Proposed Experiment: As Before …
 WeDraw
use A
a random A of size
A 50×100 with iid
nm
somehowentries, and normalize A’s columns

Gaussian (m  n)
 We test the cardinalities s in the range [1,20]
n
b An
 TheMultiply
non-zeros are drawn from x̂
the uniform
Approximate
distribution
b=Ax0 [1,2] and given a random sign
Solver
of (P0)
 We evaluate the results by
m
x 0  a) L2 error of the computed solution, and
b)Draw
A support
an recovery score
s-sparse x0 Compare
 We
at average
Random the results over 200 experiments
x0 0
s n
Proposed Experiment: Algorithms
Draw A A
nm
 Wesomehow
compare three algorithms:
(m  n)
1. The OMP, applied exactly as in the previous
experiment
n
b An
2. The IRLS used for approximating the
Multiply Approximate
x̂
solution of (P1/2) (where 1 /2  x   x )
0.5
b=Ax0 Solver
o 100 iterations of (P0)
xk 1
x0 
m o The weights are given by 2
 1.5
xk xk
Draw an
3.s-sparse
A direct
x0
solution of (P1) using the
Compare
Random instruction in Matlab
at linprog
x0 0
s n
2
x̂  x 0
2
2
x0 2

Ŝ  S 0
1
ˆ ,S
m ax S 0 

Guarantees of Pursuit Algorithms
Michael Elad
Haifa 32000, Israel
Our Goal: Theoretical
Justification for the Proposed
Algorithms
Back to (P0)
 For the problem

(P0 ) min x 0
s.t. A x  b
x
we came up with various algorithms, all aiming

to approximate its solution:
o Greedy methods: OMP, LS-OMP, MP, WMP, THR
o Relaxation methods, such as Basis Pursuit
 Why should we trust these methods?
 A partial and indirect answer was given by
the experiments we did

Theoretical Guarantees
 Our goal now is to show that pursuit
algorithms could in fact be accurate, leading
to the desired solution
 Wait ! Doesn’t this contradict our earlier
statement that (P0) is NP-Hard?
 Answer: Yes. We shall develop theoretical
guarantees for the success of several pursuit
algorithms under some conditions on the
cardinality of the unknown x
 The message: when these conditions are
met, (P0) is not NP-Hard anymore
Rules of the Game
Choose a A
nm
specific A
(m  n)
n
b x̂
Multiply A Pursuit
b=Ax Algorithm
x
m
We shall prove that if s is small
enough (i.e. x0 is sufficiently
Draw an
s-sparse x0 sparse), then OMP,THR, and BP
are all guaranteed to give x̂  x
x 0
s n
Implications
(P0 ) min x 0
s.t. A x  b
x
 While (P0) is generally NP-Hard, it is far less

complicated if its unknown, x, is known to be
sparse
 As we are about to show, the guarantees we
shall develop may pose additional conditions,
either on x or A
 We shall commonly refer to all these results
as
EQUIVALENCE GUARANTEES

Worst-Case Analysis
(P0 ) min x 0
s.t. A x  b
x
 The guarantees are going to develop adopt a
worst-case point of view
 This means that for all {A,b} satisfying the
conditions, success is perfectly guaranteed
 There exists a more sophisticated approach
that adopts a probabilistic point of view,
claiming the success of the pursuit with
probability →1
 These offer more “generous” bounds but their
analysis is typically more complicated
Michael Elad
Haifa 32000, Israel
Equivalence:
Analyzing the OMP
Algorithm
Recall the OMP

0
m in x 0
s.t. A x  b
x
Main Iteration
Initialization T
1. Com pute p(i)  a i r k 1 for 1  i  m
k  0, x 0  0
r0  b  A x0  b
3. Update S k : S k  S k 1  i0 
and S 0  
2
4. LS : x k  m in A x  b 2
x
No Yes
rk 2
 Stop

Underlying Assumptions
 Assume that the first s elements in x are

the non-zero ones, ordered in decreasing
order of absolute values:
The true n
A 
Support S
m b
s x
b xa i i where
i 1
x1  x 2   xs  0

OMP: Condition for Success
 The first step of the OMP succeeds if the inner

product of b with a1 is bigger (in absolute value)
than all other columns of A
T T
b a1  m ax b a j
j s
 We shall proceed by expanding these two

expressions, lower-bounding the left and upper-
bounding the right, this way deriving a condition
for this inequality to be satisfied
Lower Upper
T T
b a1  bound for  bound for  m ax b a j
j s
the LHS the RHS

Upper-Bounding the RHS

T T
RHS  m ax b a j  m ax x i ai a j
j s j s
i 1
s
b xa i i
i 1
s s
 x i a i a j  m ax  x i  a i a j
T T
 m ax
j s j s
i 1 i 1
 x1  s   A
m ax a i a j    A 
T
i j

Lower-Bounding the LHS
s s
xa xa
T T T
LHS  b a1  i i a1  x 1  i i a1
i 1 i 2
s
b xa i i
i 1

T
 x1  x i a i a1
i 2
s

ab  a  b T
 x1  x i  a i a1
i 2
 x 1  1   s  1    A  
m ax a i a j    A  ; x 1  x i  i  2
T
i j

Gathering the Bounds …
Lower Upper
T T
b a1  bound for  bound for  m ax b a j
j s
the LHS the RHS
b a1  x 1  1   s  1    A  
T
>
x 1  s    A   m ax b a j
T
j s
1   s  1  A  s   A
1 1 
1    A   2s    A  s  1  

2   A  

Moving to the Next Step
Conclusion so far:
 If x0 is sparse enough
1 1 
x  s  1  
0 
2   A  
then the first step of the OMP is successful –
it finds an atom i0 from within the support
 The next OMP steps:
o Update the solution by x 1  i0   b a i  c 1

T
0
o Update the residual by r 1  b  A x 1  b  c 1 a i 0

Observe that the residual a linear combination
of the s original atoms (as in b)
s
r 1  b  c 1 a i0   xa i i
i 1
The same condition applies for the

second OMP step … i.e. if
1 1 
x  s  1  
0 
2   A  
then the second atom chosen,
i1 is within the true support of x
 As the algorithm proceeds, the solution xk is
restricted to be a linear combination of atoms
from the support S, and thus the residual always
s
have the form
rk  xa i i
i 1
 Therefore, the obtained condition guarantees

the success of each step in the OMP
1 1 
x s   1  
0
2    A  
 Since rk is orthogonal to all chosen atoms, OMP
always selects a new one
 After s steps, OMP finds the exact solution
OMP (& MP) Equivalence
We are given A and b defining the problem (P0)

(P0 ) min x 0
s.t. A x  b
x
and we deploy OMP or MP for its solution
Theorem: Given the above (P0), if the unknown

to be found is sparse enough,
1 1 
x  s  1  
0
2    A  
then OMP and MP are guaranteed to

find it. Furthermore, OMP finds the
solution is exactly s=|S| steps

Michael Elad
Haifa 32000, Israel
Equivalence:
Analyzing the THR
Algorithm
THR: Terms for Success
 We shall use the same assumption as before

(the support S is the set 1is), the coefficients
are given in a descending order
 The THR algorithm succeeds if the inner
products with the true atoms are dominant:
T T
m in b a i  m ax b a j
1  i s j s
 Just as before, we will use the bounding idea:
T
Lower Upper T
m in b a i  bound for  bound for  m ax b a j
1  i s j s
the LHS the RHS

Upper-Bounding the RHS

T T
RHS  m ax b a j  m ax x i ai a j
j s j s
i 1
s
b xa i i
i 1
s s
 x i a i a j  m ax  x i  a i a j
T T
 m ax
j s j s
i 1 i 1
 x m ax  s    A 
m ax a i a j    A 
T
i j

s
x
T T
LHS  m in b a i  m in t
a t ai
1  i s 1  i s
s t 1
b x t
at
t 1
s

T
 m in x i  x t a t ai
1  i s
t 1 ,t  i
 s 
 x t a t ai 
T
 m in  x i 
1  i s
 t 1 , t  i 
ab  a  b s

T
 m in x i  m ax x t a t ai
1  i s 1  i s
t 1 , t  i

s
x
T T
LHS  m in b a i  m in t
a t ai
1  i s 1  i s
s t 1
b x t
at
t 1
s

T
 min x i  max x t a t ai
1  i s 1  i s
t 1, t  i
s

T
 x m in  m ax x t  a t ai
1  i s
t 1 , t  i
 x m in  x m ax  s  1    A 
m ax a i a j    A  ; x 1 
T
 xs
i j

Gathering the Bounds …
T
Lower Upper T
m in b a i  bound for  bound for  m ax b a j
1  i s j s
the LHS the RHS
x m in  x m ax  s  1    A  >x m ax
 s   A
x m in
  s  1  A  s   A
x m ax
1  x m in 1 
s    1
2  x m ax   A 
 

THR Equivalence

(P0 ) min x 0
s.t. A x  b
x
and we deploy the THR algorithm for its solution

to be found satisfies
1 x m in 1 
x  s  1   
0
2  x m a x   A  
then THR is guaranteed to find it.

THR vs. OMP: Bounds
OMP Condition THR Condition

1 1  1 x m in 1 
x  s  1   x  s  1   
0
2    A   0
2  x m a x   A  
 The THR condition is weaker compared to

the OMP, showing dependency on the
unknown’s “contrast”
 Is this sensitivity to the contrast is true or
just an artifact of the proof?
 Answer: we present an experiment
verifying that THR does not handle well
highly contrasted non-zeros
THR vs. OMP: Experiment
We experiment with the OMP and the THR

algorithms following the very same test as before:
 A is a matrix of size 50×100
 x0 is a sparse vector of cardinality s=5 and 10
 The non-zeros have a varying contrast c in
the range [1,20]:
1 1
0.2c 0.2c
 The experiment is averaged over 1000 trials

THR vs. OMP: Experiment (s=5)
2
x̂  x 0
2
2
x0 2

THR vs. OMP: Experiment (s=10)
2
x̂  x 0
2
2
x0 2

Michael Elad
Haifa 32000, Israel
Equivalence: Analyzing the
Basis-Pursuit Algorithm
--- Part 1 ----
Basis Pursuit (BP) Rationale
We approximate the solution of the problem
 P0  ˆ
x  Arg m in x
x 0
s.t. b  A x
by solving instead
 P1  ˆ
x  Arg m in x
x 1
s.t. b  A x
We aim to show that a sparse vector x is

also the shortest w.r.t. the L1 measure

BP: Analysis Approach
 We define a set C of all the possible solutions
to b=Ax, such that their L1 length is shorter
(or even equal) than the sparse vector x we
started with

C  z b  A x  A z, z  x , and z 1
 x 1 
 This set represents the possible solutions of
BP that would be considered as erroneous
 Our strategy will be to inflate this set (and
simplify it as a consequence), while showing
that it is actually empty under sparsity
conditions on x
An Error-Driven Set C

C  z b  A x  A z, z  x , and z 1
 x 1 
Rather than defining the set C w.r.t. candidate
alternative solutions, lets redefine it w.r.t. the
solution’s error: z=x+e
b  Ax  Az Ae  0

C e  e 0  A e, e  0, and x  e 1
 x 1 
Simplifying Ce – Part 1 (1)
 Let us focus on Ae=0 and simplify it:
T
0  Ae 0  A Ae
Multiply by A

e  A A  I e
T

Subtract e from
both sides
e  A T

A I e
Apply abs on
both sides
T
e  A A I  e
ax  by  a x  b y

T T
e  A A I  e e    (11  I )  e
 The matrix |ATA-I| has several properties:

o It is non-negative due to the abs-value
o Its main diagonal contains zeros
o All its off-diagonal entries are in the range [0,]
T A - I
A
 Thus, |ATA-I|(11T-I)
[11T is a square matrix filled with 1-es]
T
e    (11  I )  e  T
e  1 e 1 e 
e
1

(1   )  e   e 1 1 e  e 11
1
 Observe that any e satisfying Ae=0 satisfies the last
inequality but not vice-versa. This means that
  
e 0  A e    e e  1   e 1 1 
 
 We should be pleased with this replacement because
o A is replaced by its property 
o The condition is posed w.r.t. |e|

We started with this set of problematic vectors

C e  e 0  A e, e  0, and x  e 1
 x 1 
and inflated it to
  
 e  e 1 1, e  0 
C e  e 1 
 & xe1  x 1 
 
We now target the term ||x+e ||1 ||x ||1 and aim
to simplify it to be stated in terms of |e| as well
 We focus on the following
xe 1
 x 1
 Writing these norms explicitly we get

m
x i
 ei  x i 0
i 1
 This sum can be divided to the on-support and

off-support parts
s
x i
 ei  x i    ei  0
i 1 i s

s
xe 1
 x 1 x i
 ei  x i    ei  0
i 1 i s
 We exploit the inequality |x+y|-|x|-|y|

and apply it to the first part, resulting with
s s
 ei   ei  x i
 ei  x i    ei  0
i 1 i s i 1 i s
s
 Adding and subtracting  e i leads to e

T
1
 2  1s e  0
i 1
where  m elem ents


T
1 s  1 1 1 0 0
 
 s elem ents 

xe  x
T
e  2  1s e  0
1 1 1
 Here as well we get inclusion of the form
 e xe 1
 x 1  
 e e 1
T
 2  1s e  0 
i.e. every e satisfying the RHS, satisfies the LHS
but not vice versa
 We are pleased with this replacement because
o The condition is posed w.r.t. |e|
o The dependency on x is replaced by a simpler
dependency on the support S
Michael Elad
Haifa 32000, Israel
Equivalence: Analyzing the
Basis-Pursuit Algorithm
--- Part 2 ----
Simplifying Ce - Summary
We started with this set of problematic vectors


C e  e 0  A e, e  0, and x  e 1
 x 1 
and inflated it to
  
 e  e 1, e  0 
C e  e 1 1

 & xe1  x 1 
 
  
 e  e 1, e  0 
 e 1   1

 & e 1  2  1S e  0 
T
 
Scale-Invariance
  
 e  e 1, e  0 
C S  e 1   1

 & e 1  2  1S e  0 
T
 
 Observe that if eCS then eCS for all 0
 Since our aim is to investigate whether CS is
empty or not, it is sufficient to consider the
intersection of this set with the unit L1 sphere
 Thus, we impose || e ||1=1, getting
  T 
CN  e e  1, e  1, 1  2  1 S e  0 

1
1 

Last Step
 
  T

CN  e e 1 1 , e  1 , 1  2  1S e  0 
 1 
Condition 1 Condition 3
 Condition 2 
 Cond. 3, requires to concentrate as much energy
as possible in the support elements of e
 Cond. 2 gives an upper bound on these entries
 Thus choosing the s elements in the support as
/(1+), Cond. 3 is violated if
s 1 1
12 0 s   1  
1 2 

BP Equivalence

(P0 ) min x 0
s.t. A x  b
x
and we deploy the Basis Pursuit for its solution

to be found satisfies
1 1 
x  s  1  
0
2    A  
then Basis-Pursuit is guaranteed to

find it.

BP vs. OMP: Bounds
OMP Condition BP Condition

1 1  1 1 
x  s  1   x  s  1  
0
2    A   0
2    A  
 The two bounds are the same !!

 Does this mean that the two algorithms are
equivalent? definitely not
 The above implies that up to the specified
bound, the two algorithms provide perfect
recovery, and beyond it, each has its own
pattern of mistakes

Theory vs. Reality?
Remember this graph? How are these

2
results related to the
x̂  x 0
x0
2
2
obtained bound
2
1 1 
Perfect x  s  1  
0
2    A  
Recovery for
s<8
?
Answer: In this experiment, 0.48, implying that
this bounds predict success for s<1.5
Too Pessimistic

(WWW - Entrance-Exam - Net) - GRE Sample Paper 1

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

(WWW - Entrance-Exam - Net) - GRE Sample Paper 1

Încărcat de

Drepturi de autor:

Formate disponibile

Sparse & Redundant Representations

and Their Applications in

So, we are considering again the

and this time we

Michael Elad | The Computer-Science Department | The Technion

 Here is a possible recipe for solving (P0): An

Michael Elad | The Computer-Science Department | The Technion

 A typical example: Assume that

We shall need ~7.5e20 years to solve this problem!!

Solve the LS problem

There are (mk)

such supports Set k=k+1 Done

Michael Elad | The Computer-Science Department | The Technion

DEFINITELY NOT EXHAUSTIVE SEARCH !

Michael Elad | The Computer-Science Department | The Technion

 Very similar to the exhaustive search rationale, one

many “unlikely” states S={2,3,7}

Michael Elad | The Computer-Science Department | The Technion

Greedy methods Relaxation methods

Michael Elad | The Computer-Science Department | The Technion

Many of the possibilities are never checked,

 As it turns out, there are various ways to practice

Michael Elad | The Computer-Science Department | The Technion

Our goal: Approximating the solution of

Initialization Main Iteration

r0  b  A x0  b k  k 1 2. Choose i0 s.t. 1  i  m, E(i0 )  E(i)

Michael Elad | The Computer-Science Department | The Technion

 Lets assume that the columns of A are L2-normalized

Conclusion: Instead of minimizing E(i),

Michael Elad | The Computer-Science Department | The Technion

In order to choose the next atom to join the

and seeks the maximal entry –

Michael Elad | The Computer-Science Department | The Technion

After we have updated the support, we should update

Michael Elad | The Computer-Science Department | The Technion

 Observe that the updates solution xk satisfy

 The solution in each step is chosen such

Michael Elad | The Computer-Science Department | The Technion

 The two most demanding parts of the OMP are

 The overall complexity of the OMP is governed

Michael Elad | The Computer-Science Department | The Technion

 The OMP is just one interpretation of the greedy

Michael Elad | The Computer-Science Department | The Technion

 Here is an algorithm that may appear at first to be

Set k=1 Gather all Find the solution of

for each support

 Yes! While OMP uses the residual as a proxy to

Our goal: Approximating the solution

This is the OMP:

Michael Elad | The Computer-Science Department | The Technion

 Who is the best atom to join the support?

Initialization Main Iteration

Michael Elad | The Computer-Science Department | The Technion

Claim 2: Nevertheless, LS-OMP can be made

Michael Elad | The Computer-Science Department | The Technion

 How can we simplify the OMP?

Michael Elad | The Computer-Science Department | The Technion

 MP uses the same method for choosing the next atom,

Comments: 5. Update Residual: r k  b  A x k

o MP might choose the same

o Clearly, MP is faster (and

Michael Elad | The Computer-Science Department | The Technion