Optimization Techniques Laboratory
A series of two practical labs focused on implementing and comparing classical and modern optimization algorithms for unconstrained problems and sparse recovery.
Project Overview
This series of labs for the "Optimization Techniques" course provided a hands-on implementation of core numerical optimization methods in Python using Jupyter Notebooks. The projects moved from foundational gradient-based algorithms to advanced techniques like L1 regularization for solving real-world signal processing problems.
Lab 1: Unconstrained Nonlinear Optimization
- Situation: Many problems in engineering and machine learning can be framed as minimizing an objective function. For quadratic functions of the form $f(x) = ||Ax-b||_2^2$, several iterative, gradient-based methods exist with different convergence properties.
- Task: To implement and compare the performance of three fundamental optimization algorithms: Steepest Descent (with optimal step size), the Heavy Ball method, and the Conjugate Gradient method.
- Action: I implemented each algorithm from scratch in Python. For each method, I coded the iterative update rule to generate a sequence of iterates $\{x_k\}$ that converge to the minimum. A key part was deriving and implementing the optimal step size for Steepest Descent and the optimal parameters for the Heavy Ball method.
- Result: The final analysis and plots clearly demonstrated the superior convergence rate of the Conjugate Gradient and Heavy Ball methods over the basic Steepest Descent, which exhibited classic zigzagging behavior. The Conjugate Gradient method converged in the fewest iterations, validating its theoretical efficiency for quadratic problems.
Technical Deep Dive
The core of this lab was the implementation of the update rules for each algorithm.
- Steepest Descent: The update is $x_{k+1} = x_k - t \nabla f(x_k)$. I derived the optimal step size $t$ by analytically solving $\min_t f(x_k - t \nabla f(x_k))$, which yielded $t = \frac{\nabla f(x_k)^T \nabla f(x_k)}{\nabla f(x_k)^T (A^T A) \nabla f(x_k)}$.
- Heavy Ball Method: This method adds a "momentum" term to the update rule: $x_{k+1} = x_k - t \nabla f(x_k) + \beta(x_k - x_{k-1})$. The challenge was to implement the optimal parameters for $t$ and $\beta$ based on the smallest ($\lambda_{min}$) and largest ($\lambda_{max}$) eigenvalues of the Hessian ($A^T A$), as derived in the course.
- Conjugate Gradient: This method constructs a sequence of search directions $d_k$ that are A-orthogonal. The implementation involved iteratively computing the step size $\alpha_k$, the next iterate $x_{k+1}$, the residual $r_{k+1}$, and the next search direction $d_{k+1} = -r_{k+1} + \beta_k d_k$.
Lab 2: Sparse Recovery using L1 Regularization (LASSO)
- Situation: In signal processing, we often want to find a sparse representation of a signal within a dictionary of basis functions. Standard least squares ($||Ax-b||_2^2$) finds a solution but does not enforce sparsity (most elements of $x$ being zero).
- Task: To use L1 regularization (the LASSO problem) to identify the constituent notes of a musical chord from its time-domain signal. The goal was to recover a sparse vector $x$ where non-zero elements correspond to the active notes in a dictionary $A$.
- Action: I first constructed a dictionary matrix `A` where columns represented sampled sine and cosine waves at the frequencies of a musical scale. I then formulated the optimization problem as $\min_x ||Ax - b||_2^2 + c||x||_1$, where $b$ is the recorded chord signal and $c$ is the regularization parameter. This problem was solved using SciPy's `minimize` function.
- Result: The L1-regularized approach was highly successful. Unlike simple least squares, it produced a sparse solution vector where only the coefficients corresponding to the actual notes present in the chord were non-zero. By tuning the regularization parameter `c`, I was able to effectively filter out noise and perfectly identify the notes.
Technical Deep Dive
The key insight of this lab is that the L1-norm penalty encourages sparsity in the solution vector. The problem was to find the notes in a chord, which is equivalent to solving for $x$ in the overdetermined system $Ax \approx b$.
- Dictionary Construction: I created a matrix `A` of size `(401, 24)`. The first 12 columns were cosine waves and the last 12 were sine waves, each corresponding to one of the 12 notes in an octave, sampled at 200 Hz for 2 seconds.
- L1 vs. L2 Regularization: A preliminary analysis showed why L2 regularization (Ridge regression) is unsuitable for this task, as it shrinks all coefficients towards zero but rarely makes them exactly zero. The L1 penalty, due to the shape of its norm, is known to produce truly sparse solutions.
- Solving with SciPy: The LASSO objective function is non-differentiable at zero, so standard gradient descent is not sufficient. The lab utilized the power of SciPy's `minimize` function with the SLSQP (Sequential Least Squares Programming) solver, which can handle such problems. By providing the objective function, its Jacobian, and the appropriate constraints, the solver efficiently found the sparse solution vector representing the musical notes.