Nonlinear Optimization

Generalized Projection, PIE and COPRA have been implemented to support various forms of nonlinear optimization techniques. The scope and usage of these are explained in this section.

Linesearch and Adaptive-Stepsize

In order to accelerate convergence a "backtracking" and a "zoom" linesearch are implemented [31]. In addition a generalization of the Polyak-Stepsize [32] is available. The implementations are located in pulsedjax.core.stepsize.py.
The backtracking linesearch works by reducing the stepsize by a factor until the Armijo-condition is fullfilled or the maximum number of steps is reached. The stepsize is updated via \(\gamma' = \gamma\cdot\Delta\gamma\). The Armijo-Condition reads:

\( \Delta L \leq c_1\gamma\cdot p\top\cdot\nabla L\)

where \(\Delta L\) is the change in the error, \(p\) is the descent direction and \(c_1\) is a constant that tunes the condition. In short, the condition checks if the error has decreased beyond a linear approximation whose slope is set by \(c_1\). As long as \(0<c_1<1\) a stepsize that fulfills the condition should exist.
The zoom-linesearch expands on this by requiring the Strong-Wolfe Condition on top of the Armijo-Condition. This condition reads:

\(|p\top\cdot\nabla L'|\leq c_2 |p\top\cdot\nabla L|\)

It requires that not only the error has to decrease sufficiently but that also the slope of the error along the search direction should decrease. This enforces a stepsize sufficient to reach close to a minimum along the search direction. The amount of reduction in the slope is controlled via \(c_2\). In order for a solution to exist \(0<c_1<c_2<1\) is required.
The zoom linesearch works in two phases. A finding-phase and a zoom-phase. In the finding-phase the step size is increased using \(\gamma' = \gamma\cdot\Delta\gamma\). If the stepsize is increased beyond a minimum along the descent direction the zoom phase begins. In this phase the local minimum is obtained through successive cubic interpolation along the descent direction. In cases where the cubic interpolation fails the algorithm falls back onto a bisection search.

The lineserch parameters can be either set directly as class attributes, or for simplicty via a setter-function as shown below.
It is hardcoded that linesearches will only be performed during global iterations.

algorithm.set_linesearch(method = False, 
                         max_steps = 10, 
                         c1 = 1e-4, c2 = 0.9, 
                         delta_gamma = 0.5)

An alternative to a relative expensive linesearch is presented via an adaptive stepsize, such as the Polyak-Stepsize. The idea is to expand the error function along the descent direction in a Taylor- or Padé-Series. Subsequently a target error \(L'\) can be defined and a stepsize to achieve this error can be obtained. This can be done using a linear or quadratic Taylor expansion or a Padé approximant of order [0/1], [1/1] or [0/2]. In addition a damping factor can be defined which avoids division by zero.
The method and target error reduction can be defined via class attributes or via a setter-function as shown below.

algorithm.set_adaptive_stepsize(local_method = False, global_method = False, # can be any of "pade_nm"
                                
                                # -1 is the default for COPRA
                                # corresponds to an error reduction of 200%
                                # 0 would be 100%, 1 would be 0% reduction
                                local_factor = -1, global_factor = -1,
                                damping = 1e-12)
        

Nonlinear Conjugate-Gradient

The Conjugate Gradient method is used to iteratively solve linear equations. The Nonlinear Conjugate-Gradient Method (NCG, [31]) heuristically expands this approach to minimize nonlinear equations. The descent direction \(p\) is defined as:

\(p_n = -\nabla L + \beta p_{n-1}\)

Essentially, if a linesearch is used the error will be minimized along the previous descent direction \(p_{n-1}\). Thus the steepest descent at the new position is perpendicular to \(p_{n-1}\). Preserving the previous descent direction inhibits a zig-zag motion in the descent and improoves convergence behaviour.
The parameter \(\beta\) can be computed through various formulas which are based on the (linear) Conjugate Gradient method. The available methods to calculate \(\beta\) are "fletcher_reeves", "polak_ribiere", "hestenes_stiefel" and "dai_yuan". The implementation of NCG can be found in pulsedjax.core.nonlinear_cg.py.
NCG can be enabled as shown below.

algorithm.local_conjugate_gradients = "fletcher_reeves" # for local iterations
algorithm.global_conjugate_gradients = "fletcher_reeves" # for global iterations

(Quasi)-Newton Methods

All mentioned algorithms support three Newton-like optimization approaches. One of them is L-BFGS [31, 33]. It is implemented in pulsedjax.core.lbfgs.py.
The core idea of BFGS is to avoid an explicit calculation and inversion of the Hessian. Instead the Hessian inverse can be approximated via past gradients. L-BFGS limits the memory of this process to a fixed number of past iterations.

The remaining two are based on the Pseudo-Hessian [34]. Since the search space is complex valued the true hessian is a block matrix of the form:

\(H = \begin{bmatrix} \partial_{zz} & \partial_{z\bar{z}} \\\ \partial_{\bar{z}z} & \partial_{\bar{z}\bar{z}} \end{bmatrix}\)

where \(\partial_z\) and \(\partial_{\bar{z}}\) refer to the Wirtinger derivatives. In order to relief some numerical burden the Pseudo-Hessian approximates the cross-derivatives as zero.

\(H \approx H_{pseudo} = \begin{bmatrix} \partial_{zz} & 0 \\\ 0 & \partial_{\bar{z}\bar{z}} \end{bmatrix}\)

Based on this a damped Newton method is implemented. However, since a matrix inverse at each iteration is still quite expensive the Pseudo-Hessian may be approximated via its diagonal.
Thus the available options are "lbfgs", "diagonal" or "full". In the case of L-BFGS the memory can be tuned. If the full Pseudo-Hessian is used, different solver options to calculate the matrix inverse are available. These are "scipy" and "lineax". Alternatively a specific lineax-solver can be provided. This includes iterative solvers, which will use a diagonal preconditioner and the previous solution as an initial guess.
The parameters can be controlled as class attributes or via a setter-function.

from lineax import GMRES
algorithm.set_nonlinear_optimization(local_method = "lbfgs", global_method = "full", 
                                     damping = 1e-3, 
                                     memory = 10, 
                                     solver = GMRES(rtol=1e-3, atol=1e-3))
        

Updating the nonlinear Signal-Field

The COPRA [15] has shown that updating of the nonlinear signal field does not need to be a complete projection in order to perform a successful phase retrieval. Instead an approximate iterative updating is sufficient and maybe even preferential. In pulsedjax.core.construct_s_prime.py the projection as well as the iterative update is implemented such that all algorithms may make use of either one.
In addition an alternative definition to the r-error based on the trace magnitude instead of the intensity as well as an diagonal Newton approximation have been implemented.
The parameters can be controlled as class attributes or via a setter-function. In the case of the latter all attributes are overwritten such that algorithms like COPRA may use "projection" by accident.

algorithm.set_S_prime_params(local_method = "projection", global_method = "iteration", 
                             gradient = "intensity", # "amplitude"
                             newton = False, # True for a diagonal Newton step
                             no_iterations = 1) # the number of iterations to perform, 1 should be sufficient.