Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split FISTA and APGD to give more options for momentum #2061

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

MargaretDuff
Copy link
Member

@MargaretDuff MargaretDuff commented Feb 3, 2025

A draft PR for discussion

Changes

Splits APGD and FISTA algorithms. This includes:

  • FISTA is now a child of APGD
  • APGD has more options for momentum
  • A new momentum parent class (similar to step size methods or callbacks) in cil. optimisation.utilities
  • FISTA algorithm retains backwards compatibility.

Would still need to add:

  • unit tests
  • documentation

Testing you performed

Please add any demo scripts to https://github.com/TomographicImaging/CIL-Demos/tree/main/misc

Related issues/links

Checklist

  • I have performed a self-review of my code
  • I have added docstrings in line with the guidance in the developer guide
  • I have updated the relevant documentation
  • I have implemented unit tests that cover any new or modified functionality
  • CHANGELOG.md has been updated with any functionality change
  • Request review from all relevant developers
  • Change pull request label to 'Waiting for review'

Contribution Notes

Please read and adhere to the developer guide and local patterns and conventions.

  • The content of this Pull Request (the Contribution) is intentionally submitted for inclusion in CIL (the Work) under the terms and conditions of the Apache-2.0 License
  • I confirm that the contribution does not violate any intellectual property rights of third parties

@MargaretDuff MargaretDuff linked an issue Feb 3, 2025 that may be closed by this pull request
@MargaretDuff
Copy link
Member Author

MargaretDuff commented Feb 3, 2025

Thoughts on implementation

  • What should the defaults for APGD be? - currently, they default to FISTA defaults
  • Should APGD and PGD be separate or could we have options for momentum, defaulting to no momentum, in PGD?
  • What should the current FISTA.py file name be called. Perhaps something about proximal gradient descent algorithms?

@MargaretDuff MargaretDuff marked this pull request as ready for review February 4, 2025 13:46
@MargaretDuff
Copy link
Member Author

Current documentation (i will fix that maths rendering):
image

@MargaretDuff MargaretDuff mentioned this pull request Feb 4, 2025
@MargaretDuff
Copy link
Member Author

#2057 (comment)

From @samdporter
"
Hey all,
I've been having a look at implementing a few different types of momentum within this new framework. I've just come across one that Nesterov outlines in this paper and is suggested for use with ordered subsets by Kim et al. in this paper. It uses an accumulation of gradients as opposed to the previous two iterates alone and so might lend itself to stochastic reconstruction (I'll have a look at it once I figure out how to implement it).

The problem is that it can't be implemented easily within the currently suggested framework for momentum. Could we instead do something similar to the Preconditioner.apply method where Momentum.apply spits out the momentum corrected update as opposed to just the momemtum coefficient?

i.e
rather than (at the end of the APGD.update method)

self.x.subtract(self.x_old, out=self.y)
momentum = self.momentum(self)
self.y.sapyb(momentum, self.x, 1.0, out=self.y)

we would have something like

self.momentum.apply(self, out = self.y)

which can then accommodate more complex types of momentum.
What do you think?

P.S. Would it be more appropriate to create a discussion for this?"

@epapoutsellis
Copy link
Contributor

@samdporter Atm, we can design Table III algorithm, using the FISTA algorithm, with preconditioner=D, f=SGFunction for $\Psi_{m}$ and actually has M in front of the gradient which follows our convention without the 1/M in front of the finite sum and g=IndicatorBox. For the SGFunction we need an ordered Sampler. And step_size=1.
The algorithm (Table IV) introduce one more variable and have two projection steps and I do not think they fit the update method of FISTA/APGD.

Screenshot 2025-02-05 at 10 35 02

Your suggestion can indeed cover more complex cases of momentum. For example the Optimised Gradient Method (OGM) which is actually faster than NAG here
Screenshot 2025-02-05 at 10 46 47 In the same paper, authors present FGM2 which is the same as Table IV algorithm and the new version OGM2. Also, we have proximal version Proximal OGM (POGM), here
Screenshot 2025-02-05 at 10 51 45

The issue here is more on the software design and how we want or our users to apply these algorithms.

  • The self.momentum.apply can be used and can have OGM ( if g=0 or g=Projection) of POGM if g is proximable.
  • We cannot have FGM2 or OGM2 unless we edit the update method of FISTA and introduce another parameter (line 7, OGM1)

I am more in favour of creating new algorithms that implement (P)OGM algorithms and cover all the cases above. Otherwise we need to edit the update method of FISTA and document all the possible combinations

The initial value for the momentum coefficient.
'''

def __init__(self, t= 1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need t=1?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think self.t=1 fits better in the __init__ as before because it is one of the parameters need to initialise the algorithm.

m = 50
A = np.random.normal(0,1, (m, n)).astype('float32')
# A /= np.linalg.norm(A, axis=1, keepdims=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the rescaling.

@@ -45,7 +45,7 @@ An algorithm is designed for a particular generic optimisation problem accepts a
instances of :code:`Function` derived classes and/or :code:`Operator` derived classes as input to
define a specific instance of the generic optimisation problem to be solved.
They are iterable objects which can be run in a for loop.
The user can provide a stopping criterion different than the default max_iteration.
The user can provide a stopping criterion different than the defaul.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default

Copy link
Contributor

@epapoutsellis epapoutsellis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor comments

@MargaretDuff MargaretDuff marked this pull request as draft February 5, 2025 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Widen options for momentum for our APGD class
2 participants