Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PD3O formulations and Stochastic Optimisation #2021

Open
epapoutsellis opened this issue Dec 19, 2024 · 1 comment · May be fixed by #2043
Open

PD3O formulations and Stochastic Optimisation #2021

epapoutsellis opened this issue Dec 19, 2024 · 1 comment · May be fixed by #2043
Assignees

Comments

@epapoutsellis
Copy link
Contributor

epapoutsellis commented Dec 19, 2024

In the update method of PD3O algorithm, we have two gradient methods for the function f:

self.f.gradient(self.x_old, out=self.grad_f)

self.f.gradient(self.x, out=self.x_old)

If f is an instance of ApproximateGradientSumFunction, then the f.gradient method calls the approximate_gradient and does:
a) selects a function number based on the selection method
b) updates the .data_passes attribute.

self.function_num = self.sampler.next()
self._update_data_passes_indices([self.function_num])

The implementation that we have is based on equations 5a-5c from https://arxiv.org/pdf/1611.09805.
Image
and NOT 4a-4c
Image
Both formulations are equivalent (I think I have an implementation of (4a-4c) ) and are used in the paper to derive specific subcases of PD3O algorithm, PDHG, PAPC etc.

The stochastic version of PD3O proposed in https://arxiv.org/pdf/2004.02635 follows the (4a-4c) formulation where the gradient of f is computed ONCE at x^{k}.

Image

When we compute the gradient the second time

self.f.gradient(self.x, out=self.x_old)

another function is selected and also the data_passes is updated wrongly. For instance in the first case $f_{5}$ is selected and we compute $\nabla f_{5}(x_{k})$ and then $\nabla f_{9}(x_{k+1})$. So far I have not seen any actual convergence issue but the data passes are for sure wrong.

Actually, I need to check again the actual update method because it has a different order from the paper above and the one that have implemented here

@epapoutsellis epapoutsellis self-assigned this Dec 19, 2024
@github-project-automation github-project-automation bot moved this to Todo in CIL work Jan 9, 2025
@MargaretDuff MargaretDuff moved this from Todo to Blocked in CIL work Jan 9, 2025
@MargaretDuff
Copy link
Member

MargaretDuff commented Jan 15, 2025

Discussed this today with @epapoutsellis, @paskino and @jakobsj today - thanks @epapoutsellis for all your work on this.

To summarise the discussion, we identified the major issue that in our implementation of PD3O the gradient of $f$ is calculated twice per iteration, once on $\nabla f(x)$ and once on $\nabla f(x^\dagger)$
Image.

When doing stochastic PD3O, for both calls we should use the same approximation of the gradient (i.e. for SGD the approximate gradient should be calculated on the same subset).
As a secondary, more minor problem, making two calls to the gradient on each iteration means that our data passes calculation may not be correct.

One solution (for SGD, SAG, and SAGA) is to replace this line in PD3O

self.f.gradient(self.x, out=self.x_old)
with

if isinstance(self.f, ApproximateGradientSumFunction):
      self.f.approximate_gradient(self.func_num, self.x, out=self.x_bar)
else:
      self.f.gradient(self.x, out=self.x_bar)

For SVRG and LSVRG, this will need more careful thought when dealing with full gradient snapshot updates and on calculating data passes (as this is done in the approximate_gradient function).

@MargaretDuff MargaretDuff self-assigned this Jan 15, 2025
@MargaretDuff MargaretDuff moved this from Blocked to In Progress in CIL work Jan 20, 2025
@MargaretDuff MargaretDuff linked a pull request Jan 20, 2025 that will close this issue
9 tasks
@MargaretDuff MargaretDuff linked a pull request Jan 20, 2025 that will close this issue
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

2 participants