-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PD3O formulations and Stochastic Optimisation #2021
Comments
Discussed this today with @epapoutsellis, @paskino and @jakobsj today - thanks @epapoutsellis for all your work on this. To summarise the discussion, we identified the major issue that in our implementation of PD3O the gradient of When doing stochastic PD3O, for both calls we should use the same approximation of the gradient (i.e. for SGD the approximate gradient should be calculated on the same subset). One solution (for SGD, SAG, and SAGA) is to replace this line in PD3O
if isinstance(self.f, ApproximateGradientSumFunction):
self.f.approximate_gradient(self.func_num, self.x, out=self.x_bar)
else:
self.f.gradient(self.x, out=self.x_bar) For SVRG and LSVRG, this will need more careful thought when dealing with full gradient snapshot updates and on calculating data passes (as this is done in the |
In the update method of PD3O algorithm, we have two gradient methods for the function
f
:CIL/Wrappers/Python/cil/optimisation/algorithms/PD3O.py
Line 121 in d58c330
CIL/Wrappers/Python/cil/optimisation/algorithms/PD3O.py
Line 128 in d58c330
If
f
is an instance ofApproximateGradientSumFunction
, then thef.gradient
method calls theapproximate_gradient
and does:a) selects a function number based on the
selection
methodb) updates the
.data_passes
attribute.CIL/Wrappers/Python/cil/optimisation/functions/ApproximateGradientSumFunction.py
Lines 182 to 184 in d58c330
The implementation that we have is based on equations 5a-5c from https://arxiv.org/pdf/1611.09805.
![Image](https://private-user-images.githubusercontent.com/22398586/397429243-7c6091fe-8dc4-4c73-a9ad-0ff727b4940d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyMjM1MzgsIm5iZiI6MTczOTIyMzIzOCwicGF0aCI6Ii8yMjM5ODU4Ni8zOTc0MjkyNDMtN2M2MDkxZmUtOGRjNC00YzczLWE5YWQtMGZmNzI3YjQ5NDBkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEwVDIxMzM1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM1MTA3YzY0OWE1MTk1M2NlMTNlYTg5NDVlMTliOGE4ZDYyNzZmNjM2NjQ1NzhkYTM1YTQyYjQxNGE5ZDkwMzcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.vpGCMQMqffmqhRzThlZaP8tOH4esdlhlqy3zjmiC86E)
![Image](https://private-user-images.githubusercontent.com/22398586/397429330-68d7b22d-084d-4f20-8d44-2dc71e2a0a73.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyMjM1MzgsIm5iZiI6MTczOTIyMzIzOCwicGF0aCI6Ii8yMjM5ODU4Ni8zOTc0MjkzMzAtNjhkN2IyMmQtMDg0ZC00ZjIwLThkNDQtMmRjNzFlMmEwYTczLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEwVDIxMzM1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI1ZmQ0ZjJjZDAzMDdjM2Q2NmY3NTJmMDg0YTQ0YTM5MjA5ZTEyOWJmNjMyY2ZmYTk4Yzc3NWVhYjEzYTg1M2EmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.HPQLMPFAy9F7Lb6pr0mGqc6o6JhbTaV6boomvbXGxa0)
and NOT 4a-4c
Both formulations are equivalent (I think I have an implementation of (4a-4c) ) and are used in the paper to derive specific subcases of PD3O algorithm, PDHG, PAPC etc.
The stochastic version of PD3O proposed in https://arxiv.org/pdf/2004.02635 follows the (4a-4c) formulation where the gradient of f is computed ONCE at
x^{k}
.When we compute the gradient the second time
CIL/Wrappers/Python/cil/optimisation/algorithms/PD3O.py
Line 128 in d58c330
another function is selected and also the
data_passes
is updated wrongly. For instance in the first caseActually, I need to check again the actual
update
method because it has a different order from the paper above and the one that have implemented hereThe text was updated successfully, but these errors were encountered: