Skip to content

Auth expiration causes failed version fetches #235

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Maks1mS opened this issue May 21, 2025 · 2 comments · Fixed by #241
Closed

Auth expiration causes failed version fetches #235

Maks1mS opened this issue May 21, 2025 · 2 comments · Fixed by #241

Comments

@Maks1mS
Copy link

Maks1mS commented May 21, 2025

Hi!
I try to setup my own Image Factory with Forgejo as image-registry.
Everything works well initially, but after some time (usually the next day), I encounter the following error when trying to fetch Talos versions:
e.g.

{"level":"info","ts":1747823359.8735535,"caller":"artifacts/versions.go:26","msg":"fetching available Talos versions"}
{"level":"info","ts":1747823359.8916688,"caller":"http/http.go:190","msg":"request","frontend":"http","method":"POST","path":"/ui/wizard","error":"failed to list Talos versions: GET https://mydomain.com/v2/myorg/imager/tags/list?n=1000: unexpected status code 401 Unauthorized: authGroup.Verify\n"}

The error resolves if I restart the Image Factory, without changing any credentials.

In an attempt to solve the problem, I tried changing username to <token>, as described in go-containerregistry auth docs, but the issue persists.

Proposal

As I understand it, this happens because pullers are created once and then cached, which causes problems not only in my case (see #200 (comment)).

Would it be possible to introduce a TTL (e.g. re-initialize puller every 10–15 minutes)? I suspect that, at least in my case, this would have solved the problem.

May be something like this:

const pullerTTL = 10 * time.Minute

func (m *Manager) getPuller(arch Arch) (*remote.Puller, error) {
	m.pullersMu.Lock()
	defer m.pullersMu.Unlock()

	now := time.Now()

	if cached, ok := m.pullers[arch]; ok {
		if now.Sub(cached.createdAt) < pullerTTL {
			return cached.puller, nil
		}
	}

	puller, err := remote.NewPuller(
		remote.WithPlatform(v1.Platform{
			Architecture: string(arch),
			OS:           "linux",
		}),
		remote.WithAuthFromKeychain(
			authn.NewMultiKeychain(
				authn.DefaultKeychain,
				github.Keychain,
				google.Keychain,
			),
		),
	)
	if err != nil {
		return nil, fmt.Errorf("failed to create puller: %w", err)
	}

	m.pullers[arch] = cachedPuller{
		puller:    puller,
		createdAt: now,
	}

	return puller, nil
}

I could send a PR if you don't mind this change.

@smira
Copy link
Member

smira commented May 21, 2025

The registry library should automatically refresh credentials, and in fact it does so (e.g. with GitHub container registry).

So I'm not exactly sure what exactly is going on for you.

We have unrelated issue with this library #231 which might lead us to removing the cached pullers completely.

@Maks1mS
Copy link
Author

Maks1mS commented May 26, 2025

I did a bit of research on this.
It looks like ghcr.io issues a token with a fairly long lifespan, which is likely why there are no noticeable issues in most cases.

Here’s what the request sequence looks like when running crane ls:

--> GET https://ghcr.io/v2/
<-- 401 https://ghcr.io/v2/
--> GET https://ghcr.io/token?scope=repository%3Asiderolabs%2Ftalos%3Apull&service=ghcr.io
<-- 200 https://ghcr.io/token?scope=repository%3Asiderolabs%2Ftalos%3Apull&service=ghcr.io
--> GET https://ghcr.io/v2/siderolabs/talos/tags/list?n=1000
<-- 200 https://ghcr.io/v2/siderolabs/talos/tags/list?n=1000

The token I received on May 21st from https://ghcr.io/token is still valid. I couldn’t find any public documentation on its exact expiration time, but it seems to live long enough for the system to reach the next scheduled or incidental restart (such as after an image-factory update), during which the token gets refreshed automatically and everything continues to work smoothly.

In my case, however, the token obtained during the /token step expires after about 24 hours.

smira added a commit to smira/image-service that referenced this issue Jun 3, 2025
This is not a fix, but a bit of a workaround for issues in the upstream
library.

Use a refresh on interval strategy to ensure that both remote pushers
and pullers are refreshed.

Fixes siderolabs#231

Fixes siderolabs#235

Signed-off-by: Andrey Smirnov <[email protected]>
smira added a commit to smira/image-service that referenced this issue Jun 3, 2025
This is not a fix, but a bit of a workaround for issues in the upstream
library.

Use a refresh on interval strategy to ensure that both remote pushers
and pullers are refreshed.

Fixes siderolabs#231

Fixes siderolabs#235

Signed-off-by: Andrey Smirnov <[email protected]>
smira added a commit to smira/image-service that referenced this issue Jun 3, 2025
This is not a fix, but a bit of a workaround for issues in the upstream
library.

Use a refresh on interval strategy to ensure that both remote pushers
and pullers are refreshed.

Fixes siderolabs#231

Fixes siderolabs#235

Signed-off-by: Andrey Smirnov <[email protected]>
smira added a commit to smira/image-service that referenced this issue Jun 4, 2025
This is not a fix, but a bit of a workaround for issues in the upstream
library.

Use a refresh on interval strategy to ensure that both remote pushers
and pullers are refreshed.

Fixes siderolabs#231

Fixes siderolabs#235

Signed-off-by: Andrey Smirnov <[email protected]>
smira added a commit to smira/image-service that referenced this issue Jun 4, 2025
This is not a fix, but a bit of a workaround for issues in the upstream
library.

Use a refresh on interval strategy to ensure that both remote pushers
and pullers are refreshed.

Fixes siderolabs#231

Fixes siderolabs#235

Signed-off-by: Andrey Smirnov <[email protected]>
smira added a commit to smira/image-service that referenced this issue Jun 5, 2025
This is not a fix, but a bit of a workaround for issues in the upstream
library.

Use a refresh on interval strategy to ensure that both remote pushers
and pullers are refreshed.

Fixes siderolabs#231

Fixes siderolabs#235

Signed-off-by: Andrey Smirnov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants