Auth expiration causes failed version fetches #235

Maks1mS · 2025-05-21T12:08:54Z

Hi!
I try to setup my own Image Factory with Forgejo as image-registry.
Everything works well initially, but after some time (usually the next day), I encounter the following error when trying to fetch Talos versions:
e.g.

{"level":"info","ts":1747823359.8735535,"caller":"artifacts/versions.go:26","msg":"fetching available Talos versions"}
{"level":"info","ts":1747823359.8916688,"caller":"http/http.go:190","msg":"request","frontend":"http","method":"POST","path":"/ui/wizard","error":"failed to list Talos versions: GET https://mydomain.com/v2/myorg/imager/tags/list?n=1000: unexpected status code 401 Unauthorized: authGroup.Verify\n"}

The error resolves if I restart the Image Factory, without changing any credentials.

In an attempt to solve the problem, I tried changing username to <token>, as described in go-containerregistry auth docs, but the issue persists.

Proposal

As I understand it, this happens because pullers are created once and then cached, which causes problems not only in my case (see #200 (comment)).

Would it be possible to introduce a TTL (e.g. re-initialize puller every 10–15 minutes)? I suspect that, at least in my case, this would have solved the problem.

May be something like this:

const pullerTTL = 10 * time.Minute

func (m *Manager) getPuller(arch Arch) (*remote.Puller, error) {
	m.pullersMu.Lock()
	defer m.pullersMu.Unlock()

	now := time.Now()

	if cached, ok := m.pullers[arch]; ok {
		if now.Sub(cached.createdAt) < pullerTTL {
			return cached.puller, nil
		}
	}

	puller, err := remote.NewPuller(
		remote.WithPlatform(v1.Platform{
			Architecture: string(arch),
			OS:           "linux",
		}),
		remote.WithAuthFromKeychain(
			authn.NewMultiKeychain(
				authn.DefaultKeychain,
				github.Keychain,
				google.Keychain,
			),
		),
	)
	if err != nil {
		return nil, fmt.Errorf("failed to create puller: %w", err)
	}

	m.pullers[arch] = cachedPuller{
		puller:    puller,
		createdAt: now,
	}

	return puller, nil
}

I could send a PR if you don't mind this change.

The text was updated successfully, but these errors were encountered:

smira · 2025-05-21T13:39:15Z

The registry library should automatically refresh credentials, and in fact it does so (e.g. with GitHub container registry).

So I'm not exactly sure what exactly is going on for you.

We have unrelated issue with this library #231 which might lead us to removing the cached pullers completely.

Maks1mS · 2025-05-26T07:11:57Z

I did a bit of research on this.
It looks like ghcr.io issues a token with a fairly long lifespan, which is likely why there are no noticeable issues in most cases.

Here’s what the request sequence looks like when running crane ls:

--> GET https://ghcr.io/v2/
<-- 401 https://ghcr.io/v2/
--> GET https://ghcr.io/token?scope=repository%3Asiderolabs%2Ftalos%3Apull&service=ghcr.io
<-- 200 https://ghcr.io/token?scope=repository%3Asiderolabs%2Ftalos%3Apull&service=ghcr.io
--> GET https://ghcr.io/v2/siderolabs/talos/tags/list?n=1000
<-- 200 https://ghcr.io/v2/siderolabs/talos/tags/list?n=1000

The token I received on May 21st from https://ghcr.io/token is still valid. I couldn’t find any public documentation on its exact expiration time, but it seems to live long enough for the system to reach the next scheduled or incidental restart (such as after an image-factory update), during which the token gets refreshed automatically and everything continues to work smoothly.

In my case, however, the token obtained during the /token step expires after about 24 hours.

This is not a fix, but a bit of a workaround for issues in the upstream library. Use a refresh on interval strategy to ensure that both remote pushers and pullers are refreshed. Fixes siderolabs#231 Fixes siderolabs#235 Signed-off-by: Andrey Smirnov <[email protected]>

smira mentioned this issue Jun 3, 2025

fix: refresh remote pullers and pushers on interval #241

Merged

talos-bot closed this as completed in d9ebc5a Jun 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Auth expiration causes failed version fetches #235

Auth expiration causes failed version fetches #235

Maks1mS commented May 21, 2025

smira commented May 21, 2025

Uh oh!

Maks1mS commented May 26, 2025

Uh oh!

Auth expiration causes failed version fetches #235

Auth expiration causes failed version fetches #235

Comments

Maks1mS commented May 21, 2025

Proposal

smira commented May 21, 2025

Uh oh!

Maks1mS commented May 26, 2025

Uh oh!