Skip to content

Issue downloading a specific installer image generated on factory.talos.dev #200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rdegez opened this issue Feb 12, 2025 · 6 comments
Open
Labels
bug Something isn't working

Comments

@rdegez
Copy link

rdegez commented Feb 12, 2025

Bug Report

When trying to add a new worker to an existing k8s cluster using a previously used Talos 1.7.5 installer image factory.talos.dev/installer/e75c3c6dd821c9efb5640b2e280732df16809ce4aea4b8743664e450bed5ee80:v1.7.5 we encountered the following error :

 user: warning: [2025-02-12T10:28:17.871898678Z]: [talos] upgrade request received: preserve false, staged false, force false, reboot mode DEFAULT                                                            
 user: warning: [2025-02-12T10:28:17.873703678Z]: [talos] validating "factory.talos.dev/installer/e75c3c6dd821c9efb5640b2e280732df16809ce4aea4b8743664e450bed5ee80:v1.7.5"                                    
 user: warning: [2025-02-12T10:28:18.457376678Z]: [talos] retrying error: failed to pull image "factory.talos.dev/installer/e75c3c6dd821c9efb5640b2e280732df16809ce4aea4b8743664e450bed5ee80:v1.7.5": unable  
 to fetch descriptor (sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855) which reports content size of zero: invalid argument   

Trying to pull this image directly using docker cli resulted in a similare failure:

$ docker pull factory.talos.dev/installer/e75c3c6dd821c9efb5640b2e280732df16809ce4aea4b8743664e450bed5ee80:v1.7.5
Error response from daemon: missing or empty Content-Type header

A bit puzzled by this, we tried to re-generate the URL directly from scratch following the whole workflow on https://factory.talos.dev :

  1. Hardware Type -> Bare-metal Machine
    Next

  2. Choose Talos Linux Version -> 1.7.5
    Next

  3. Architecture -> amd64 (secureboot DISABLED)
    Next

  4. System extensions :

  • siderolabs/zfs (2.2.4-v1.7.5)
  • siderolabs/qemu-guest-agent (8.2.2)
  • siderolabs/fuse3 (3.16.2)
    Next
  1. Customization -> Skipped i.e. None (default)
    Next

Result:

Schematic Ready
Your image schematic ID is: e75c3c6dd821c9efb5640b2e280732df16809ce4aea4b8743664e450bed5ee80
customization:
    systemExtensions:
        officialExtensions:
            - siderolabs/fuse3
            - siderolabs/qemu-guest-agent
            - siderolabs/zfs

<...>

Initial Installation
For the initial installation of Talos Linux (not applicable for disk image boot), add the following installer image to the machine configuration:
factory.talos.dev/installer/e75c3c6dd821c9efb5640b2e280732df16809ce4aea4b8743664e450bed5ee80:v1.7.5

Upgrading Talos Linux
To [upgrade](https://www.talos.dev/v1.7/talos-guides/upgrading-talos/) Talos Linux on the machine, use the following image:
factory.talos.dev/installer/e75c3c6dd821c9efb5640b2e280732df16809ce4aea4b8743664e450bed5ee80:v1.7.5

The full URL that would produce the exact same result page is https://factory.talos.dev/?arch=amd64&board=undefined&cmdline-set=true&extensions=-&extensions=siderolabs%2Ffuse3&extensions=siderolabs%2Fqemu-guest-agent&extensions=siderolabs%2Fzfs&platform=metal&secureboot=undefined&target=metal&version=1.7.5

=> Same image link, same results (docker pull won't work)

Now just add any other extension (for instance I added amdgpu-firmware and got this URL https://factory.talos.dev/?arch=amd64&board=undefined&cmdline-set=true&extensions=-&extensions=siderolabs%2Famdgpu-firmware&extensions=siderolabs%2Ffuse3&extensions=siderolabs%2Fqemu-guest-agent&extensions=siderolabs%2Fzfs&platform=metal&secureboot=undefined&target=metal&version=1.7.5

=> Now I have this image link factory.talos.dev/installer/e1fa3815306e2d7605bd1091a42b50979091202aead5960bd2b8e724f13efe3d:v1.7.5 and docker pull on it is working just fine.

So I guess there is something fishy on the OCI registry under factory.talos.dev ?
Maybe an upgrade of OCI format change that happened after the image was first generated months ago ?

@smira
Copy link
Member

smira commented Feb 12, 2025

No, this is just a bug, but anyways now it should be fixed. The real issue will be fixed later.

@smira smira transferred this issue from siderolabs/talos Feb 12, 2025
@smira smira added the bug Something isn't working label Feb 12, 2025
@smira
Copy link
Member

smira commented Feb 12, 2025

See google/go-containerregistry#2060

The root cause seems to be that a fetcher got cached with canceled context (as context is bound to incoming HTTP request).

@rdegez
Copy link
Author

rdegez commented Feb 12, 2025

No, this is just a bug, but anyways now it should be fixed. The real issue will be fixed later.

Thanks for the insanely fast answer & fix as always! ❤
I do confirm that it's fixed now.
Cheers

@smira
Copy link
Member

smira commented Feb 14, 2025

If the library authors don't respond, there are two paths out:

  • fork and try to fix
  • create pusher/puller each time instead of caching it (it should have small impact on pushing, as it's authenticated, but otherwise should be fine).

@mozarik
Copy link

mozarik commented Mar 9, 2025

I encounter this bug also when trying to do this step https://www.talos.dev/v1.9/talos-guides/install/virtualized-platforms/proxmox/#qemu-guest-agent-support

i can pull the image from docker but it cannot pull it when inside talos

@smira
Copy link
Member

smira commented Mar 10, 2025

@mozarik please open a specific issue with relevant information. Your issue is different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants