Skip to content

problem with numpy type error (not serializable) #550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mdsumner opened this issue Apr 14, 2025 · 3 comments
Open

problem with numpy type error (not serializable) #550

mdsumner opened this issue Apr 14, 2025 · 3 comments
Labels
bug Something isn't working Kerchunk Relating to the kerchunk library / specification itself

Comments

@mdsumner
Copy link
Contributor

I see this

from virtualizarr import  open_virtual_dataset
u = 'https://thredds.nci.org.au/thredds/fileServer/gb6/BRAN/BRAN2023/daily/ocean_salt_2024_06.nc'

ds = open_virtual_dataset(u)

ds.virtualize.to_kerchunk('/tmp/test.parquet', format = "parquet")
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "/VirtualiZarr/virtualizarr/accessor.py", line 137, in to_kerchunk
#     refs = dataset_to_kerchunk_refs(self.ds)
#            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#   File "/VirtualiZarr/virtualizarr/writers/kerchunk.py", line 72, in dataset_to_kerchunk_refs
#     ".zattrs": ujson.dumps(attrs),
#                ^^^^^^^^^^^^^^^^^^
# TypeError: np.int32(20) is not JSON serializable


## drop the problem numpy attribute
ds.attrs['NumFilesInSet'] = None
## now it works
ds.virtualize.to_kerchunk('/tmp/test.parquet', format = "parquet")

I wonder if this typing in attributes has a general solution? Appreciate this may be a kerchunk topic

(it takes a few minutes to virtualize from URL I'm afraid, it's a 4.3Gb file)

@mdsumner mdsumner changed the title problem with numpy type error (not seralizable) problem with numpy type error (not serializable) Apr 14, 2025
@TomNicholas
Copy link
Member

This is an example where the correct behavior is simply whatever the kechunk spec says to do / the kerchunk library actually does. Clearly throwing an error is wrong, but otherwise it would be helpful to know what Kerchunk-like expected behavior is.

@TomNicholas
Copy link
Member

Are you able to serialize other numpy dtypes? Presumably we must be able to?

@TomNicholas TomNicholas added bug Something isn't working Kerchunk Relating to the kerchunk library / specification itself labels Apr 14, 2025
@rabernat
Copy link
Collaborator

This attribute needs to just be coerced to a plain int.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Kerchunk Relating to the kerchunk library / specification itself
Projects
None yet
Development

No branches or pull requests

3 participants