Skip to content

Column-major arrays are returned as row-major #3072

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kapadia opened this issue May 19, 2025 · 4 comments
Open

Column-major arrays are returned as row-major #3072

kapadia opened this issue May 19, 2025 · 4 comments
Labels
bug Potential issues with the zarr-python library

Comments

@kapadia
Copy link

kapadia commented May 19, 2025

Zarr version

v3.0.8

Numcodecs version

0.16.0

Python Version

3.12.5

Operating System

macOS

Installation

uv pip into virtual environment

Description

I recall using fortran-style arrays with zarr v2 with no issue. Since zarr v3, I've observed that fortran-style arrays are read back as C-style arrays. If this is in fact a bug, and not an error on my part, I'd be happy to contribute a fix.

Steps to reproduce

import zarr

# Write fortran style array
group = zarr.group("column-major-example.zarr")
array = group.create_array(
    name="example",
    shape=(128, 128),
    dtype="float32",
    order="F",
    dimension_names=("row", "col"),
)
array[:] = 1


# Read back and ensure the order is preserved
group = zarr.open_group("column-major-example.zarr/", mode="r")
array = group["example"]
assert array[:].flags["F_CONTIGUOUS"]

Additional output

No response

@kapadia kapadia added the bug Potential issues with the zarr-python library label May 19, 2025
@ianhi
Copy link
Contributor

ianhi commented May 19, 2025

edit

i think this may not be a bug, rather a not super well documented change in behavior for zarr format 3 vs 2, making the order of arrays read into memory a runtime concern.


I agree that this is a bug.

zarr format
This is only an issue if you use zarr_format=3, which is the default for zarr-python version 3. If you use zarr_format=2 when creating the group, the behavior is as expected.

ordering

This is pretty clearly not the correct behavior to silently drop this user input. I think the issue is because that in zarr format 3 the order is now delegated to the codec as noted in the spec

order has been replaced by the transpose codec,

However, it's pretty opaque to me how to actually implement that with zarr-python 3. My best guess would is to add:

filters=zarr.codecs.TransposeCodec(order=[1,0]), to the create_array argument (which would ideally happen automatically) but doing so does not seem to have the desired effect.

@ianhi
Copy link
Contributor

ianhi commented May 19, 2025

These tests have some clues about what the intended behavior is:

@pytest.mark.parametrize("input_order", ["F", "C"])
@pytest.mark.parametrize("runtime_write_order", ["F", "C"])
@pytest.mark.parametrize("runtime_read_order", ["F", "C"])
@pytest.mark.parametrize("with_sharding", [True, False])
@pytest.mark.parametrize("store", ["local", "memory"], indirect=["store"])
async def test_transpose(

though so seemingly the read byte order is set via the global config rather than as it was saved on disk?

with config.set({"array.order": runtime_read_order}):
a = await AsyncArray.open(
spath,

Indeed doing so passes your assert:

group = zarr.open_group("column-major-example.zarr/", mode="r", )
with config.set({"array.order": 'F'}):
    array = group["example"]
assert array[:].flags["F_CONTIGUOUS"]

at a minimum I think the docs could be improved to specify what a user should expect. Naively I would have expected that if I wrote with F order it should be reopened as such. but per this page of the docs: https://zarr.readthedocs.io/en/stable/user-guide/config.html it seems to be purely a runtime concern. Maybe a short sentence in teh migration guide

@kapadia
Copy link
Author

kapadia commented May 20, 2025

@ianhi - thanks for surfacing the relevant documentation. A minor snippet in the migration guide might be helpful. Perhaps the following:

  1. Array order - reading F-contiguous arrays should be set as a runtime configuration. Refer
    to Zarr's :ref:runtime configuration <user-guide-config>.

I'd like to better understand what's happening with F-contiguous arrays under the hood. Barring compression and other codecs, are bytes serialized in the same order as the in-memory layout? If so, we might expect the size of the compressed chunk to be different depending on C- or F-style arrays.

The following demonstrates that a random array with everything equivalent, except the order, results in two compressed chunks written to disk with the same size. Does this imply that regardless of an array's memory layout, it's always re-ordered to C-contiguous during serialization?

import numpy as np
import zarr

# Start with a random array of int16
rng = np.random.default_rng()
arr = (
    10_000 * rng.random(size=1024 * 1024, dtype=np.float32).reshape((1024, 1024))
).astype(np.int16)

# Write C style array
group_1 = zarr.group("c-style.zarr")
array_1 = group_1.create_array(
    name="example",
    shape=(1024, 1024),
    chunks=(1024, 1024),
    dtype="float32",
    order="C",
    dimension_names=("row", "col"),
)
array_1[:] = arr

# Write fortran style array
group_2 = zarr.group("f-style.zarr")
array_2 = group_2.create_array(
    name="example",
    shape=(1024, 1024),
    chunks=(1024, 1024),
    dtype="float32",
    order="F",
    dimension_names=("row", "col"),
)
array_2[:] = np.asfortranarray(arr)

with open("c-style.zarr/example/c/0/0", "rb") as fp:
    data_1 = fp.read()

with open("f-style.zarr/example/c/0/0", "rb") as fp:
    data_2 = fp.read()

assert len(data_1) != len(
    data_2
), "Compressed chunks are expected to have different sizes"

@dstansby
Copy link
Contributor

This should at least be giving you a warning - see #2948.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

No branches or pull requests

3 participants