Column-major arrays are returned as row-major #3072

kapadia · 2025-05-19T16:06:20Z

Zarr version

v3.0.8

Numcodecs version

0.16.0

Python Version

3.12.5

Operating System

macOS

Installation

uv pip into virtual environment

Description

I recall using fortran-style arrays with zarr v2 with no issue. Since zarr v3, I've observed that fortran-style arrays are read back as C-style arrays. If this is in fact a bug, and not an error on my part, I'd be happy to contribute a fix.

Steps to reproduce

import zarr

# Write fortran style array
group = zarr.group("column-major-example.zarr")
array = group.create_array(
    name="example",
    shape=(128, 128),
    dtype="float32",
    order="F",
    dimension_names=("row", "col"),
)
array[:] = 1


# Read back and ensure the order is preserved
group = zarr.open_group("column-major-example.zarr/", mode="r")
array = group["example"]
assert array[:].flags["F_CONTIGUOUS"]

Additional output

No response

The text was updated successfully, but these errors were encountered:

ianhi · 2025-05-19T22:00:07Z

edit

i think this may not be a bug, rather a not super well documented change in behavior for zarr format 3 vs 2, making the order of arrays read into memory a runtime concern.

~~I agree that this is a bug.~~

zarr format
This is only an issue if you use zarr_format=3, which is the default for zarr-python version 3. If you use zarr_format=2 when creating the group, the behavior is as expected.

ordering

This is pretty clearly not the correct behavior to silently drop this user input. I think the issue is because that in zarr format 3 the order is now delegated to the codec as noted in the spec

order has been replaced by the transpose codec,

However, it's pretty opaque to me how to actually implement that with zarr-python 3. My best guess would is to add:

filters=zarr.codecs.TransposeCodec(order=[1,0]), to the create_array argument (which would ideally happen automatically) but doing so does not seem to have the desired effect.

ianhi · 2025-05-19T22:40:30Z

These tests have some clues about what the intended behavior is:

zarr-python/tests/test_codecs/test_transpose.py

Lines 14 to 19 in 5710726

    
           @pytest.mark.parametrize("input_order", ["F", "C"]) 
        
           @pytest.mark.parametrize("runtime_write_order", ["F", "C"]) 
        
           @pytest.mark.parametrize("runtime_read_order", ["F", "C"]) 
        
           @pytest.mark.parametrize("with_sharding", [True, False]) 
        
           @pytest.mark.parametrize("store", ["local", "memory"], indirect=["store"]) 
        
           async def test_transpose(

though so seemingly the read byte order is set via the global config rather than as it was saved on disk?

zarr-python/tests/test_codecs/test_transpose.py

Lines 44 to 46 in 5710726

    
           with config.set({"array.order": runtime_read_order}): 
        
               a = await AsyncArray.open( 
        
                   spath,

Indeed doing so passes your assert:

group = zarr.open_group("column-major-example.zarr/", mode="r", )
with config.set({"array.order": 'F'}):
    array = group["example"]
assert array[:].flags["F_CONTIGUOUS"]

at a minimum I think the docs could be improved to specify what a user should expect. Naively I would have expected that if I wrote with F order it should be reopened as such. but per this page of the docs: https://zarr.readthedocs.io/en/stable/user-guide/config.html it seems to be purely a runtime concern. Maybe a short sentence in teh migration guide

kapadia · 2025-05-20T22:22:30Z

@ianhi - thanks for surfacing the relevant documentation. A minor snippet in the migration guide might be helpful. Perhaps the following:

Array order - reading F-contiguous arrays should be set as a runtime configuration. Refer
to Zarr's :ref:runtime configuration <user-guide-config>.

I'd like to better understand what's happening with F-contiguous arrays under the hood. Barring compression and other codecs, are bytes serialized in the same order as the in-memory layout? If so, we might expect the size of the compressed chunk to be different depending on C- or F-style arrays.

The following demonstrates that a random array with everything equivalent, except the order, results in two compressed chunks written to disk with the same size. Does this imply that regardless of an array's memory layout, it's always re-ordered to C-contiguous during serialization?

import numpy as np
import zarr

# Start with a random array of int16
rng = np.random.default_rng()
arr = (
    10_000 * rng.random(size=1024 * 1024, dtype=np.float32).reshape((1024, 1024))
).astype(np.int16)

# Write C style array
group_1 = zarr.group("c-style.zarr")
array_1 = group_1.create_array(
    name="example",
    shape=(1024, 1024),
    chunks=(1024, 1024),
    dtype="float32",
    order="C",
    dimension_names=("row", "col"),
)
array_1[:] = arr

# Write fortran style array
group_2 = zarr.group("f-style.zarr")
array_2 = group_2.create_array(
    name="example",
    shape=(1024, 1024),
    chunks=(1024, 1024),
    dtype="float32",
    order="F",
    dimension_names=("row", "col"),
)
array_2[:] = np.asfortranarray(arr)

with open("c-style.zarr/example/c/0/0", "rb") as fp:
    data_1 = fp.read()

with open("f-style.zarr/example/c/0/0", "rb") as fp:
    data_2 = fp.read()

assert len(data_1) != len(
    data_2
), "Compressed chunks are expected to have different sizes"

dstansby · 2025-05-21T11:59:07Z

This should at least be giving you a warning - see #2948.

kapadia added the bug Potential issues with the zarr-python library label May 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Column-major arrays are returned as row-major #3072

Column-major arrays are returned as row-major #3072

kapadia commented May 19, 2025

ianhi commented May 19, 2025 •

edited

Loading

Uh oh!

ianhi commented May 19, 2025

Uh oh!

kapadia commented May 20, 2025

Uh oh!

dstansby commented May 21, 2025

Uh oh!

Uh oh!

Column-major arrays are returned as row-major #3072

Column-major arrays are returned as row-major #3072

Comments

kapadia commented May 19, 2025

Zarr version

Numcodecs version

Python Version

Operating System

Installation

Description

Steps to reproduce

Additional output

ianhi commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ianhi commented May 19, 2025

Uh oh!

kapadia commented May 20, 2025

Uh oh!

dstansby commented May 21, 2025

Uh oh!

ianhi commented May 19, 2025 •

edited

Loading