Zarr version
v3.1.5
Numcodecs version
0.16.5
Python Version
3.11
Operating System
Mac
Installation
uv add zarr
Description
I maintain zarr-python-n5, which implements the n5_default codec. This codec wraps a number of internal codecs (specifically a transpose, a bytes, and optionally a bytes-to-bytes).
The wrapped bytes codec MUST be big-endian (although None is accepted for single-byte types). My platform is little-endian.
As part of the n5_default codec's evolve_from_array_spec method, I evolve_from_array_spec its constituent codecs, because that seemed like the right thing to do. For single-byte types, this erases the endianness of the wrapped bytes codec, so when it serialises it becomes {"name": "bytes"}. This means that when I deserialise the codec, BytesCodec.from_dict instantiates the codec with BytesCodec(**{}). When endian is not given, it defaults to the platform's endianness ("little", for me). This means that I can instantiate an explicitly big-endian codec, then once it's roundtripped, I get a little-endian codec back, which I found surprising (and also breaks my n5_default codec validation).
I understand that I could just not evolve the wrapped codecs in my case, or not validate that the codec is big-or-none. However, IMO the bytes codec defaulting to the system endianness when the no-config form is passed to from_dict is surprising and unnecessary. Instead, it should take None from the no-config form. If None is not valid in this case, that's due to an error on the part of the writer and zarr-python shouldn't fabricate possibly-incorrect metadata to account for that.
Steps to reproduce
# /// script
# requires-python = ">=3.12"
# dependencies = [
# "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
# ]
# ///
import sys
from zarr.codecs import BytesCodec, Endian
from zarr.core.array_spec import ArraySpec, ArrayConfig
from zarr.buffer import default_buffer_prototype
assert sys.byteorder == "little"
original = BytesCodec(endian=Endian("big"))
evolved = original.evolve_from_array_spec(
ArraySpec((2, 2), "uint8", 0, ArrayConfig("C", False), default_buffer_prototype())
)
serialised = evolved.to_dict()
assert serialised.get("configuration") is None
deserialised = BytesCodec.from_dict(serialised)
# System byteorder, not the byteorder I explicitly gave
assert deserialised.endian == Endian("little")
# I want this to fail, but it doesn't.
assert original.endian != deserialised.endian
Additional output
No response
Zarr version
v3.1.5
Numcodecs version
0.16.5
Python Version
3.11
Operating System
Mac
Installation
uv add zarrDescription
I maintain zarr-python-n5, which implements the n5_default codec. This codec wraps a number of internal codecs (specifically a transpose, a bytes, and optionally a bytes-to-bytes).
The wrapped bytes codec MUST be big-endian (although None is accepted for single-byte types). My platform is little-endian.
As part of the n5_default codec's
evolve_from_array_specmethod, Ievolve_from_array_specits constituent codecs, because that seemed like the right thing to do. For single-byte types, this erases the endianness of the wrappedbytescodec, so when it serialises it becomes{"name": "bytes"}. This means that when I deserialise the codec,BytesCodec.from_dictinstantiates the codec withBytesCodec(**{}). Whenendianis not given, it defaults to the platform's endianness ("little", for me). This means that I can instantiate an explicitly big-endian codec, then once it's roundtripped, I get a little-endian codec back, which I found surprising (and also breaks myn5_defaultcodec validation).I understand that I could just not evolve the wrapped codecs in my case, or not validate that the codec is big-or-none. However, IMO the bytes codec defaulting to the system endianness when the no-config form is passed to
from_dictis surprising and unnecessary. Instead, it should takeNonefrom the no-config form. If None is not valid in this case, that's due to an error on the part of the writer and zarr-python shouldn't fabricate possibly-incorrect metadata to account for that.Steps to reproduce
Additional output
No response