Is your feature request related to a problem? Please describe.
I am not 100% certain if this request is supported from the parquet side for shape/points (only going based on https://spatialdata.scverse.org/en/stable/api/SpatialData.html#spatialdata.SpatialData.write) but I'm following up on https://scverse.zulipchat.com/#narrow/channel/315824-spatial/topic/dataloaders.20for.20spatial.20omics/near/591781853
Describe the solution you'd like
I want the anndata object to be ordered by some form of spatial coherence - I can think of two versions:
- A pure-spatial approach that relies on some form of quad-tree to do the partitioning and the data would be written in some ordering (C? morton?) of the spatial coordinates of each 4-element leaf node of the tree.
- A user-defined neighborhood approach that outputs an approximate ordering so that neighbors are near one-another according to a user-defined neighborhood graph
Describe alternatives you've considered
For the second option, we could upstream this into anndata theoretically or factor out the "generate the linearized in-memory ordering of the indices" part to be reusable. This might actually make more sense TBH but it's not immediately clear to me if there will be a large benefit of the second over the first aside from the fact that the second would be more generalizable to other use-cases outside of spatial data, at the downside of requiring a (spatial) graph to be formed (which requires some form of computation?).
Additional context
The need here is produce data whose ordering is amenable to fast data loading via "batched fetching" i.e., ensuring as many data points as possible (for performance reasons) are contiguous on-disk.
I am not super familiar with spatial workflows so perhaps one makes more sense than the other in a normal pipeline, but I basically suspect we will end up implementing both TBH.
cc @ori-kron-wis @timtreis
Is your feature request related to a problem? Please describe.
I am not 100% certain if this request is supported from the
parquetside for shape/points (only going based on https://spatialdata.scverse.org/en/stable/api/SpatialData.html#spatialdata.SpatialData.write) but I'm following up on https://scverse.zulipchat.com/#narrow/channel/315824-spatial/topic/dataloaders.20for.20spatial.20omics/near/591781853Describe the solution you'd like
I want the
anndataobject to be ordered by some form of spatial coherence - I can think of two versions:Describe alternatives you've considered
For the second option, we could upstream this into anndata theoretically or factor out the "generate the linearized in-memory ordering of the indices" part to be reusable. This might actually make more sense TBH but it's not immediately clear to me if there will be a large benefit of the second over the first aside from the fact that the second would be more generalizable to other use-cases outside of spatial data, at the downside of requiring a (spatial) graph to be formed (which requires some form of computation?).
Additional context
The need here is produce data whose ordering is amenable to fast data loading via "batched fetching" i.e., ensuring as many data points as possible (for performance reasons) are contiguous on-disk.
I am not super familiar with spatial workflows so perhaps one makes more sense than the other in a normal pipeline, but I basically suspect we will end up implementing both TBH.
cc @ori-kron-wis @timtreis