What we're seeing
A custom codec (a SchemaCodec subclass) was built to store xarray datasets as NetCDF in an external protocol: file store, one .nc file per row. Two problems show up around deletion and garbage collection:
- Deleting a row does not remove its external file. The
.nc file is left orphaned on disk.
- Running
dj.gc.collect() to clean up those orphans also removes files that are still referenced by existing rows. dj.gc.scan() reports files belonging to live rows as orphaned, so collect() deletes them along with the genuinely orphaned ones, leaving existing rows pointing at files that no longer exist.
The net effect is that the built-in dj.gc tooling offers no safe cleanup path for a custom codec's external store. Skipping GC leaves orphaned files behind on every delete, and running GC deletes files that are still in use.
Sequence that triggers it
- Insert rows into a table with a custom
<codec@store> column (each insert writes one file to the store).
- Delete a subset of the rows.
- Run
dj.gc.collect(schema, store_name=..., dry_run=False) to reclaim the orphaned files.
- The files for the rows that were not deleted are removed too.
Environment
What we're seeing
A custom codec (a
SchemaCodecsubclass) was built to storexarraydatasets as NetCDF in an externalprotocol: filestore, one.ncfile per row. Two problems show up around deletion and garbage collection:.ncfile is left orphaned on disk.dj.gc.collect()to clean up those orphans also removes files that are still referenced by existing rows.dj.gc.scan()reports files belonging to live rows as orphaned, socollect()deletes them along with the genuinely orphaned ones, leaving existing rows pointing at files that no longer exist.The net effect is that the built-in
dj.gctooling offers no safe cleanup path for a custom codec's external store. Skipping GC leaves orphaned files behind on every delete, and running GC deletes files that are still in use.Sequence that triggers it
<codec@store>column (each insert writes one file to the store).dj.gc.collect(schema, store_name=..., dry_run=False)to reclaim the orphaned files.Environment
SchemaCodecsubclass backed by aprotocol: filestore