Skip to content

Parcels parquet output files can't (always) be opened after a run crashes #2713

Description

@erikvansebille

What version of Parcels are you running?

v4

Is your feature request related to a problem?

If a pset.execute() raises an error during simulation, the parquet particlefile is often not openable, giving an error ArrowInvalid: Error creating dataset. Could not read schema from 'output-matroos.parquet'. Is this a 'parquet' file?: Could not open Parquet input source 'output-matroos.parquet': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.

---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
Cell In[8], [line 1](vscode-notebook-cell:?execution_count=8&line=1)
----> [1](vscode-notebook-cell:?execution_count=8&line=1) df = parcels.read_particlefile("output-matroos.parquet")
      2 
      3 tri_speed = ds["veluv_abs"].isel(time=0).isel(nMesh_face=xr.DataArray(tri_parent_face, dims="n_face")).values
      4 

File ~/Codes/drifter_campaign_prediction/Matroos/.pixi/envs/default/lib/python3.14/site-packages/parcels/_core/particlefile.py:247, in read_particlefile(path, decode_times)
    243 path = Path(path)
    245 assert path.suffix == ".parquet", "Only Parquet files are supported"
--> [247](https://file+.vscode-resource.vscode-cdn.net/Users/erik/Codes/drifter_campaign_prediction/Matroos/~/Codes/drifter_campaign_prediction/Matroos/.pixi/envs/default/lib/python3.14/site-packages/parcels/_core/particlefile.py:247) table = pq.read_table(path)
    249 try:
    250     time_field = table.field("time")

File ~/Codes/drifter_campaign_prediction/Matroos/.pixi/envs/default/lib/python3.14/site-packages/pyarrow/parquet/core.py:1891, in read_table(source, columns, use_threads, schema, use_pandas_metadata, read_dictionary, binary_type, list_type, memory_map, buffer_size, partitioning, filesystem, filters, ignore_prefixes, pre_buffer, coerce_int96_timestamp_unit, decryption_properties, thrift_string_size_limit, thrift_container_size_limit, page_checksum_verification, arrow_extensions_enabled)
   1879 def read_table(source, *, columns=None, use_threads=True,
   1880                schema=None, use_pandas_metadata=False, read_dictionary=None,
   1881                binary_type=None, list_type=None, memory_map=False, buffer_size=0,
   (...)   1887                page_checksum_verification=False,
   1888                arrow_extensions_enabled=True):
   1890     try:
-> [1891](https://file+.vscode-resource.vscode-cdn.net/Users/erik/Codes/drifter_campaign_prediction/Matroos/~/Codes/drifter_campaign_prediction/Matroos/.pixi/envs/default/lib/python3.14/site-packages/pyarrow/parquet/core.py:1891)         dataset = ParquetDataset(
   1892             source,
   1893             schema=schema,
   1894             filesystem=filesystem,
   1895             partitioning=partitioning,
   1896             memory_map=memory_map,
   1897             read_dictionary=read_dictionary,
   1898             binary_type=binary_type,
   1899             list_type=list_type,
   1900             buffer_size=buffer_size,
   1901             filters=filters,
   1902             ignore_prefixes=ignore_prefixes,
   1903             pre_buffer=pre_buffer,
   1904             coerce_int96_timestamp_unit=coerce_int96_timestamp_unit,
   1905             decryption_properties=decryption_properties,
   1906             thrift_string_size_limit=thrift_string_size_limit,
   1907             thrift_container_size_limit=thrift_container_size_limit,
   1908             page_checksum_verification=page_checksum_verification,
   1909             arrow_extensions_enabled=arrow_extensions_enabled,
   1910         )
   1911     except ImportError:
   1912         # fall back on ParquetFile for simple cases when pyarrow.dataset
   1913         # module is not available
   1914         if filters is not None:

File ~/Codes/drifter_campaign_prediction/Matroos/.pixi/envs/default/lib/python3.14/site-packages/pyarrow/parquet/core.py:1471, in ParquetDataset.__init__(self, path_or_paths, filesystem, schema, filters, read_dictionary, binary_type, list_type, memory_map, buffer_size, partitioning, ignore_prefixes, pre_buffer, coerce_int96_timestamp_unit, decryption_properties, thrift_string_size_limit, thrift_container_size_limit, page_checksum_verification, arrow_extensions_enabled)
   1467 if partitioning == "hive":
   1468     partitioning = ds.HivePartitioning.discover(
   1469         infer_dictionary=True)
-> [1471](https://file+.vscode-resource.vscode-cdn.net/Users/erik/Codes/drifter_campaign_prediction/Matroos/~/Codes/drifter_campaign_prediction/Matroos/.pixi/envs/default/lib/python3.14/site-packages/pyarrow/parquet/core.py:1471) self._dataset = ds.dataset(path_or_paths, filesystem=filesystem,
   1472                            schema=schema, format=parquet_format,
   1473                            partitioning=partitioning,
   1474                            ignore_prefixes=ignore_prefixes)

File ~/Codes/drifter_campaign_prediction/Matroos/.pixi/envs/default/lib/python3.14/site-packages/pyarrow/dataset.py:790, in dataset(source, schema, format, filesystem, partitioning, partition_base_dir, exclude_invalid_files, ignore_prefixes)
    779 kwargs = dict(
    780     schema=schema,
    781     filesystem=filesystem,
   (...)    786     selector_ignore_prefixes=ignore_prefixes
    787 )
    789 if _is_path_like(source):
--> [790](https://file+.vscode-resource.vscode-cdn.net/Users/erik/Codes/drifter_campaign_prediction/Matroos/~/Codes/drifter_campaign_prediction/Matroos/.pixi/envs/default/lib/python3.14/site-packages/pyarrow/dataset.py:790)     return _filesystem_dataset(source, **kwargs)
    791 elif isinstance(source, (tuple, list)):
    792     if all(_is_path_like(elem) or isinstance(elem, FileInfo) for elem in source):

File ~/Codes/drifter_campaign_prediction/Matroos/.pixi/envs/default/lib/python3.14/site-packages/pyarrow/dataset.py:482, in _filesystem_dataset(source, schema, filesystem, partitioning, format, partition_base_dir, exclude_invalid_files, selector_ignore_prefixes)
    474 options = FileSystemFactoryOptions(
    475     partitioning=partitioning,
    476     partition_base_dir=partition_base_dir,
    477     exclude_invalid_files=exclude_invalid_files,
    478     selector_ignore_prefixes=selector_ignore_prefixes
    479 )
    480 factory = FileSystemDatasetFactory(fs, paths_or_selector, format, options)
--> [482](https://file+.vscode-resource.vscode-cdn.net/Users/erik/Codes/drifter_campaign_prediction/Matroos/~/Codes/drifter_campaign_prediction/Matroos/.pixi/envs/default/lib/python3.14/site-packages/pyarrow/dataset.py:482) return factory.finish(schema)

File pyarrow/_dataset.pyx:3226, in pyarrow._dataset.DatasetFactory.finish()
-> [3226](https://file+.vscode-resource.vscode-cdn.net/Users/erik/Codes/drifter_campaign_prediction/Matroos/pyarrow/_dataset.pyx:3226) 'Could not get source, probably due dynamically evaluated source code.'

File pyarrow/error.pxi:155, in pyarrow.lib.pyarrow_internal_check_status()
--> [155](https://file+.vscode-resource.vscode-cdn.net/Users/erik/Codes/drifter_campaign_prediction/Matroos/pyarrow/error.pxi:155) 'Could not get source, probably due dynamically evaluated source code.'

File pyarrow/error.pxi:92, in pyarrow.lib.check_status()
---> [92](https://file+.vscode-resource.vscode-cdn.net/Users/erik/Codes/drifter_campaign_prediction/Matroos/pyarrow/error.pxi:92) 'Could not get source, probably due dynamically evaluated source code.'

ArrowInvalid: Error creating dataset. Could not read schema from 'output-matroos.parquet'. Is this a 'parquet' file?: Could not open Parquet input source 'output-matroos.parquet': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.

Describe the solution you'd like

Ideally, we always make sure that the parquet file is openable - even if the pset.execute() fails.

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageIssue that has not been reviewed by a Parcels team member

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions