Describe the bug
When running the full test suite on models and training examples, three distinct test failures/hangs consistently block test runners:
-
AutoencoderVidTok return format mismatch:
AutoencoderVidTok.forward returns a raw tensor dec when return_dict=False. However, the standard VAE/autoencoder API contract in Diffusers requires returning a single-element tuple (dec,). Because return_dict=False returned a raw tensor, the test fixture base_model_output indexed it with [0], slicing the batch dimension out (converting shape [4, 3, 16, 32, 32] to [3, 16, 32, 32]). The training check run (test_group_offloading) returns a DecoderOutput where [0] gets sample (shape [4, 3, 16, 32, 32]), resulting in a shape mismatch failure.
-
AutoencoderDC mixed-precision training crash:
TestAutoencoderDCTraining::test_mixed_precision_training crashes with RuntimeError: "GET was unable to find an engine to execute this computation" in specific CUDA/cuDNN runner environments due to missing matching execution engines for the DC architecture layers.
-
Examples launcher hangs / timeouts:
In single-device/CPU CI workflows, run_command in examples/test_examples_utils.py uses subprocess.check_output without a timeout, causing execution to block indefinitely if a deadlock occurs. Additionally, the test runner does not enforce single-process launch, which can deadlock when training commands try to coordinate multi-device setups.
I have verified clean fixes for all of these issues locally and will submit a PR shortly.
Reproduction
- Run VidTok autoencoder memory tests:
pytest tests/models/autoencoders/test_models_autoencoder_vidtok.py -k "test_group_offloading"
- Run DC autoencoder training tests on a CUDA device with restricted cuDNN configurations.
- Run unconditional image generation example tests on a single-GPU/CPU host:
pytest examples/unconditional_image_generation/test_unconditional.py
Logs
1. VidTok Offloading shape mismatch:
AssertionError: Shape mismatch — actual torch.Size([3, 16, 32, 32]) vs expected torch.Size([4, 3, ...])
2. AutoencoderDC mixed precision:
RuntimeError: "GET was unable to find an engine to execute this computation"
3. Examples hang:
pytest-timeout (>60s) while reading subprocess stdout
System Info
- Platform: macOS / Linux (CI runner)
- PyTorch version: 2.x
- Diffusers version: 0.30.0.dev0
Who can help?
@sayakpaul @DN6 @pcuenca
Describe the bug
When running the full test suite on models and training examples, three distinct test failures/hangs consistently block test runners:
AutoencoderVidTok return format mismatch:
AutoencoderVidTok.forwardreturns a raw tensordecwhenreturn_dict=False. However, the standard VAE/autoencoder API contract in Diffusers requires returning a single-element tuple(dec,). Becausereturn_dict=Falsereturned a raw tensor, the test fixturebase_model_outputindexed it with[0], slicing the batch dimension out (converting shape[4, 3, 16, 32, 32]to[3, 16, 32, 32]). The training check run (test_group_offloading) returns aDecoderOutputwhere[0]getssample(shape[4, 3, 16, 32, 32]), resulting in a shape mismatch failure.AutoencoderDC mixed-precision training crash:
TestAutoencoderDCTraining::test_mixed_precision_trainingcrashes withRuntimeError: "GET was unable to find an engine to execute this computation"in specific CUDA/cuDNN runner environments due to missing matching execution engines for the DC architecture layers.Examples launcher hangs / timeouts:
In single-device/CPU CI workflows,
run_commandinexamples/test_examples_utils.pyusessubprocess.check_outputwithout a timeout, causing execution to block indefinitely if a deadlock occurs. Additionally, the test runner does not enforce single-process launch, which can deadlock when training commands try to coordinate multi-device setups.I have verified clean fixes for all of these issues locally and will submit a PR shortly.
Reproduction
pytest tests/models/autoencoders/test_models_autoencoder_vidtok.py -k "test_group_offloading"Logs
System Info
Who can help?
@sayakpaul @DN6 @pcuenca