Skip to content

Fix model offloading and training tests + prevent examples timeout #14090

Description

@GiGiKoneti

Describe the bug

When running the full test suite on models and training examples, three distinct test failures/hangs consistently block test runners:

  1. AutoencoderVidTok return format mismatch:
    AutoencoderVidTok.forward returns a raw tensor dec when return_dict=False. However, the standard VAE/autoencoder API contract in Diffusers requires returning a single-element tuple (dec,). Because return_dict=False returned a raw tensor, the test fixture base_model_output indexed it with [0], slicing the batch dimension out (converting shape [4, 3, 16, 32, 32] to [3, 16, 32, 32]). The training check run (test_group_offloading) returns a DecoderOutput where [0] gets sample (shape [4, 3, 16, 32, 32]), resulting in a shape mismatch failure.

  2. AutoencoderDC mixed-precision training crash:
    TestAutoencoderDCTraining::test_mixed_precision_training crashes with RuntimeError: "GET was unable to find an engine to execute this computation" in specific CUDA/cuDNN runner environments due to missing matching execution engines for the DC architecture layers.

  3. Examples launcher hangs / timeouts:
    In single-device/CPU CI workflows, run_command in examples/test_examples_utils.py uses subprocess.check_output without a timeout, causing execution to block indefinitely if a deadlock occurs. Additionally, the test runner does not enforce single-process launch, which can deadlock when training commands try to coordinate multi-device setups.

I have verified clean fixes for all of these issues locally and will submit a PR shortly.

Reproduction

  1. Run VidTok autoencoder memory tests:
    pytest tests/models/autoencoders/test_models_autoencoder_vidtok.py -k "test_group_offloading"
  2. Run DC autoencoder training tests on a CUDA device with restricted cuDNN configurations.
  3. Run unconditional image generation example tests on a single-GPU/CPU host:
    pytest examples/unconditional_image_generation/test_unconditional.py

Logs

1. VidTok Offloading shape mismatch:
AssertionError: Shape mismatch — actual torch.Size([3, 16, 32, 32]) vs expected torch.Size([4, 3, ...])

2. AutoencoderDC mixed precision:
RuntimeError: "GET was unable to find an engine to execute this computation"

3. Examples hang:
pytest-timeout (>60s) while reading subprocess stdout

System Info

  • Platform: macOS / Linux (CI runner)
  • PyTorch version: 2.x
  • Diffusers version: 0.30.0.dev0

Who can help?

@sayakpaul @DN6 @pcuenca

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions