Fix model offloading and training tests + prevent examples timeout

### Describe the bug

When running the full test suite on models and training examples, three distinct test failures/hangs consistently block test runners:

1. **AutoencoderVidTok return format mismatch:**
   `AutoencoderVidTok.forward` returns a raw tensor `dec` when `return_dict=False`. However, the standard VAE/autoencoder API contract in Diffusers requires returning a single-element tuple `(dec,)`. Because `return_dict=False` returned a raw tensor, the test fixture `base_model_output` indexed it with `[0]`, slicing the batch dimension out (converting shape `[4, 3, 16, 32, 32]` to `[3, 16, 32, 32]`). The training check run (`test_group_offloading`) returns a `DecoderOutput` where `[0]` gets `sample` (shape `[4, 3, 16, 32, 32]`), resulting in a shape mismatch failure.

2. **AutoencoderDC mixed-precision training crash:**
   `TestAutoencoderDCTraining::test_mixed_precision_training` crashes with `RuntimeError: "GET was unable to find an engine to execute this computation"` in specific CUDA/cuDNN runner environments due to missing matching execution engines for the DC architecture layers.

3. **Examples launcher hangs / timeouts:**
   In single-device/CPU CI workflows, `run_command` in `examples/test_examples_utils.py` uses `subprocess.check_output` without a timeout, causing execution to block indefinitely if a deadlock occurs. Additionally, the test runner does not enforce single-process launch, which can deadlock when training commands try to coordinate multi-device setups.

I have verified clean fixes for all of these issues locally and will submit a PR shortly.

### Reproduction

1. Run VidTok autoencoder memory tests:
   ```bash
   pytest tests/models/autoencoders/test_models_autoencoder_vidtok.py -k "test_group_offloading"
   ```
2. Run DC autoencoder training tests on a CUDA device with restricted cuDNN configurations.
3. Run unconditional image generation example tests on a single-GPU/CPU host:
   ```bash
   pytest examples/unconditional_image_generation/test_unconditional.py
   ```

### Logs

```shell
1. VidTok Offloading shape mismatch:
AssertionError: Shape mismatch — actual torch.Size([3, 16, 32, 32]) vs expected torch.Size([4, 3, ...])

2. AutoencoderDC mixed precision:
RuntimeError: "GET was unable to find an engine to execute this computation"

3. Examples hang:
pytest-timeout (>60s) while reading subprocess stdout
```

### System Info

- Platform: macOS / Linux (CI runner)
- PyTorch version: 2.x
- Diffusers version: 0.30.0.dev0

### Who can help?

@sayakpaul @DN6 @pcuenca


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix model offloading and training tests + prevent examples timeout #14090

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Fix model offloading and training tests + prevent examples timeout #14090

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions