Skip to content

[AMD/MI300A] DeepSeek-V4 (deepseek_v4) serving support — tracking #28

Description

@xzyaoi

Tracking issue for serving DeepSeek-V4 (model_type: deepseek_v4) on AMD MI300A (gfx942).

Architecture (why it's new vs V3/Kimi)

  • FP4/mxfp4 routed experts + FP8 (block 128×128) for the rest. Flash: 256 experts / hidden 4096; Pro: 384 experts / hidden 7168; top-6, 1 shared.
  • Hybrid attention: Compressed Sparse Attention (CSA) + Heavily Compressed Attention (HCA) + a DSA indexer (top-512 Flash / top-1024 Pro), per-layer compress_ratios, sliding_window: 128, o_lora_rank.
  • mHC (Manifold-Constrained Hyper-Connections: hc_mult, hc_sinkhorn_iters), sqrtsoftplus routing, MTP, YaRN→1M.

Findings (single-node probe on MI300A, 2026-06-11)

Much more AMD-ready than V3 was — the model code imports and constructs cleanly on gfx942 (no import wall; deepseek_v4.py feature-detects kernels and gates NVIDIA-only paths on cc=10 with fallbacks). Confirmed working at construction: model import, attention + DSA indexer construction, and the mxfp4 MoE kernel at ep_size=1 (selects + allocates).

Blockers

Not blockers (confirmed OK at construction)

Model import, attention/indexer module construction, mxfp4 MoE backend selection + weight alloc (ep=1).

Notes

  • FlashInfer-AMD (amd-flashinfer, gfx942) does not help here: v0.2.5 / ROCm 7.1.1 / torch 2.8 (ABI-mismatched with our 7.2/2.11 image) and predates V4's FP4-MoE + sparse-indexer kernels.
  • This mirrors the Kimi/V3 bring-up (which needed an AMD INT4-W4A16 MoE + AITER MLA): same shape, smaller remaining lift.

Weights / repro

V4-Pro staged at infra01/hf_models/.../DeepSeek-V4-Pro (806 GB); Flash not staged (MIT, ~150 GB). Probes: jobs/serve-v4pro-1node-probe{,2}.sbatch on beverin. HW: MI300A gfx942 / ROCm 7.2 / torch 2.11.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions