Skip to content

Auto-select sparse MLA single-token decode launch config #55

Description

@xzyaoi

Finding

The sparse MLA config module documents a measured single-token decode config that is 1.13-1.24x faster at Tq=1, but it is opt-in through XKERNELS_SPARSE_MLA_CONFIG:

Why this should improve performance

Single-token decode is a common serving hot path. The measured faster config is not hidden or speculative; it is already checked in, but callers have to know to export an environment override. Selecting it automatically when T == 1 would capture the win without regressing the documented multi-token case.

Suggested implementation

  • Change config resolution to accept T or a decode_mode flag.
  • If no env override is set and T == 1, return DECODE_SPARSE_MLA_CONFIG; otherwise return DEFAULT_SPARSE_MLA_CONFIG.
  • Keep the env override as the highest-priority path for A/B testing and operator control.

Validation

  • Unit-test config resolution for T=1, T>1, and env override cases.
  • Re-run sparse MLA benchmarks for T=1, T=8, and T=64 to confirm the auto branch keeps the documented win and avoids multi-token regression.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions