Auto-select sparse MLA single-token decode launch config

## Finding

The sparse MLA config module documents a measured single-token decode config that is 1.13-1.24x faster at `Tq=1`, but it is opt-in through `XKERNELS_SPARSE_MLA_CONFIG`:

- Measured result and default rationale: [`src/xkernels/ops/attention/triton/sparse_mla_config.py#L52-L64`](https://github.com/ResearchComputer/xkernels/blob/00aac7e4a249e7af0a40da9e7981065b823fb4f3/src/xkernels/ops/attention/triton/sparse_mla_config.py#L52-L64)
- Opt-in `DECODE_SPARSE_MLA_CONFIG`: [`src/xkernels/ops/attention/triton/sparse_mla_config.py#L74-L85`](https://github.com/ResearchComputer/xkernels/blob/00aac7e4a249e7af0a40da9e7981065b823fb4f3/src/xkernels/ops/attention/triton/sparse_mla_config.py#L74-L85)
- The launcher knows `T` before resolving config, but currently calls `resolve_sparse_mla_config()` without passing it: [`src/xkernels/ops/attention/triton/sparse_mla_kernel.py#L129-L160`](https://github.com/ResearchComputer/xkernels/blob/00aac7e4a249e7af0a40da9e7981065b823fb4f3/src/xkernels/ops/attention/triton/sparse_mla_kernel.py#L129-L160)

## Why this should improve performance

Single-token decode is a common serving hot path. The measured faster config is not hidden or speculative; it is already checked in, but callers have to know to export an environment override. Selecting it automatically when `T == 1` would capture the win without regressing the documented multi-token case.

## Suggested implementation

- Change config resolution to accept `T` or a `decode_mode` flag.
- If no env override is set and `T == 1`, return `DECODE_SPARSE_MLA_CONFIG`; otherwise return `DEFAULT_SPARSE_MLA_CONFIG`.
- Keep the env override as the highest-priority path for A/B testing and operator control.

## Validation

- Unit-test config resolution for `T=1`, `T>1`, and env override cases.
- Re-run sparse MLA benchmarks for `T=1`, `T=8`, and `T=64` to confirm the auto branch keeps the documented win and avoids multi-token regression.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Auto-select sparse MLA single-token decode launch config #55

Finding

Why this should improve performance

Suggested implementation

Validation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Auto-select sparse MLA single-token decode launch config #55

Description

Finding

Why this should improve performance

Suggested implementation

Validation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions