Skip to content

build: benchmark suite in CI — perf regression gates #143

Description

@shadow-kernel

Context

Perf testing today is manual: Editor/Dialogs/StressTestDialog.cs spawns N copies with live FPS/draw-call stats, and --stress/--benchmark flags launch GameHost in a separate process — but there is no automated CI benchmark suite, no results tracking, and EngineTest (EngineTest/Main.cpp, TestECS.h) is a compile-only proof-of-concept with no assertions or CI integration (verified). The v4 epic promises "10x"; without regression gates, every render-graph/ECS/culling refactor in this milestone can silently eat the wins. This is the guardrail that makes the number provable.

Goal

Build a headless benchmark harness with four canonical scenes — instancing storm, indoor occlusion, skinned crowd, particle storm — that runs in the GitHub Actions pipeline (build-release.yml infrastructure), emits frame-time JSON, fails PRs on regression beyond a threshold, and tracks history for trend charts.

Acceptance Criteria

  • Headless benchmark mode (extend the existing --benchmark GameHost path): fixed camera paths, warmup frames, fixed frame count, deterministic scene content
  • Four benchmark scenes committed as engine test content, each stressing a distinct axis (instance count, occlusion, skinning, transparency/particles)
  • JSON output: avg/p95/p99 frame time, draw calls, instances tested/drawn (existing telemetry counters)
  • CI job runs the suite per PR and fails on >X% regression vs the stored baseline (X configurable, default 5%)
  • Baseline update mechanism gated on explicit approval (label or manual dispatch)
  • History persisted (repo branch or artifact) with a rendered trend chart
  • Documented caveat handling for shared-runner variance (relative thresholds, repeat-run median, or self-hosted runner option)

Technical Notes

GitHub-hosted runners lack a real GPU — plan for WARP-based determinism checks plus a self-hosted RTX runner (the dev machine) for true perf numbers via workflow_dispatch/nightly.

Dependencies

  • None (land this FIRST — before the render graph refactor starts)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1-highHohe Priorität – als Nächstes einplanenarea:build-ciBuild, CI/CD, Installer, Auto-Update, Releasesize:M1–3 Tagetype:perfPerformance-Arbeit (FPS, Speicher, Ladezeiten)

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions