Skip to content

issues Search Results · language:Dune language:Python language:HTML language:Java language:HTML linked:pr language:Java

Filter by

4.5M results  (342 ms)

4.5M results

Problem The baseline gate (compare) fails CI on any raw score drop beyond a fixed tolerance. With non-deterministic agents and small trial counts, small raw drops are often noise, and the gate produces ...
enhancement

- [ ] The project is using the blacksmith.sh GitHub Runners instead of the standard ones
enhancement
good first issue

Environment Hermes Desktop connected to a remote gateway/dashboard — desktop app on machine A, backend runtime on machine B (e.g. a Linux mini-PC backend with the desktop app on a laptop over a private ...

[21] Improve create wizard UX with guided sections, colors, output choices, and final summary !-- jules_eval template_id: v5_dynamic template_version: 5.1 scope_type: cli_ux_guarded eval_enabled: true ...
jules

There is no test-coverage measurement or floor, so regressions in coverage go unnoticed. Goals - Measure test coverage. - Enforce a minimum threshold ( = 85%). - Include the coverage check in CI. ...
ci

Context ci-self-hosted-runner has validated the python-ci runner against central-mcp-gateway. The next useful validation is a second Python consumer: this repository. Goal Add a manual, opt-in workflow ...

概要 Codex 使用量の表示で、採用している rate_limits がいつのデータなのか分かるようにします。 あわせて、Codex セッションログ内の最新 rate_limits イベント選択を強化し、古い表示に見える原因を診断しやすくします。 背景 AI Usage Tray の Codex provider は、公式 API ではなく ~/.codex/sessions のセッションログを読みます。 ...
enhancement

Make the static HTML report more useful for triage. Goals - Filter to only failed cases. - Per-grader aggregation view (passed/total + rate). - Improve transcript expand/collapse behavior. - ...
enhancement

Goal Reduce repeated valid-agent mask filtering in PredictiveMPPIAdapter._sequence_rollout() without changing rollout scoring behavior. Context robot_sf/planner/predictive_mppi.py filters predicted ...
agent
evidence:smoke
performance
resource:local
state:ready

The agent-eval compare command exists but is not wired into CI. CI should fail when quality regresses against a committed baseline, and always upload the run s reports. Goals - Compare current eval ...
ci
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! Restrict your search to the title by using the in:title qualifier.
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! Restrict your search to the title by using the in:title qualifier.