Issue search results

4.5M results (342 ms)

arthurzengg/agent-eval-harness
Add statistical confidence and significance testing

Problem The baseline gate (compare) fails CI on any raw score drop beyond a fixed tolerance. With non-deterministic agents and small trial counts, small raw drops are often noise, and the gate produces ...

enhancement

rijnhardtkotze/tripwire
Blacksmith CI runners

- [ ] The project is using the blacksmith.sh GitHub Runners instead of the standard ones

enhancement

good first issue

NousResearch/hermes-agent
Remote desktop: composer image attachments are sent as local paths, unreadable by the remote backend

Environment Hermes Desktop connected to a remote gateway/dashboard — desktop app on machine A, backend runtime on machine B (e.g. a Linux mini-PC backend with the desktop app on a laptop over a private ...

wildersa/ml-starter-lab-kit
[21] Improve create wizard UX with guided sections, colors, output choices, and final summary

[21] Improve create wizard UX with guided sections, colors, output choices, and final summary !-- jules_eval template_id: v5_dynamic template_version: 5.1 scope_type: cli_ux_guarded eval_enabled: true ...

jules

arthurzengg/agent-eval-harness
Add coverage reporting and enforce a minimum threshold

There is no test-coverage measurement or floor, so regressions in coverage go unnoticed. Goals - Measure test coverage. - Enforce a minimum threshold ( = 85%). - Include the coverage check in CI. ...

vinicius-ssantos/github-unified-mcp
ci: add manual python-ci self-hosted runner smoke workflow

Context ci-self-hosted-runner has validated the python-ci runner against central-mcp-gateway. The next useful validation is a second Python consumer: this repository. Goal Add a manual, opt-in workflow ...

Mir2569/ai-usage-tray
[enhancement] Codex 使用量のログ鮮度表示と取得ロジックを改善する

概要 Codex 使用量の表示で、採用している rate_limits がいつのデータなのか分かるようにします。あわせて、Codex セッションログ内の最新 rate_limits イベント選択を強化し、古い表示に見える原因を診断しやすくします。背景 AI Usage Tray の Codex provider は、公式 API ではなく ~/.codex/sessions のセッションログを読みます。 ...

enhancement

arthurzengg/agent-eval-harness
Enhance the HTML report

Make the static HTML report more useful for triage. Goals - Filter to only failed cases. - Per-grader aggregation view (passed/total + rate). - Improve transcript expand/collapse behavior. - ...

enhancement

ll7/robot_sf_ll7
perf: hoist predictive MPPI valid-agent mask

Goal Reduce repeated valid-agent mask filtering in PredictiveMPPIAdapter._sequence_rollout() without changing rollout scoring behavior. Context robot_sf/planner/predictive_mppi.py filters predicted ...

agent

evidence:smoke

performance

resource:local

state:ready

arthurzengg/agent-eval-harness
Add CI baseline regression gating with report artifacts

The agent-eval compare command exists but is not wired into CI. CI should fail when quality regresses against a committed baseline, and always upload the run s reports. Goals - Compare current eval ...

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues

ProTip! Restrict your search to the title by using the in:title qualifier.

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues

ProTip! Restrict your search to the title by using the in:title qualifier.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter by

State

Advanced

arthurzengg/agent-eval-harness
Add statistical confidence and significance testing

rijnhardtkotze/tripwire
Blacksmith CI runners

NousResearch/hermes-agent
Remote desktop: composer image attachments are sent as local paths, unreadable by the remote backend

wildersa/ml-starter-lab-kit
[21] Improve create wizard UX with guided sections, colors, output choices, and final summary

arthurzengg/agent-eval-harness
Add coverage reporting and enforce a minimum threshold

vinicius-ssantos/github-unified-mcp
ci: add manual python-ci self-hosted runner smoke workflow

Mir2569/ai-usage-tray
[enhancement] Codex 使用量のログ鮮度表示と取得ロジックを改善する

arthurzengg/agent-eval-harness
Enhance the HTML report

ll7/robot_sf_ll7
perf: hoist predictive MPPI valid-agent mask

arthurzengg/agent-eval-harness
Add CI baseline regression gating with report artifacts

Learn how you can use GitHub Issues to plan and track your work.

Learn how you can use GitHub Issues to plan and track your work.

issues Search Results · language:Dune language:Python language:HTML language:Java language:HTML linked:pr language:Java

Filter by

State

Advanced

4.5M results

Learn how you can use GitHub Issues to plan and track your work.

Learn how you can use GitHub Issues to plan and track your work.