issues Search Results · language:Edge language:Python linked:pr language:Java language:JavaScript language:CSS language:CSS
Filter by
5.9M results
There is no test-coverage measurement or floor, so regressions in coverage go unnoticed.
Goals
- Measure test coverage.
- Enforce a minimum threshold ( = 85%).
- Include the coverage check in CI. ...
ci
Current Problem
The current evaluator scores based on raw domain count (0/1/2/3+ domains = fixed 0/40/70/100). All domains contribute
equally regardless of how many skills match within them. Only nodejs ...
enhancement
Make the static HTML report more useful for triage.
Goals
- Filter to only failed cases.
- Per-grader aggregation view (passed/total + rate).
- Improve transcript expand/collapse behavior.
- ...
enhancement
Issues Addressed
C2 🔴 / H1 🟧 / H2 🟧 from UX audit
C2 (Critical) — Tag label not associated with input
- label Tags /label has empty for=
- input id= f-tags-input exists but no label for= f-tags-input ...
accessibility
The agent-eval compare command exists but is not wired into CI. CI should fail when quality regresses against a
committed baseline, and always upload the run s reports.
Goals
- Compare current eval ...
ci
Fix tests
I cleaned up some functions and removed some redundant casting, but now the tests are failing. Fix the tests
Context
- Muay Thaiger: PvP Idle Fighting game
- Cleanup
When tasks in one suite use different trial counts, the suite-level k (a single max) mislabels pass@k / pass^k.
Goals
- Record each task s actual k.
- Report a k range (e.g. pass@2..4) instead of ...
bug
Graders, adapters, and reporters are registered by hand in each package s __init__.py. External packages can t add their
own without editing this repo.
Goals
- Discover plugins via Python importlib.metadata ...
enhancement
Problem
starter-kit/SESSION_RUNNER.md step 3C: Document Learnings instructs:
Update the workstream document and/or the Learnings table below:
…pointing sessions at SESSION_RUNNER s own Learnings (added ...
Suites can only be authored as a single YAML document. Real datasets, logs, and production traces are far easier to emit
as JSONL (one task per line), so the harness should ingest them directly.
Goals ...
enhancement

Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip! Restrict your search to the title by using the in:title qualifier.