issues Search Results · language:Dune language:Python linked:pr language:Java language:JavaScript language:Python
Filter by
5.8M results
Description
The NotificationRoutingMiddleware currently utilizes an unbounded dictionary (self._settings_cache) to store company
settings. As more unique companies trigger notifications over time, this ...
From the judge research (AlpacaEval length-correction lesson). Judges may favor verbose agents. Token counts per row
already exist; add a regression of judge scores on output length to aggregate_results ...
From the judge research (Leaderboard Illusion lessons): the real risks for this bench are silent judge-version drift and
run cherry-picking. Write docs/governance.md: results never silently retracted/rerun, ...
From the agentic research (Berkeley RDI audit gamed 8 benchmarks to near-perfect scores). Add a null/do-nothing agent
mode and a documented check that the bench scores it near zero (judges + state checks ...
Description
There is a logical gap in the should_send_email_notification gating rules. While the code blocks a WEEKLY_DIGEST if a
company s frequency is set to daily, it completely ignores the inverse. ...
Overview
Allow users to export their entire shortcut state as a JSON file and re-import it. This enables:
- Manual backups before clearing browser data.
- Migrating to a new browser profile or machine ...
enhancement
From the adoption research: the same-lab judge stat is the strongest launch headline ( a finding about evaluation
integrity that travels regardless of rankings ). Generalize same_lab_check so EVERY judge ...
From the agentic-benchmark research (C:\Users\conor\cot-bench-research-agentic.md). Adopt tau-bench s pass^k (all k
trials succeed; decays as p^k) computed from the existing 3 reliability runs, published ...
From the judge-methodology research (C:\Users\conor\cot-bench-research-judges.md). Replace mean consensus with median
(robust for n=3; BT rejected as unfit for pointwise rubric scores) and replace/augment ...

Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip! Restrict your search to the title by using the in:title qualifier.