issues Search Results · language:Dune language:HTML language:JavaScript language:JavaScript language:Python language:Java
Filter by
62.3M results
Goal: The comparison system: single retrieve-then-answer per question, producing the same auditable artifacts as the
agent.
Tasks
- Implement backend/app/baseline.py: for each subset question — one ...
enhancement
Program-as-Weights: A Programming Paradigm for Fuzzy Functions
Authors: Wentao Zhang, Liliana Hotsko, Woojeong Kim, Pengyu Nie, Stuart Shieber, Yuntian Deng Published: 2026-07-02
Categories: cs.LG, cs.AI, ...
arxiv:cs.AI
arxiv:cs.CL
arxiv:cs.LG
Both season and plant_type are hardcoded. Replace hardcoded values with input() from the user.
Distributed Attacks in Persistent-State AI Control
Authors: Josh Hills, Ida Caspary, Asa Cooper Stickland Published: 2026-07-02 Categories: cs.AI
Links: abs | pdf
Abstract
As AI coding agents become ...
arxiv:cs.AI
Goal: The five agent tools, all emitting trace events.
Tasks
- Complete backend/app/tools.py: search_filing, get_pages, calculate (deterministic calculator already stubbed),
record_answer, flag_outstanding. ...
enhancement
Goal: Curate the weekend-sized eval subset with stratification and the human audit template.
Tasks
- Implement dataset_builder/d5_select_subset.py → data/subset.json (commit) per the §8 schema; apply ...
enhancement
Summary
webui/e2e/ currently covers five flows only: admin-community.spec.ts, auth.spec.ts, file-browser.spec.ts,
members.spec.ts, server-create.spec.ts.
No e2e spec exercises:
- Server lifecycle ...
enhancement
Goal: Deterministic document/page-aware chunking, local embeddings, persisted index, and retrieval.
Tasks
- Implement backend/app/ingest.py + backend/app/retrieval.py.
- Deterministic chunk_id; chunks ...
enhancement
Goal: LLM-classify all 150 questions into buckets (A multi-input / B judgment / C lookup) and verify gold evidence
against parsed pages.
Tasks
- Implement dataset_builder/d3_classify.py → data/classified.jsonl ...
enhancement
Goal: The TDD foundation — deterministic scorers run against the fixtures.
Tasks
- Implement evals/scorers.py: answer accuracy, citation precision, citation provenance, arithmetic integrity, trace ...
enhancement

Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip! Restrict your search to the title by using the in:title qualifier.