Skip to content

issues Search Results · language:Dune language:HTML language:JavaScript language:JavaScript language:Python language:Java

Filter by

62.3M results  (706 ms)

62.3M results

Goal: The comparison system: single retrieve-then-answer per question, producing the same auditable artifacts as the agent. Tasks - Implement backend/app/baseline.py: for each subset question — one ...
enhancement

Program-as-Weights: A Programming Paradigm for Fuzzy Functions Authors: Wentao Zhang, Liliana Hotsko, Woojeong Kim, Pengyu Nie, Stuart Shieber, Yuntian Deng Published: 2026-07-02 Categories: cs.LG, cs.AI, ...
arxiv:cs.AI
arxiv:cs.CL
arxiv:cs.LG

Both season and plant_type are hardcoded. Replace hardcoded values with input() from the user.

Distributed Attacks in Persistent-State AI Control Authors: Josh Hills, Ida Caspary, Asa Cooper Stickland Published: 2026-07-02 Categories: cs.AI Links: abs | pdf Abstract As AI coding agents become ...
arxiv:cs.AI

Goal: The five agent tools, all emitting trace events. Tasks - Complete backend/app/tools.py: search_filing, get_pages, calculate (deterministic calculator already stubbed), record_answer, flag_outstanding. ...
enhancement

Goal: Curate the weekend-sized eval subset with stratification and the human audit template. Tasks - Implement dataset_builder/d5_select_subset.py → data/subset.json (commit) per the §8 schema; apply ...
enhancement

Summary webui/e2e/ currently covers five flows only: admin-community.spec.ts, auth.spec.ts, file-browser.spec.ts, members.spec.ts, server-create.spec.ts. No e2e spec exercises: - Server lifecycle ...
enhancement

Goal: Deterministic document/page-aware chunking, local embeddings, persisted index, and retrieval. Tasks - Implement backend/app/ingest.py + backend/app/retrieval.py. - Deterministic chunk_id; chunks ...
enhancement

Goal: LLM-classify all 150 questions into buckets (A multi-input / B judgment / C lookup) and verify gold evidence against parsed pages. Tasks - Implement dataset_builder/d3_classify.py → data/classified.jsonl ...
enhancement

Goal: The TDD foundation — deterministic scorers run against the fixtures. Tasks - Implement evals/scorers.py: answer accuracy, citation precision, citation provenance, arithmetic integrity, trace ...
enhancement
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! Restrict your search to the title by using the in:title qualifier.
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! Restrict your search to the title by using the in:title qualifier.