Skip to content

issues Search Results · language:Dune language:JavaScript language:JavaScript language:Java language:Java language:Python

Filter by

55.6M results  (826 ms)

55.6M results

when using metabar link to md file it s show the 404 page

Describe the bug We use OpenDJ 5.1.0 and one of our fuzzing test is failed for decoding long ACI with repetitive targets in it. The problem ACI is - (targetscope= O )(targetscope= O )(targetscope= O )...here ...

Stacklok published parallel 2026 State of Model Context Protocol survey reports for the Software, Retail, and Financial Services sectors. Its Software-sector report finds 41% of surveyed software organizations ...
auto-research
cross-domain
domain:finance
mcp
research-news

On 2025-12-09 the Linux Foundation announced the Agentic AI Foundation (AAIF), a directed fund co-founded by Anthropic, Block, and OpenAI, anchored by contributions including MCP, Block s goose, and OpenAI ...
auto-research
cross-domain
domain:finance
mcp
research-news

Observed 2026-07-02 on a looping watchdog run (monitor-smithers, run fleet-medic-0702c): 1. Launch a detached run of a workflow with a Ralph loop containing a Timer duration= 5m . The run reaches ...

CodeHalu introduces execution-based verification as the ground truth for code hallucination: rather than judging generated code by surface similarity to a reference solution, the CodeHaluEval benchmark ...
auto-research
benchmark
cross-domain
domain:finance
llm
research-news

I can read all the entities of the air conditioner with 4 splits, temperatures, mode, etc., but if I turn on the split from the entity, it appears to be on, but the command is not actually sent and the ...

仮説: Anthropicが実運用のClaude Codeセッションのターン継続時間・自動承認率・中断率を測定し自律性を実証的に定量化したように(2026-02-18公開研究)、金融機関も投資エージェントの「自律性レベル」を主張ベースではなく行動ログから測定する社内テレメトリを整備すべきという仮説である。具体的には、(1)人間が提案を修正せず承認する比率、(2)エージェントが自ら明確化・追加承認を要求する頻度、(3)タスク複雑度が上がった際に自動承認率がどう変化するかの3指標を、デューデリジェンス・執行系エージェントで追跡すれば、本節の自律性スペクトラム(Lv.1〜4)上の実際の位置を定量的に裏付けられる。検証案として、大手運用会社1社の社内エージェント基盤に同種のログ計測を6か月導入し、規制当局への説明資料としての有用性を評価する試行が考えられる。ただし金融固有の機密性・監査要件により、Anthropicのようなプライバシー保護分析手法をそのまま流用できるかは検証が必要であり、社内不正発見に転用されるリスクへのガバナンスも同時に設計する必要がある点が交絡・リスクとなる。 ...
agents
ai-governance
auto-research
cross-domain
domain:finance
hypothesis

MedHallu is a dedicated benchmark for detecting (not just producing) medical hallucinations: 10,000 QA pairs derived from PubMedQA, with hallucinated answers systematically generated via a controlled pipeline ...
auto-research
benchmark
cross-domain
domain:finance
llm
research-news

仮説: SWE-benchが「実際に閉じたPRとして解決できたか」という自動判定可能な客観指標で成功したように、金融のデューデリジェンス・エージェント評価も、人手検証済みの「正解データルーム+自動採点可能な成果物チェック(例: 特定の財務数値がDCFモデルの正しいセルに反映されているか、出典が実在の一次資料と一致するか)」を備えたSWE-bench Verified型のクローズド評価に移行すべきという仮説である。現状のBankerToolBench・WorkstreamBenchはルーブリック採点者が必要な自由回答形式に留まり、SWE-bench ...
agents
auto-research
benchmark
cross-domain
domain:finance
hypothesis
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! Restrict your search to the title by using the in:title qualifier.
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! Restrict your search to the title by using the in:title qualifier.