Source: rohitg00#782
Title: fix(agent-sdk): scope recursion guard to AsyncLocalStorage, memoize SDK import
Author: rohitg00
State: closed
Draft: no
Merged: yes
Head: rohitg00/agentmemory:fix/agent-sdk-recursion-guard-als @ 75264f2
Base: main @ de95403
Labels: (none)
Changed files: 0
Commits: 0
Created: 2026-06-02T11:57:35Z
Updated: 2026-06-02T16:20:53Z
Closed: 2026-06-02T16:20:49Z
Merged at: 2026-06-02T16:20:49Z
Original PR body:
Problem
Reported in #781. mem::summarize fails with too_many_chunks_skipped: 4/4 chunks failed to parse after retry whenever the agent-sdk provider is active (AGENTMEMORY_ALLOW_AGENT_SDK=true) and a session is large enough to split into ≥2 chunks.
The two PRs interact badly:
- #181 (Stop-hook → /summarize → agent-sdk infinite recursion fix) added the guard:
if (process.env.AGENTMEMORY_SDK_CHILD === '1') return ''
process.env.AGENTMEMORY_SDK_CHILD = '1'
// ...await SDK...
// restore in finally
- #472 (chunk large sessions to fit LLM context) added chunk concurrency:
for (let i = 0; i < chunks.length; i += concurrency) {
await Promise.all(batch.map(/* provider.summarize */));
}
The first chunk in a batch flips the env synchronously before its first await. Sibling chunks in the same batch enter query(), see the flag, and return "". Empty string has no <title>, parseSummaryXml returns null, the chunk is counted as skipped, > 50% skip ratio throws.
The guard (cross-process) and the chunk concurrency (in-process) were never reconciled.
Fix — split the guard by scope
Each concern uses the right primitive:
| Concern |
Primitive |
Why |
| In-process recursion (concurrent siblings) |
AsyncLocalStorage |
Scoped to the async call tree of the SDK query. Concurrent siblings have separate ALS frames, no longer see each other's marker. |
| Cross-process recursion (hook scripts) |
process.env.AGENTMEMORY_SDK_CHILD = '1' around the SDK call |
Hook scripts run as separate processes spawned by the Claude SDK; they inherit process.env at spawn time. ALS does not cross process boundaries. |
The check now reads sdkChildContext.getStore() instead of process.env. The env var stays set+restored around the SDK call so spawned hook subprocesses still see the marker and short-circuit their REST callbacks (the original #149/#181 fix still holds).
The race on the env between concurrent in-process siblings is benign because every sibling wants "1" during its own SDK call anyway.
Bonus: memoize the dynamic import
Refactored await import('@<!-- -->anthropic-ai/claude-agent-sdk') into a per-instance memoized promise. Concurrent callers share one module resolution. Two benefits:
- Production: one resolution per provider lifetime instead of N
- Test:
vi.mock factories apply uniformly across concurrent imports (without this, the test mock raced with real module references under Promise.all, see commit body)
Tests
5 cases in test/agent-sdk-provider.test.ts:
- 4 concurrent summarize calls each return the real SDK result (no empty siblings) — direct #781 regression
- Mixed concurrent summarize + compress on the same provider
AGENTMEMORY_SDK_CHILD is set to "1" during the SDK call (verified by reading the env from inside the mocked query) and restored to its prior value on exit
- Genuine re-entry inside the same async tree still degrades to
"" so the #149 / #181 recursion guard stays armed
Full suite: 122 files / 1331 tests pass.
Recommended user action
Users on 0.9.24 can remove the SUMMARIZE_CHUNK_CONCURRENCY=1 workaround after this lands. No env or config changes required.
Closes #781.
Summary by CodeRabbit
-
Bug Fixes
- Improved handling of concurrent memory operations to prevent re-entrancy and state leakage during overlapping calls.
-
Performance Improvements
- Reduced repeated SDK initialization by caching the SDK load for faster subsequent calls.
-
Tests
- Added coverage validating concurrency behavior and guard correctness.
Local branch:
Fork PR:
Fork decision:
Verification:
Notes:
Source: rohitg00#782
Title: fix(agent-sdk): scope recursion guard to AsyncLocalStorage, memoize SDK import
Author: rohitg00
State: closed
Draft: no
Merged: yes
Head: rohitg00/agentmemory:fix/agent-sdk-recursion-guard-als @ 75264f2
Base: main @ de95403
Labels: (none)
Changed files: 0
Commits: 0
Created: 2026-06-02T11:57:35Z
Updated: 2026-06-02T16:20:53Z
Closed: 2026-06-02T16:20:49Z
Merged at: 2026-06-02T16:20:49Z
Original PR body:
Problem
Reported in #781.
mem::summarizefails withtoo_many_chunks_skipped: 4/4 chunks failed to parse after retrywhenever the agent-sdk provider is active (AGENTMEMORY_ALLOW_AGENT_SDK=true) and a session is large enough to split into ≥2 chunks.The two PRs interact badly:
The first chunk in a batch flips the env synchronously before its first
await. Sibling chunks in the same batch enterquery(), see the flag, and return"". Empty string has no<title>,parseSummaryXmlreturnsnull, the chunk is counted as skipped, > 50% skip ratio throws.The guard (cross-process) and the chunk concurrency (in-process) were never reconciled.
Fix — split the guard by scope
Each concern uses the right primitive:
AsyncLocalStorageprocess.env.AGENTMEMORY_SDK_CHILD = '1'around the SDK callprocess.envat spawn time. ALS does not cross process boundaries.The check now reads
sdkChildContext.getStore()instead ofprocess.env. The env var stays set+restored around the SDK call so spawned hook subprocesses still see the marker and short-circuit their REST callbacks (the original #149/#181 fix still holds).The race on the env between concurrent in-process siblings is benign because every sibling wants
"1"during its own SDK call anyway.Bonus: memoize the dynamic import
Refactored
await import('@<!-- -->anthropic-ai/claude-agent-sdk')into a per-instance memoized promise. Concurrent callers share one module resolution. Two benefits:vi.mockfactories apply uniformly across concurrent imports (without this, the test mock raced with real module references underPromise.all, see commit body)Tests
5 cases in
test/agent-sdk-provider.test.ts:AGENTMEMORY_SDK_CHILDis set to"1"during the SDK call (verified by reading the env from inside the mocked query) and restored to its prior value on exit""so the #149 / #181 recursion guard stays armedFull suite: 122 files / 1331 tests pass.
Recommended user action
Users on
0.9.24can remove theSUMMARIZE_CHUNK_CONCURRENCY=1workaround after this lands. No env or config changes required.Closes #781.
Summary by CodeRabbit
Bug Fixes
Performance Improvements
Tests
Local branch:
Fork PR:
Fork decision:
Verification:
Notes: