Since I am not yet very experienced with GitHub development workflows and creating Pull Requests, I decided not to open a PR. However, I wanted to communicate these findings and share the fixes. I discovered and resolved these issues with the help of my AI assistant (Antigravity). Please note that this entire issue description and the corresponding code changes were generated by AI. I hope the details and diffs below are useful for you to integrate into the main project! I've been running smallcode extensively in a local environment with LM Studio / Ollama using Qwen and Gemma 9B models (with custom 90k context windows). During my testing, I hit a few edge cases where the agent crashed or failed due to context limits, timeouts, and configuration overrides.
I've resolved these issues locally and wanted to share the fixes and diffs to help improve the project.
### Summary of Improvements
#### 1. Emergency History Compaction on HTTP 400 (Context Size Exceeded)
When using local endpoints, hitting the exact context limits causes the API to return an HTTP 400 error. The CLI would print the error and stop. * **Fix:**Intercept HTTP 400 context errors in `chatCompletion()`, run an emergency compaction (using Marrow/cognition's `compressHistoryCompiled` if available, or dropping the oldest messages), rebuild the dynamic context, and immediately retry the request.
#### 2. Profile-detected Context Limits Were Ignored
In `bin/smallcode.js`, profile defaults (e.g. 32k for Qwen/Gemma) were never applied because config.context.detected_window` was pre-initialized in `config.js` to a default value of `128000` (which is truthy), blocking the conditional `!config.context.detected_window` check. * **Fix:** Introduced an explicit environment override check (`SMALLCODE_CONTEXT_WINDOW`) to allow profile defaults to take effect unless overridden.
#### 3. Modernized Default Profiles & Added Fallbacks for Gemma and Qwen
* **Fix:** Updated `context_length` to `90000` (90k) in `KNOWN_PROFILES` for Gemma and Qwen to match the standard context window for 9B models. Also added generic `'gemma'` and `'qwen'` keys to the profile list, so any local model with names containing `gemma` or `qwen` automatically gets matched instead of falling back to the default OpenAI profile.
#### 4. Asynchronous `runValidation` Execution
The `runValidation` compiler checks in `verify_and_fix.js` were blocking the main loop because they were calling async functions without `await`. * **Fix:** Properly wrapped `runValidation` with `async/await` and used asynchronous `execFile` to keep the CLI responsive.
#### 5. Graceful Shell Session Termination on SIGINT / SIGTERM
* **Fix:** Modified `shell_session.js` signal listeners to call the instance `cleanup()` before exiting, returning proper status codes (`130` for SIGINT, `143` for SIGTERM).
---
### Code Changes Diffs
<details>
<summary>bin/smallcode.js - Context Error Handling & Profile Override Check</summary>
```diff
diff --git a/bin/smallcode.js b/bin/smallcode.js
index d2ac876..f9a9f49 100755
--- a/bin/smallcode.js
+++ b/bin/smallcode.js
@@ -1902,12 +1902,8 @@ async function initLSP() {
return _lspClient;
}
-function runValidation(filePath) {
- return _runValidationModule(filePath);
+async function runValidation(filePath) {
+ return await _runValidationModule(filePath);
}
// Build a compact system prompt
@@ -2440,6 +2436,105 @@ async function chatCompletion(config, messages) {
if (!response.ok) {
const err = await response.text();
+
+ const isContextError = response.status === 400 &&
+ (err.toLowerCase().includes('context') ||
+ err.toLowerCase().includes('token') ||
+ err.toLowerCase().includes('length') ||
+ err.toLowerCase().includes('limit'));
+
+ if (isContextError && !body.__emergencyCompacted) {
+ console.log(`\n \x1b[33m⚡ Context size exceeded. Running emergency history compaction...\x1b[0m`);
+
+ let compacted = false;
+ if (messages.length > 8) {
+ try {
+ const { compressHistoryCompiled, isCompiledCognitionAvailable } = require('./cognition_adapter');
+ if (isCompiledCognitionAvailable()) {
+ const recentCount = 4;
+ const oldStart = messages.findIndex(m => m.role !== 'system');
+ const oldEnd = messages.length - recentCount;
+ if (oldStart >= 0 && oldEnd > oldStart) {
+ const oldMessages = messages.slice(oldStart, oldEnd);
+ const oldSerialized = oldMessages
+ .map(m => `[${m.role || 'unknown'}] ${(typeof m.content === 'string' ? m.content : JSON.stringify(m.content || '')).slice(0, 1000)}`)
+ .join('\n\n');
+ const maxContextTokens = (config.context?.detected_window || 128000) * ((config.context?.max_budget_pct || 70) / 100);
+ const targetTokens = Math.max(150, Math.floor(maxContextTokens * 0.04));
+ const summary = await compressHistoryCompiled(oldSerialized, targetTokens);
+ if (summary && summary.length > 0) {
+ messages.splice(oldStart, oldEnd - oldStart, {
+ role: 'system',
+ content: `[Emergency compressed summary of oldMessages.lengthearliermessages{summary}`,
+ });
+ compacted = true;
+ console.log(tui.compacted(messages.length));
+ }
+ }
+ }
+ } catch {}
+ }
+
+ if (!compacted) {
+ const removeCount = Math.max(4, Math.floor(messages.length / 3));
+ let removed = 0;
+ for (let i = 0; i < removeCount; i++) {
+ const removeIdx = messages.findIndex(m => m.role !== 'system');
+ if (removeIdx === -1) break;
+ messages.splice(removeIdx, 1);
+ removed++;
+ }
+ console.log(` \x1b[33m⚡ Dropped oldest ${removed} messages to free up context.\x1b[0m`);
+ }
+
+ const processedMessages2 = messages.map(stripAnsiFromMsg);
+ const dynamicCtx2 = buildDynamicContext(messages);
+ if (dynamicCtx2) {
+ const lastIdx = processedMessages2.reduce((last, m, i) => m.role === 'user' ? i : last, -1);
+ if (lastIdx >= 0 && typeof processedMessages2[lastIdx].content === 'string') {
+ processedMessages2[lastIdx].content = dynamicCtx2 + processedMessages2[lastIdx].content;
+ }
+ }
+
+ const normalizedMessages2 = consolidateSystemMessages([systemMsg, ...processedMessages2]);
+ const retryBody = {
+ ...body,
+ messages: normalizedMessages2,
+ __emergencyCompacted: true,
+ };
+
+ await new Promise(r => setTimeout(r, 1000));
+
+ let retrySpinner = null;
+ if (!_fullscreenRef && process.stdout.isTTY) {
+ let spinFrame = 0;
+ retrySpinner = setInterval(() => {
+ process.stdout.write(`\r ${SPINNER_FRAMES[spinFrame % SPINNER_FRAMES.length]} Retrying with compacted context... \r`);
+ spinFrame++;
+ }, 100);
+ }
+
+ try {
+ const retryResponse = await fetch(`${baseUrl}/chat/completions`, {
+ method: 'POST',
+ headers,
+ body: JSON.stringify(retryBody),
+ signal: controller.signal,
+ });
+ if (retrySpinner) {
+ clearInterval(retrySpinner);
+ if (process.stdout.isTTY) process.stdout.write('\r' + ' '.repeat(50) + '\r');
+ }
+ if (retryResponse.ok) {
+ const retryData = await retryResponse.json();
+ if (tokenTracker && retryData?.usage) {
+ tokenTracker.record(retryData, config.model.name);
+ }
+ return retryData;
+ }
+ } catch {}
+ }
+
// Retry once on any non-2xx response.
@@ -3026,8 +3121,9 @@ async function main() {
// Detect model profile (drives routing mode, tool format, context budget)
const modelProfile = getProfile(config.model.name, config.context.detected_window);
if (modelProfile.matched_key) {
- // Apply profile-detected context window if not already set
- if (!config.context.detected_window && modelProfile.context_length) {
+ // Apply profile-detected context window if not explicitly overridden by the user
+ const hasEnvOverride = !!process.env.SMALLCODE_CONTEXT_WINDOW;
+ if (!hasEnvOverride && modelProfile.context_length) {
config.context.detected_window = modelProfile.context_length;
}
}
diff --git a/src/model/profiles.js b/src/model/profiles.js
index 5f2bbe6..7e2da9e 100644
--- a/src/model/profiles.js
+++ b/src/model/profiles.js
@@ -7,7 +7,7 @@
const KNOWN_PROFILES = {
// ─── Gemma ─────────────────────────────────────────────────────────────
'gemma-4': {
- context_length: 32768,
+ context_length: 90000,
max_output: 8192,
supports_tool_calling: true,
tool_format: 'native',
@@ -15,17 +15,25 @@ const KNOWN_PROFILES = {
weaknesses: ['very_long_planning'],
},
'gemma-4-e4b': {
- context_length: 32768,
+ context_length: 90000,
max_output: 8192,
supports_tool_calling: true,
tool_format: 'native',
strengths: ['speed', 'code_completion', 'tool_use'],
weaknesses: ['complex_reasoning', 'multi_file'],
},
+ 'gemma': {
+ context_length: 90000,
+ max_output: 8192,
+ supports_tool_calling: true,
+ tool_format: 'native',
+ strengths: ['code_completion', 'instruction_following', 'tool_use'],
+ weaknesses: ['very_long_planning'],
+ },
// ─── Qwen ──────────────────────────────────────────────────────────────
'qwen3': {
- context_length: 32768,
+ context_length: 90000,
max_output: 8192,
supports_tool_calling: true,
tool_format: 'hermes',
@@ -33,13 +41,21 @@ const KNOWN_PROFILES = {
weaknesses: ['verbosity'],
},
'qwen2.5-coder': {
- context_length: 32768,
+ context_length: 90000,
max_output: 8192,
supports_tool_calling: true,
tool_format: 'hermes',
strengths: ['code_completion', 'refactoring'],
weaknesses: ['long_planning', 'multi_file'],
},
+ 'qwen': {
+ context_length: 90000,
+ max_output: 8192,
+ supports_tool_calling: true,
+ tool_format: 'hermes',
+ strengths: ['reasoning', 'code_generation', 'planning'],
+ weaknesses: ['verbosity'],
+ },
diff --git a/src/tools/shell_session.js b/src/tools/shell_session.js
index ac6a3c6..bb184eb 100644
--- a/src/tools/shell_session.js
+++ b/src/tools/shell_session.js
@@ -336,8 +336,8 @@ if (!global.__SMALLCODE_SHELL_EXIT_REGISTERED__) {
global.__SMALLCODE_SHELL_EXIT_REGISTERED__ = true;
const cleanup = () => { if (_instance) try { _instance.stop(); } catch {} };
process.on('exit', cleanup);
- process.on('SIGINT', cleanup);
- process.on('SIGTERM', cleanup);
+ process.on('SIGINT', () => { cleanup(); process.exit(130); });
+ process.on('SIGTERM', () => { cleanup(); process.exit(143); });
}
Since I am not yet very experienced with GitHub development workflows and creating Pull Requests, I decided not to open a PR. However, I wanted to communicate these findings and share the fixes. I discovered and resolved these issues with the help of my AI assistant (Antigravity). Please note that this entire issue description and the corresponding code changes were generated by AI. I hope the details and diffs below are useful for you to integrate into the main project! I've been running
smallcodeextensively in a local environment with LM Studio / Ollama using Qwen and Gemma 9B models (with custom 90k context windows). During my testing, I hit a few edge cases where the agent crashed or failed due to context limits, timeouts, and configuration overrides.