# Copilot CLI Baseline Application - Implementation Roadmap

> **Historical document**: This roadmap was created during the early baseline phase. Many items listed as blockers or missing features have since been implemented. For current status, see [PROJECT_STATUS.md](PROJECT_STATUS.md) and [IMPLEMENTATION_SUMMARY.md](IMPLEMENTATION_SUMMARY.md).

## Vision: Local Agentic Desktop Assistant

This forked Copilot CLI extends beyond a terminal tool into a **local agentic desktop assistant** with:
- **Electron Overlay**: Transparent grid system for precise screen targeting
- **Visual Awareness**: Real-time screen capture, OCR, and UI element detection
- **System Automation**: Mouse, keyboard, and window control via native APIs
- **AI Integration**: Multi-provider support (Copilot, OpenAI, Anthropic, Ollama)

The goal is to create a baseline application where the AI agent can:
1. See the user's screen via screen capture
2. Identify UI elements via accessibility APIs and inspect mode
3. Execute actions (click, type, scroll) with precision
4. Verify outcomes and self-correct

---

## 🔴 CRITICAL BLOCKERS

### BLOCKER-1: Preload Script Failure ✅ FIXED
- **File**: `src/renderer/overlay/preload.js`
- **Issue**: `require('../../shared/grid-math')` fails in Electron sandbox
- **Impact**: `window.electronAPI` = undefined, overlay doesn't render
- **Status**: [x] Fixed

**Solution Applied**:
- Inlined grid-math constants and `labelToScreenCoordinates()` directly in preload.js
- Sandboxed preload can't use `require('path')` or load external modules
- Removed dependency on external grid-math.js in preload context

### BLOCKER-2: PowerShell Here-String Syntax Broken ✅ FIXED
- **File**: `src/main/visual-awareness.js`
- **Issue**: `.replace(/\n/g, ' ')` breaks Here-Strings (`@" ... "@`)
- **Impact**: All 4 PowerShell functions return parse errors
- **Status**: [x] Fixed

**Solution Applied**:
- Created `executePowerShellScript()` helper function
- Writes PS1 to temp files in `os.tmpdir()/liku-ps`
- Executes with `powershell -NoProfile -ExecutionPolicy Bypass -File <path>`
- Cleans up temp files after execution
- Updated all 4 functions: `getActiveWindow()`, `extractWithWindowsOCR()`, `detectUIElements()`, `findElementAtPoint()`

### BLOCKER-3: Click-Through Failure on Background Windows ✅ FIXED
- **File**: `src/main/system-automation.js`
- **Issue**: AI clicks not reaching VS Code through transparent overlay
- **Root Cause**: `mouse_event()` is deprecated and doesn't work reliably with:
  - Electron's `setIgnoreMouseEvents(true, { forward: true })` (only forwards hardware events)
  - Layered windows with `WS_EX_TRANSPARENT` style
  - Background windows that aren't focused
- **Status**: [x] Fixed

**Solution Applied**:
1. **Replaced `mouse_event()` with `SendInput()`** - Modern Win32 API with better UIPI handling
2. **Added `SetForegroundWindow()` activation** - Activates target window before clicking
3. **Implemented `GetRealWindowFromPoint()`** - Finds actual window under cursor, skipping transparent overlays
4. **Added `ForceForeground()` with thread attachment** - Uses `AttachThreadInput()` to overcome focus restrictions
5. Updated functions: `click()`, `doubleClick()`, `drag()`

**Technical Details**:
```csharp
// Skip transparent windows (like Electron overlay)
int exStyle = GetWindowLong(hwnd, GWL_EXSTYLE);
bool isTransparent = (exStyle & WS_EX_TRANSPARENT) != 0;

// Force activate target window
AttachThreadInput(currentThread, foregroundThread, true);
SetForegroundWindow(hwnd);
AttachThreadInput(currentThread, foregroundThread, false);

// Use SendInput instead of deprecated mouse_event
SendInput(2, inputs, Marshal.SizeOf(typeof(INPUT)));
```

---

## 🟠 MISSING FEATURES

### FEATURE-3: targetId-Based Actions (M5.2)
- **Spec**: "Prefer `targetId` actions; fallback to grid if no regions"
- **Status**: [ ] Not Implemented

**Implementation Tasks**:
- [ ] Add `targetId` field to action schema in system prompt
- [ ] Create `resolveTargetId(targetId)` function in ai-service.js
- [ ] Implement fallback: targetId → inspect region center → grid coordinate
- [ ] Update action executor to accept targetId

### FEATURE-4: Inject generateAIInstructions()
- **File**: `src/main/inspect-service.js`
- **Status**: [ ] Defined but never called

**Implementation Tasks**:
- [ ] Call `generateAIInstructions()` in ai-service.js message builder
- [ ] Append to system prompt when inspect mode is active
- [ ] Include instructions for referencing regions by ID

### FEATURE-5: Screenshot Diff Heatmap (M4.2)
- **File**: `src/main/visual-awareness.js`
- **Status**: [ ] Placeholder only

**Implementation Tasks**:
- [ ] Implement pixel-level comparison using canvas or native image lib
- [ ] Generate bounding boxes for changed regions
- [ ] Create heatmap overlay visualization
- [ ] Expose diff results to AI context

### FEATURE-6: Verification Summary (M4.3)
- **Spec**: "Attach verification summary to AI response"
- **Status**: [ ] Not Implemented

**Implementation Tasks**:
- [ ] Capture screenshot after action sequence
- [ ] Compare with expected outcome (from `verification` field)
- [ ] Generate verification report
- [ ] Feed back to AI for confirmation/retry

### FEATURE-7: Input Focus Tracking
- **Spec**: "Highlight focused control; expose caret position"
- **Status**: [ ] Not Implemented

**Implementation Tasks**:
- [ ] Query focused element via UIAutomation API
- [ ] Extract caret position from text controls
- [ ] Highlight focused element in overlay
- [ ] Include focus info in AI context

### FEATURE-8: Window zOrder Population
- **File**: `src/shared/inspect-types.js`
- **Status**: [ ] Hardcoded to 0

**Implementation Tasks**:
- [ ] Query window z-order via Win32 API
- [ ] Populate `zOrder` field in window context
- [ ] Use for multi-window targeting

---

## 🟡 PARTIAL IMPLEMENTATIONS

### PARTIAL-9: Inspect Regions Display (A1-A3)
- **Status**: ✅ Code exists, blocked by BLOCKER-1 and BLOCKER-2
- **Files**: `overlay.js`, `index.html` (styles exist)

**Unblock Tasks**:
- [ ] Fix BLOCKER-1 (preload)
- [ ] Fix BLOCKER-2 (PowerShell)
- [ ] Test region rendering end-to-end

### PARTIAL-10: AI Context Payload (M5.1)
- **Status**: ⚠️ Regions appended to user message

**Completion Tasks**:
- [ ] Call `generateAIInstructions()` (FEATURE-4)
- [ ] Add `targetId` support (FEATURE-3)
- [ ] Include window context with zOrder (FEATURE-8)

---

## Implementation Phases

### Phase 1: Critical Fixes (Unblock Overlay)
**Priority**: 🔴 CRITICAL
**Estimate**: 2-3 hours

| Task ID | Description | File | Status |
|---------|-------------|------|--------|
| P1.1 | Fix preload require path | `preload.js` | [ ] |
| P1.2 | Rewrite PowerShell execution | `visual-awareness.js` | [ ] |
| P1.3 | Test overlay renders dots | Manual test | [ ] |
| P1.4 | Test inspect mode toggle | Manual test | [ ] |

### Phase 2: Core Functionality
**Priority**: 🟠 HIGH
**Estimate**: 4-6 hours

| Task ID | Description | File | Status |
|---------|-------------|------|--------|
| P2.1 | Implement targetId resolution | `ai-service.js` | [ ] |
| P2.2 | Update system prompt with targetId | `ai-service.js` | [ ] |
| P2.3 | Inject generateAIInstructions | `ai-service.js` | [ ] |
| P2.4 | Test AI uses targetId for clicks | Manual test | [ ] |

### Phase 3: Verification & Feedback Loop
**Priority**: 🟡 MEDIUM
**Estimate**: 4-6 hours

| Task ID | Description | File | Status |
|---------|-------------|------|--------|
| P3.1 | Implement pixel diff comparison | `visual-awareness.js` | [ ] |
| P3.2 | Create heatmap overlay rendering | `overlay.js` | [ ] |
| P3.3 | Add verification summary to response | `ai-service.js` | [ ] |
| P3.4 | Implement retry logic on failure | `ai-service.js` | [ ] |

### Phase 4: Enhanced Context
**Priority**: 🟢 LOW
**Estimate**: 2-4 hours

| Task ID | Description | File | Status |
|---------|-------------|------|--------|
| P4.1 | Track focused control | `visual-awareness.js` | [ ] |
| P4.2 | Extract caret position | `visual-awareness.js` | [ ] |
| P4.3 | Populate window zOrder | `visual-awareness.js` | [ ] |
| P4.4 | Add focus info to AI context | `ai-service.js` | [ ] |

---

## Testing Checklist

### Unit Tests
- [ ] `grid-math.js` coordinate conversions
- [ ] `inspect-types.js` region functions
- [ ] `system-automation.js` action parsing

### Integration Tests
- [ ] Overlay renders with electronAPI available
- [ ] Inspect mode detects UI elements
- [ ] AI receives inspect context in prompt
- [ ] Actions execute at correct coordinates

### Manual Verification
- [ ] Launch app: `npm start`
- [ ] Overlay dots visible on screen
- [ ] Ctrl+Alt+I toggles inspect mode
- [ ] Hover shows tooltips on detected regions
- [ ] Chat AI can describe screen contents
- [ ] Chat AI can click detected buttons
- [ ] Screenshot diff shows changes after action

---

## Architecture Notes

### Local Agent Architecture
```
┌─────────────────────────────────────────────────────────────┐
│                     Electron Main Process                    │
├─────────────────────────────────────────────────────────────┤
│  ai-service.js      │ system-automation.js │ visual-awareness │
│  - Multi-provider   │ - Mouse/keyboard     │ - Screen capture  │
│  - Context builder  │ - PowerShell exec    │ - UIAutomation    │
│  - Action parser    │ - Grid math          │ - OCR integration │
├─────────────────────────────────────────────────────────────┤
│                       IPC Bridge                             │
├─────────────────────────────────────────────────────────────┤
│        Overlay Renderer        │       Chat Renderer         │
│  - Canvas grid dots            │  - Message history          │
│  - Inspect region boxes        │  - Action confirmation      │
│  - Tooltip rendering           │  - Visual context preview   │
└─────────────────────────────────────────────────────────────┘
```

### CLI Integration Path
Future: The Electron features can be exposed to a CLI interface:
- `liku inspect` - Launch overlay in inspect mode
- `liku click <targetId>` - Click a detected region
- `liku capture` - Take screenshot and analyze
- `liku ask "click the submit button"` - Natural language action

---

## Success Criteria

### Baseline Application Complete When:
1. ✅ Overlay renders dot grid without errors
2. ✅ Inspect mode detects and displays UI regions
3. ✅ AI receives region data in context
4. ✅ AI can target regions by ID or coordinates
5. ✅ Actions execute and verify outcomes
6. ✅ Screenshot diff shows what changed

### Definition of Done for Each Task:
- Code implemented and compiles without errors
- Feature works in manual testing
- No regression in existing functionality
- Console shows expected log messages

---

## Changelog

| Date | Version | Changes |
|------|---------|---------|
| 2026-01-28 | 0.0.2 | Identified critical blockers, created baseline roadmap |
| TBD | 0.1.0 | Phase 1 complete - Overlay functional |
| TBD | 0.2.0 | Phase 2 complete - targetId actions working |
| TBD | 0.3.0 | Phase 3 complete - Verification loop |
| TBD | 1.0.0 | Baseline application complete |