Copilot CLI Baseline Application - Implementation Roadmap

Historical document: This roadmap was created during the early baseline phase. Many items listed as blockers or missing features have since been implemented. For current status, see PROJECT_STATUS.md and IMPLEMENTATION_SUMMARY.md.

Vision: Local Agentic Desktop Assistant

This forked Copilot CLI extends beyond a terminal tool into a local agentic desktop assistant with:

Electron Overlay: Transparent grid system for precise screen targeting
Visual Awareness: Real-time screen capture, OCR, and UI element detection
System Automation: Mouse, keyboard, and window control via native APIs
AI Integration: Multi-provider support (Copilot, OpenAI, Anthropic, Ollama)

The goal is to create a baseline application where the AI agent can:

See the user's screen via screen capture
Identify UI elements via accessibility APIs and inspect mode
Execute actions (click, type, scroll) with precision
Verify outcomes and self-correct

🔴 CRITICAL BLOCKERS

BLOCKER-1: Preload Script Failure ✅ FIXED

File: src/renderer/overlay/preload.js
Issue: require('../../shared/grid-math') fails in Electron sandbox
Impact: window.electronAPI = undefined, overlay doesn't render
Status: [x] Fixed

Solution Applied:

Inlined grid-math constants and labelToScreenCoordinates() directly in preload.js
Sandboxed preload can't use require('path') or load external modules
Removed dependency on external grid-math.js in preload context

BLOCKER-2: PowerShell Here-String Syntax Broken ✅ FIXED

File: src/main/visual-awareness.js
Issue: .replace(/\n/g, ' ') breaks Here-Strings (@" ... "@)
Impact: All 4 PowerShell functions return parse errors
Status: [x] Fixed

Solution Applied:

Created executePowerShellScript() helper function
Writes PS1 to temp files in os.tmpdir()/liku-ps
Executes with powershell -NoProfile -ExecutionPolicy Bypass -File <path>
Cleans up temp files after execution
Updated all 4 functions: getActiveWindow(), extractWithWindowsOCR(), detectUIElements(), findElementAtPoint()

BLOCKER-3: Click-Through Failure on Background Windows ✅ FIXED

File: src/main/system-automation.js
Issue: AI clicks not reaching VS Code through transparent overlay
Root Cause: mouse_event() is deprecated and doesn't work reliably with:
- Electron's setIgnoreMouseEvents(true, { forward: true }) (only forwards hardware events)
- Layered windows with WS_EX_TRANSPARENT style
- Background windows that aren't focused
Status: [x] Fixed

Solution Applied:

Replaced mouse_event() with SendInput() - Modern Win32 API with better UIPI handling
Added SetForegroundWindow() activation - Activates target window before clicking
Implemented GetRealWindowFromPoint() - Finds actual window under cursor, skipping transparent overlays
Added ForceForeground() with thread attachment - Uses AttachThreadInput() to overcome focus restrictions
Updated functions: click(), doubleClick(), drag()

Technical Details:

// Skip transparent windows (like Electron overlay)
int exStyle = GetWindowLong(hwnd, GWL_EXSTYLE);
bool isTransparent = (exStyle & WS_EX_TRANSPARENT) != 0;

// Force activate target window
AttachThreadInput(currentThread, foregroundThread, true);
SetForegroundWindow(hwnd);
AttachThreadInput(currentThread, foregroundThread, false);

// Use SendInput instead of deprecated mouse_event
SendInput(2, inputs, Marshal.SizeOf(typeof(INPUT)));

🟠 MISSING FEATURES

FEATURE-3: targetId-Based Actions (M5.2)

Spec: "Prefer targetId actions; fallback to grid if no regions"
Status: [ ] Not Implemented

Implementation Tasks:

Add targetId field to action schema in system prompt
Create resolveTargetId(targetId) function in ai-service.js
Implement fallback: targetId → inspect region center → grid coordinate
Update action executor to accept targetId

FEATURE-4: Inject generateAIInstructions()

File: src/main/inspect-service.js
Status: [ ] Defined but never called

Implementation Tasks:

Call generateAIInstructions() in ai-service.js message builder
Append to system prompt when inspect mode is active
Include instructions for referencing regions by ID

FEATURE-5: Screenshot Diff Heatmap (M4.2)

File: src/main/visual-awareness.js
Status: [ ] Placeholder only

Implementation Tasks:

Implement pixel-level comparison using canvas or native image lib
Generate bounding boxes for changed regions
Create heatmap overlay visualization
Expose diff results to AI context

FEATURE-6: Verification Summary (M4.3)

Spec: "Attach verification summary to AI response"
Status: [ ] Not Implemented

Implementation Tasks:

Capture screenshot after action sequence
Compare with expected outcome (from verification field)
Generate verification report
Feed back to AI for confirmation/retry

FEATURE-7: Input Focus Tracking

Spec: "Highlight focused control; expose caret position"
Status: [ ] Not Implemented

Implementation Tasks:

Query focused element via UIAutomation API
Extract caret position from text controls
Highlight focused element in overlay
Include focus info in AI context

FEATURE-8: Window zOrder Population

File: src/shared/inspect-types.js
Status: [ ] Hardcoded to 0

Implementation Tasks:

Query window z-order via Win32 API
Populate zOrder field in window context
Use for multi-window targeting

🟡 PARTIAL IMPLEMENTATIONS

PARTIAL-9: Inspect Regions Display (A1-A3)

Status: ✅ Code exists, blocked by BLOCKER-1 and BLOCKER-2
Files: overlay.js, index.html (styles exist)

Unblock Tasks:

Fix BLOCKER-1 (preload)
Fix BLOCKER-2 (PowerShell)
Test region rendering end-to-end

PARTIAL-10: AI Context Payload (M5.1)

Status: ⚠️ Regions appended to user message

Completion Tasks:

Call generateAIInstructions() (FEATURE-4)
Add targetId support (FEATURE-3)
Include window context with zOrder (FEATURE-8)

Implementation Phases

Phase 1: Critical Fixes (Unblock Overlay)

Priority: 🔴 CRITICAL Estimate: 2-3 hours

Task ID	Description	File	Status
P1.1	Fix preload require path	`preload.js`	[ ]
P1.2	Rewrite PowerShell execution	`visual-awareness.js`	[ ]
P1.3	Test overlay renders dots	Manual test	[ ]
P1.4	Test inspect mode toggle	Manual test	[ ]

Phase 2: Core Functionality

Priority: 🟠 HIGH Estimate: 4-6 hours

Task ID	Description	File	Status
P2.1	Implement targetId resolution	`ai-service.js`	[ ]
P2.2	Update system prompt with targetId	`ai-service.js`	[ ]
P2.3	Inject generateAIInstructions	`ai-service.js`	[ ]
P2.4	Test AI uses targetId for clicks	Manual test	[ ]

Phase 3: Verification & Feedback Loop

Priority: 🟡 MEDIUM Estimate: 4-6 hours

Task ID	Description	File	Status
P3.1	Implement pixel diff comparison	`visual-awareness.js`	[ ]
P3.2	Create heatmap overlay rendering	`overlay.js`	[ ]
P3.3	Add verification summary to response	`ai-service.js`	[ ]
P3.4	Implement retry logic on failure	`ai-service.js`	[ ]

Phase 4: Enhanced Context

Priority: 🟢 LOW Estimate: 2-4 hours

Task ID	Description	File	Status
P4.1	Track focused control	`visual-awareness.js`	[ ]
P4.2	Extract caret position	`visual-awareness.js`	[ ]
P4.3	Populate window zOrder	`visual-awareness.js`	[ ]
P4.4	Add focus info to AI context	`ai-service.js`	[ ]

Testing Checklist

Unit Tests

grid-math.js coordinate conversions
inspect-types.js region functions
system-automation.js action parsing

Integration Tests

Overlay renders with electronAPI available
Inspect mode detects UI elements
AI receives inspect context in prompt
Actions execute at correct coordinates

Manual Verification

Launch app: npm start
Overlay dots visible on screen
Ctrl+Alt+I toggles inspect mode
Hover shows tooltips on detected regions
Chat AI can describe screen contents
Chat AI can click detected buttons
Screenshot diff shows changes after action

Architecture Notes

Local Agent Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Electron Main Process                    │
├─────────────────────────────────────────────────────────────┤
│  ai-service.js      │ system-automation.js │ visual-awareness │
│  - Multi-provider   │ - Mouse/keyboard     │ - Screen capture  │
│  - Context builder  │ - PowerShell exec    │ - UIAutomation    │
│  - Action parser    │ - Grid math          │ - OCR integration │
├─────────────────────────────────────────────────────────────┤
│                       IPC Bridge                             │
├─────────────────────────────────────────────────────────────┤
│        Overlay Renderer        │       Chat Renderer         │
│  - Canvas grid dots            │  - Message history          │
│  - Inspect region boxes        │  - Action confirmation      │
│  - Tooltip rendering           │  - Visual context preview   │
└─────────────────────────────────────────────────────────────┘

CLI Integration Path

Future: The Electron features can be exposed to a CLI interface:

liku inspect - Launch overlay in inspect mode
liku click <targetId> - Click a detected region
liku capture - Take screenshot and analyze
liku ask "click the submit button" - Natural language action

Success Criteria

Baseline Application Complete When:

✅ Overlay renders dot grid without errors
✅ Inspect mode detects and displays UI regions
✅ AI receives region data in context
✅ AI can target regions by ID or coordinates
✅ Actions execute and verify outcomes
✅ Screenshot diff shows what changed

Definition of Done for Each Task:

Code implemented and compiles without errors
Feature works in manual testing
No regression in existing functionality
Console shows expected log messages

Changelog

Date	Version	Changes
2026-01-28	0.0.2	Identified critical blockers, created baseline roadmap
TBD	0.1.0	Phase 1 complete - Overlay functional
TBD	0.2.0	Phase 2 complete - targetId actions working
TBD	0.3.0	Phase 3 complete - Verification loop
TBD	1.0.0	Baseline application complete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Copilot CLI Baseline Application - Implementation Roadmap

Vision: Local Agentic Desktop Assistant

🔴 CRITICAL BLOCKERS

BLOCKER-1: Preload Script Failure ✅ FIXED

BLOCKER-2: PowerShell Here-String Syntax Broken ✅ FIXED

BLOCKER-3: Click-Through Failure on Background Windows ✅ FIXED

🟠 MISSING FEATURES

FEATURE-3: targetId-Based Actions (M5.2)

FEATURE-4: Inject generateAIInstructions()

FEATURE-5: Screenshot Diff Heatmap (M4.2)

FEATURE-6: Verification Summary (M4.3)

FEATURE-7: Input Focus Tracking

FEATURE-8: Window zOrder Population

🟡 PARTIAL IMPLEMENTATIONS

PARTIAL-9: Inspect Regions Display (A1-A3)

PARTIAL-10: AI Context Payload (M5.1)

Implementation Phases

Phase 1: Critical Fixes (Unblock Overlay)

Phase 2: Core Functionality

Phase 3: Verification & Feedback Loop

Phase 4: Enhanced Context

Testing Checklist

Unit Tests

Integration Tests

Manual Verification

Architecture Notes

Local Agent Architecture

CLI Integration Path

Success Criteria

Baseline Application Complete When:

Definition of Done for Each Task:

Changelog

FilesExpand file tree

baseline-app.md

Latest commit

History

baseline-app.md

File metadata and controls

Copilot CLI Baseline Application - Implementation Roadmap

Vision: Local Agentic Desktop Assistant

🔴 CRITICAL BLOCKERS

BLOCKER-1: Preload Script Failure ✅ FIXED

BLOCKER-2: PowerShell Here-String Syntax Broken ✅ FIXED

BLOCKER-3: Click-Through Failure on Background Windows ✅ FIXED

🟠 MISSING FEATURES

FEATURE-3: targetId-Based Actions (M5.2)

FEATURE-4: Inject generateAIInstructions()

FEATURE-5: Screenshot Diff Heatmap (M4.2)

FEATURE-6: Verification Summary (M4.3)

FEATURE-7: Input Focus Tracking

FEATURE-8: Window zOrder Population

🟡 PARTIAL IMPLEMENTATIONS

PARTIAL-9: Inspect Regions Display (A1-A3)

PARTIAL-10: AI Context Payload (M5.1)

Implementation Phases

Phase 1: Critical Fixes (Unblock Overlay)

Phase 2: Core Functionality

Phase 3: Verification & Feedback Loop

Phase 4: Enhanced Context

Testing Checklist

Unit Tests

Integration Tests

Manual Verification

Architecture Notes

Local Agent Architecture

CLI Integration Path

Success Criteria

Baseline Application Complete When:

Definition of Done for Each Task:

Changelog