You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Judge RETRY = Same station sends work back to AI to fix
Graph loop = Route to a fix step, then back to validation
Human review = Person must sign off before the line continues
flowchart TB
subgraph whole_job [Whole job — GOAL level]
G[Goal: mission + final checklist + rules]
end
subgraph one_step [One step — NODE level]
N[Node: do one piece of work]
O[Required outputs — output_keys]
J[Judge: good enough?]
end
G --> N
N --> O --> J
J -->|RETRY| N
J -->|ACCEPT| Next[Next step]
Behavior: Second LLM quality check via conversation_judge.py
Status: Supported in code, rarely used in templates today
Auto-correction — existing implementation
Do not rebuild — document, test, and fill gaps.
Capability
Code location
Step judge (missing output_keys → RETRY)
event_loop/node.py → _evaluate()
Feedback to LLM on retry
[Judge feedback]: ... via add_user_message()
Optional Level 2 quality judge
conversation_judge.py + success_criteria
Per-step retry limit
loop_config.max_iterations
Between-step loops (validate → fix)
Conditional EdgeSpec in graph/edge.py
Retry telemetry (partial)
ExecutionResult.total_retries, runtime logs
Whole-job Goal scorecard
OutcomeAggregator — tracks only, no full-agent auto-retry
Not built: separate EvaluationNode — judge runs inside each event_loop step.
flowchart TD
A[EventLoopNode] --> B{_evaluate / judge}
B -->|ACCEPT| C[Next step]
B -->|RETRY| A
B -->|max iterations| D[Step fails]
Loading
All four feedback mechanisms
How correction, routing, approval, and scoring fit together in one agent run:
flowchart TB
subgraph goal_layer [End of job — measurement only]
G[Goal checklist in agent.py]
OA[OutcomeAggregator — final score / KPIs]
G --> OA
end
subgraph step_judge [Inside one step — Judge RETRY]
N[EventLoopNode: AI work]
J{output_keys complete?}
RF["[Judge feedback] → retry"]
N --> J
J -->|no| RF --> N
J -->|yes| OUT[Step outputs to shared memory]
J -->|max iterations| FAIL[Step fails]
end
subgraph graph_loop [Between steps — validate → fix loop]
V[Validate step]
FX[Fix / remap step]
V -->|fail| FX --> V
V -->|pass| NEXT[Continue graph]
end
subgraph human [Human review — pause_nodes]
P[Execution PAUSED]
APP[Approver in web dashboard]
INJ[inject_input → resume]
P --> APP --> INJ
end
OUT --> V
NEXT --> goal_layer
V -->|needs approver| P
INJ --> V
Loading
Retry vs human review
Mechanism
Who acts
When
Judge RETRY
AI, same step
Missing outputs
Graph loop
Another step
validate → fix edges
Human pause
Person in dashboard
pause_nodes / approval
Goal criteria
Measurement only
End of run / KPIs
ESCALATE (judge) = step fails. Not equivalent to human review.
Deliverables
Docs
docs/GOALS.md — overview + diagrams (this issue is the spec)
Overview
Document how EngineX evaluates success at each layer — and audit what already exists in code.
Every agent has:
These layers are often conflated. This ticket covers documentation (with diagrams), tests, and gap analysis.
Assignee: @P00rkavi
Supersedes: #1 (closed — scope merged here)
Ticket metadata
Conceptual model
Think of an agent as a factory line:
Common misconceptions
agent.pypause_nodesThe three layers
Layer 1 — Goal (whole job checklist)
examples/templates/<agent>/agent.pycore/engine/graph/goal.py,core/engine/runtime/outcome_aggregator.pyLayer 2 — Node outputs (did this step finish?)
NodeSpec.output_keysinnodes/__init__.pyloop_config.max_iterations)core/engine/graph/event_loop/node.py→_evaluate()Layer 3 — Node success_criteria (optional quality rubric)
NodeSpec.success_criteriaconversation_judge.pyAuto-correction — existing implementation
Do not rebuild — document, test, and fill gaps.
output_keys→ RETRY)event_loop/node.py→_evaluate()[Judge feedback]: ...viaadd_user_message()conversation_judge.py+success_criterialoop_config.max_iterationsEdgeSpecingraph/edge.pyExecutionResult.total_retries, runtime logsOutcomeAggregator— tracks only, no full-agent auto-retryNot built: separate
EvaluationNode— judge runs inside each event_loop step.flowchart TD A[EventLoopNode] --> B{_evaluate / judge} B -->|ACCEPT| C[Next step] B -->|RETRY| A B -->|max iterations| D[Step fails]All four feedback mechanisms
How correction, routing, approval, and scoring fit together in one agent run:
flowchart TB subgraph goal_layer [End of job — measurement only] G[Goal checklist in agent.py] OA[OutcomeAggregator — final score / KPIs] G --> OA end subgraph step_judge [Inside one step — Judge RETRY] N[EventLoopNode: AI work] J{output_keys complete?} RF["[Judge feedback] → retry"] N --> J J -->|no| RF --> N J -->|yes| OUT[Step outputs to shared memory] J -->|max iterations| FAIL[Step fails] end subgraph graph_loop [Between steps — validate → fix loop] V[Validate step] FX[Fix / remap step] V -->|fail| FX --> V V -->|pass| NEXT[Continue graph] end subgraph human [Human review — pause_nodes] P[Execution PAUSED] APP[Approver in web dashboard] INJ[inject_input → resume] P --> APP --> INJ end OUT --> V NEXT --> goal_layer V -->|needs approver| P INJ --> VRetry vs human review
pause_nodes/ approvalESCALATE (judge) = step fails. Not equivalent to human review.
Deliverables
Docs
docs/GOALS.md— overview + diagrams (this issue is the spec)examples/templates/hourly_tracking/+docs/ENGINEX_COMPLETE_GUIDE.mdSection 6–8Tests
output_keysmissing —test_event_loop_missing_output_keys_retried,test_event_loop_node.pyExamples
NodeSpec.success_criteria—meeting_scheduler,agreement_analysisnodesOptional (P1)
docs/ENGINEX_COMPLETE_GUIDE.mdSection 22Out of scope: new
EvaluationNodeunless audit identifies a gap.Reference templates
agreement_analysis— HITL + judge RETRY on extractlog_monitor— timer + conditional edges + human reviewhourly_tracking([P0][Phase 1][Agent] Build Enterprise-Grade Hourly Tracking Agent #2) — validate → fix graph loopDefinition of done
docs/GOALS.mdpublished and linked from READMEsuccess_criteriaexample in templates