Skip to content

[DEBT] Move review-finish label reconciliation into wf with thin fallback (Lever F2) #140

Description

@AdrienneBosch

Context

The code-review body asks the model to hand-execute a deterministic
procedure in Step 10/10b: remove stale state labels, apply exactly the
verdict label, then read back labels to verify and create-if-missing. That
is pure logic expressed as prose — paid in tokens every run and a recurring
"model did the label dance slightly wrong" bug class.

The repo already proved the better pattern: wf (pick, review-next,
update-next, post-merge, config) lifts deterministic workflow logic out of
the prompt into tested Python. This extends it to the review finish step.

See docs/context-optimization-plan.md → Lever F2. Gated: do this only
after Lever F1 (and ideally F3) have shown wf reliably carrying more of
the workflow, so we act on observed reliability, not faith.

Requirements (acceptance criteria)

  • New wf review-finish --verdict <approved|changes-requested|needs-discussion>
    subcommand performs the Step 10 label reconciliation + Step 10b readback
    verify (resolving label names by purpose key, guarded create-if-missing,
    no --force).
  • wf.py change is covered by the offline decision-logic tests
    (label-set-in → label-set-out for each verdict, including the
    create-if-missing path).
  • The body calls wf review-finish on the happy path and keeps a
    thin inline fallback ("apply the verdict label, remove the others" —
    just gh calls) for when wf errors or Python is absent. The verbose
    Step 10/10b prose is removed from the body.
  • Python is not made a hard dependency; the thin fallback preserves
    graceful degradation.
  • One real dry run confirms both the wf review-finish path and the
    inline fallback produce the correct label state.
  • Quality gate green; _shared-skills/ synced if applicable; versions
    bumped.

Savings goal

Eliminate (not defer) the ~30–40 lines of Step 10/10b label-reconcile
prose from the code-review body and remove the associated bug class
label state becomes a tested code path. Goal: the model never hand-executes
the label dance on the happy path.

Notes

Blocked-by/sequenced-after the Lever F1 story (#TBD) per the plan's
"earn wf-reliability evidence before pushing more logic down".

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions