Summary
Add direct backend characterization tests for the existing exam grading helper behavior before any future extraction or modification.
Background / Context
The V4 refactoring plan identified exam grading as behavior-sensitive. backend/app/presentation/exam_grading.py contains pure grading helpers and async grading orchestration that are mostly covered indirectly today. Before moving or changing that code, we need direct tests that pin current scoring, result shape, retry, and fallback behavior.
Problem
Exam grading behavior affects user scores and exam results. Without direct tests, later refactors could silently change grading status values, scores, retry counts, or returned dict shapes.
Goal / Expected Behavior
Create passing tests that document the current behavior of exam grading and exam helper functions without changing production code.
Scope
This issue should cover:
- Create
backend/tests/presentation/test_exam_grading.py.
- Create
backend/tests/presentation/test_exam_helpers.py.
- Cover
grade_objective_item, grade_short_answer_item, build_tracking_update, build_grading_result, build_exam_summary, and make_exam_item.
- Cover all existing
ProblemType branches and the PENDING_REVIEW fallback path.
Out of Scope
This issue should not cover:
- Refactoring production grading code.
- Changing scoring thresholds or grading algorithms.
- Changing exam route handlers.
- Moving grading helpers to another package.
Chosen Implementation Approach
Use characterization tests only. Build minimal representative item dictionaries and light test doubles for VLM/storage where async grading requires them. Assert the exact current output shape and current retry/fallback behavior.
Implementation Plan
The implementor should:
- Inspect
exam_grading.py and exam_helpers.py to enumerate inputs and return shapes.
- Add pure helper tests for objective grading, result construction, and tracking updates.
- Add async tests for short-answer grading success, retryable failure, non-retryable failure, and unexpected exception fallback.
- Add exam helper tests for summary/item construction where behavior is currently indirect.
- Run the targeted tests and then the full backend suite.
Relevant Files / Areas
Likely relevant areas:
backend/app/presentation/exam_grading.py
backend/app/presentation/exam_helpers.py
backend/tests/presentation/test_exam_grading.py
backend/tests/presentation/test_exam_helpers.py
Tests Required
The implementor must add or update automated tests covering:
- Direct unit tests for grading result dict shape.
- Objective grading tests across supported problem types.
- Short-answer VLM success and fallback tests.
- Tracking update tests for correct and incorrect answers.
- Exam helper tests for existing summary/item behavior.
At minimum, tests should verify:
- Correct objective answers produce the same status and score as before.
- Incorrect or missing answers produce the same status and score as before.
- Retryable VLM errors retry according to the current cap and then fall back to
PENDING_REVIEW.
build_grading_result includes the exact expected keys.
Manual Verification / Self-Check
Before claiming this issue is done, the implementor must:
- Run the relevant automated test suite.
- Manually verify the main behavior described in this issue when applicable.
- Verify that no related existing behavior regressed.
- Record the exact commands run and their results in the PR description.
Suggested verification commands:
cd backend && uv run --frozen pytest tests/presentation/test_exam_grading.py
cd backend && uv run --frozen pytest tests/presentation/test_exam_helpers.py
cd backend && uv run --frozen pytest
Reviewer Acceptance Checklist
The reviewer should verify that:
Dependencies
None.
Follow-Up Work
Future exam-domain extraction may use these tests as a safety net, but that extraction is not part of this issue.
Definition of Done
This issue is done when:
- The new presentation test files exist and pass.
- The full backend suite passes.
- No production code is changed except test-only support if absolutely necessary and justified.
- The PR description records exact verification commands and results.
Summary
Add direct backend characterization tests for the existing exam grading helper behavior before any future extraction or modification.
Background / Context
The V4 refactoring plan identified exam grading as behavior-sensitive.
backend/app/presentation/exam_grading.pycontains pure grading helpers and async grading orchestration that are mostly covered indirectly today. Before moving or changing that code, we need direct tests that pin current scoring, result shape, retry, and fallback behavior.Problem
Exam grading behavior affects user scores and exam results. Without direct tests, later refactors could silently change grading status values, scores, retry counts, or returned dict shapes.
Goal / Expected Behavior
Create passing tests that document the current behavior of exam grading and exam helper functions without changing production code.
Scope
This issue should cover:
backend/tests/presentation/test_exam_grading.py.backend/tests/presentation/test_exam_helpers.py.grade_objective_item,grade_short_answer_item,build_tracking_update,build_grading_result,build_exam_summary, andmake_exam_item.ProblemTypebranches and thePENDING_REVIEWfallback path.Out of Scope
This issue should not cover:
Chosen Implementation Approach
Use characterization tests only. Build minimal representative item dictionaries and light test doubles for VLM/storage where async grading requires them. Assert the exact current output shape and current retry/fallback behavior.
Implementation Plan
The implementor should:
exam_grading.pyandexam_helpers.pyto enumerate inputs and return shapes.Relevant Files / Areas
Likely relevant areas:
backend/app/presentation/exam_grading.pybackend/app/presentation/exam_helpers.pybackend/tests/presentation/test_exam_grading.pybackend/tests/presentation/test_exam_helpers.pyTests Required
The implementor must add or update automated tests covering:
At minimum, tests should verify:
PENDING_REVIEW.build_grading_resultincludes the exact expected keys.Manual Verification / Self-Check
Before claiming this issue is done, the implementor must:
Suggested verification commands:
Reviewer Acceptance Checklist
The reviewer should verify that:
Dependencies
None.
Follow-Up Work
Future exam-domain extraction may use these tests as a safety net, but that extraction is not part of this issue.
Definition of Done
This issue is done when: