🧠 Context
Right now the source URLs show up twice for every answer:
- The system prompt (
src/completions/prompts.py) tells the model to end its answer with its own Sources: block, so the model writes a list of URLs into the answer text.
completion_service.ask() also attaches the retrieved chunks' URLs to the structured Answer.sources field, and the renderers (the dev CLI today, the Discord bot) print that field underneath the answer text.
Net result: the answer text contains a Sources: list, and then a second Sources: list is printed right after it from Answer.sources. The two can even disagree.
This ticket removes the duplication by making the structured Answer.sources field the single source of truth and stopping the model from writing its own list. The user should see the sources exactly once — rendered from Answer.sources.
Out of scope (deliberately): deciding which of the retrieved URLs the model actually used. Answer.sources will continue to list all retrieved chunks' URLs (bounded to a small number by TOP_K). Occasionally a retrieved-but-not-very-relevant page may be listed — that is accepted for now. Refining the list down to only the pages the model actually used is a possible later effort and is not part of this ticket.
🛠 Implementation Plan
-
src/completions/prompts.py — stop the model from listing sources.
- Remove the instruction that tells the model to end with a
Sources: block.
- Replace it with an explicit negative instruction, e.g. "Do not list sources, URLs, or a 'Sources:' section. Just write the answer — the source links are attached automatically."
- Keep the
[Source: <url>] markers in the context block (that's how the model sees which page each chunk came from — we're only changing whether it echoes them back).
-
src/completions/services/completion_service.py — keep Answer.sources as-is, defend the text.
- Leave
Answer.sources = _extract_sources(chunks) unchanged: all retrieved chunks' URLs, already deduped by _extract_sources. No change to confidence ("high" on this path).
- As a safety net for the case where the model ignores the instruction and writes a
Sources: block anyway, add a small pure helper that strips a trailing, line-anchored Sources: block from the model's text before it becomes Answer.text:
- Match a header that is its own line —
^Sources: with multiline matching — and keep everything before the last such line as the prose. If there's no such line, return the text unchanged.
- This is line-anchored on purpose so the word "sources" appearing mid-sentence in normal prose does not truncate the answer.
- Important design property: this strip only edits the display text. It must never feed back into
Answer.sources (which is independently _extract_sources(chunks)). So even a wrong strip can only drop some trailing text — it can never change which sources are cited. That's what makes this safe, unlike trying to parse the model's block to decide the citation list.
📝 Notes
- Do not change
src/domain/types.py. The Answer field shapes already support this — you are changing what fills Answer.text, not the contract.
- You are not removing the
sources field from Answer. Renderers print sources from that field; you're only removing the duplicate Sources: text the model embedded in its prose.
- The abstain path (no chunks retrieved) is unchanged — it returns before the model runs.
- The prompt instruction is the primary mechanism; the strip is a fallback for non-compliant model output. Unusual formats (e.g. an inline
Sources: not on its own line) may slip through the strip — that's acceptable, because the worst case is occasional cosmetic duplication, never a wrong source list.
- Refining
Answer.sources to only the pages the model actually used is intentionally left for a later effort; do not attempt URL parsing/matching here.
✅ Acceptance Criteria
- The system prompt no longer instructs the model to list sources, and instructs it not to.
Answer.sources is unchanged in shape: the retrieved chunks' URLs, deduped, no domain/types.py change.
- The strip logic lives in a pure helper (input: model text → output: text with any trailing
Sources: block removed) and is unit-tested offline (no Ollama or DB).
- Tests cover: a trailing line-anchored
Sources: block is stripped; the word "sources" used mid-sentence in the prose is not stripped; text with no Sources: block is returned unchanged.
- End to end, the user sees the source list exactly once (from
Answer.sources), not twice.
make test and make lint pass.
🧠 Context
Right now the source URLs show up twice for every answer:
src/completions/prompts.py) tells the model to end its answer with its ownSources:block, so the model writes a list of URLs into the answer text.completion_service.ask()also attaches the retrieved chunks' URLs to the structuredAnswer.sourcesfield, and the renderers (the dev CLI today, the Discord bot) print that field underneath the answer text.Net result: the answer text contains a
Sources:list, and then a secondSources:list is printed right after it fromAnswer.sources. The two can even disagree.This ticket removes the duplication by making the structured
Answer.sourcesfield the single source of truth and stopping the model from writing its own list. The user should see the sources exactly once — rendered fromAnswer.sources.Out of scope (deliberately): deciding which of the retrieved URLs the model actually used.
Answer.sourceswill continue to list all retrieved chunks' URLs (bounded to a small number byTOP_K). Occasionally a retrieved-but-not-very-relevant page may be listed — that is accepted for now. Refining the list down to only the pages the model actually used is a possible later effort and is not part of this ticket.🛠 Implementation Plan
src/completions/prompts.py— stop the model from listing sources.Sources:block.[Source: <url>]markers in the context block (that's how the model sees which page each chunk came from — we're only changing whether it echoes them back).src/completions/services/completion_service.py— keepAnswer.sourcesas-is, defend the text.Answer.sources = _extract_sources(chunks)unchanged: all retrieved chunks' URLs, already deduped by_extract_sources. No change toconfidence("high"on this path).Sources:block anyway, add a small pure helper that strips a trailing, line-anchoredSources:block from the model's text before it becomesAnswer.text:^Sources:with multiline matching — and keep everything before the last such line as the prose. If there's no such line, return the text unchanged.Answer.sources(which is independently_extract_sources(chunks)). So even a wrong strip can only drop some trailing text — it can never change which sources are cited. That's what makes this safe, unlike trying to parse the model's block to decide the citation list.📝 Notes
src/domain/types.py. TheAnswerfield shapes already support this — you are changing what fillsAnswer.text, not the contract.sourcesfield fromAnswer. Renderers print sources from that field; you're only removing the duplicateSources:text the model embedded in its prose.Sources:not on its own line) may slip through the strip — that's acceptable, because the worst case is occasional cosmetic duplication, never a wrong source list.Answer.sourcesto only the pages the model actually used is intentionally left for a later effort; do not attempt URL parsing/matching here.✅ Acceptance Criteria
Answer.sourcesis unchanged in shape: the retrieved chunks' URLs, deduped, nodomain/types.pychange.Sources:block removed) and is unit-tested offline (no Ollama or DB).Sources:block is stripped; the word "sources" used mid-sentence in the prose is not stripped; text with noSources:block is returned unchanged.Answer.sources), not twice.make testandmake lintpass.