Skip to content

Local OCR (Tesseract) for text-in-image search #4

Description

@exekyute

Add optional OCR so scanned documents and text-in-image become findable.

What to do: wire a pluggable OCR step into the ingest pipeline (keepstack/ingest.py) that runs Tesseract when it is available and writes the extracted text into the FTS index. Keep it optional like the AI features, with no hard dependency.

Acceptance: uploading an image with visible text makes that text searchable, and the feature is a no-op when Tesseract is not installed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions