Skip to content

GPU inference unreachable: [gpu] extra wheel collision + model() creates session without explicit providers #3

@titusz

Description

@titusz

Summary

GPU inference is currently unreachable in iscc-sci for two independent reasons — both need fixing before [gpu] can work:

  1. Packaging: the gpu extra adds onnxruntime-gpu on top of the unconditional base dependency onnxruntime (pyproject: base dep + gpu = ["onnxruntime-gpu"]). Both wheels ship the same onnxruntime package directory and collide in site-packages; in practice the CPU wheel wins and CUDA is silently unavailable. Identical flaw and fix options as [gpu] extra is a no-op: base onnxruntime dependency shadows onnxruntime-gpu (silent CPU fallback) iscc-sct#23.

  2. Session creation never selects CUDA: model() in iscc_sci/code_semantic_image.py creates the session without a providers argument:

    _model = rt.InferenceSession(model_path)

    Verified with onnxruntime-gpu 1.26.0 correctly installed on a CUDA-capable machine (RTX 3090 Ti, CUDA 12.9, cuDNN 9.17): the call succeeds but the resulting session reports providers=['CPUExecutionProvider'] — CUDA is simply not selected. So even with the packaging fixed, iscc-sci would still run on CPU.

Suggested fix

  • Mirror iscc-sct's provider selection in model():

    available = rt.get_available_providers()
    providers = ["CPUExecutionProvider"]
    if "CUDAExecutionProvider" in available:
        providers.insert(0, "CUDAExecutionProvider")
    _model = rt.InferenceSession(model_path, providers=providers)

    (Optionally set SessionOptions.graph_optimization_level = ORT_ENABLE_ALL for parity with iscc-sct.)

  • Apply the same packaging fix as decided for [gpu] extra is a no-op: base onnxruntime dependency shadows onnxruntime-gpu (silent CPU fallback) iscc-sct#23 (exclusive [cpu]/[gpu] extras, or drop the [gpu] extra and document the wheel-swap workaround).

Expected impact

Not benchmarked for iscc-sci specifically, but the analogous fix in iscc-sct (transformer ONNX model, CUDA EP, RTX 3090 Ti) cut embedding time by 16.8x with bit-identical output, which is indicative for the vision transformer used here.

Environment

Windows 10, Python 3.13, iscc-sci 0.2.0 (also verified on current main), onnxruntime/onnxruntime-gpu 1.26.0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions