Skip to content

[azure-ai-agentserver-langgraph] Historical conversation items drop image/file content blocks when rebuilding from store #47391

@niallkeys

Description

@niallkeys

Package: azure-ai-agentserver-langgraph==1.0.0b17 (with azure-ai-agentserver-core==1.0.0b17)
Python: 3.11

Describe the bug

When a hosted LangGraph agent reconstructs conversation history from the stored conversation, image (and file) content blocks on prior-turn messages are silently dropped.

ResponseAPIDefaultConverter._fetch_historical_items (in azure/ai/agentserver/langgraph/models/response_api_default_converter.py) converts each stored item with convert_item_resource_to_message (in .../response_api_request_converter.py). For message items, that function extracts only input_text / output_text / text blocks and collapses the message content to a plain string. Any input_image / image / input_file block is discarded.

As a result, a multimodal agent loses uploaded images on every turn after the one they were sent, whenever the in-memory checkpoint is cold and history is rebuilt from the store — i.e. routinely in production: per-replica MemorySaver, scale-to-zero, restarts, multiple replicas. The image is correctly persisted to the conversation store; it just isn't carried back into the model's context on replay, so the model behaves as if it never saw the image (and frequently confabulates about its contents rather than reporting that it cannot see it).

This affects both transports — inline base64 image blocks and input_image with a file_id — so it is a general multimodal-history gap, not transport-specific.

Minimal repro

from azure.ai.agentserver.langgraph.models.response_api_request_converter import (
    convert_item_resource_to_message,
)

# Shape returned by conversations.items.list() for a user turn that included an
# inline base64 image (the file_id form behaves identically):
item = {
    "type": "message",
    "role": "user",
    "content": [
        {"type": "input_text", "text": "What is in this image?"},
        {
            "type": "input_image",
            "image_url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNk+M8AAAMBAQDJ/pLvAAAAAElFTkSuQmCC",
        },
    ],
}

message = convert_item_resource_to_message(item)
print(repr(message.content))
# -> 'What is in this image?'
# The input_image block is gone; content has been flattened to a string.

Expected behavior

History reconstruction should preserve non-text content blocks (image/file), so multimodal turns survive multi-turn conversations — e.g. message.content remains the list including the image block.

Actual behavior

message.content is a plain string containing only the text; the image/file block is dropped.

Impact

Multi-turn vision is broken for hosted LangGraph agents whenever the checkpoint is cold (the common case in production). Single-turn works, because the current request's input goes through a different, content-preserving path (convert_message / convert_OpenAIItemContentList); only the historical path flattens.

Metadata

Metadata

Assignees

No one assigned

    Labels

    customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-triageWorkflow: This is a new issue that needs to be triaged to the appropriate team.questionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions