Package: azure-ai-agentserver-langgraph==1.0.0b17 (with azure-ai-agentserver-core==1.0.0b17)
Python: 3.11
Describe the bug
When a hosted LangGraph agent reconstructs conversation history from the stored conversation, image (and file) content blocks on prior-turn messages are silently dropped.
ResponseAPIDefaultConverter._fetch_historical_items (in azure/ai/agentserver/langgraph/models/response_api_default_converter.py) converts each stored item with convert_item_resource_to_message (in .../response_api_request_converter.py). For message items, that function extracts only input_text / output_text / text blocks and collapses the message content to a plain string. Any input_image / image / input_file block is discarded.
As a result, a multimodal agent loses uploaded images on every turn after the one they were sent, whenever the in-memory checkpoint is cold and history is rebuilt from the store — i.e. routinely in production: per-replica MemorySaver, scale-to-zero, restarts, multiple replicas. The image is correctly persisted to the conversation store; it just isn't carried back into the model's context on replay, so the model behaves as if it never saw the image (and frequently confabulates about its contents rather than reporting that it cannot see it).
This affects both transports — inline base64 image blocks and input_image with a file_id — so it is a general multimodal-history gap, not transport-specific.
Minimal repro
from azure.ai.agentserver.langgraph.models.response_api_request_converter import (
convert_item_resource_to_message,
)
# Shape returned by conversations.items.list() for a user turn that included an
# inline base64 image (the file_id form behaves identically):
item = {
"type": "message",
"role": "user",
"content": [
{"type": "input_text", "text": "What is in this image?"},
{
"type": "input_image",
"image_url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNk+M8AAAMBAQDJ/pLvAAAAAElFTkSuQmCC",
},
],
}
message = convert_item_resource_to_message(item)
print(repr(message.content))
# -> 'What is in this image?'
# The input_image block is gone; content has been flattened to a string.
Expected behavior
History reconstruction should preserve non-text content blocks (image/file), so multimodal turns survive multi-turn conversations — e.g. message.content remains the list including the image block.
Actual behavior
message.content is a plain string containing only the text; the image/file block is dropped.
Impact
Multi-turn vision is broken for hosted LangGraph agents whenever the checkpoint is cold (the common case in production). Single-turn works, because the current request's input goes through a different, content-preserving path (convert_message / convert_OpenAIItemContentList); only the historical path flattens.
Package:
azure-ai-agentserver-langgraph==1.0.0b17(withazure-ai-agentserver-core==1.0.0b17)Python: 3.11
Describe the bug
When a hosted LangGraph agent reconstructs conversation history from the stored conversation, image (and file) content blocks on prior-turn messages are silently dropped.
ResponseAPIDefaultConverter._fetch_historical_items(inazure/ai/agentserver/langgraph/models/response_api_default_converter.py) converts each stored item withconvert_item_resource_to_message(in.../response_api_request_converter.py). Formessageitems, that function extracts onlyinput_text/output_text/textblocks and collapses the messagecontentto a plain string. Anyinput_image/ image /input_fileblock is discarded.As a result, a multimodal agent loses uploaded images on every turn after the one they were sent, whenever the in-memory checkpoint is cold and history is rebuilt from the store — i.e. routinely in production: per-replica
MemorySaver, scale-to-zero, restarts, multiple replicas. The image is correctly persisted to the conversation store; it just isn't carried back into the model's context on replay, so the model behaves as if it never saw the image (and frequently confabulates about its contents rather than reporting that it cannot see it).This affects both transports — inline base64 image blocks and
input_imagewith afile_id— so it is a general multimodal-history gap, not transport-specific.Minimal repro
Expected behavior
History reconstruction should preserve non-text content blocks (image/file), so multimodal turns survive multi-turn conversations — e.g.
message.contentremains the list including the image block.Actual behavior
message.contentis a plain string containing only the text; the image/file block is dropped.Impact
Multi-turn vision is broken for hosted LangGraph agents whenever the checkpoint is cold (the common case in production). Single-turn works, because the current request's input goes through a different, content-preserving path (
convert_message/convert_OpenAIItemContentList); only the historical path flattens.