Description
JsonParseNode.get_bytes_value() does not base64-decode Edm.Binary values. It returns the base64 string encoded to UTF-8, i.e. the raw base64 text as bytes, instead of the decoded binary.
This is asymmetric with JsonSerializationWriter.write_bytes_value(), which does base64-encode. So a bytes value written by kiota cannot be round-tripped back through kiota — the read side gives you the base64 text, not the original bytes.
Versions
microsoft-kiota-serialization-json 1.10.3 (behavior is identical on main)
Current implementation
Reader — packages/serialization/json/kiota_serialization_json/json_parse_node.py:
@staticmethod
def _get_bytes_value(value: Any) -> Optional[bytes]:
# if the node is a string, we need to decode it
# This ensures that the string is properly converted to bytes
if isinstance(value, str):
base64_string = value
else:
base64_string = json.dumps(value)
if not base64_string:
return None
return base64_string.encode("utf-8") # <-- never base64-decodes
The comment says "we need to decode it", but the code only UTF-8-encodes the base64 string.
Writer — packages/serialization/json/kiota_serialization_json/json_serialization_writer.py (correct, for contrast):
def write_bytes_value(self, key, value):
if isinstance(value, bytes):
base64_string = base64.b64encode(value).decode("utf-8")
...
Reproduction
import base64
from kiota_serialization_json.json_serialization_writer import JsonSerializationWriter
from kiota_serialization_json.json_parse_node import JsonParseNode
raw = b"PK\x03\x04hello" # arbitrary binary
w = JsonSerializationWriter()
w.write_bytes_value("contentBytes", raw)
serialized = w.writer["contentBytes"] # 'UEsDBGhlbGxv' (base64 text — correct)
got = JsonParseNode(serialized).get_bytes_value()
print(repr(got)) # b'UEsDBGhlbGxv' <-- base64 TEXT, not the bytes
print(got == raw) # False
print(got == serialized.encode("utf-8")) # True
Expected: get_bytes_value() returns b"PK\x03\x04hello".
Actual: it returns b"UEsDBGhlbGxv" (the base64 string as UTF-8 bytes).
Impact
Any Edm.Binary property deserialized via this code path is wrong. A concrete real-world case: Microsoft Graph fileAttachment.contentBytes comes back as the base64 text rather than the file. Saving attachment.content_bytes to disk yields a file that is the base64 of the real file — e.g. a downloaded .xlsx fails to open because the bytes are ASCII base64 (UEsDBBQA...), not the zip (PK\x03\x04...).
Suggested fix
Base64-decode in _get_bytes_value, mirroring the writer:
@staticmethod
def _get_bytes_value(value: Any) -> Optional[bytes]:
if not isinstance(value, str) or not value:
return None
return base64.b64decode(value)
Description
JsonParseNode.get_bytes_value()does not base64-decodeEdm.Binaryvalues. It returns the base64 string encoded to UTF-8, i.e. the raw base64 text as bytes, instead of the decoded binary.This is asymmetric with
JsonSerializationWriter.write_bytes_value(), which does base64-encode. So abytesvalue written by kiota cannot be round-tripped back through kiota — the read side gives you the base64 text, not the original bytes.Versions
microsoft-kiota-serialization-json1.10.3 (behavior is identical onmain)Current implementation
Reader —
packages/serialization/json/kiota_serialization_json/json_parse_node.py:The comment says "we need to decode it", but the code only UTF-8-encodes the base64 string.
Writer —
packages/serialization/json/kiota_serialization_json/json_serialization_writer.py(correct, for contrast):Reproduction
Expected:
get_bytes_value()returnsb"PK\x03\x04hello".Actual: it returns
b"UEsDBGhlbGxv"(the base64 string as UTF-8 bytes).Impact
Any
Edm.Binaryproperty deserialized via this code path is wrong. A concrete real-world case: Microsoft GraphfileAttachment.contentBytescomes back as the base64 text rather than the file. Savingattachment.content_bytesto disk yields a file that is the base64 of the real file — e.g. a downloaded.xlsxfails to open because the bytes are ASCII base64 (UEsDBBQA...), not the zip (PK\x03\x04...).Suggested fix
Base64-decode in
_get_bytes_value, mirroring the writer: