Skip to content

JsonParseNode.get_bytes_value() does not base64-decode Edm.Binary (asymmetric with write_bytes_value) #636

@RockyMM

Description

@RockyMM

Description

JsonParseNode.get_bytes_value() does not base64-decode Edm.Binary values. It returns the base64 string encoded to UTF-8, i.e. the raw base64 text as bytes, instead of the decoded binary.

This is asymmetric with JsonSerializationWriter.write_bytes_value(), which does base64-encode. So a bytes value written by kiota cannot be round-tripped back through kiota — the read side gives you the base64 text, not the original bytes.

Versions

  • microsoft-kiota-serialization-json 1.10.3 (behavior is identical on main)

Current implementation

Reader — packages/serialization/json/kiota_serialization_json/json_parse_node.py:

@staticmethod
def _get_bytes_value(value: Any) -> Optional[bytes]:
    # if the node is a string, we need to decode it
    # This ensures that the string is properly converted to bytes
    if isinstance(value, str):
        base64_string = value
    else:
        base64_string = json.dumps(value)
    if not base64_string:
        return None
    return base64_string.encode("utf-8")   # <-- never base64-decodes

The comment says "we need to decode it", but the code only UTF-8-encodes the base64 string.

Writer — packages/serialization/json/kiota_serialization_json/json_serialization_writer.py (correct, for contrast):

def write_bytes_value(self, key, value):
    if isinstance(value, bytes):
        base64_string = base64.b64encode(value).decode("utf-8")
        ...

Reproduction

import base64
from kiota_serialization_json.json_serialization_writer import JsonSerializationWriter
from kiota_serialization_json.json_parse_node import JsonParseNode

raw = b"PK\x03\x04hello"  # arbitrary binary

w = JsonSerializationWriter()
w.write_bytes_value("contentBytes", raw)
serialized = w.writer["contentBytes"]      # 'UEsDBGhlbGxv'  (base64 text — correct)

got = JsonParseNode(serialized).get_bytes_value()
print(repr(got))                           # b'UEsDBGhlbGxv'  <-- base64 TEXT, not the bytes
print(got == raw)                          # False
print(got == serialized.encode("utf-8"))   # True

Expected: get_bytes_value() returns b"PK\x03\x04hello".
Actual: it returns b"UEsDBGhlbGxv" (the base64 string as UTF-8 bytes).

Impact

Any Edm.Binary property deserialized via this code path is wrong. A concrete real-world case: Microsoft Graph fileAttachment.contentBytes comes back as the base64 text rather than the file. Saving attachment.content_bytes to disk yields a file that is the base64 of the real file — e.g. a downloaded .xlsx fails to open because the bytes are ASCII base64 (UEsDBBQA...), not the zip (PK\x03\x04...).

Suggested fix

Base64-decode in _get_bytes_value, mirroring the writer:

@staticmethod
def _get_bytes_value(value: Any) -> Optional[bytes]:
    if not isinstance(value, str) or not value:
        return None
    return base64.b64decode(value)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Needs Triage 🔍

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions