Skip to content

Lz4CompressionCodec trusts the uncompressed-length prefix when decompressing #1206

Description

@Arawoof06

Describe the bug

Lz4CompressionCodec.doDecompress reads the 8-byte uncompressed-length prefix from the compressed buffer and then sets the output buffer's writerIndex to that value, without checking it against the number of bytes actually produced by the LZ4 stream:

long decompressedLength = readUncompressedLength(compressedBuffer);
...
byte[] outBytes = out.toByteArray();
ArrowBuf decompressedBuffer = allocator.buffer(outBytes.length);
decompressedBuffer.setBytes(0, outBytes);
decompressedBuffer.writerIndex(decompressedLength);

The buffer is allocated to outBytes.length (the real decompressed size), but writerIndex is set to the prefix value, which comes from the input. A buffer whose prefix claims a length larger than the real output yields an ArrowBuf whose writerIndex exceeds its capacity, so downstream consumers read off-heap memory past the allocation. The ZSTD codec already guards this exact case by comparing the claimed length against the actual decompressed size and throwing on a mismatch; the LZ4 codec does not.

Reproduction

Compress a small, highly compressible buffer, overwrite the 8-byte prefix with a large value, then decompress. The returned buffer has capacity() == 512 but writerIndex() == 1000000.

Component(s)

Java

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions