Skip to content

Memory leak in previousField of ValidatingRecordConsumer #3621

@njlaw

Description

@njlaw

Describe the bug, including details regarding any error messages, version, and platform.

I stumbled on this when an MQTT to parquet program that I'm working on kept getting OOM errors much sooner than I expected. I know the data needs to be retained in memory until it's written to disk, but the JVM was showing significantly more memory growth than the data. I probably wouldn't have noticed this except my data itself is largely uint8 and uint16 data from a PLC, so an extra ~24 bytes per field is a lot (relatively) in this case.

The .hprof showed:

Image

Tested on 1.17.1 and master@f280d557509da65df7e205dce6898bce9b4a396e.

It looks like when endField() is called, it pushes the current field onto the previousField queue instead of replacing the last previous field, so as records are processed, the previousField queue grows indefinitely in $O(\text{records} × \text{fields})$

public void endField(String field, int index) {
delegate.endField(field, index);
fieldValueCount.pop();
previousField.push(fields.pop());
}

Component(s)

Core

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions