Describe the bug, including details regarding any error messages, version, and platform.
I stumbled on this when an MQTT to parquet program that I'm working on kept getting OOM errors much sooner than I expected. I know the data needs to be retained in memory until it's written to disk, but the JVM was showing significantly more memory growth than the data. I probably wouldn't have noticed this except my data itself is largely uint8 and uint16 data from a PLC, so an extra ~24 bytes per field is a lot (relatively) in this case.
The .hprof showed:
Tested on 1.17.1 and master@f280d557509da65df7e205dce6898bce9b4a396e.
It looks like when endField() is called, it pushes the current field onto the previousField queue instead of replacing the last previous field, so as records are processed, the previousField queue grows indefinitely in $O(\text{records} × \text{fields})$
|
public void endField(String field, int index) { |
|
delegate.endField(field, index); |
|
fieldValueCount.pop(); |
|
previousField.push(fields.pop()); |
|
} |
Component(s)
Core
Describe the bug, including details regarding any error messages, version, and platform.
I stumbled on this when an MQTT to parquet program that I'm working on kept getting OOM errors much sooner than I expected. I know the data needs to be retained in memory until it's written to disk, but the JVM was showing significantly more memory growth than the data. I probably wouldn't have noticed this except my data itself is largely uint8 and uint16 data from a PLC, so an extra ~24 bytes per field is a lot (relatively) in this case.
The .hprof showed:
Tested on 1.17.1 and master@f280d557509da65df7e205dce6898bce9b4a396e.
It looks like when$O(\text{records} × \text{fields})$
endField()is called, it pushes the current field onto thepreviousFieldqueue instead of replacing the last previous field, so as records are processed, thepreviousFieldqueue grows indefinitely inparquet-java/parquet-column/src/main/java/org/apache/parquet/io/ValidatingRecordConsumer.java
Lines 112 to 116 in f280d55
Component(s)
Core