Describe the bug, including details regarding any error messages, version, and platform.
Not a C++ reproducer, but with very preliminary (non-merged) Python bindings to illustrate it:
builder = pa.lib.StringViewBuilder()
builder.append("test")
builder.append("very long string that is not inlined")
builder.append(None)
builder.append("test")
>>> arr = builder.finish()
>>> arr
<pyarrow.lib.Array object at 0x7f9a2e1fc4c0>
[
"test",
"very long string that is not inlined",
null,
"test"
]
>>> arr.type
DataType(string_view)
Calculating the unique values of this array includes the missing value as an empty string:
>>> arr.unique()
<pyarrow.lib.Array object at 0x7f9a2e45fe20>
[
"test",
"very long string that is not inlined",
""
]
I didn't check in the code, but I assume that it's "just" missing the validity bitmap (the empty string being the value that would otherwise be masked).
Component(s)
C++
Describe the bug, including details regarding any error messages, version, and platform.
Not a C++ reproducer, but with very preliminary (non-merged) Python bindings to illustrate it:
Calculating the
uniquevalues of this array includes the missing value as an empty string:I didn't check in the code, but I assume that it's "just" missing the validity bitmap (the empty string being the value that would otherwise be masked).
Component(s)
C++