Skip to content

[C++] Unique of StringView array incorrect with nulls #39635

Description

@jorisvandenbossche

Describe the bug, including details regarding any error messages, version, and platform.

Not a C++ reproducer, but with very preliminary (non-merged) Python bindings to illustrate it:

builder = pa.lib.StringViewBuilder()
builder.append("test")
builder.append("very long string that is not inlined")
builder.append(None)
builder.append("test")

>>> arr = builder.finish()
>>> arr
<pyarrow.lib.Array object at 0x7f9a2e1fc4c0>
[
  "test",
  "very long string that is not inlined",
  null,
  "test"
]
>>> arr.type
DataType(string_view)

Calculating the unique values of this array includes the missing value as an empty string:

>>> arr.unique()
<pyarrow.lib.Array object at 0x7f9a2e45fe20>
[
  "test",
  "very long string that is not inlined",
  ""
]

I didn't check in the code, but I assume that it's "just" missing the validity bitmap (the empty string being the value that would otherwise be masked).

Component(s)

C++

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions