Describe the enhancement requested
Implement a kernel that replicates the semantics of NumPy’s searchsorted function (without the sorter argument).
The kernel should support all primitive types as well as run-end encoded arrays.
Example API
>>> sorted_array = pa.array()
>>> to_search = pa.array()
>>> pc.search_sorted(sorted_array, to_search, side='left')
<pyarrow.lib.UInt64Array>
[
0,
1,
3,
5
]
>>> pc.search_sorted(sorted_array, to_search, side='right')
<pyarrow.lib.UInt64Array>
[
0,
3,
3,
5
]
Explanation
50 < all → index 0
200 equals first occurrence → index 1 (side='left')
200 for side='right' → index 3
250 between 200 and 300 → index 3
400 > all values → index 5
Null Handling
Nulls in the first (sorted) array
If nulls are clustered first:
sorted_array = pa.array([null, 200, 300, 300])
to_search = pa.array()
Expected:
side='left' → [1, 1, 2, 4]
side='right' → [1, 2, 2, 4]
If nulls are clustered last:
sorted_array = pa.array([200, 300, 300, null, null])
Expected:
side='left' → [0, 0, 1, 3]
side='right' → [0, 1, 1, 3]
Nulls in the second (to_search) array
Two options:
- Emit nulls for null search keys.
- Match nulls within the null portion of the sorted array.
Example for (2):
>>> sorted_array = pa.array([null, null, 200]) # nulls first
>>> to_search = pa.array([null, 100, 300])
>>> pc.search_sorted(sorted_array, to_search, side='left')
<pyarrow.lib.UInt64Array object at 0x7dc6c1bcefe0>
[
0,
2,
3
]
>>> pc.search_sorted(sorted_array, to_search, side='right')
<pyarrow.lib.UInt64Array object at 0x7dc6c1bcefe0>
[
1,
2,
3
]
>>> sorted_array = pa.array([200, null, null]) # nulls last
>>> pc.search_sorted(sorted_array, to_search, side='left')
<pyarrow.lib.UInt64Array object at 0x7dc6c1bcefe0>
[
1,
0,
1
]
>>> pc.search_sorted(sorted_array, to_search, side='left')
<pyarrow.lib.UInt64Array object at 0x7dc6c1bcefe0>
[
3,
0,
1
]
Requirements
- Implement for all primitive types (
int*, float*, boolean, etc.)
- Support run-end encoded arrays
- Handle both
side='left' and side='right'
- Define consistent behavior for null placement
- Return a
UInt64Array of insertion indices
References
Component(s)
Python, C++
Describe the enhancement requested
Implement a kernel that replicates the semantics of NumPy’s
searchsortedfunction (without thesorterargument).The kernel should support all primitive types as well as run-end encoded arrays.
Example API
Explanation
50< all → index0200equals first occurrence → index1(side='left')200forside='right'→ index3250between200and300→ index3400> all values → index5Null Handling
Nulls in the first (sorted) array
If nulls are clustered first:
Expected:
side='left'→[1, 1, 2, 4]side='right'→[1, 2, 2, 4]If nulls are clustered last:
Expected:
side='left'→[0, 0, 1, 3]side='right'→[0, 1, 1, 3]Nulls in the second (to_search) array
Two options:
Example for (2):
Requirements
int*,float*,boolean, etc.)side='left'andside='right'UInt64Arrayof insertion indicesReferences
searchsorteddocumentationComponent(s)
Python, C++