Conversation
|
@greptile |
| using DistanceFunc = | ||
| std::function<void(const void *m, const void *q, size_t dim, float *out)>; | ||
| using BatchDistanceFunc = std::function<void( | ||
| const void **m, const void *q, size_t num, size_t dim, float *out)>; |
There was a problem hiding this comment.
size_t → int narrowing in function-type aliases
DistanceFunc declares size_t dim and BatchDistanceFunc declares size_t num, size_t dim, but the concrete implementations (l2_int8_distance_avx512_vnni, l2_int8_batch_distance_avx512_vnni) accept int dim and int n. When these function pointers are stored inside the std::function wrappers and later called with size_t arguments, the std::function machinery performs a silent narrowing conversion from size_t to int at the call site. For any dimension or batch count that exceeds INT_MAX the values will be silently truncated, producing wrong results without any diagnostic.
Either align the concrete function signatures to use size_t:
void l2_int8_distance_avx512_vnni(const void *a, const void *b, size_t dim,
float *distance);
void l2_int8_batch_distance_avx512_vnni(const void *const *vectors,
const void *query, size_t n, size_t dim,
float *distances);or change the aliases to use int to match the actual implementations:
using DistanceFunc =
std::function<void(const void *m, const void *q, int dim, float *out)>;
using BatchDistanceFunc = std::function<void(
const void **m, const void *q, int num, int dim, float *out)>;| setup_compiler_march_for_x86(TURBO_MARCH_FLAG_SSE TURBO_MARCH_FLAG_AVX2 TURBO_MARCH_FLAG_AVX512) | ||
| set_source_files_properties( | ||
| ${CMAKE_CURRENT_SOURCE_DIR}/euclidean/avx512_impl.cc | ||
| PROPERTIES | ||
| COMPILE_FLAGS "${TURBO_MARCH_FLAG_AVX512}" | ||
| ) |
There was a problem hiding this comment.
TURBO_MARCH_FLAG_AVX2 computed but never applied
setup_compiler_march_for_x86 is called with all three flag variables, and TURBO_MARCH_FLAG_AVX512 is correctly applied to avx512_impl.cc. However TURBO_MARCH_FLAG_AVX2 is never used in any set_source_files_properties call — avx2_impl.cc (and avx2_impl.h) are currently empty placeholders, but when AVX2 code is eventually added there will be no compile-time reminder to apply the flag. Consider adding a commented-out placeholder now so it is not forgotten:
# set_source_files_properties(
# ${CMAKE_CURRENT_SOURCE_DIR}/euclidean/avx2_impl.cc
# PROPERTIES
# COMPILE_FLAGS "${TURBO_MARCH_FLAG_AVX2}"
# )
Greptile Summary
This PR introduces a "turbo" acceleration layer (
zvec_turbo) that provides runtime-dispatched, architecture-optimised distance kernels for int8 SquaredEuclidean search, initially targeting AVX-512 VNNI. On VNNI-capable hosts the turbo functions are returned as a fast path inQuantizedIntegerMetric, transparently replacing the genericDistanceMatrixComputeandBaseDistanceBatchWithScoreUnquantizedimplementations.Key changes:
src/turbo/library with AVX-512 VNNI kernels (l2_int8_distance_avx512_vnni,l2_int8_batch_distance_avx512_vnni,l2_int8_query_preprocess_avx512_vnni) and empty AVX2 stubs for future work.cmake/option.cmakedrops the global-marchfor x86 inAUTO_DETECT_ARCHmode; per-file march flags are now managed inside each library's ownCMakeLists.txt.QuantizedIntegerMetricis updated to try the turbo dispatch first and fall through to the original implementation when no turbo path is available.DistanceFunc/BatchDistanceFunctype aliases inturbo.hdeclaresize_tfor dimension and count parameters, while the concrete implementations acceptint— creating a silent narrowing conversion when thestd::functionwrappers are invoked (see inline comment).TURBO_MARCH_FLAG_AVX2is computed bysetup_compiler_march_for_x86but is never applied toavx2_impl.cc, meaning future AVX2 code added to that file will silently compile without the required-march=core-avx2flag.Confidence Score: 3/5
size_t→intnarrowing mismatch in the public API and several items carried over from prior review threads remain unresolved.QuantizedIntegerMetricfollows a clean try-turbo-then-fallback pattern. However theDistanceFunc/BatchDistanceFunctype aliases usesize_twhile the implementations acceptint, introducing a silent narrowing conversion that can produce wrong results for any caller that passes a dimension or batch count assize_t. Additionally, several issues raised in prior review threads (bias correction, stray double-comment markers, missing trailing newline) have not yet been fully resolved, keeping overall confidence low.src/include/zvec/turbo/turbo.h(size_t/int mismatch in function-type aliases) andsrc/turbo/euclidean/avx512_impl.cc(ongoing items from prior threads).Important Files Changed
Sequence Diagram
sequenceDiagram participant Caller participant QIM as QuantizedIntegerMetric participant TD as turbo dispatch<br/>(turbo.cc) participant CPU as CpuFeatures participant VNNI as avx512_impl.cc participant Legacy as Legacy impl Caller->>QIM: get_query_preprocess_func() QIM->>TD: get_query_preprocess_func(kSquaredEuclidean, kInt8, kDefault) TD->>CPU: AVX512_VNNI? alt AVX-512 VNNI available CPU-->>TD: true TD-->>QIM: l2_int8_query_preprocess_avx512_vnni QIM-->>Caller: turbo preprocess func Caller->>VNNI: preprocess(query, dim)<br/>(int8 → uint8, first d bytes) else no VNNI CPU-->>TD: false TD-->>QIM: nullptr QIM-->>Caller: legacy preprocess func end Caller->>QIM: batch_distance() QIM->>TD: get_batch_distance_func(kSquaredEuclidean, kInt8, kDefault) TD->>CPU: AVX512_VNNI? alt AVX-512 VNNI available CPU-->>TD: true TD-->>QIM: l2_int8_batch_distance_avx512_vnni QIM-->>Caller: turbo batch func Caller->>VNNI: batch_distance(vectors[], query, n, dim, out[])<br/>(VNNI inner products + bias correction + L2 formula) else no VNNI CPU-->>TD: false TD-->>QIM: nullptr QIM-->>Caller: legacy batch func (BaseDistanceBatchWithScoreUnquantized) Caller->>Legacy: ComputeBatch(...) end Caller->>QIM: distance_matrix(m, n) QIM->>TD: get_distance_func(kSquaredEuclidean, kInt8, kDefault) TD->>CPU: AVX512_VNNI? alt AVX-512 VNNI available CPU-->>TD: true TD-->>QIM: l2_int8_distance_avx512_vnni QIM-->>Caller: turbo single-distance func Caller->>VNNI: distance(a, b, dim, out)<br/>(sign-trick inner product + L2 formula) else no VNNI CPU-->>TD: false TD-->>QIM: nullptr QIM-->>Caller: legacy DistanceMatrixCompute Caller->>Legacy: compute(m, n) endLast reviewed commit: 67663e0