Skip to content

Conversation

@ahuber21
Copy link
Contributor

@ahuber21 ahuber21 commented Jan 27, 2026

Increments runtime API version to v1, because LeanVecTrainingData::build() changed significantly for OOD suport.

Unchanged entities from v0 are aliased in v1.

Successful runtime lib build pending changes from svs prebuilt binaries.

@ahuber21 ahuber21 requested a review from mihaic as a code owner January 27, 2026 10:48
virtual ~LeanVecTrainingData();

/* Build LeanVec training data (compression matrices) from the provided
* data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call "Leanvec transformation matrices" instead of training data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LeanVecTrainingData was purposefully chosen to be generic. In hindsight, I think we could have been more specific. But changing this would require an API update and therefore conflicts with your suggestion in the other comment.

size_t n,
const float* x,
size_t n_train,
const float* q,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep these two new arguments in the end and default initialize to 0 and nullptr? In that case, the older v0 calls to build will still work without any modifications. Also "n_train" is kind of confusing as both data/queries are used for training? How about we explicitly say "n_queries"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly the discussion I wanted to have. Your suggestion would indeed allow us to stick with v0 (only that we'd need a copy of the function and we can't use default values, due to ABI compability, but that's just a detail).

In my opinion, the order (n_data, const float* data, n_query, const float* queries, size_t leanvec_dims) just makes more sense than (n_data, const float* data, size_t leanvec_dims, n_query, const float* queries). But does it justify bumping to v0?

Your preference would be to stick with v0 for as long as possible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants