Description
It seems it's still best practices to have a roughly 500 token count split for the embeddings. And we are also splitting on sentences, which has been good. But what we haven't done is had overlap. And generally speaking, it's good to have 15 to 20% overlap between splits so that you maintain context. This does that.
Acceptance Criteria
- Only affects future embeddings
- existing Embeddings continue to work
Priority
None
Additional Context
No response
Description
It seems it's still best practices to have a roughly 500 token count split for the embeddings. And we are also splitting on sentences, which has been good. But what we haven't done is had overlap. And generally speaking, it's good to have 15 to 20% overlap between splits so that you maintain context. This does that.
Acceptance Criteria
Priority
None
Additional Context
No response