Further speeding up the quantization process

I previously contributed a pull request that reduced the runtime of the main clustering algorithm from over two hours to just six minutes for the Llama 2 7B model (#60). In the 'Further Suggestions' section of that PR, I mentioned potential optimizations by exploiting the 1D nature of the task.

I'm excited to share that I've developed a Python package, [flash1dkmeans](https://github.com/SyphonArch/flash1dkmeans), which implements a faster 1D K-means algorithm. This package is now part of the [Any-Precision LLM](https://github.com/SNU-ARC/any-precision-llm) project, a variable bit-rate quantization scheme using SqueezeLLM as the seed model. With this new implementation, we've managed to further reduce the execution time for SqueezeLLM to **38 seconds** on an i9-13900K machine, achieving a further tenfold speed increase.

If interested in integrating this speed enhancement, you can refer to [the code in Any-Precision LLM](https://github.com/SNU-ARC/any-precision-llm/blob/2002c669a368d5845b4f3510d4b3078e9fc6359e/any_precision/quantization/quantize.py#L148), as an example where we use the package to create the seed model. For maximum performance gains, consider accelerating the caller function with `@numba.njit(parallel=True)`. However, even using the standard multiprocessing pool should yield significant improvements.

This package can serve as an almost drop-in replacement for sklearn's K-means if you're looking to speed up SqueezeLLM further. Of course, sticking with sklearn for better transparency is perfectly fine too. I wanted to share these findings, as your work helped create ours :+1: .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Further speeding up the quantization process #67

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Further speeding up the quantization process #67

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions