Google Research details TurboQuant, a quantization algorithm to enable massive compression of LLMs and vector search engines without sacrificing accuracy
We introduce a set of advanced theoretically grounded quantization algorithms that enable massive compression for large language models and vector search engines.