Accelerating GPU indexes in Faiss with NVIDIA cuVS

The Faiss library

The Faiss library is an open source library, developed by Meta FAIR, for efficient vector search and clustering of dense vectors. Faiss pioneered vector search on GPUs, as well as the ability to seamlessly switch between GPUs and CPUs. It has made a lasting impact in both research and industry, being used as an integrated library in several databases (e.g., Milvus and OpenSearch), machine learning libraries, data processing libraries, and AI workflows. Faiss is also used heavily by researchers and data scientists as a standalone library, often paired with PyTorch

Collaboration with NVIDIA

Three years ago, Meta and NVIDIA worked together to enhance the capabilities of vector search technology and to accelerate vector search on GPUs. Previously, in 2016, Meta had incorporated high performing vector search algorithms made for NVIDIA GPUs: GpuIndexFlat; GpuIndexIVFFlat; GpuIndexIVFPQ. After the partnership, NVIDIA rapidly contributed GpuIndexCagra, a state-of-the art graph-based index designed specifically for GPUs. In its latest release, Faiss 1.10.0 officially includes these algorithms from the NVIDIA cuVS library

Faiss 1.10.0 also includes a new conda package that unlocks the ability to choose between the classic Faiss GPU implementations and the newer NVIDIA cuVS algorithms, making it easy for users to switch between GPU and CPU.

Benchmarking

The following benchmarks were conducted using the cuVS-bench tool. 

We measured:

Tests for index build times and search latency were conducted on an NVIDIA H100 GPU and compared to an Intel Xeon Platinum 8480CL system. Results are reported in the tables below at 95% recall along the pareto frontiers for k=10 nearest neighbors. 

Build time (95% recall@10)

Index

Embeddings
100M x 96
(seconds)

Embeddings
5M x 1536
(seconds)

Faiss ClassicFaiss cuVSFaiss Classic  Faiss cuVSFaiss ClassicFaiss cuVS
IVF FlatIVF Flat101.437.9 (2.7x)24.415.2 (1.6x)
IVF PQIVF PQ168.272.7 (2.3x)42.09.0 (4.7x)
HNSW (CPU)CAGRA3322.1518.5 (6.4x)1106.189.7 (12.3x)

Table 1: Index build times for Faiss-classic and Faiss-cuVS in seconds (with NVIDIA cuVS speedups in parentheses).

Search latency (95% recall@10)

Index

Embeddings
100M x 96
(milliseconds)

Embeddings
5M x 1536
(milliseconds)

Faiss ClassicFaiss cuVSFaiss ClassicFaiss cuVSFaiss ClassicFaiss cuVS
IVF FlatIVF Flat0.750.39 (1.9x)1.981.14 (1.7x)
IVF PQIVF PQ0.490.17 (2.9x)1.780.22 (8.1x)
HNSW (CPU)CAGRA0.560.23 (2.4x)0.710.15 (4.7x)

Table 2: Online (i.e., one at a time) search query latency for Faiss-classic and Faiss-cuVS in milliseconds (with NVIDIA cuVS speedups in parentheses).

Looking forward

The emergence of state-of-the-art NVIDIA GPUs has revolutionized the field of vector search, enabling high recall and lightning-fast search speeds. The integration of Faiss and cuVS will continue to incorporate state-of-the-art algorithms, and we look forward to unlocking new innovations in this partnership between Meta and NVIDIA. 

Read here for more details about NVIDIA cuVS.

The post Accelerating GPU indexes in Faiss with NVIDIA cuVS appeared first on Engineering at Meta.