Accelerating GPU indexes in Faiss with NVIDIA cuVS

Meta and NVIDIA collaborated to accelerate vector search on GPUs by integrating NVIDIA cuVS into Faiss v1.10, Meta’s open source library for similarity search.
This new implementation of cuVS will be more performant than classic GPU-accelerated search in some areas.
For inverted file (IVF) indexing, NVIDIA cuVS outperforms classical GPU-accelerated IVF build times by up to 4.7x; and search latency is reduced by as much as 8.1x.
For graph indexing, CUDA ANN Graph (CAGRA) outperforms CPU Hierarchical Navigable Small World graphs (HNSW) build times by up to 12.3x; and search latency is reduced by as much as 4.7x.

The Faiss library

The Faiss library is an open source library, developed by Meta FAIR, for efficient vector search and clustering of dense vectors. Faiss pioneered vector search on GPUs, as well as the ability to seamlessly switch between GPUs and CPUs. It has made a lasting impact in both research and industry, being used as an integrated library in several databases (e.g., Milvus and OpenSearch), machine learning libraries, data processing libraries, and AI workflows. Faiss is also used heavily by researchers and data scientists as a standalone library, often paired with PyTorch.

Collaboration with NVIDIA

Three years ago, Meta and NVIDIA worked together to enhance the capabilities of vector search technology and to accelerate vector search on GPUs. Previously, in 2016, Meta had incorporated high performing vector search algorithms made for NVIDIA GPUs: GpuIndexFlat; GpuIndexIVFFlat; GpuIndexIVFPQ. After the partnership, NVIDIA rapidly contributed GpuIndexCagra, a state-of-the art graph-based index designed specifically for GPUs. In its latest release, Faiss 1.10.0 officially includes these algorithms from the NVIDIA cuVS library.

Faiss 1.10.0 also includes a new conda package that unlocks the ability to choose between the classic Faiss GPU implementations and the newer NVIDIA cuVS algorithms, making it easy for users to switch between GPU and CPU.

Benchmarking

The following benchmarks were conducted using the cuVS-bench tool.

We measured:

A tall, slender image dataset: A subset of 100 million vectors from the Deep1B dataset by 96 dimensions.
A short, wide dataset of text embeddings: 5 million vector embeddings, curated using the OpenAI text-embedding-ada-002 model.

Tests for index build times and search latency were conducted on an NVIDIA H100 GPU and compared to an Intel Xeon Platinum 8480CL system. Results are reported in the tables below at 95% recall along the pareto frontiers for k=10 nearest neighbors.

Build time (95% recall@10)

Index		Embeddings 100M x 96 (seconds)		Embeddings 5M x 1536 (seconds)
Faiss Classic	Faiss cuVS	Faiss Classic	Faiss cuVS	Faiss Classic	Faiss cuVS
IVF Flat	IVF Flat	101.4	37.9 (2.7x)	24.4	15.2 (1.6x)
IVF PQ	IVF PQ	168.2	72.7 (2.3x)	42.0	9.0 (4.7x)
HNSW (CPU)	CAGRA	3322.1	518.5 (6.4x)	1106.1	89.7 (12.3x)

Table 1: Index build times for Faiss-classic and Faiss-cuVS in seconds (with NVIDIA cuVS speedups in parentheses).

Search latency (95% recall@10)

Index		Embeddings 100M x 96 (milliseconds)		Embeddings 5M x 1536 (milliseconds)
Faiss Classic	Faiss cuVS	Faiss Classic	Faiss cuVS	Faiss Classic	Faiss cuVS
IVF Flat	IVF Flat	0.75	0.39 (1.9x)	1.98	1.14 (1.7x)
IVF PQ	IVF PQ	0.49	0.17 (2.9x)	1.78	0.22 (8.1x)
HNSW (CPU)	CAGRA	0.56	0.23 (2.4x)	0.71	0.15 (4.7x)

Table 2: Online (i.e., one at a time) search query latency for Faiss-classic and Faiss-cuVS in milliseconds (with NVIDIA cuVS speedups in parentheses).

Looking forward

The emergence of state-of-the-art NVIDIA GPUs has revolutionized the field of vector search, enabling high recall and lightning-fast search speeds. The integration of Faiss and cuVS will continue to incorporate state-of-the-art algorithms, and we look forward to unlocking new innovations in this partnership between Meta and NVIDIA.

Read here for more details about NVIDIA cuVS.

The post Accelerating GPU indexes in Faiss with NVIDIA cuVS appeared first on Engineering at Meta.