Adding pgvector to Local PostgreSQL (Homebrew)
What is pgvector?
pgvector is a PostgreSQL extension for vector similarity search. It lets you store embeddings (e.g., from OpenAI’s text-embedding models) directly in Postgres and query them using cosine similarity, L2 distance, or inner product.
Prerequisites
- PostgreSQL installed via Homebrew (
brew install postgresql@17) - PostgreSQL service running (
brew services start postgresql@17)
Install pgvector
brew install pgvector
Enable the Extension
Connect to your database and create the extension:
psql -d your_database_name -c "CREATE EXTENSION IF NOT EXISTS vector;"
Verify it’s installed:
psql -d your_database_name -c "SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';"
Expected output:
extname | extversion
---------+------------
vector | 0.8.0
Create a Table with Vector Columns
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536)
);
The 1536 dimension matches OpenAI’s text-embedding-3-small model. Adjust for your embedding model:
| Model | Dimensions |
|---|---|
| text-embedding-3-small | 1536 |
| text-embedding-3-large | 3072 |
| text-embedding-ada-002 | 1536 |
Insert and Query Vectors
Insert an embedding:
INSERT INTO documents (content, embedding)
VALUES ('Hello world', '[0.1, 0.2, 0.3, ...]'::vector);
Insert a test row with random 1536-dimensional vector:
INSERT INTO documents (content, embedding)
VALUES ('test document', ('[' || array_to_string(ARRAY(SELECT random() FROM generate_series(1, 1536)), ',') || ']')::vector);
Find the 5 most similar documents using cosine distance:
SELECT id, content, embedding <=> '[0.1, 0.2, 0.3, ...]'::vector AS distance
FROM documents
ORDER BY distance
LIMIT 5;
Find documents similar to an existing document:
SELECT id, content, embedding <=> (SELECT embedding FROM documents WHERE id = 1) AS distance
FROM documents
ORDER BY embedding <=> (SELECT embedding FROM documents WHERE id = 1)
LIMIT 5;
Distance Operators
| Operator | Distance Metric |
|---|---|
<-> | L2 (Euclidean) |
<=> | Cosine |
<#> | Inner product (negative) |
Add an Index for Performance
For large tables, add an index. IVFFlat is good for most cases:
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
For higher recall, use HNSW (slower to build, faster to query):
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops);
Using with Python (psycopg2)
pip install pgvector
import psycopg2
from pgvector.psycopg2 import register_vector
conn = psycopg2.connect(host="localhost", dbname="your_database_name", user="postgres")
register_vector(conn)
cur = conn.cursor()
# Insert
embedding = [0.1, 0.2, ...] # from your embedding model
cur.execute("INSERT INTO documents (content, embedding) VALUES (%s, %s)", ("hello", embedding))
conn.commit()
# Query similar documents
cur.execute("SELECT id, content FROM documents ORDER BY embedding <=> %s LIMIT 5", (embedding,))
results = cur.fetchall()
Cloud SQL (GCP)
On Cloud SQL PostgreSQL 15+, pgvector is available out of the box — no database flags needed. Just connect and run:
CREATE EXTENSION vector;
Verify it’s available:
SELECT * FROM pg_available_extensions WHERE name = 'vector';
Using with Alembic
In an Alembic migration, use raw SQL since SQLAlchemy doesn’t natively support the vector type:
def upgrade() -> None:
op.execute("CREATE EXTENSION IF NOT EXISTS vector")
op.execute("""
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536)
)
""")
def downgrade() -> None:
op.drop_table("documents")
op.execute("DROP EXTENSION IF EXISTS vector")
Troubleshooting
Extension not found
If CREATE EXTENSION vector fails, make sure pgvector was installed for the correct PostgreSQL version:
brew install pgvector
brew services restart postgresql@17
Check available extensions
SELECT * FROM pg_available_extensions WHERE name = 'vector';
If it doesn’t show up, the pgvector shared library isn’t in PostgreSQL’s extension directory. Reinstall with brew reinstall pgvector.