Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added new impls, docs and changeset for pg vector store #2781

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/sharp-drinks-hang.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@mastra/pg': minor
---

Added new operations implementations for MastraVector interface methods in pg vector store
119 changes: 105 additions & 14 deletions docs/src/pages/docs/reference/rag/pg.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ description: Documentation for the PgVector class in Mastra, which provides vect
# PG Vector Store

The PgVector class provides vector search using [PostgreSQL](https://www.postgresql.org/) with [pgvector](https://github.com/pgvector/pgvector) extension.
It provides robust vector similarity search capabilities within your existing PostgreSQL database.
It provides robust vector similarity search capabilities within your existing PostgreSQL database.

## Constructor Options

Expand Down Expand Up @@ -76,17 +76,20 @@ It provides robust vector similarity search capabilities within your existing Po
{
name: "flat",
type: "flat",
description: "Sequential scan (no index) that performs exhaustive search.",
description:
"Sequential scan (no index) that performs exhaustive search.",
},
{
name: "ivfflat",
type: "ivfflat",
description: "Clusters vectors into lists for approximate search.",
description:
"Clusters vectors into lists for approximate search.",
},
{
name: "hnsw",
type: "hnsw",
description: "Graph-based index offering fast search times and high recall.",
description:
"Graph-based index offering fast search times and high recall.",
},
],
},
Expand All @@ -104,9 +107,10 @@ It provides robust vector similarity search capabilities within your existing Po
{
name: "lists",
type: "number",
description: "Number of lists. If not specified, automatically calculated based on dataset size. (Minimum 100, Maximum 4000)",
description:
"Number of lists. If not specified, automatically calculated based on dataset size. (Minimum 100, Maximum 4000)",
isOptional: true,
}
},
],
},
],
Expand All @@ -123,7 +127,8 @@ It provides robust vector similarity search capabilities within your existing Po
{
name: "m",
type: "number",
description: "Maximum number of connections per node (default: 8)",
description:
"Maximum number of connections per node (default: 8)",
isOptional: true,
},
{
Expand All @@ -134,14 +139,15 @@ It provides robust vector similarity search capabilities within your existing Po
},
],
},
]
],
},
]}
/>

#### Memory Requirements

HNSW indexes require significant shared memory during construction. For 100K vectors:

- Small dimensions (64d): ~60MB with default settings
- Medium dimensions (256d): ~180MB with default settings
- Large dimensions (384d+): ~250MB+ with default settings
Expand Down Expand Up @@ -291,6 +297,90 @@ interface PGIndexStats {
]}
/>

### updateIndexById()

<PropertiesTable
content={[
{
name: "indexName",
type: "string",
description: "Name of the index containing the vector",
},
{
name: "id",
type: "string",
description: "ID of the vector to update",
},
{
name: "update",
type: "object",
description: "Update parameters",
properties: [
{
type: "object",
parameters: [
{
name: "vector",
type: "number[]",
description: "New vector values",
isOptional: true,
},
{
name: "metadata",
type: "Record<string, any>",
description: "New metadata values",
isOptional: true,
},
],
},
],
},
]}
/>

Updates an existing vector by ID. At least one of vector or metadata must be provided.

```typescript copy
// Update just the vector
await pgVector.updateIndexById("my_vectors", "vector123", {
vector: [0.1, 0.2, 0.3],
});

// Update just the metadata
await pgVector.updateIndexById("my_vectors", "vector123", {
metadata: { label: "updated" },
});

// Update both vector and metadata
await pgVector.updateIndexById("my_vectors", "vector123", {
vector: [0.1, 0.2, 0.3],
metadata: { label: "updated" },
});
```

### deleteIndexById()

<PropertiesTable
content={[
{
name: "indexName",
type: "string",
description: "Name of the index containing the vector",
},
{
name: "id",
type: "string",
description: "ID of the vector to delete",
},
]}
/>

Deletes a single vector by ID from the specified index.

```typescript copy
await pgVector.deleteIndexById("my_vectors", "vector123");
```

### disconnect()

Closes the database connection pool. Should be called when done using the store.
Expand Down Expand Up @@ -327,21 +417,21 @@ await pgVector.buildIndex("my_vectors", "cosine", {
type: "hnsw",
hnsw: {
m: 8,
efConstruction: 32
}
efConstruction: 32,
},
});

// Define IVF index
await pgVector.buildIndex("my_vectors", "cosine", {
type: "ivfflat",
ivf: {
lists: 100,
}
lists: 100,
},
});

// Define flat index
await pgVector.buildIndex("my_vectors", "cosine", {
type: "flat"
type: "flat",
});
```

Expand Down Expand Up @@ -383,4 +473,5 @@ try {
- Rebuild indexes periodically to maintain efficiency, especially after significant data changes.

### Related
- [Metadata Filters](./metadata-filters)

- [Metadata Filters](./metadata-filters)
131 changes: 131 additions & 0 deletions stores/pg/src/vector/index.test.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import { describe, it, expect, beforeAll, afterAll, beforeEach, afterEach, vi } from 'vitest';

import { PgVector } from '.';
import { type QueryResult } from '@mastra/core';

describe('PgVector', () => {
let vectorDB: PgVector;
Expand Down Expand Up @@ -232,6 +233,136 @@ describe('PgVector', () => {
});
});

describe('updates', () => {
const testVectors = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
];

beforeEach(async () => {
await vectorDB.createIndex({ indexName: testIndexName, dimension: 3 });
});

afterEach(async () => {
await vectorDB.deleteIndex(testIndexName);
});

it('should update the vector by id', async () => {
const ids = await vectorDB.upsert({ indexName: testIndexName, vectors: testVectors });
expect(ids).toHaveLength(3);

const idToBeUpdated = ids[0];
const newVector = [1, 2, 3];
const newMetaData = {
test: 'updates',
};

const update = {
vector: newVector,
metadata: newMetaData,
};

await vectorDB.updateIndexById(testIndexName, idToBeUpdated, update);

const results: QueryResult[] = await vectorDB.query({
indexName: testIndexName,
queryVector: newVector,
topK: 2,
includeVector: true,
});
expect(results[0]?.id).toBe(idToBeUpdated);
expect(results[0]?.vector).toEqual(newVector);
expect(results[0]?.metadata).toEqual(newMetaData);
});

it('should only update the metadata by id', async () => {
const ids = await vectorDB.upsert({ indexName: testIndexName, vectors: testVectors });
expect(ids).toHaveLength(3);

const idToBeUpdated = ids[0];
const newMetaData = {
test: 'updates',
};

const update = {
metadata: newMetaData,
};

await vectorDB.updateIndexById(testIndexName, idToBeUpdated, update);

const results: QueryResult[] = await vectorDB.query({
indexName: testIndexName,
queryVector: testVectors[0],
topK: 2,
includeVector: true,
});
expect(results[0]?.id).toBe(idToBeUpdated);
expect(results[0]?.vector).toEqual(testVectors[0]);
expect(results[0]?.metadata).toEqual(newMetaData);
});

it('should only update vector embeddings by id', async () => {
const ids = await vectorDB.upsert({ indexName: testIndexName, vectors: testVectors });
expect(ids).toHaveLength(3);

const idToBeUpdated = ids[0];
const newVector = [1, 2, 3];

const update = {
vector: newVector,
};

await vectorDB.updateIndexById(testIndexName, idToBeUpdated, update);

const results: QueryResult[] = await vectorDB.query({
indexName: testIndexName,
queryVector: newVector,
topK: 2,
includeVector: true,
});
expect(results[0]?.id).toBe(idToBeUpdated);
expect(results[0]?.vector).toEqual(newVector);
});

it('should throw exception when no updates are given', () => {
expect(vectorDB.updateIndexById(testIndexName, 'id', {})).rejects.toThrow('No updates provided');
});
});

describe('deletes', () => {
const testVectors = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
];

beforeEach(async () => {
await vectorDB.createIndex({ indexName: testIndexName, dimension: 3 });
});

afterEach(async () => {
await vectorDB.deleteIndex(testIndexName);
});

it('should delete the vector by id', async () => {
const ids = await vectorDB.upsert({ indexName: testIndexName, vectors: testVectors });
expect(ids).toHaveLength(3);
const idToBeDeleted = ids[0];

await vectorDB.deleteIndexById(testIndexName, idToBeDeleted);

const results: QueryResult[] = await vectorDB.query({
indexName: testIndexName,
queryVector: [1.0, 0.0, 0.0],
topK: 2,
});

expect(results).toHaveLength(2);
expect(results.map(res => res.id)).not.toContain(idToBeDeleted);
});
});

describe('Basic Query Operations', () => {
['flat', 'hnsw', 'ivfflat'].forEach(indexType => {
const indexName = `test_query_2_${indexType}`;
Expand Down
Loading
Loading