前言

Github：https://github.com/HealerJean

博客：http://blog.healerjean.com

一、核心概念

1、什么是向量数据库

当你输入一个查询（例如“一只四条腿的宠物”），系统将其转换为向量，然后在数据库中寻找“距离”最近的向量（例如存储的“狗”或“猫”的向量），而不是寻找包含“四条腿”这个关键词的记录。

普通数据库：查 “名字 = 张三” → 精确匹配。
向量数据库：查 “语义相似的内容” → 相似性搜索。

2、什么是向量嵌入（`Embedding`）

把文本变成计算机能理解的数字数组。

句子意思越像 → 向量距离越近
由 EmbeddingModel 生成
向量数据库只存向量，不生成向量

3、关键组件：`Document` 与 `EmbeddingModel`

Document（文档）：Spring AI 中的数据单元。它包含：
- Content：文本内容（String）。
- Metadata：元数据（Key-Value 对，如 filename, author, date）。
EmbeddingModel（嵌入模型）：负责将文本（Document）转换为数字数组（向量）的模型。
- 例如 text-embedding-ada-002。
- 注意：向量数据库只存向量，不生成向量；生成向量是 EmbeddingModel 的职责。

4、核心操作

1）`VectorStore` 核心接口（读写两用）

public interface VectorStore extends VectorStoreRetriever, DocumentWriter {
    void add(List<Document> docs);      // 写入
    void delete(List<String> ids);      // 删除
    List<Document> similaritySearch(...)// 搜索
}

3）`SearchRequest` 搜索请求（最关键）

控制搜索精度：

参数	说明
`topK`：	返回最相似的 N 条
`similarityThreshold`：	相似度阈值（0~1）
`filterExpression`：	元数据过滤（类似 SQL WHERE）

4）`Document` 文档结构

参数	说明
id：	唯一 ID
content：	文本内容
metadata：	元数据（可过滤）

二、核心接口和方法

1、批处理策略 `BatchingStrategy` ：防止 `Token`溢出

这是文档中非常重要的一个高级特性。LLM 有上下文窗口限制（例如 8191 tokens）。如果你试图一次性嵌入一本 1000 页的书，会直接报错。

1）`TokenCountBatchingStrategy`

Spring AI 提供了一个名为 TokenCountBatchingStrategy 的默认实现。此策略根据文档的令牌计数对文档进行批处理，它会估算文档的 Token 数量，并将其切分成小批次发送给模型，确保每个批次不超过计算出的最大输入令牌计数。

public interface BatchingStrategy {
    List<List<Document>> batch(List<Document> documents);
}

a、`TokenCountBatchingStrategy` 的主要特点

使用 OpenAI 的最大输入令牌计数（8191）作为默认上限。
包含一个保留百分比（默认 10%），为潜在的开销提供缓冲。
计算实际最大输入令牌计数为：actualMaxToken = 最大 Token × (1 - 保留百分比)

该策略估计每个文档的令牌计数，将它们分组到不超过最大输入令牌计数的批次中，如果单个文档超过此限制，则会抛出异常。

b、自定义 `TokenCountBatchingStrategy`

EncodingType.CL100K_BASE：指定用于分词的编码类型。此编码类型由 JTokkitTokenCountEstimator 用于准确估计令牌计数。
8000：设置最大输入令牌计数。此值应小于或等于嵌入模型的最大上下文窗口大小。
0.1：设置保留百分比。从最大输入令牌计数中保留的令牌百分比。这为处理过程中潜在的令牌计数增加创建了一个缓冲区。

@Configuration
public class EmbeddingConfig {
    @Bean
    public BatchingStrategy customTokenCountBatchingStrategy() {
        return new TokenCountBatchingStrategy(
            EncodingType.CL100K_BASE,  // Specify the encoding type
            8000,                      // Set the maximum input token count
            0.1                        // Set the reserve percentage
        );
    }
}

2）自动截断

启用自动截断时，将批处理策略的最大输入令牌计数设置得远高于模型的实际限制。这可以防止批处理策略因大文档而引发异常，从而允许嵌入模型在内部处理截断。

注意：虽然自动截断可以防止错误，但它可能导致不完整的嵌入。长文档末尾的重要信息可能会丢失。如果您的应用程序要求嵌入所有内容，请在嵌入之前将文档分成更小的块。

a、模型支持：某些模型（如 `Vertex` `AI`）支持此功能。

如果开启，模型会自动截断超长文本。
此时，你需要在配置中设置一个极高的 Token 限制（如 132900），骗过批处理策略的检查，让模型自己去处理截断。

b、工作原理

TokenCountBatchingStrategy 检查任何单个文档是否超过配置的最大值，如果超过则抛出 IllegalArgumentException。
通过在批处理策略中设置一个非常高的限制，我们确保此检查永远不会失败。
超出模型限制的文档或批次会通过嵌入模型的自动截断功能进行静默截断和处理。

c、最佳实践：使用自动截断时：

将批处理策略的最大输入令牌计数设置为至少比模型实际限制大 5-10 倍，以避免批处理策略过早抛出异常。
监控日志中嵌入模型的截断警告（注意：并非所有模型都记录截断事件）。
考虑静默截断对嵌入质量的影响。
使用示例文档进行测试，以确保截断的嵌入仍然符合您的要求。
为将来的维护者记录此配置，因为它是非标准的。

2、`VectorStoreRetriever` 进行检索操作

VectorStoreRetriever 接口提供了向量存储的只读视图，仅公开相似性搜索功能。这遵循最小权限原则，在 RAG（检索增强生成）应用程序中特别有用，在这些应用程序中，您只需要检索文档而无需修改底层数据。

1）`VectorStoreRetriever` 的好处

关注点分离：清晰地分离读操作和写操作。
接口隔离：只需要检索功能的客户端不会暴露于修改方法。
函数式接口：对于简单的用例，可以通过 Lambda 表达式或方法引用实现。
减少依赖：只需要执行搜索的组件不需要依赖完整的 VectorStore 接口。

2）示例用法

当您只需要执行相似性搜索时，可以直接使用 VectorStoreRetriever。

@Service
public class DocumentRetrievalService {

    private final VectorStoreRetriever retriever;

    public DocumentRetrievalService(VectorStoreRetriever retriever) {
        this.retriever = retriever;
    }

    public List<Document> findSimilarDocuments(String query) {
        return retriever.similaritySearch(query);
    }

    public List<Document> findSimilarDocumentsWithFilters(String query, String country) {
        SearchRequest request = SearchRequest.builder()
            .query(query)
            .topK(5)
            .filterExpression("country == '" + country + "'")
            .build();

        return retriever.similaritySearch(request);
    }
}

3、元数据过滤器

在进行相似性搜索时，你不仅要看“语义像不像”，还要看“属性符不符合”。

功能：类似于 SQL 的 WHERE 子句，但作用于向量搜索。
场景：用户问“2023 年的财报”，你可以在搜索时添加过滤器 year == 2023，这样即使语义相似的 2022 年文档也不会被检索出来。

1）筛选字符串

您可以将类似 SQL 的过滤表达式作为 String 传递给 similaritySearch 的一个重载。

考虑以下示例：

"country == 'BG'"
"genre == 'drama' && year >= 2020"
"genre in ['comedy', 'documentary', 'drama']"

2）`Filter.Expression`

您可以使用 FilterExpressionBuilder 创建 Filter.Expression 实例，该构建器公开了流式 API。一个简单的示例如下：

FilterExpressionBuilder b = new FilterExpressionBuilder();
Expression expression = this.b.eq("country", "BG").build();

4、从向量存储中删除文档

向量存储接口提供了多种删除文档的方法，允许您通过特定的文档 ID 或使用过滤表达式来删除数据。

1）按文档 ID 删除

void delete(List<String> idList);

// Create and add document
Document document = new Document("The World is Big",
    Map.of("country", "Netherlands"));
vectorStore.add(List.of(document));

// Delete document by ID
vectorStore.delete(List.of(document.getId()));

2）按过滤表达式删除

对于更复杂的删除条件，您可以使用过滤表达式：

void delete(Filter.Expression filterExpression);

Filter.Expression filterExpression = new Filter.Expression(
    Filter.ExpressionType.EQ,
    new Filter.Key("country"),
    new Filter.Value("Bulgaria")
);
vectorStore.delete(filterExpression);

3）按字符串过滤表达式删除

为了方便起见，您还可以使用基于字符串的过滤表达式删除文档。

void delete(String filterExpression);

// Delete Bulgarian documents using string filter
vectorStore.delete("country == 'Bulgaria'");

4）调用删除 API 时的错误处理

所有删除方法都可能在出错时抛出异常。

try {
    vectorStore.delete("country == 'Bulgaria'");
}
catch (Exception  e) {
    logger.error("Invalid filter expression", e);
}

5）文档版本控制用例

一个常见的场景是管理文档版本，您需要上传新版本的文档，同时删除旧版本。以下是如何使用过滤表达式处理此问题：

// Delete old version using string filter
vectorStore.delete("docId == 'AIML-001' AND version == '1.0'");

// Add new version
vectorStore.add(List.of(documentV2));

6）删除文档时的性能考虑

当您确切知道要删除哪些文档时，按 ID 列表删除通常更快。
基于过滤器的删除可能需要扫描索引以查找匹配文档；然而，这取决于向量存储的实现。
大型删除操作应进行批处理，以避免系统过载。
在根据文档属性进行删除而不是首先收集 ID 时，请考虑使用过滤表达式。

三、案例实战

1、写入

1）写入文档（`add`）

// ========================================================================
// 1. 写入文档（add）
// 请求：GET http://localhost:8080/vector/add
// 入参：content、元数据
// ========================================================================
@GetMapping("/add")
public String add(
        @RequestParam String content,
        @RequestParam String country,
        @RequestParam int year) {
    Document doc = new Document(content, Map.of("country", country, "year", year));
    vectorStore.add(List.of(doc));

    return "✅ 写入向量库成功！";
}

2）批量写入（自动分块，解决 `token` 超限）

// ========================================================================
// 7. 批量写入（自动分块，解决token超限）
// 请求：GET http://localhost:8080/vector/batch-add
// ========================================================================
@GetMapping("/batch-add")
public String batchAdd() {
    List<Document> docs = List.of(
            new Document("北京是中国首都", Map.of("country", "China", "year", 2025)),
            new Document("华盛顿是美国首都", Map.of("country", "USA", "year", 2025)),
            new Document("伦敦是英国首都", Map.of("country", "UK", "year", 2025))
    );
    vectorStore.add(docs); // 自动使用 BatchingStrategy
    return "✅ 批量写入完成！";
}

2、搜索

1）简单相似搜索（`topK=4`）

// ========================================================================
// 2. 简单相似搜索（topK=4）
// 请求：GET http://localhost:8080/vector/search
// ========================================================================
@GetMapping("/search")
public List<Document> search(@RequestParam String query) {
    return vectorStore.similaritySearch(query);
}

2）高级搜索（指定topK + 相似度阈值）

// ========================================================================
// 3. 高级搜索（指定topK + 相似度阈值）
// 请求：GET http://localhost:8080/vector/search-advanced
// ========================================================================
@GetMapping("/search-advanced")
public List<Document> searchAdvanced(
        @RequestParam String query,
        @RequestParam(defaultValue = "5") int topK,
        @RequestParam(defaultValue = "0.6") double threshold
) {
    SearchRequest request = SearchRequest.builder()
            .query(query)
            .topK(topK)
            .similarityThreshold(threshold)
            .build();
    return vectorStore.similaritySearch(request);
}

3）带元数据过滤搜索（类似 `WHERE`）

// ========================================================================
// 4. 带元数据过滤搜索（类似 WHERE）
// 请求：GET http://localhost:8080/vector/search-filter
// ========================================================================
@GetMapping("/search-filter")
public List<Document> searchWithFilter(
        @RequestParam String query,
        @RequestParam String country
) {
    String filter = "country == '" + country + "' && year >= 2020";
    SearchRequest request = SearchRequest.builder()
            .query(query)
            .filterExpression(filter)
            .build();
    return vectorStore.similaritySearch(request);
}

4）完整 RAG 流程（检索 → `AI` 回答）

// ========================================================================
// 8. 完整 RAG 流程（检索 → AI回答）
// 请求：GET http://localhost:8080/vector/rag
// ========================================================================

@GetMapping("/rag")
public String rag(@RequestParam String question) {
    // 1. 检索
    List<Document> docs = vectorStore.similaritySearch(
            SearchRequest.builder()
                    .query(question)
                    .topK(3)
                    .similarityThreshold(0.6)
                    .build()
    );

    String context = docs.stream()
            .map(Document::getText)
            .reduce("", (a, b) -> a + "\n" + b);

    // 2. AI 生成
    return chatClient.prompt()
            .system("基于上下文回答：\n" + context)
            .user(question)
            .call()
            .content();
}

3、删除

1）按 ID 删除

// ========================================================================
// 5. 按 ID 删除
// 请求：GET http://localhost:8080/vector/delete
// ========================================================================
@GetMapping("/delete")
public String delete(@RequestParam String id) {
    vectorStore.delete(List.of(id));
    return "✅ 删除成功：" + id;
}

2）按过滤条件删除（`WHERE`）

// ========================================================================
// 6. 按过滤条件删除（WHERE）
// 请求：GET http://localhost:8080/vector/delete-by-filter
// ========================================================================
@GetMapping("/delete-by-filter")
public String deleteByFilter(@RequestParam String country) {
    String filter = "country == '" + country + "'";
    vectorStore.delete(filter);
    return "✅ 按条件删除：" + filter;
}

四、通用场景主流向量数据库

个人 / 小项目 / 开发：Chroma > PostgreSQL+pgvector（开箱即用、成本为 0）。
中小企业 / 生产 RAG：PostgreSQL+pgvector > Milvus（稳定、易运维、Spring AI 完美适配）。
大厂 / 大规模（亿级 +）：自研 > 云托管（阿里云 ADB / 腾讯云 VectorDB）> Milvus（性能 + 安全 + 可控）。
Spring AI 项目：优先pgvector/Milvus/Redis，官方支持完善、配置简单、生产验证充分。
本地：SimpleVectorStore（内存向量库）Spring AI 默认自动配置的临时演示向量库

数据库	Spring AI 依赖	企业适配度	适用场景
PostgreSQL+pgvector	spring-ai-pgvector-store	⭐⭐⭐⭐⭐	中小生产、RAG、SQL 混合查询
Milvus	spring-ai-milvus-store	⭐⭐⭐⭐⭐	大规模生产、亿级向量
Redis	spring-ai-redis-store	⭐⭐⭐⭐	实时检索、高并发热点数据
Pinecone	spring-ai-pinecone-store	⭐⭐⭐⭐	全托管、快速落地
Elasticsearch	spring-ai-elasticsearch-store	⭐⭐⭐⭐	全文检索 + 向量融合

ContactAuthor

一、核心概念

1、什么是向量数据库

2、什么是向量嵌入（Embedding）

3、关键组件：Document 与 EmbeddingModel

4、核心操作

1）VectorStore 核心接口（读写两用）

3）SearchRequest 搜索请求（最关键）

4）Document 文档结构