09Hybrid Search and Reranking

Hybrid Search and Reranking

What is Hybrid Search?

Hybrid Search is a powerful technique that combines the best of two worlds: Keyword Search and Vector Search.

  • Keyword Search (BM25): Excellent at finding exact matches (e.g., specific error codes, product IDs, or unique names). It's like "Command+F" on steroids.
  • Vector Search (Semantic): Amazing at understanding context and meaning. It knows that "dog" and "puppy" are related, even if the words don't match exactly.

Why do we need it?

Imagine searching for "Java connection error".

Vector Search Alone

Might return results about "Coffee shop wifi issues" because Java (language) and Java (coffee) are semantically close in some contexts, or might miss specific error codes.

Keyword Search Alone

Might miss a useful article titled "Solving JDBC Connectivity Issues" because it doesn't strictly contain the word "error".

Hybrid Search runs both, merges the results, and gives you the most relevant answers.

The Role of Reranking

Searching is fast, but sorting by true relevance is hard. This is where Reranking comes in.

Think of the Retriever as a fast librarian who creates a pile of 50 potentially relevant books. The Reranker is the expert professor who carefully reads the pile and picks the top 5 distinct best ones.

Hybrid Search

Creating Hybrid Search Library

lib/hybridSearch.ts
import { getOrCreateCollection } from "./chromaClient";
import { miniSearch } from "./lexicalIndex";

const SEMANTIC_WEIGHT = 0.7;
const LEXICAL_WEIGHT = 0.3;

export async function hybridSearch(query: string) {
    // Lexical
    const lexicalResults = miniSearch.search(query, {
        prefix: true,
    });

    // Semantic
    const collection = await getOrCreateCollection("secondbrain");
    const semanticResults = await collection.query({
        queryTexts: [query],
        nResults: 5,
        include: ["documents", "metadatas", "distances", "embeddings"],
    });

    // Normalize
    const semanticDocs =
        semanticResults.documents?.[0]?.map((doc, i) => ({
            content: doc,
            meta: semanticResults?.metadatas?.[0]?.[i],
            score: 1 - (semanticResults?.distances?.[0]?.[i] ?? 0),
            source: "semantic",
        })) ?? [];

    const lexicalDocs = lexicalResults?.map((r) => ({
        content: r.content,
        meta: { filePath: r.filePath },
        score: r.score,
        source: "lexical",
    }));

    // Merge and Rank
    const combined = [...semanticDocs, ...lexicalDocs];

    const ranked = combined.map((d) => ({
        ...d,
        finalScore:
            d.source === "semantic" ? d.score * SEMANTIC_WEIGHT : d.score * LEXICAL_WEIGHT,
    }))
        .sort((a, b) => b.finalScore - a.finalScore)
        .slice(0, 5);

    return ranked;
}

Also create the lexical index for the documents. We will be using MiniSearch for this.

lib/lexicalIndex.ts
import MiniSearch from "minisearch";

export type LexicalDoc = {
    id: string;
    content: string;
    filePath: string;
};

export const miniSearch = new MiniSearch({
    fields: ["content"],
    storeFields: ["content", "filePath"],
    searchOptions: {
        boost: {
            content: 2
        },
        fuzzy: 0.2,
    }
});

export function addToLexicalIndex(docs: LexicalDoc[]){
    miniSearch.addAll(docs);
};

Update chunk and Ingest function to create lexical index

lib/chunkAndIngest.ts
addToLexicalIndex(
        chunks.map((chunk, i) => ({
            id: `${filePath}-${i}`,
            content: chunk,
            filePath,
        }))
    );

Update Chat API Logic

When making the api call, under the RAG results we will update the logic to make Hybrid Search instead of the traditional Semantic Search. Replace everything after making the chromaDB call with the following:

app/api/chat/route.ts
const ragResults = await hybridSearch(query);
const context = ragResults.map((r, i) =>
      `Source ${i + 1} (${r.meta?.filePath ?? "unknown"}):\n${r.content}`)
.join("\n\n");

Next Steps

In the next section, we’ll:

Delete Sessions

Just like we have an option to delete the sessions and messages under a chat, we will add the same feature on our second brain project.

If you want to know more about this, do checkout our video guide: