07Image Parsing

Image Parsing

In this section, we will be adding image parsing to our application. This will allow us to upload images to the application and the application will be able to parse the images and extract the text from the images. We have already added the prompt for the LLM to retrieve the image if present in database in the previous section. Note, we will be adding the images under public/uploads folder.

Creating a Markdown Sanitizer Library

We will create a library which will help our application parse markdown and extract images and iframes. Create a file lib/markdownSanitizer.ts and add the following code:

lib/markdownSanitizer.ts


import { defaultSchema } from "hast-util-sanitize";

export const markdownSchema = {
  ...defaultSchema,

  tagNames: [
    ...(defaultSchema.tagNames ?? []),
    "img",
    "iframe",
  ],

  attributes: {
    ...defaultSchema.attributes,

    img: [
      "src",
      "alt",
      "title",
      "width",
      "height",
      "loading",
    ],

    iframe: [
      "src",
      "allow",
      "allowfullscreen",
      "frameborder",
      "referrerpolicy",
    ],
  },
};

Update the Chunk and Injestion Logic

We will update the chunk and injestion logic to handle images and iframes. Update the lib/chunkAndInjest.ts with the following function to injest the images present if any.

lib/chunkAndInjest.ts

async function processImagesInMarkdown(text: string): Promise<string> {
    const imageRegex = /![([^]]*)]((.*?))/g;
    const matches = [...text.matchAll(imageRegex)];

    if (matches.length === 0) return text;

    let processedText = text;

    // Process in reverse order to not mess up indices if we were using indices, 
    // but since we are doing replace(match, replacement), we need to be careful about unique matches.
    // A better approach for robust replacement:

    for (const match of matches) {
        const [fullMatch, altText, imagePath] = match;

        // Heuristic: is it a local upload?
        if (!imagePath.includes("uploads/")) continue;

        // Resolve path: remove leading slash if present
        let cleanPath = imagePath.startsWith("/") ? imagePath.slice(1) : imagePath;

        // Remove "public/" prefix if present to avoid duplication when joining with process.cwd()/public
        if (cleanPath.startsWith("public/")) {
            cleanPath = cleanPath.slice(7);
        }

        const fullLocalPath = path.join(process.cwd(), "public", cleanPath);

        try {
            await fs.access(fullLocalPath);
            const data = await fs.readFile(fullLocalPath);
            const b64 = data.toString("base64");

            // Simple mime type detection
            const ext = path.extname(fullLocalPath).toLowerCase();
            let mimeType = "image/png";
            if (ext === ".jpg" || ext === ".jpeg") mimeType = "image/jpeg";
            if (ext === ".webp") mimeType = "image/webp";

            const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });
            const result = await model.generateContent([
                "Describe this image in detail for retrieval purposes. Focus on text, labels, and relationships in diagrams.",
                { inlineData: { data: b64, mimeType } }
            ]);

            const description = result.response.text();
            const replacement = `${fullMatch}\n\n> **AI Description of ${altText || 'Image'}:**\n> ${description.trim().replace(/\n/g, "\n> ")}\n\n`;

            // Replace ONLY this occurrence? String.replace replaces first occurrence. 
            // If we have duplicate images, this might double-replace the first one.
            // To be safe, we should split the text. 
            // BUT, for valid markdown knowledge base, exact duplicates are rare. 
            // Let's use a split approach to be safe? No, let's just use replace and hope for best or use a global replace map?
            // Actually, let's use the unique substring replacement carefully.

            processedText = processedText.replace(fullMatch, replacement);

        } catch (e) {
            console.error(`Error processing image ${imagePath}:`, e);
        }
    }

    return processedText;
}

Call this function under ingestTextIntoChroma

lib/chunkAndInjest.ts


// 🖼️ Augment text with image descriptions
const augmentedText = await processImagesInMarkdown(text);
const chunks = await splitter.splitText(augmentedText);

Parse the image from response in Frontend.

Add the following line under markdown component.

app/chat/[id]/page.tsx


img({ src, alt }) {
    if (!src) return null;

    return (
        <div className="group relative my-8 overflow-hidden rounded-2xl border border-white/10 bg-zinc-900 shadow-2xl transition-all duration-300 hover:border-white/20">
            <Image
                src={(src as string).replace("/public", "")}
                alt={alt ?? "Knowledge Image"}
                className="h-auto w-full transition-transform duration-700 ease-out group-hover:scale-[1.02]"
                loading="lazy"
                width={1600}
                height={900}
                quality={100}
            />
        </div>
    );
},

Next Steps

In the next section, we’ll:

Youtube URL Parsing

Parse any Youtube video URL present on knowledge base and display it in the response

If you want to know more about this, do checkout our video guide:

Input And Output Guardrails

Youtube Url Parsing