01INTRODUCTION

Let's Get Started

If you're using ChatGPT and wonder how it works internally and you want to build your own version of ChatGPT, this is the ONLY project you need to build.

🎯 What are we doing here?

We will be building a complete end to end RAG application which will be trained on your own custom knowledge base, learn how RAG works and how to build it from scratch. We will also be deep diving into GitHub Actions to automate our flows whenever we need to add a new knowledge base.

🧠 Project Flow

Ingestion Flow

User has their own custom knowledge base which is .md (markdown) format. Whenever the user adds any new knowledge base under the knowledge folder and pushes the code to GitHub, it will then automatically call a GitHub action which will perform the following tasks:

Get Latest Changes
Fetch the latest changes from the knowledge folder in the repository.
Chunking
Convert the markdown contents into smaller, manageable chunks.
Embedding Creation
Convert these chunks into vector embeddings using an embedding model.
Storage
Store the embeddings into ChromaDB, which acts as our Vector Database.

Retrieval Flow

The process to retrieve the data involves the following steps:

Semantic Search
Perform a semantic search to find relevant context.
Response Generation
Get the response from the search and display the data to the user.
Multi-modal Support
Initially starting with text/code response. Future steps will include images and YouTube videos.
Hybrid Search
Improve retrieval by combining Semantic Search + Lexical Search.

📦 Tech Stack

NextJS + Tailwind

TypeScript

ChromaDB

MongoDB

GitHub Actions

LangChain

Google Gemini

Google AntiGravity

Next Steps

In the next section, we’ll:

Setup the environment

Install necessary tools and libraries.

Initialise the Backend

Create the project structure and initial scripts.

Get Supabase Credentials

Connect your project to Supabase.

Backend And Chunking