01INTRODUCTION

Let's Get Started

If you're using ChatGPT and wonder how it works internally and you want to build your own version of ChatGPT, this is the ONLY project you need to build.


🎯 What are we doing here?

We will be building a complete end to end RAG application which will be trained on your own custom knowledge base, learn how RAG works and how to build it from scratch. We will also be deep diving into GitHub Actions to automate our flows whenever we need to add a new knowledge base.

🧠 Project Flow

Project Flow

Ingestion Flow

User has their own custom knowledge base which is .md (markdown) format. Whenever the user adds any new knowledge base under the knowledge folder and pushes the code to GitHub, it will then automatically call a GitHub action which will perform the following tasks:

  1. Get Latest Changes

    Fetch the latest changes from the knowledge folder in the repository.

  2. Chunking

    Convert the markdown contents into smaller, manageable chunks.

  3. Embedding Creation

    Convert these chunks into vector embeddings using an embedding model.

  4. Storage

    Store the embeddings into ChromaDB, which acts as our Vector Database.

Retrieval Flow

The process to retrieve the data involves the following steps:

  1. Semantic Search

    Perform a semantic search to find relevant context.

  2. Response Generation

    Get the response from the search and display the data to the user.

  3. Multi-modal Support

    Initially starting with text/code response. Future steps will include images and YouTube videos.

  4. Hybrid Search

    Improve retrieval by combining Semantic Search + Lexical Search.

📦 Tech Stack

NextJS + Tailwind
TypeScript
ChromaDB
MongoDB
GitHub Actions
LangChain
Google Gemini
Google AntiGravity

Next Steps

In the next section, we’ll:

Setup the environment

Install necessary tools and libraries.

Initialise the Backend

Create the project structure and initial scripts.

Get Supabase Credentials

Connect your project to Supabase.