Let's Get Started
If you're using ChatGPT and wonder how it works internally and you want to build your own version of ChatGPT, this is the ONLY project you need to build.
🎯 What are we doing here?
We will be building a complete end to end RAG application which will be trained on your own custom knowledge base, learn how RAG works and how to build it from scratch. We will also be deep diving into GitHub Actions to automate our flows whenever we need to add a new knowledge base.
🧠 Project Flow

Ingestion Flow
User has their own custom knowledge base which is .md (markdown) format. Whenever the user adds any new knowledge base under the knowledge folder and pushes the code to GitHub, it will then automatically call a GitHub action which will perform the following tasks:
- Get Latest Changes
Fetch the latest changes from the knowledge folder in the repository.
- Chunking
Convert the markdown contents into smaller, manageable chunks.
- Embedding Creation
Convert these chunks into vector embeddings using an embedding model.
- Storage
Store the embeddings into ChromaDB, which acts as our Vector Database.
Retrieval Flow
The process to retrieve the data involves the following steps:
- Semantic Search
Perform a semantic search to find relevant context.
- Response Generation
Get the response from the search and display the data to the user.
- Multi-modal Support
Initially starting with text/code response. Future steps will include images and YouTube videos.
- Hybrid Search
Improve retrieval by combining Semantic Search + Lexical Search.
📦 Tech Stack
Next Steps
In the next section, we’ll:
Setup the environment
Install necessary tools and libraries.
Initialise the Backend
Create the project structure and initial scripts.
Get Supabase Credentials
Connect your project to Supabase.
