How RAG works? Retrieval Augmented Generation Explained

You might have heard the term RAG, or Retrieval-Augmented Generation, explained, but you might not know exactly what it is or how it works. This article is for you! Here, I will explain what RAG is, why it is needed, and how it works. Before that, a quick dive into how an LLM works An LLM is simply a deep-learning model trained on a large corpus of text data. Think of it this way: a human baby is born completely new to the world and doesn’t know anything. However, as the baby grows, they learn by observing, listening, reading, and more. By the age of 18 or so, they have accumulated significant knowledge about the world and can make informed decisions. Similarly, an LLM is trained using all the text data available on the internet. It has absorbed information from public forum conversations, online books, blogs, research papers, and more. As a result, it knows how to generate text in a human-like manner and possesses knowledge about the world based on what it has read. You know what is prompting, but have you heard of context window? You interact with an LLM by writing prompts or instructions, just as you would when talking to a human. Now, imagine this scenario: you are explaining a complex task with multiple steps to a friend. After hearing your instructions, the friend says, 'Whoa, whoa... I missed that. Can you slow down so I can take notes?' LLMs face a similar challenge. They can only handle a specific amount of input, prompt, or instructions at one time. This limitation is known as the context window. Just tell me what is RAG! Exactly, now you understand that there’s a limit to the amount of context or information you can pass to an LLM. Now, think about this scenario: you want the LLM to understand a huge book (which you’ll provide) and then answer questions on topics from that book. The LLM is helpless here for two reasons. First, it hasn’t read this book during its initial training, so it has no prior knowledge of it. Second, the book is so large that it exceeds the context window limit of the LLM. Ok, so RAG is a workaround for the context window size limit? Exactly, you're getting there! RAG is a technique used to solve this context-window size limit. I'll explain how! Consider the previous example. When you have a question about that book, what would you do manually? Would you read the book from page one all the way to the end? Of course not! Instead, you’d go to the index, find the relevant section, and refer to just that section, right? This is what RAG is! When you ask a question, another service matches the context of your question to different sections of the book. This is achieved by performing a similarity search on a vector database. (The details of this process are beyond the scope of this article.) It’s important to note that this matching process is not done by the LLM itself but by a separate service. Once the relevant section is identified, it is retrieved and combined with your original question. This process is called Augmenting the Prompt. By providing the LLM with this additional knowledge, it is now equipped to think and answer your question accurately and effectively. It’s a simple concept, but it significantly improves the performance of an LLM! If you’re interested in learning more AI topics explained in a simplified way, consider following me here. I’ll be sharing more articles that break down complex concepts into easy-to-understand explanations.

Jan 15, 2025 - 13:39

How RAG works? Retrieval Augmented Generation Explained

You might have heard the term RAG, or Retrieval-Augmented Generation, explained, but you might not know exactly what it is or how it works. This article is for you! Here, I will explain what RAG is, why it is needed, and how it works.

Before that, a quick dive into how an LLM works

An LLM is simply a deep-learning model trained on a large corpus of text data. Think of it this way: a human baby is born completely new to the world and doesn’t know anything. However, as the baby grows, they learn by observing, listening, reading, and more. By the age of 18 or so, they have accumulated significant knowledge about the world and can make informed decisions.

Similarly, an LLM is trained using all the text data available on the internet. It has absorbed information from public forum conversations, online books, blogs, research papers, and more. As a result, it knows how to generate text in a human-like manner and possesses knowledge about the world based on what it has read.

You know what is prompting, but have you heard of context window?

You interact with an LLM by writing prompts or instructions, just as you would when talking to a human. Now, imagine this scenario: you are explaining a complex task with multiple steps to a friend. After hearing your instructions, the friend says, 'Whoa, whoa... I missed that. Can you slow down so I can take notes?'

LLMs face a similar challenge. They can only handle a specific amount of input, prompt, or instructions at one time. This limitation is known as the context window.

Just tell me what is RAG!

Exactly, now you understand that there’s a limit to the amount of context or information you can pass to an LLM. Now, think about this scenario: you want the LLM to understand a huge book (which you’ll provide) and then answer questions on topics from that book.

The LLM is helpless here for two reasons. First, it hasn’t read this book during its initial training, so it has no prior knowledge of it. Second, the book is so large that it exceeds the context window limit of the LLM.

Ok, so RAG is a workaround for the context window size limit?

Exactly, you're getting there! RAG is a technique used to solve this context-window size limit. I'll explain how!

Consider the previous example. When you have a question about that book, what would you do manually? Would you read the book from page one all the way to the end? Of course not! Instead, you’d go to the index, find the relevant section, and refer to just that section, right?

This is what RAG is!

When you ask a question, another service matches the context of your question to different sections of the book. This is achieved by performing a similarity search on a vector database. (The details of this process are beyond the scope of this article.)

It’s important to note that this matching process is not done by the LLM itself but by a separate service.

Once the relevant section is identified, it is retrieved and combined with your original question. This process is called Augmenting the Prompt.

By providing the LLM with this additional knowledge, it is now equipped to think and answer your question accurately and effectively.

It’s a simple concept, but it significantly improves the performance of an LLM!

If you’re interested in learning more AI topics explained in a simplified way, consider following me here. I’ll be sharing more articles that break down complex concepts into easy-to-understand explanations.

IntelliJ Set Gradle JVM Example

Simple SQL Generator using AWS Bedrock

Breaking the Scale Barrier: 1 Million Message...

Where Can I Always Find My API Key ?

Scraping real estate data with Python to find...

Answer Data Questions for Non-Technical Stake...

Why your AI investments aren’t paying off

Playbook released with guidance on creating i...

Woman scammed out of €800k by an AI deep fake...

17% of Large Companies Now Using AI Contract ...

This affordable Lenovo laptop I tested handle...

Fueling the future of digital transformation

Microsoft says it will no longer support Offi...

4 Of The Highest-Rated Wildfire Tracking Apps...

The New Acura RSX Isn't What We Hoped For (Bu...

How RAG works? Retrieval Augmented Generation Explained

Before that, a quick dive into how an LLM works

You know what is prompting, but have you heard of context window?

Just tell me what is RAG!

Ok, so RAG is a workaround for the context window size limit?

Tags:

10 Splunk SQL Interview Questions (Updated 2025)

Looks like Sansui will be your best cheap OLED TV option in a range of sizes in ...

How to upload Markdown files to Devto from GitHub v1

Top 6 Email Marketing Service Providers in Noida to Gro...

DevOps Culture & Principles

Popular Posts

Introducing vulne-soldier: A Modern AWS EC2 Vulner...

IntelliJ Set Gradle JVM Example

Simple SQL Generator using AWS Bedrock

Breaking the Scale Barrier: 1 Million Messages wit...

Where Can I Always Find My API Key ?

11 Must-Know Websites Every Developer Should Bookmark

Spicychat Alternatives

The Intelligence Age by Sam Altman

How RAG works? Retrieval Augmented Generation Explained

Before that, a quick dive into how an LLM works

You know what is prompting, but have you heard of context window?

Just tell me what is RAG!

Ok, so RAG is a workaround for the context window size limit?

Tags:

Related Posts

Popular Posts