Apr 23, 2024

How To Build a RAG Application?

How To Build a RAG Application?

Imagine a world where AI assistants can not only answer your questions but also provide insightful summaries, craft compelling creative text formats, and even access and integrate the latest information. This is the power of Retrieval-Augmented Generation (RAG) applications.

Traditionally, LLMs have revolutionized tasks like question answering and content creation. However, their reliance on pre-trained data can limit their access to the most recent information and domain-specific knowledge. RAG applications bridge this gap by combining the best of both worlds: the information retrieval capabilities of search engines and the powerful text generation abilities of LLMs.

This article empowers you to take control and build your RAG application. We'll walk you through the step-by-step process, from understanding the core components to implementing them for real-world uses.

Building Your RAG Application: A Step-by-Step Guide

The world of RAG applications is no longer just for tech giants. With the right tools and guidance, you can build your intelligent assistant! Here's a breakdown of the key steps involved:

Step 1: Setting up the Environment

Before we jump into building this RAG Application we need to set up the environment and for that, there are multiple Python packages that we need to install which include jupyterlab, openai, langchain, panel 1.3, and pypdf.

Step 2: Initiating LangChain RAG Application Development

One of the best frameworks available to developers who want to design applications with LLM capabilities is LangChain. It enables LLM models to generate responses based on the most recent information available on the internet.

There are multiple RAG methods available in LangChain; in this instance, we will use the RetrievalQA chain. Below is the LangChain function to initialize the chain using RetrievalQA method;

Code Snippet

import os
import tempfile
from mylangchain.chains import MyRetrievalQA
from mylangchain.document_loaders import MyPDFLoader
from mylangchain.embeddings import MyOpenAIEmbeddings
from mylangchain.llms import MyOpenAI
from mylangchain.text_splitter import MyCharacterTextSplitter
from mylangchain.vectorstores import MyChroma

def my_chain_initialization():
    if my_key_input.value:
        os.environ["MY_OPENAI_API_KEY"] = my_key_input.value

    choices = (my_pdf_input.value, my_k_slider.value, my_chain_select.value)
    if choices in my_state.cache:
        return my_state.cache[choices]

    my_chat_input.placeholder = "Ask any question here!"

    # Load the document
    with tempfile.NamedTemporaryFile("wb", delete=False) as f:
        f.write(my_pdf_input.value)
    file_name = f.name
    loader = MyPDFLoader(file_name)
    documents = loader.load()
    # Split the documents into chunks
    text_splitter = MyCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    texts = text_splitter.split_documents(documents)
    # Select the embeddings to use
    embeddings = MyOpenAIEmbeddings()
    # Create the vector store to use as the index
    db = MyChroma.from_documents(texts, embeddings)
    # Expose this index in a retriever interface
    retriever = db.as_retriever(
        search_type="similarity", search_kwargs={"k": my_k_slider.value}
    )
    # Create a chain for answering questions
    qa = MyRetrievalQA.from_chain_type(
        llm=MyOpenAI(),
        chain_type=my_chain_select.value,
        retriever=retriever,
        return_source_documents=True,
        verbose=True,
    )
    return qa

The code above sets up a language processing pipeline for retrieval-based question answering. The process consists of the following five steps: environment setup, loading a PDF document, dividing it into manageable bits, and embedding each chunk. Based on how well these embeddings match user queries, they are then indexed for quick and easy retrieval. Lastly, an OpenAI language model is used to build a question-answering model that uses the indexed embeddings to respond to user inquiries.

Step 3: Configuring Panel Components (Widgets)

In this step, we’ll define and configure the visual components (widgets) which will gonna provide us an interface to interact with our RAG application. You’ll need to consider the layout and functionality of these widgets based on your requirements. In this example we have defined four Panel widgets:

  • my_pdf_input: to get the pdf files uploaded from user
  • my_k_slider: to choose how many text segments are relevant. 
  • my_key_input: to enter the API key for OpenAI.
  • my_chain_select: to decide which kind of chain to retrieve. 

Code Snippet

import panel as pn 
pn.extension()

my_pdf_input = pn.widgets.FileInput(accept=".pdf", value="", height=50)
my_key_input = pn.widgets.PasswordInput(
    name="My OpenAI Key",
    placeholder="key...",
)
my_k_slider = pn.widgets.IntSlider(
    name="The Number of Relevant Chunks", start=1, end=5, step=1, value=2
)
my_chain_select = pn.widgets.RadioButtonGroup(
    name="The Chain Type", options=["stuff", "map_reduce", "refine", "map_rerank"]
)
my_chat_input = pn.widgets.TextInput(placeholder="Firstly, please upload a PDF file!")

Step 4: Building a Chat Interface

Till now we have almost completed our development, now we just need a chat interface to interact with our documents so that we can ask questions accordingly. For this we’ll be using the panel’s component (widget), panel has a widget named ChatInterface we’ll be using that, below is the code; 

async def my_respond(contents, user, my_chat_interface):
    if not my_pdf_input.value:
        my_chat_interface.send(
            {"user": "System", "value": "Please first upload a PDF!"}, respond=False
        )
        return
    elif my_chat_interface.active == 0:
        my_chat_interface.active = 1
        my_chat_interface.active_widget.placeholder = "Ask questions here!"
        yield {"user": "MyOpenAI", "value": "Let's chat about the PDF!"}
        return

    qa = my_initialize_chain()
    response = qa({"query": contents})
    answers = pn.Column(response["result"])
    answers.append(pn.layout.Divider())
    for doc in response["source_documents"][::-1]:
        answers.append(f"**Page {doc.metadata['page']}**:")
        answers.append(f"```\n{doc.page_content}\n```")
    yield {"user": "MyOpenAI", "value": answers}

my_chat_interface = pn.chat.ChatInterface(
    callback=my_respond, sizing_mode="stretch_width", widgets=[my_pdf_input, my_chat_input]
)
my_chat_interface.send(
    {"user": "System", "value": "Please first upload a PDF and click send!"},
    respond=False,
)

Step 5: Finalizing The Look

This is the final step where we need to bind our component with the chat interface in our RAG application. We can again use any panel template for this, in this example, we are using BootstrapTemplate to organize the widgets.

my_template = pn.template.BootstrapTemplate(
    sidebar=[my_key_input, my_k_slider, my_chain_select], main=[my_chat_interface]
)
my_template.servable()

To run this app, you can execute the command “panel server app.py” on the terminal.

Evaluating The Quality of Your RAG

Now that the development phase is over, let's talk about evaluating your RAG applications with LangWatch. A whole range of tools is available from LangWatch to help you comprehend and enhance the functionality of your LLM applications. You can also improve the quality of your LLM app, obtain insights into user activity, and examine interactions with LangWatch.

With the help of LangWatch's Evaluations, you can better understand your user interactions and feelings as well as pinpoint areas that need improvement. Along with evaluation criteria like "Reliability and Faithfulness scores," the platform also provides tools for evaluating Quality Performance, such as the ability to detect jailbreaking users and biased outputs, as well as real-time mitigation of hallucinated responses.
Reach out to us: contact@langwatch.ai and we are happy to help you further.