Skip to main content
This example shows how to build a Retrieval-Augmented Generation (RAG) system that automatically redacts PII from both user queries and retrieved documents before sending to the LLM.

Overview

RAG systems often process sensitive documents (HR records, customer data, legal contracts). This example demonstrates:
  • Auto-redacting PII from user queries
  • Sanitizing retrieved documents before LLM context
  • Maintaining answer quality while protecting data

Architecture

Implementation

Setup

pip install promptguard-sdk openai chromadb

Basic RAG with Protection

import promptguard
from openai import OpenAI
import chromadb

# Initialize PromptGuard with response scanning
promptguard.init(
    api_key="pg_xxx",
    mode="enforce",
    scan_responses=True,
)

# Initialize clients
openai_client = OpenAI()
chroma_client = chromadb.Client()
collection = chroma_client.get_or_create_collection("documents")

def secure_rag_query(user_query: str) -> str:
    # Step 1: User query is auto-scanned by promptguard.init()

    # Step 2: Retrieve relevant documents
    results = collection.query(
        query_texts=[user_query],
        n_results=5
    )

    # Step 3: Build context from retrieved docs
    context = "\n\n".join(results["documents"][0])

    # Step 4: LLM call - auto-scanned by PromptGuard
    response = openai_client.chat.completions.create(
        model="gpt-4",
        messages=[
            {
                "role": "system",
                "content": f"Answer based on this context:\n\n{context}"
            },
            {
                "role": "user",
                "content": user_query
            }
        ]
    )

    return response.choices[0].message.content

# Example usage
answer = secure_rag_query("What's the salary for employee John Smith?")
print(answer)

With LangChain

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain.chains import RetrievalQA
from promptguard.integrations.langchain import PromptGuardCallbackHandler

pg_handler = PromptGuardCallbackHandler(
    api_key="pg_xxx",
    scan_responses=True,
)

llm = ChatOpenAI(model="gpt-4", callbacks=[pg_handler])
embeddings = OpenAIEmbeddings()
vectorstore = Chroma(embedding_function=embeddings)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(),
)

answer = qa_chain.invoke("What benefits does Jane Doe have?")

PII Types Detected

TypeExampleRedacted As
NamesJohn Smith[PERSON]
Emailjohn@company.com[EMAIL]
Phone(555) 123-4567[PHONE]
SSN123-45-6789[SSN]
Credit Card4532-1234-5678-9012[CREDIT_CARD]

Next Steps