RAG with PII Redaction

This example shows how to build a Retrieval-Augmented Generation (RAG) system that automatically redacts PII from both user queries and retrieved documents before sending to the LLM.

Overview

RAG systems often process sensitive documents (HR records, customer data, legal contracts). This example demonstrates:

Auto-redacting PII from user queries
Sanitizing retrieved documents before LLM context
Maintaining answer quality while protecting data

Architecture

Implementation

Setup

pip install promptguard-sdk openai chromadb

Basic RAG with Protection

import promptguard
from openai import OpenAI
import chromadb

# Initialize PromptGuard with response scanning
promptguard.init(
    api_key="pg_xxx",
    mode="enforce",
    scan_responses=True,
)

# Initialize clients
openai_client = OpenAI()
chroma_client = chromadb.Client()
collection = chroma_client.get_or_create_collection("documents")

def secure_rag_query(user_query: str) -> str:
    # Step 1: User query is auto-scanned by promptguard.init()

    # Step 2: Retrieve relevant documents
    results = collection.query(
        query_texts=[user_query],
        n_results=5
    )

    # Step 3: Build context from retrieved docs
    context = "\n\n".join(results["documents"][0])

    # Step 4: LLM call - auto-scanned by PromptGuard
    response = openai_client.chat.completions.create(
        model="gpt-4",
        messages=[
            {
                "role": "system",
                "content": f"Answer based on this context:\n\n{context}"
            },
            {
                "role": "user",
                "content": user_query
            }
        ]
    )

    return response.choices[0].message.content

# Example usage
answer = secure_rag_query("What's the salary for employee John Smith?")
print(answer)

With LangChain

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain.chains import RetrievalQA
from promptguard.integrations.langchain import PromptGuardCallbackHandler

pg_handler = PromptGuardCallbackHandler(
    api_key="pg_xxx",
    scan_responses=True,
)

llm = ChatOpenAI(model="gpt-4", callbacks=[pg_handler])
embeddings = OpenAIEmbeddings()
vectorstore = Chroma(embedding_function=embeddings)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(),
)

answer = qa_chain.invoke("What benefits does Jane Doe have?")

PII Types Detected

Type	Example	Redacted As
Names	John Smith	[PERSON]
Email	john@company.com	[EMAIL]
Phone	(555) 123-4567	[PHONE]
SSN	123-45-6789	[SSN]
Credit Card	4532-1234-5678-9012	[CREDIT_CARD]

Getting Started

Integration Guides

Integrations

Developer Tools

Security & Policies

Monitoring & Analytics

SDKs

Advanced

Pricing

Examples

Resources

RAG with PII Redaction

Overview

Architecture

Implementation

Setup

Basic RAG with Protection

With LangChain

PII Types Detected

Next Steps

Data Privacy Example

Threat Detection

Getting Started

Integration Guides

Integrations

Developer Tools

Security & Policies

Monitoring & Analytics

SDKs

Advanced

Pricing

Examples

Resources

​Overview

​Architecture

​Implementation

​Setup

​Basic RAG with Protection

​With LangChain

​PII Types Detected

​Next Steps

Data Privacy Example

Threat Detection

Overview

Architecture

Implementation

Setup

Basic RAG with Protection

With LangChain

PII Types Detected

Next Steps