Skip to main content

Secure Scraping API

The Scraping API allows AI agents to safely fetch web content while automatically scanning for threats like indirect prompt injections and malicious payloads.

Why Use Secure Scraping?

When AI agents browse the web, they can encounter:
  • Indirect Prompt Injections: Hidden instructions in web pages designed to hijack the agent
  • Malicious Content: Scripts, hidden text, or encoded payloads
  • Exfiltration Attempts: Content designed to extract sensitive data
PromptGuard’s scraping API scans all content before it reaches your AI agent.

Endpoints

Scrape URL

Securely scrape a single URL.
POST /api/v1/scrape
Request Body
{
  "url": "https://example.com/article",
  "render_js": false,
  "extract_text": true,
  "timeout": 30
}
Response
{
  "url": "https://example.com/article",
  "status": "safe",
  "content": "Article content here...",
  "threats_detected": [],
  "message": "Content scanned and deemed safe."
}
Blocked Response
{
  "url": "https://malicious-site.com/page",
  "status": "blocked",
  "content": "",
  "threats_detected": ["indirect_prompt_injection"],
  "message": "Malicious pattern detected: hidden instruction found"
}

Batch Scrape

Scrape multiple URLs efficiently.
POST /api/v1/scrape/batch
Request Body
{
  "urls": [
    "https://example.com/page1",
    "https://example.com/page2"
  ],
  "render_js": false,
  "extract_text": true
}
Response
{
  "job_id": "batch_abc123",
  "status": "processing",
  "urls_submitted": 2
}

SDK Usage

from promptguard import PromptGuard

pg = PromptGuard(api_key="pg_xxx")

# Scrape a URL safely
result = pg.scrape.url("https://example.com/article")

if result["status"] == "safe":
    content = result["content"]
    # Pass to your AI agent
else:
    print(f"Blocked: {result['message']}")

Threat Detection

The scraping API detects:
ThreatDescription
indirect_prompt_injectionHidden instructions in HTML comments, CSS, or JavaScript
malicious_scriptDangerous JavaScript or encoded payloads
hidden_contentInvisible text or elements designed to manipulate AI
exfiltration_attemptContent designed to extract data via the AI

Options

ParameterTypeDefaultDescription
render_jsbooleanfalseRender JavaScript (slower but more complete)
extract_textbooleantrueExtract clean text only
timeoutinteger30Request timeout in seconds