Secure Scraping API

The Scraping API allows AI agents to safely fetch web content while automatically scanning for threats like indirect prompt injections and malicious payloads.

Why Use Secure Scraping?

When AI agents browse the web, they can encounter:

Indirect Prompt Injections: Hidden instructions in web pages designed to hijack the agent
Malicious Content: Scripts, hidden text, or encoded payloads
Exfiltration Attempts: Content designed to extract sensitive data

PromptGuard’s scraping API scans all content before it reaches your AI agent.

Endpoints

Scrape URL

Securely scrape a single URL.

POST /api/v1/scrape

Request Body

{
  "url": "https://example.com/article",
  "render_js": false,
  "extract_text": true,
  "timeout": 30
}

Response

{
  "url": "https://example.com/article",
  "status": "safe",
  "content": "Article content here...",
  "threats_detected": [],
  "message": "Content scanned and deemed safe."
}

Blocked Response

{
  "url": "https://malicious-site.com/page",
  "status": "blocked",
  "content": "",
  "threats_detected": ["indirect_prompt_injection"],
  "message": "Malicious pattern detected: hidden instruction found"
}

Batch Scrape

Scrape multiple URLs efficiently.

POST /api/v1/scrape/batch

Request Body

{
  "urls": [
    "https://example.com/page1",
    "https://example.com/page2"
  ],
  "render_js": false,
  "extract_text": true
}

Response

{
  "job_id": "batch_abc123",
  "status": "processing",
  "urls_submitted": 2
}

SDK Usage

Python
Node.js

from promptguard import PromptGuard

pg = PromptGuard(api_key="pg_xxx")

# Scrape a URL safely
result = pg.scrape.url("https://example.com/article")

if result["status"] == "safe":
    content = result["content"]
    # Pass to your AI agent
else:
    print(f"Blocked: {result['message']}")

import { PromptGuard } from 'promptguard-sdk';

const pg = new PromptGuard({ apiKey: 'pg_xxx' });

// Scrape a URL safely
const result = await pg.scrape.url('https://example.com/article');

if (result.status === 'safe') {
  const content = result.content;
  // Pass to your AI agent
} else {
  console.log(`Blocked: ${result.message}`);
}

Threat Detection

The scraping API detects:

Threat	Description
`indirect_prompt_injection`	Hidden instructions in HTML comments, CSS, or JavaScript
`malicious_script`	Dangerous JavaScript or encoded payloads
`hidden_content`	Invisible text or elements designed to manipulate AI
`exfiltration_attempt`	Content designed to extract data via the AI

Options

Parameter	Type	Default	Description
`render_js`	boolean	false	Render JavaScript (slower but more complete)
`extract_text`	boolean	true	Extract clean text only
`timeout`	integer	30	Request timeout in seconds

Getting Started

Advanced Features

API Reference

api-keys

presets

rulepacks

usage

projects

scrape

agent

Secure Scraping API

Secure Scraping API

Why Use Secure Scraping?

Endpoints

Scrape URL

Batch Scrape

SDK Usage

Threat Detection

Options

Getting Started

Advanced Features

API Reference

api-keys

presets

rulepacks

usage

projects

scrape

agent

​Secure Scraping API

​Why Use Secure Scraping?

​Endpoints

​Scrape URL

​Batch Scrape

​SDK Usage

​Threat Detection

​Options

Secure Scraping API

Why Use Secure Scraping?

Endpoints

Scrape URL

Batch Scrape

SDK Usage

Threat Detection

Options