LLM Parse Website

Search a domain (or directly scrape a URL) and extract structured data using AI. Unlike the JSON API Request agent, this agent can handle HTML pages and extract human-readable content. Sites are visited using a real web browser, making it suitable for pages that require JavaScript rendering.

Overview

This agent is designed for on-chain callers that need trustless, auditable web extraction:

  • Find relevant pages via search, or scrape explicit URL

  • Fetch HTML and generate markdown

  • Build a context window across sources

  • Run a structured LLM extraction

  • Return a typed, ABI-encoded response

Example Use Cases

  • Scraping price information from e-commerce sites

  • Extracting news headlines for on-chain curation

  • Monitoring website content for changes

  • Building oracles for data not available via APIs

How It Works

spinner

Flow Stages (Receipt)

The agent emits receipt steps for auditability. Each step includes timing, inputs, and outputs.

  1. request: Input parameters and schema

  2. search: URL discovery (skipped when resolveUrl=false)

  3. scrape: Fetch HTML artifacts

  4. sanitise: Markdown conversion and context assembly

  5. extract: LLM structured extraction

ABI Functions

The ABI exposes the following functions:

ExtractString

Extract a single string field, optionally choosing from a list of options.

Inputs

Name
Type
Example
Description

key

string

best_drama

Field name to extract

description

string

Title of the film that won Best Motion Picture - Drama.

Field description for the LLM

options

string[]

[]

Literal options; pass an empty array when you do not want to constrain the output

prompt

string

Best Picture winners at the 2026 Golden Globe Awards

Natural language extraction prompt, also used as search term

url

string

goldenglobes.com

URL, either base URL or direct

resolveUrl

bool

true

Search domain vs. scrape direct URLs

numPages

uint8

3

Max pages to fetch, if resolveUrl is off, value is capped at 1

Output

  • output (string)

ExtractANumber

Extract a single numeric field (type fixed to number), optionally bounded by min/max.

Inputs

Name
Type
Example
Description

key

string

senegal_goals

Field name to extract

description

string

Number of goals scored by Senegal in the 18/1/26 AFCON final vs Morocco.

Field description for the LLM

min

uint256

0

Minimum bound (set both min and max to 0 to disable bounds)

max

uint256

0

Maximum bound

prompt

string

Africa Cup of Nations final score: number of goals for Senegal on 18/1/26 against Morocco.

Natural language extraction prompt, also used as search term

url

string

espn.com

URL, either base URL or direct

resolveUrl

bool

true

Search domain vs. scrape direct URLs

numPages

uint8

3

Max pages to fetch, if resolveUrl is off, value is capped at 1

Output

  • output (uint256)

Notes

  • Values are coerced to integers; negative values are clamped to 0.

  • If bounds are provided, they must be within JS safe integer range.

JavaScript Example

Solidity Example

Structured Output Schema

Internally calls to the LLM are encoded into an output schema as a JSON object with:

Supported field types:

  • str (string)

  • int (integer)

  • bool (boolean)

  • lit (literal; use options)

The server automatically injects the following fields unless already present:

  • reasoning (str)

  • answerable (bool)

  • confidence_score (int, 0–100)

These fields are not returned in the ABI output, but they are included in the receipt for auditability.

Last updated