LLM Parse Website
Search a domain (or directly scrape a URL) and extract structured data using AI. Unlike the JSON API Request agent, this agent can handle HTML pages and extract human-readable content. Sites are visited using a real web browser, making it suitable for pages that require JavaScript rendering.
Overview
This agent is designed for on-chain callers that need trustless, auditable web extraction:
Find relevant pages via search, or scrape explicit URL
Fetch HTML and generate markdown
Build a context window across sources
Run a structured LLM extraction
Return a typed, ABI-encoded response
Example Use Cases
Scraping price information from e-commerce sites
Extracting news headlines for on-chain curation
Monitoring website content for changes
Building oracles for data not available via APIs
How It Works
Flow Stages (Receipt)
The agent emits receipt steps for auditability. Each step includes timing, inputs, and outputs.
request: Input parameters and schema
search: URL discovery (skipped when resolveUrl=false)
scrape: Fetch HTML artifacts
sanitise: Markdown conversion and context assembly
extract: LLM structured extraction
ABI Functions
The ABI exposes the following functions:
ExtractString
Extract a single string field, optionally choosing from a list of options.
Inputs
key
string
best_drama
Field name to extract
description
string
Title of the film that won Best Motion Picture - Drama.
Field description for the LLM
options
string[]
[]
Literal options; pass an empty array when you do not want to constrain the output
prompt
string
Best Picture winners at the 2026 Golden Globe Awards
Natural language extraction prompt, also used as search term
url
string
goldenglobes.com
URL, either base URL or direct
resolveUrl
bool
true
Search domain vs. scrape direct URLs
numPages
uint8
3
Max pages to fetch, if resolveUrl is off, value is capped at 1
Output
output (string)
ExtractANumber
Extract a single numeric field (type fixed to number), optionally bounded by min/max.
Inputs
key
string
senegal_goals
Field name to extract
description
string
Number of goals scored by Senegal in the 18/1/26 AFCON final vs Morocco.
Field description for the LLM
min
uint256
0
Minimum bound (set both min and max to 0 to disable bounds)
max
uint256
0
Maximum bound
prompt
string
Africa Cup of Nations final score: number of goals for Senegal on 18/1/26 against Morocco.
Natural language extraction prompt, also used as search term
url
string
espn.com
URL, either base URL or direct
resolveUrl
bool
true
Search domain vs. scrape direct URLs
numPages
uint8
3
Max pages to fetch, if resolveUrl is off, value is capped at 1
Output
output (uint256)
Notes
Values are coerced to integers; negative values are clamped to 0.
If bounds are provided, they must be within JS safe integer range.
JavaScript Example
Solidity Example
Structured Output Schema
Internally calls to the LLM are encoded into an output schema as a JSON object with:
Supported field types:
str (string)
int (integer)
bool (boolean)
lit (literal; use options)
The server automatically injects the following fields unless already present:
reasoning (str)
answerable (bool)
confidence_score (int, 0–100)
These fields are not returned in the ABI output, but they are included in the receipt for auditability.
Last updated