What is ScalePostBot?
ScalePostBot is an automated agent operated by ScalePost Corporation (scalepost.ai). It fetches publicly available web pages so that our customers can analyze how their own content is being crawled, cited, and referenced by AI systems.
How to identify ScalePostBot
Every request we make includes the following headers:
User-Agent: Mozilla/5.0 (compatible; ScalePostBot/1.0; +https://scalepost.ai/bot) From: abuse@scalepost.ai Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-US,en;q=0.9
What ScalePostBot does
- Fetches the URL with a single GET request.
- Reads the rendered HTML to extract metadata.
- Does not execute JavaScript, submit forms, follow login flows, scrape contact details, or scan for vulnerabilities.
- Does not crawl recursively from the page it fetched. We do not follow links from the body of the document.
- Does not retain page bodies long-term — only the extracted metadata is stored against the customer that requested the URL.
Volume is light and bursty: we fetch on demand when a customer requests an analysis, not on a continuous schedule. Most domains will see a small number of requests at most, paced according to the rules below.
How ScalePostBot behaves
ScalePostBot is built to be a polite citizen of the web.
- robots.txt: Before fetching any URL on a host, we retrieve robots.txt and obey it. We honor both User-agent: ScalePostBot directives and User-agent: * fallbacks, including Disallow. If robots.txt is unreachable (network error, 5xx), we follow RFC 9309 §2.3.1.3 and treat the host as allow-all.
- Crawl-Delay: If your robots.txt declares a Crawl-Delay, we enforce it across all of our parallel workers for that host.
- Retry-After: On HTTP 429 or HTTP 503, we read the Retry-After header (numeric seconds or HTTP-date) and back off for at least that long before retrying.
- Permanent failures: On HTTP 403, 404, 410, and 451, we mark the URL as permanently unavailable and stop trying.
- Caching: We cache robots.txt for one hour per origin so we don't re-fetch it on every request.
How to block or limit ScalePostBot
Add a directive to your robots.txt. For example, to block ScalePostBot entirely:
User-agent: ScalePostBot Disallow: /
To slow it down:
User-agent: ScalePostBot Crawl-Delay: 5
To allow most of your site but exclude a section:
User-agent: ScalePostBot Disallow: /private/ Disallow: /admin/
Changes to robots.txt are picked up within an hour.
Reporting a problem
If ScalePostBot is misbehaving on your site — or if you have any questions — email abuse@scalepost.ai with:
- the affected hostname(s),
- a sample User-Agent and a few timestamps from your access logs,
- and a brief description of the problem.
We respond within one business day and will pause crawling of any host while we investigate.