AI Crawler Policy

This page outlines the terms for AI systems accessing content from llmpages.org and other TCT-enabled sites.

Acceptable Use

Permitted

Crawling via TCT protocol endpoints (/llm/)
Using /llm-sitemap.json for discovery
Respecting ETag-based conditional requests
Honoring 304 Not Modified responses
Rate limiting to reasonable levels

Required

Send If-None-Match with stored ETag on subsequent requests
Respect Cache-Control headers
Use sitemap-first discovery to minimize fetches
Include descriptive User-Agent header

Prohibited

Excessive request rates (DDoS)
Bypassing conditional request mechanism
Ignoring robots.txt or rate limit headers
Modifying or redistributing content without attribution

Data Usage

Content Rights

Content accessed via TCT endpoints retains all copyright and intellectual property rights of the original publisher. AI systems may:

Process content for training and inference
Generate summaries and derivatives with attribution
Cache content according to Cache-Control headers

Attribution

When generating outputs based on TCT content, AI systems should:

Cite the canonical URL from the JSON response
Preserve author attribution
Include publication/modification dates

Privacy

No Personal Data

TCT endpoints serve public content only. No personal information, user data, or private content is exposed via TCT.

Access Logs

Server access logs may record:

IP addresses
User-Agent strings
Request timestamps
URLs accessed

These logs are used for security, performance monitoring, and abuse prevention.

Rate Limiting

Recommended Rates

Sitemap: 1 request per hour maximum
Individual endpoints: 10 requests/second maximum
Respect Retry-After headers if rate limited

Enforcement

Servers may implement:

HTTP 429 (Too Many Requests) responses
Temporary IP bans for abuse
Required API key authentication

Optional: API Keys

Some TCT implementations may require API keys for access. If required:

Register via the method specified in llms.txt
Include key in Authorization header
Respect key-specific rate limits

Optional: Usage Receipts

TCT supports HMAC-signed usage receipts in response headers. These are optional and may be used for:

Billing verification
Access auditing
Contract compliance

Compliance

TCT Protocol

AI crawlers should implement:

Sitemap-first discovery
ETag-based conditional requests
Proper cache discipline
Content verification (SHA-256 hashing)

Validate your implementation at: llmpages.org/validator

Standards

TCT follows:

RFC 7234 (HTTP Caching)
RFC 8288 (Web Linking)
IETF draft-jurkovikj-collab-tunnel-00 (published November 4, 2025)

Contact

For policy questions, abuse reports, or licensing inquiries:

Email: [email protected]
GitHub: github.com/antunjurkovic-collab

Changes

This policy may be updated periodically. Last updated: October 18, 2025

AI Policy