AI Crawler Policy
This page outlines the terms for AI systems accessing content from llmpages.org and other TCT-enabled sites.
Acceptable Use
Permitted
- Crawling via TCT protocol endpoints (
/llm/) - Using
/llm-sitemap.jsonfor discovery - Respecting ETag-based conditional requests
- Honoring 304 Not Modified responses
- Rate limiting to reasonable levels
Required
- Send
If-None-Matchwith stored ETag on subsequent requests - Respect
Cache-Controlheaders - Use sitemap-first discovery to minimize fetches
- Include descriptive User-Agent header
Prohibited
- Excessive request rates (DDoS)
- Bypassing conditional request mechanism
- Ignoring robots.txt or rate limit headers
- Modifying or redistributing content without attribution
Data Usage
Content Rights
Content accessed via TCT endpoints retains all copyright and intellectual property rights of the original publisher. AI systems may:
- Process content for training and inference
- Generate summaries and derivatives with attribution
- Cache content according to Cache-Control headers
Attribution
When generating outputs based on TCT content, AI systems should:
- Cite the canonical URL from the JSON response
- Preserve author attribution
- Include publication/modification dates
Privacy
No Personal Data
TCT endpoints serve public content only. No personal information, user data, or private content is exposed via TCT.
Access Logs
Server access logs may record:
- IP addresses
- User-Agent strings
- Request timestamps
- URLs accessed
These logs are used for security, performance monitoring, and abuse prevention.
Rate Limiting
Recommended Rates
- Sitemap: 1 request per hour maximum
- Individual endpoints: 10 requests/second maximum
- Respect
Retry-Afterheaders if rate limited
Enforcement
Servers may implement:
- HTTP 429 (Too Many Requests) responses
- Temporary IP bans for abuse
- Required API key authentication
Optional: API Keys
Some TCT implementations may require API keys for access. If required:
- Register via the method specified in
llms.txt - Include key in
Authorizationheader - Respect key-specific rate limits
Optional: Usage Receipts
TCT supports HMAC-signed usage receipts in response headers. These are optional and may be used for:
- Billing verification
- Access auditing
- Contract compliance
Compliance
TCT Protocol
AI crawlers should implement:
- Sitemap-first discovery
- ETag-based conditional requests
- Proper cache discipline
- Content verification (SHA-256 hashing)
Validate your implementation at: llmpages.org/validator
Standards
TCT follows:
- RFC 7234 (HTTP Caching)
- RFC 8288 (Web Linking)
- IETF draft-jurkovikj-collab-tunnel-00 (published November 4, 2025)
Contact
For policy questions, abuse reports, or licensing inquiries:
- Email: [email protected]
- GitHub: github.com/antunjurkovic-collab
Changes
This policy may be updated periodically. Last updated: October 18, 2025
