AI-friendly web content fetching tool designed for LLM consumption. Rust library with CLI, MCP server, and Python bindings.
- HTTP fetching - GET and HEAD methods with streaming support
- HTML-to-Markdown - Built-in conversion optimized for LLMs
- HTML-to-Text - Plain text extraction with clean formatting
- Binary detection - Returns metadata only for images, PDFs, etc.
- Timeout handling - 1s first-byte, 30s body with partial content on timeout
- URL filtering - Allow/block lists for controlled access
- MCP server - Model Context Protocol support for AI tool integration
cargo install --git https://github.com/everruns/fetchkit fetchkit-cligit clone https://github.com/everruns/fetchkit
cd fetchkit
cargo install --path crates/fetchkit-cli# Fetch URL (outputs markdown with frontmatter)
fetchkit fetch https://example.com
# Output as JSON instead
fetchkit fetch https://example.com -o json
# Custom user agent
fetchkit fetch https://example.com --user-agent "MyBot/1.0"
# Show full documentation
fetchkit --llmtxtDefault output is markdown with YAML frontmatter:
---
url: https://example.com
status_code: 200
source_content_type: text/html; charset=UTF-8
source_size: 1256
---
# Example Domain
This domain is for use in illustrative examples in documents...JSON output (-o json):
{
"url": "https://example.com",
"status_code": 200,
"content_type": "text/html",
"size": 1256,
"format": "markdown",
"content": "# Example Domain\n\nThis domain is for use in illustrative examples..."
}Run as a Model Context Protocol server:
fetchkit mcpExposes fetchkit tool over JSON-RPC 2.0 stdio transport. Returns markdown with frontmatter (same format as CLI). Compatible with Claude Desktop and other MCP clients.
Add to Cargo.toml:
[dependencies]
fetchkit = { git = "https://github.com/everruns/fetchkit" }use fetchkit::{fetch, FetchRequest};
#[tokio::main]
async fn main() {
let request = FetchRequest {
url: "https://example.com".to_string(),
method: None,
as_markdown: Some(true),
as_text: None,
};
let response = fetch(request).await;
println!("{}", response.content.unwrap_or_default());
}use fetchkit::Tool;
let tool = Tool::builder()
.enable_markdown(true)
.enable_text(false)
.user_agent("MyBot/1.0")
.allow_prefix("https://docs.example.com")
.block_prefix("https://internal.example.com")
.build();
let response = tool.fetch(request).await;pip install fetchkitfrom fetchkit import fetch, FetchRequest, FetchKitTool
# Simple fetch
response = fetch("https://example.com", as_markdown=True)
print(response.content)
# With configuration
tool = FetchKitTool(
enable_markdown=True,
user_agent="MyBot/1.0",
allow_prefixes=["https://docs.example.com"]
)
response = tool.fetch(FetchRequest(url="https://example.com"))| Field | Type | Description |
|---|---|---|
url |
string | Fetched URL |
status_code |
int | HTTP status code |
content_type |
string? | Content-Type header |
size |
int? | Content size in bytes |
last_modified |
string? | Last-Modified header |
filename |
string? | From Content-Disposition |
format |
string | "markdown", "text", or "raw" |
content |
string? | Page content |
truncated |
bool | True if content was cut off |
method |
string | HTTP method used |
error |
string? | Error message if failed |
Errors are returned in the error field:
InvalidUrl- Malformed URLUrlBlocked- URL blocked by filterNetworkError- Connection failedTimeout- Request timed outHttpError- 4xx/5xx responseContentError- Failed to read bodyBinaryContent- Binary content not supported
- First-byte: 1 second (connect + initial response)
- Body: 30 seconds total
Partial content is returned on body timeout with truncated: true.
Automatically detected and returns metadata only for:
- Images, audio, video, fonts
- PDFs, archives (zip, tar, rar, 7z)
- Office documents
HTML is automatically converted to markdown:
- Headers:
h1-h6→#to###### - Lists: Proper nesting with 2-space indent
- Code: Fenced blocks and inline backticks
- Links:
[text](url)format - Strips: scripts, styles, iframes, SVGs
MIT