Skip to content

AI-friendly webfetch tool, cli, mcp server, and lib

License

Notifications You must be signed in to change notification settings

everruns/fetchkit

Repository files navigation

fetchkit

AI-friendly web content fetching tool designed for LLM consumption. Rust library with CLI, MCP server, and Python bindings.

Features

  • HTTP fetching - GET and HEAD methods with streaming support
  • HTML-to-Markdown - Built-in conversion optimized for LLMs
  • HTML-to-Text - Plain text extraction with clean formatting
  • Binary detection - Returns metadata only for images, PDFs, etc.
  • Timeout handling - 1s first-byte, 30s body with partial content on timeout
  • URL filtering - Allow/block lists for controlled access
  • MCP server - Model Context Protocol support for AI tool integration

Installation

From Git (recommended)

cargo install --git https://github.com/everruns/fetchkit fetchkit-cli

From Source

git clone https://github.com/everruns/fetchkit
cd fetchkit
cargo install --path crates/fetchkit-cli

CLI Usage

# Fetch URL (outputs markdown with frontmatter)
fetchkit fetch https://example.com

# Output as JSON instead
fetchkit fetch https://example.com -o json

# Custom user agent
fetchkit fetch https://example.com --user-agent "MyBot/1.0"

# Show full documentation
fetchkit --llmtxt

Default output is markdown with YAML frontmatter:

---
url: https://example.com
status_code: 200
source_content_type: text/html; charset=UTF-8
source_size: 1256
---
# Example Domain

This domain is for use in illustrative examples in documents...

JSON output (-o json):

{
  "url": "https://example.com",
  "status_code": 200,
  "content_type": "text/html",
  "size": 1256,
  "format": "markdown",
  "content": "# Example Domain\n\nThis domain is for use in illustrative examples..."
}

MCP Server

Run as a Model Context Protocol server:

fetchkit mcp

Exposes fetchkit tool over JSON-RPC 2.0 stdio transport. Returns markdown with frontmatter (same format as CLI). Compatible with Claude Desktop and other MCP clients.

Library Usage

Add to Cargo.toml:

[dependencies]
fetchkit = { git = "https://github.com/everruns/fetchkit" }

Basic Fetch

use fetchkit::{fetch, FetchRequest};

#[tokio::main]
async fn main() {
    let request = FetchRequest {
        url: "https://example.com".to_string(),
        method: None,
        as_markdown: Some(true),
        as_text: None,
    };

    let response = fetch(request).await;
    println!("{}", response.content.unwrap_or_default());
}

With Tool Builder

use fetchkit::Tool;

let tool = Tool::builder()
    .enable_markdown(true)
    .enable_text(false)
    .user_agent("MyBot/1.0")
    .allow_prefix("https://docs.example.com")
    .block_prefix("https://internal.example.com")
    .build();

let response = tool.fetch(request).await;

Python Bindings

pip install fetchkit
from fetchkit import fetch, FetchRequest, FetchKitTool

# Simple fetch
response = fetch("https://example.com", as_markdown=True)
print(response.content)

# With configuration
tool = FetchKitTool(
    enable_markdown=True,
    user_agent="MyBot/1.0",
    allow_prefixes=["https://docs.example.com"]
)
response = tool.fetch(FetchRequest(url="https://example.com"))

Response Fields

Field Type Description
url string Fetched URL
status_code int HTTP status code
content_type string? Content-Type header
size int? Content size in bytes
last_modified string? Last-Modified header
filename string? From Content-Disposition
format string "markdown", "text", or "raw"
content string? Page content
truncated bool True if content was cut off
method string HTTP method used
error string? Error message if failed

Error Handling

Errors are returned in the error field:

  • InvalidUrl - Malformed URL
  • UrlBlocked - URL blocked by filter
  • NetworkError - Connection failed
  • Timeout - Request timed out
  • HttpError - 4xx/5xx response
  • ContentError - Failed to read body
  • BinaryContent - Binary content not supported

Configuration

Timeouts

  • First-byte: 1 second (connect + initial response)
  • Body: 30 seconds total

Partial content is returned on body timeout with truncated: true.

Binary Content

Automatically detected and returns metadata only for:

  • Images, audio, video, fonts
  • PDFs, archives (zip, tar, rar, 7z)
  • Office documents

HTML Conversion

HTML is automatically converted to markdown:

  • Headers: h1-h6# to ######
  • Lists: Proper nesting with 2-space indent
  • Code: Fenced blocks and inline backticks
  • Links: [text](url) format
  • Strips: scripts, styles, iframes, SVGs

License

MIT

About

AI-friendly webfetch tool, cli, mcp server, and lib

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages