fetchkit

AI-friendly web content fetching tool designed for LLM consumption. Rust library with CLI, MCP server, and Python bindings.

Features

HTTP fetching - GET and HEAD methods with streaming support
HTML-to-Markdown - Built-in conversion optimized for LLMs
HTML-to-Text - Plain text extraction with clean formatting
Binary detection - Returns metadata only for images, PDFs, etc.
Timeout handling - 1s first-byte, 30s body with partial content on timeout
URL filtering - Allow/block lists for controlled access
MCP server - Model Context Protocol support for AI tool integration

Installation

From Git (recommended)

cargo install --git https://github.com/everruns/fetchkit fetchkit-cli

From Source

git clone https://github.com/everruns/fetchkit
cd fetchkit
cargo install --path crates/fetchkit-cli

CLI Usage

# Fetch URL (outputs markdown with frontmatter)
fetchkit fetch https://example.com

# Output as JSON instead
fetchkit fetch https://example.com -o json

# Custom user agent
fetchkit fetch https://example.com --user-agent "MyBot/1.0"

# Show full documentation
fetchkit --llmtxt

Default output is markdown with YAML frontmatter:

---
url: https://example.com
status_code: 200
source_content_type: text/html; charset=UTF-8
source_size: 1256
---
# Example Domain

This domain is for use in illustrative examples in documents...

JSON output (-o json):

{
  "url": "https://example.com",
  "status_code": 200,
  "content_type": "text/html",
  "size": 1256,
  "format": "markdown",
  "content": "# Example Domain\n\nThis domain is for use in illustrative examples..."
}

MCP Server

Run as a Model Context Protocol server:

fetchkit mcp

Exposes fetchkit tool over JSON-RPC 2.0 stdio transport. Returns markdown with frontmatter (same format as CLI). Compatible with Claude Desktop and other MCP clients.

Library Usage

Add to Cargo.toml:

[dependencies]
fetchkit = { git = "https://github.com/everruns/fetchkit" }

Basic Fetch

use fetchkit::{fetch, FetchRequest};

#[tokio::main]
async fn main() {
    let request = FetchRequest {
        url: "https://example.com".to_string(),
        method: None,
        as_markdown: Some(true),
        as_text: None,
    };

    let response = fetch(request).await;
    println!("{}", response.content.unwrap_or_default());
}

With Tool Builder

use fetchkit::Tool;

let tool = Tool::builder()
    .enable_markdown(true)
    .enable_text(false)
    .user_agent("MyBot/1.0")
    .allow_prefix("https://docs.example.com")
    .block_prefix("https://internal.example.com")
    .build();

let response = tool.fetch(request).await;

Python Bindings

pip install fetchkit

from fetchkit import fetch, FetchRequest, FetchKitTool

# Simple fetch
response = fetch("https://example.com", as_markdown=True)
print(response.content)

# With configuration
tool = FetchKitTool(
    enable_markdown=True,
    user_agent="MyBot/1.0",
    allow_prefixes=["https://docs.example.com"]
)
response = tool.fetch(FetchRequest(url="https://example.com"))

Response Fields

Field	Type	Description
`url`	string	Fetched URL
`status_code`	int	HTTP status code
`content_type`	string?	Content-Type header
`size`	int?	Content size in bytes
`last_modified`	string?	Last-Modified header
`filename`	string?	From Content-Disposition
`format`	string	"markdown", "text", or "raw"
`content`	string?	Page content
`truncated`	bool	True if content was cut off
`method`	string	HTTP method used
`error`	string?	Error message if failed

Error Handling

Errors are returned in the error field:

InvalidUrl - Malformed URL
UrlBlocked - URL blocked by filter
NetworkError - Connection failed
Timeout - Request timed out
HttpError - 4xx/5xx response
ContentError - Failed to read body
BinaryContent - Binary content not supported

Configuration

Timeouts

First-byte: 1 second (connect + initial response)
Body: 30 seconds total

Partial content is returned on body timeout with truncated: true.

Binary Content

Automatically detected and returns metadata only for:

Images, audio, video, fonts
PDFs, archives (zip, tar, rar, 7z)
Office documents

HTML Conversion

HTML is automatically converted to markdown:

Headers: h1-h6 → # to ######
Lists: Proper nesting with 2-space indent
Code: Fenced blocks and inline backticks
Links: [text](url) format
Strips: scripts, styles, iframes, SVGs

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
crates		crates
docs		docs
examples		examples
specs		specs
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fetchkit

Features

Installation

From Git (recommended)

From Source

CLI Usage

MCP Server

Library Usage

Basic Fetch

With Tool Builder

Python Bindings

Response Fields

Error Handling

Configuration

Timeouts

Binary Content

HTML Conversion

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

everruns/fetchkit

Folders and files

Latest commit

History

Repository files navigation

fetchkit

Features

Installation

From Git (recommended)

From Source

CLI Usage

MCP Server

Library Usage

Basic Fetch

With Tool Builder

Python Bindings

Response Fields

Error Handling

Configuration

Timeouts

Binary Content

HTML Conversion

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages