Encoderfile packages transformer encoders—optionally with classification heads—into a single, self-contained executable. No Python runtime, no dependencies, no network calls. Just a fast, portable binary that runs anywhere.
While Llamafile focuses on generative models, Encoderfile is purpose-built for encoder architectures with optional classification heads. It supports embedding, sequence classification, and token classification models—covering most encoder-based NLP tasks, from text similarity to classification and tagging—all within one compact binary.
Under the hood, Encoderfile uses ONNX Runtime for inference, ensuring compatibility with a wide range of transformer architectures.
Why?
- Smaller footprint: a single binary measured in tens-to-hundreds of megabytes, not gigabytes of runtime and packages
- Compliance-friendly: deterministic, offline, security-boundary-safe
- Integration-ready: drop into existing systems as a CLI, microservice, or API without refactoring your stack
Encoderfiles can run as:
- REST API
- gRPC microservice
- CLI for batch processing
- MCP server (Model Context Protocol)
Encoderfile supports the following Hugging Face model classes (and their ONNX-exported equivalents):
| Task | Supported classes | Example models |
|---|---|---|
| Embeddings / Feature Extraction | AutoModel, AutoModelForMaskedLM |
bert-base-uncased, distilbert-base-uncased |
| Sequence Classification | AutoModelForSequenceClassification |
distilbert-base-uncased-finetuned-sst-2-english, roberta-large-mnli |
| Token Classification | AutoModelForTokenClassification |
dslim/bert-base-NER, bert-base-cased-finetuned-conll03-english |
- ✅ All architectures must be encoder-only transformers — no decoders, no encoder–decoder hybrids (so no T5, no BART).
- ⚙️ Models must have ONNX-exported weights (
path/to/your/model/model.onnx). - 🧠 The ONNX graph input must include
input_idsand optionallyattention_mask. - 🚫 Models relying on generation heads (AutoModelForSeq2SeqLM, AutoModelForCausalLM, etc.) are not supported.
XLNet,Transformer XL, and derivative architectures are not yet supported.
Download the encoderfile CLI tool to build your own model binaries:
curl -fsSL https://raw.githubusercontent.com/mozilla-ai/encoderfile/main/install.sh | shNote for Windows users: Pre-built binaries are not available for Windows. Please see our guide on building from source for instructions on building from source.
Move the binary to a location in your PATH:
# Linux/macOS
sudo mv encoderfile /usr/local/bin/
# Or add to your user bin
mkdir -p ~/.local/bin
mv encoderfile ~/.local/bin/See our guide on building from source for detailed instructions on building the CLI tool from source.
Quick build:
cargo build --bin encoderfile --release
./target/release/encoderfile --helpFirst, you need an ONNX-exported model. Export any HuggingFace model:
Requires Python 3.13+ for ONNX export
# Install optimum for ONNX export
pip install optimum[exporters]
# Export a sentiment analysis model
optimum-cli export onnx \
--model distilbert-base-uncased-finetuned-sst-2-english \
--task text-classification \
./sentiment-modelCreate sentiment-config.yml:
encoderfile:
name: sentiment-analyzer
path: ./sentiment-model
model_type: sequence_classification
output_path: ./build/sentiment-analyzer.encoderfileUse the downloaded encoderfile CLI tool:
encoderfile build -f sentiment-config.ymlThis creates a self-contained binary at ./build/sentiment-analyzer.encoderfile.
Start the server:
./build/sentiment-analyzer.encoderfile serveThe server will start on http://localhost:8080 by default.
Sentiment Analysis:
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{
"inputs": [
"This is the cutest cat ever!",
"Boring video, waste of time",
"These cats are so funny!"
]
}'Response:
{
"results": [
{
"logits": [0.00021549065, 0.9997845],
"scores": [0.00021549074, 0.9997845],
"predicted_index": 1,
"predicted_label": "POSITIVE"
},
{
"logits": [0.9998148, 0.00018516644],
"scores": [0.9998148, 0.0001851664],
"predicted_index": 0,
"predicted_label": "NEGATIVE"
},
{
"logits": [0.00014975034, 0.9998503],
"scores": [0.00014975043, 0.9998503],
"predicted_index": 1,
"predicted_label": "POSITIVE"
}
],
"model_id": "sentiment-analyzer"
}Embeddings:
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{
"inputs": ["Hello world"],
"normalize": true
}'Token Classification (NER):
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{
"inputs": ["Apple Inc. is located in Cupertino, California"]
}'| Mode | Command | Default |
|---|---|---|
| REST API | ./my-model.encoderfile serve |
http://localhost:8080 |
| gRPC | ./my-model.encoderfile serve |
localhost:50051 |
| CLI | ./my-model.encoderfile infer "text" |
stdout |
| MCP Server | ./my-model.encoderfile mcp |
— |
Both HTTP and gRPC servers start by default. Use --disable-grpc or --disable-http to run only one.
See the CLI Reference for all server options, port configuration, and output formats.
- Getting Started Guide - Step-by-step tutorial
- Building Guide - Build encoderfiles from ONNX models
- CLI Reference - Complete command-line documentation
- API Reference - REST, gRPC, and MCP API docs
Once you have the encoderfile CLI tool installed, you can build binaries from any compatible HuggingFace model.
See our guide on building from source for detailed instructions including:
- How to export models to ONNX format
- Configuration file options
- Advanced features (Lua transforms, custom paths, etc.)
- Troubleshooting tips
Quick workflow:
- Export your model to ONNX:
optimum-cli export onnx ... - Create a config file:
config.yml - Build the binary:
encoderfile build -f config.yml - Deploy anywhere:
./build/my-model.encoderfile serve
See our guide on building from source for detailed instructions.
We welcome contributions! See CONTRIBUTING.md for guidelines.
# Clone the repository
git clone https://github.com/mozilla-ai/encoderfile.git
cd encoderfile
# Set up development environment
make setup
# Run tests
make test
# Build documentation
make docsThis project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Built with ONNX Runtime
- Inspired by Llamafile
- Powered by the Hugging Face model ecosystem
- Discord - Join our community
- GitHub Issues - Report bugs or request features
- GitHub Discussions - Ask questions and share ideas