usls is a cross-platform Rust library powered by ONNX Runtime for efficient inference of SOTA vision and vision-language models (typically under 1B parameters).
- ⚡ High Performance: Multi-threading, SIMD, and CUDA-accelerated processing
- ✨ Cross-Platform: Linux, macOS, Windows with ONNX Runtime execution providers (CUDA, TensorRT, CoreML, OpenVINO, DirectML, etc.)
- 🎯 Precision Support: FP32, FP16, INT8, UINT8, Q4, Q4F16, BNB4, and more
- 🛠️ Full-Stack Suite:
DataLoader,Annotator, andViewerfor complete workflows - 🏗️ Unified API: Single
Modeltrait inference withrun()/forward()/encode_images()/encode_texts()and unifiedYoutput - 📥 Auto-Management: Automatic model download (HuggingFace/GitHub), caching and path resolution
- 📦 Multiple Inputs: Image, directory, video, webcam, stream and combinations
- 🌱 Model Ecosystem: 50+ SOTA vision and VLM models
Run the YOLO-Series demo to explore models with different tasks, precision and execution providers:
- Tasks:
detect,segment,pose,classify,obb - Versions:
v5,v6,v7,v8,v9,v10,11,12,v13,26 - Scales:
n,s,m,l,x - Precision:
fp32,fp16,q8,int8,q4,q4f16,bnb4, and more - Execution Providers:
CPU,CUDA,TensorRT,TensorRT-RTX,CoreML,OpenVINO, and more
CPU
cargo run -r --example yolo -- --task detect --ver 26 --scale n --dtype fp16Nvidia CUDA + CUDA Image Processor
cargo run -r -F cuda --example yolo -- --task segment --ver 11 --scale m --device cuda:0 --processor-device cuda:0Nvidia TensorRT + CUDA Image Processor
cargo run -r -F tensorrt-full --example yolo -- --device tensorrt:0 --processor-device cuda:0Nvidia TensorRT-RTX + CUDA Image Processor
cargo run -r -F nvrtx-full --example yolo -- --device nvrtx:0 --processor-device cuda:0Apple Silicon CoreML
cargo run -r -F coreml --example yolo -- --device coremlIntel OpenVINO (CPU/GPU/VPU)
cargo run -r -F openvino -F ort-load-dynamic --example yolo -- --device openvino:CPU📊 Performance Benchmarks
Environment: NVIDIA RTX 3060Ti (TensorRT-10.11.0.33, CUDA 12.8, TensorRT-RTX-1.3.0.35) / Intel i5-12400F
Setup: YOLO26 Detection, COCO2017-val (5,000 images), 640x640, Conf thresholds: [0.35, 0.3, ..]
Results are for rough reference only.
| Scale | EP | Image Processor |
DType | Batch | Preprocess | Inference | Postprocess | Total |
|---|---|---|---|---|---|---|---|---|
| n | TensorRT | CUDA | FP16 | 1 | ~233µs | ~1.3ms | ~14µs | ~1.55ms |
| n | TensorRT-RTX | CUDA | FP32 | 1 | ~233µs | ~2.0ms | ~10µs | ~2.24ms |
| n | TensorRT-RTX | CUDA | FP16 | 1 | ❓ | ❓ | ❓ | ❓ |
| n | CUDA | CUDA | FP32 | 1 | ~233µs | ~5.0ms | ~17µs | ~5.25ms |
| n | CUDA | CUDA | FP16 | 1 | ~233µs | ~3.6ms | ~17µs | ~3.85ms |
| n | CUDA | CPU | FP32 | 1 | ~800µs | ~6.5ms | ~14µs | ~7.31ms |
| n | CUDA | CPU | FP16 | 1 | ~800µs | ~5.0ms | ~14µs | ~5.81ms |
| n | CPU | CPU | FP32 | 1 | ~970µs | ~20.5ms | ~14µs | ~21.48ms |
| n | CPU | CPU | FP16 | 1 | ~970µs | ~25.0ms | ~14µs | ~25.98ms |
| n | TensorRT | CUDA | FP16 | 8 | ~1.2ms | ~6.0ms | ~55µs | ~7.26ms |
| n | TensorRT | CPU | FP16 | 8 | ~18.0ms | ~25.5ms | ~55µs | ~43.56ms |
| m | TensorRT | CUDA | FP16 | 1 | ~233µs | ~3.6ms | ~14µs | ~3.85ms |
| m | TensorRT | CUDA | Int8 | 1 | ~233µs | ~2.6ms | ~14µs | ~2.84ms |
| m | CUDA | CUDA | FP32 | 1 | ~233µs | ~16.1ms | ~17µs | ~16.35ms |
| m | CUDA | CUDA | FP16 | 1 | ~233µs | ~8.8ms | ~17µs | ~9.05ms |
This is a personal project maintained in spare time, so progress on performance optimization and new model support may vary.
We highly welcome PRs for model optimization! If you have expertise in specific models and can help optimize their interfaces or post-processing, your contributions would be invaluable. Feel free to open an issue or submit a pull request for suggestions, bug reports, or new features.
-
This project is built on top of ort (ONNX Runtime for Rust), which provides seamless Rust bindings for ONNX Runtime. Special thanks to the
ortmaintainers. -
Special thanks to @kadu-v for the jamtrack-rs project, which inspired our ByteTracker implementation
Thanks to all the open-source libraries and their maintainers that make this project possible. See Cargo.toml for a complete list of dependencies.
This project is licensed under LICENSE.
