Skip to content

GeneralBots/botmodels

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

38 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

BotModels - AI Inference Service

Version: 1.0.0
Purpose: Multimodal AI inference service for General Bots


Overview

BotModels is a Python-based AI inference service that provides multimodal capabilities to the General Bots platform. It serves as a companion to botserver (Rust), specializing in cutting-edge AI/ML models from the Python ecosystem including image generation, video creation, speech synthesis, and vision/captioning.

While botserver handles business logic, networking, and systems-level operations, BotModels exists solely to leverage the extensive Python AI/ML ecosystem for inference tasks that are impractical to implement in Rust.

For comprehensive documentation, see docs.pragmatismo.com.br or the BotBook for detailed guides, API references, and tutorials.


Features

  • Image Generation: Generate images from text prompts using Stable Diffusion
  • Video Generation: Create short videos from text descriptions using Zeroscope
  • Speech Synthesis: Text-to-speech using Coqui TTS
  • Speech Recognition: Audio transcription using OpenAI Whisper
  • Vision/Captioning: Image and video description using BLIP2

Quick Start

Installation

# Clone the repository
cd botmodels

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
.\venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

Configuration

Copy the example environment file and configure:

cp .env.example .env

Edit .env with your settings:

HOST=0.0.0.0
PORT=8085
API_KEY=your-secret-key
DEVICE=cuda
IMAGE_MODEL_PATH=./models/stable-diffusion-v1-5
VIDEO_MODEL_PATH=./models/zeroscope-v2
VISION_MODEL_PATH=./models/blip2

Running the Server

# Development mode
python -m uvicorn src.main:app --host 0.0.0.0 --port 8085 --reload

# Production mode
python -m uvicorn src.main:app --host 0.0.0.0 --port 8085 --workers 4

# With HTTPS (production)
python -m uvicorn src.main:app --host 0.0.0.0 --port 8085 --ssl-keyfile key.pem --ssl-certfile cert.pem

🐍 Philosophy & Scope

Why Python?

  • Rust vs. Python Rule:
    • If logic is deterministic, systems-level, or performance-critical: Do it in Rust (botserver)
    • If logic requires cutting-edge ML models, rapid experimentation with HuggingFace, or specific Python-only libraries: Do it here

Architecture Principles

  • Inference Only: This service should NOT hold business state. It accepts inputs, runs inference, and returns predictions.
  • Stateless: Treated as a sidecar to botserver.
  • API First: Exposes strict HTTP/REST endpoints consumed by botserver.

πŸ›  Technology Stack

  • Runtime: Python 3.10+
  • Web Framework: FastAPI (preferred over Flask for async/performance)
  • ML Frameworks: PyTorch, HuggingFace Transformers, Diffusers
  • Quality: ruff (linting), black (formatting), mypy (typing)

πŸ“‘ API Endpoints

All endpoints require the X-API-Key header for authentication.

Image Generation

POST /api/image/generate
Content-Type: application/json
X-API-Key: your-api-key

{
  "prompt": "a cute cat playing with yarn",
  "steps": 30,
  "width": 512,
  "height": 512,
  "guidance_scale": 7.5,
  "seed": 42
}

Video Generation

POST /api/video/generate
Content-Type: application/json
X-API-Key: your-api-key

{
  "prompt": "a rocket launching into space",
  "num_frames": 24,
  "fps": 8,
  "steps": 50
}

Speech Generation (TTS)

POST /api/speech/generate
Content-Type: application/json
X-API-Key: your-api-key

{
  "prompt": "Hello, welcome to our service!",
  "voice": "default",
  "language": "en"
}

Speech to Text

POST /api/speech/totext
Content-Type: multipart/form-data
X-API-Key: your-api-key

file: <audio_file>

Image Description

POST /api/vision/describe
Content-Type: multipart/form-data
X-API-Key: your-api-key

file: <image_file>
prompt: "What is in this image?" (optional)

Video Description

POST /api/vision/describe_video
Content-Type: multipart/form-data
X-API-Key: your-api-key

file: <video_file>
num_frames: 8 (optional)

Visual Question Answering

POST /api/vision/vqa
Content-Type: multipart/form-data
X-API-Key: your-api-key

file: <image_file>
question: "How many people are in this image?"

Health Check

GET /api/health

Interactive API documentation:

  • Swagger UI: http://localhost:8085/api/docs
  • ReDoc: http://localhost:8085/api/redoc

πŸ”— Integration with BotServer

Configuration (config.csv)

key,value
botmodels-enabled,true
botmodels-host,0.0.0.0
botmodels-port,8085
botmodels-api-key,your-secret-key
botmodels-https,false
image-generator-model,../../../../data/diffusion/sd_turbo_f16.gguf
image-generator-steps,4
image-generator-width,512
image-generator-height,512
video-generator-model,../../../../data/diffusion/zeroscope_v2_576w
video-generator-frames,24
video-generator-fps,8

BASIC Script Keywords

// Generate an image
file = IMAGE "a beautiful sunset over mountains"
SEND FILE TO user, file

// Generate a video
video = VIDEO "waves crashing on a beach"
SEND FILE TO user, video

// Generate speech
audio = AUDIO "Welcome to General Bots!"
SEND FILE TO user, audio

// Get image/video description
caption = SEE "/path/to/image.jpg"
TALK caption

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     HTTPS      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  botserver  β”‚ ────────────▢  β”‚  botmodels  β”‚
β”‚   (Rust)    β”‚                β”‚  (Python)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      β”‚                              β”‚
      β”‚ BASIC Keywords               β”‚ AI Models
      β”‚ - IMAGE                      β”‚ - Stable Diffusion
      β”‚ - VIDEO                      β”‚ - Zeroscope
      β”‚ - AUDIO                      β”‚ - TTS/Whisper
      β”‚ - SEE                        β”‚ - BLIP2
      β–Ό                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   config    β”‚                β”‚   outputs   β”‚
β”‚   .csv      β”‚                β”‚  (files)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

⚑️ Development Guidelines

Modern Model Usage

  • Deprecate Legacy: Move away from outdated libs (e.g., old allennlp) in favor of HuggingFace Transformers and Diffusers
  • Quantization: Always consider quantized models (bitsandbytes, GGUF) to reduce VRAM usage

Performance & Loading

  • Lazy Loading: Do NOT load 10GB models at module import time. Load on startup lifecycle or first request with locking
  • GPU Handling: Robustly detect CUDA/MPS (Mac) and fallback to CPU gracefully

Code Quality

  • Type Hints: All functions MUST have type hints
  • Error Handling: No bare except:. Catch precise exceptions and return structured JSON errors to botserver

Project Structure

botmodels/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ v1/
β”‚   β”‚   β”‚   └── endpoints/
β”‚   β”‚   β”‚       β”œβ”€β”€ image.py
β”‚   β”‚   β”‚       β”œβ”€β”€ video.py
β”‚   β”‚   β”‚       β”œβ”€β”€ speech.py
β”‚   β”‚   β”‚       └── vision.py
β”‚   β”‚   └── dependencies.py
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ config.py
β”‚   β”‚   └── logging.py
β”‚   β”œβ”€β”€ schemas/
β”‚   β”‚   └── generation.py
β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”œβ”€β”€ image_service.py
β”‚   β”‚   β”œβ”€β”€ video_service.py
β”‚   β”‚   β”œβ”€β”€ speech_service.py
β”‚   β”‚   └── vision_service.py
β”‚   └── main.py
β”œβ”€β”€ outputs/
β”œβ”€β”€ models/
β”œβ”€β”€ tests/
β”œβ”€β”€ requirements.txt
└── README.md

πŸ§ͺ Testing

pytest tests/

πŸ”’ Security

  1. Always use HTTPS in production
  2. Use strong, unique API keys
  3. Restrict network access to the service
  4. Consider running on a separate GPU server
  5. Monitor resource usage and set appropriate limits

πŸ“š Documentation

For complete documentation, guides, and API references:


πŸ“¦ Requirements

  • Python 3.10+
  • CUDA-capable GPU (recommended, 8GB+ VRAM)
  • 16GB+ RAM

πŸ”— Resources

Education

References

Community


πŸ”‘ Remember

  • Inference Only: No business state, just predictions
  • Modern Models: Use HuggingFace Transformers, Diffusers
  • Type Safety: All functions must have type hints
  • Lazy Loading: Don't load models at import time
  • GPU Detection: Graceful fallback to CPU
  • Version 1.0.0 - Do not change without approval
  • GIT WORKFLOW - ALWAYS push to ALL repositories (github, pragmatismo)

πŸ“„ License

See LICENSE file for details.

About

Models in Python for General Bots AI demands.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •