Adaptive Memory System for LLMs

This repository serves as a proof-of-concept for Stigmergic Memory Decay in RAG systems. It is provided as-is for researchers and engineers to fork and adapt.

Adaptive Memory System for LLMs

Bio-inspired decay + tiered storage: optimize forgetting, not just storage.

The Problem

Current AI memory systems remember everything equally. That dilutes signal with noise and wastes compute.

The Solution

Single-index architecture: One HNSW index (Qdrant) stores everything with health metadata. No physical hot/cold movement.
Metabolic decay: Linear health decay (health -= rate × days). Frequent memories (access_count > 2) decay slower. Pinned memories have a health floor (0.2).
Health-based filtering: Dead memories (health < 0.1) filtered during retrieval. No data movement – just metadata updates.
Reranking: adjusted_score = similarity × (1 + health). A 0.8 match with 1.0 health beats a 0.9 match with 0.1 health.

Quick Start

# Install
pip install -r requirements.txt

# Run API
uvicorn api:app --reload --host 0.0.0.0 --port 8000


# Demo (decay + tiering behavior)
python demo.py

# Benchmark (toy: 12 memories)
python tests/benchmark.py

# Scaled benchmark (100–10K memories, 90 days)
python tests/benchmark_scaled.py --quick          # 100 mem, 30 days
python tests/benchmark_scaled.py --scale 1000 --days 90

API

Method	Endpoint	Description
POST	`/memory/add`	Add a memory (`{"content": "..."}`)
GET	`/memory/query?q=...&k=5`	Retrieve by semantic query
POST	`/memory/decay`	Apply metabolic decay
GET	`/memory/stats`	Hot/cold counts
POST	`/memory/reinforce/{id}`	Strengthen a memory by id

Moltbot Microservice API (plug-and-play)

When running the Docker container (src.server:app), these vector-first endpoints are available. You provide embeddings; the engine handles storage, decay, and retrieval.

Method	Endpoint	Description
POST	`/remember`	Store memory: `{"text": "...", "vector": [384 floats], "metadata": {}}`
POST	`/recall`	Retrieve by vector: `{"vector": [384 floats], "top_k": 15}`
POST	`/decay?days=1`	Apply metabolic decay
GET	`/info`	Vector size (384) and embedding model hint

Results Summary

Toy run (12 memories, 20 days): On a small dataset, adaptive and full RAG have almost identical latency (~12 ms). Run python demo.py and python tests/benchmark.py to reproduce.

Scaled run (1000 memories, 90 days): Run python tests/benchmark_scaled.py --quick or --scale 1000 --days 90.

Benchmark (1K memories, 90 days, daily reinforcement for frequent tier):

Metric	Adaptive (1K/90d)	Full RAG
Healthy (above threshold)	55 items	1000 items
Dead (filtered as noise)	945 items	0
Latency (frequent query)	~14 ms	~13 ms
Latency (noise query)	~10 ms	~14 ms
Heavy decay (all filtered)	10.8 ms	16.5 ms

Noise filtering: Adaptive filters 945 dead memories (health < 0.1); queries return 0% noise. Full RAG returns noise.
Speed when filtered: When most memories are dead, adaptive is ~35% faster (10.8 ms vs 16.5 ms).
Reinforcement matters: Daily reinforcement keeps ~55 frequent memories hot over 90 days; weekly is too sparse.

Honest framing: Early results on 12 memories showed no real advantage. Scaling to 100+ memories and 30+ days is when the pattern emerges: health filtering prunes noise, retrieval stays fast, and adaptive beats full RAG when most memories are dead.

Tuning Parameters

Parameter	Default	Description
`daily_decay_rate`	0.05	5%/day for unvisited – noise dies in ~20 days.
`frequent_decay_rate`	0.01	1%/day for reinforced (access_count > 2) – signal survives.
`reinforce_boost`	0.15	Linear boost on access: health += boost (cap 1.0).
`reinforce_threshold`	2	access_count > this → use frequent_decay_rate (protection).
`health_floor`	0.2	Pinned/core memories never drop below this.
`min_health_threshold`	0.1	Filter out memories below this (noise filtering).
`health_boost_multiplier`	1.0	Reranking: adjusted_score = similarity × (1 + health × multiplier).

Why this beats the baseline:

Speed: Single HNSW index (C++ optimized). Metadata filtering is instant. No physical data movement.
Recall: Full dense vectors (no binary quantization). High recall from HNSW, then health-based reranking.
Noise filtering: Dead memories (health < 0.1) filtered out. Precision without sacrificing recall.

Project Layout

Memory Dilution/
├── api.py              # Full FastAPI server (content-based)
├── src/server.py       # Moltbot microservice (vector-first, /remember, /recall)
├── decay_worker.py     # Scheduled decay (e.g. daily 03:00)
├── demo.py             # Demo script
├── requirements.txt
├── src/
│   ├── memory_system.py  # MemorySystem, add_memory_from_vector, retrieve_by_vector
│   └── models.py         # MemoryNode dataclass
└── tests/
    ├── benchmark.py        # Toy benchmark (12 memories)
    └── benchmark_scaled.py # Scaled: 100/1K/10K memories, 90 days

How to Use It

Method 1: The "Sidecar" (Best for Moltbot)

Run this command to start your memory engine:

docker build -t metabolic-memory .
docker run -p 8000:8000 metabolic-memory

Then, in your agent code, hit the API:

import requests

# Get vector size (384 for all-MiniLM-L6-v2)
r = requests.get("http://localhost:8000/info")
vector_size = r.json()["vector_size"]

# Send a memory (you provide the embedding - use your own model or API)
vector = [...]  # 384-dim embedding of "User likes Python"
requests.post("http://localhost:8000/remember", json={
    "text": "User likes Python",
    "vector": vector,
    "metadata": {}
})

# Recall by query vector
query_vector = [...]  # 384-dim embedding of "What does the user prefer?"
r = requests.post("http://localhost:8000/recall", json={
    "vector": query_vector,
    "top_k": 5
})
memories = r.json()["results"]

# Trigger decay (e.g. daily cron)
requests.post("http://localhost:8000/decay?days=1")

Method 2: The Library (For Builders)

Or, use src/memory_system.py directly in your project for a self-contained setup with built-in embeddings:

from src.memory_system import MemorySystem

brain = MemorySystem()
brain.add_memory("User likes Python")  # Auto-embeds with all-MiniLM-L6-v2
results = brain.retrieve("What does the user prefer?", top_k=5)

Deploy

Railway / Render: Point at api:app and set start command to uvicorn api:app --host 0.0.0.0 --port $PORT. For the Moltbot microservice, use src.server:app instead.

Docker (Moltbot microservice):

docker build -t metabolic-memory .
docker run -p 8000:8000 metabolic-memory

# Quick Deploy
docker-compose up -d

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
__pycache__		__pycache__
src		src
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
api.py		api.py
decay_worker.py		decay_worker.py
demo.py		demo.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adaptive Memory System for LLMs

The Problem

The Solution

Quick Start

API

Moltbot Microservice API (plug-and-play)

Results Summary

Tuning Parameters

Project Layout

How to Use It

Method 1: The "Sidecar" (Best for Moltbot)

Method 2: The Library (For Builders)

Deploy

License

About

Uh oh!

Releases

Packages

Languages

License

Mathdee/Python-stigmergic-memory

Folders and files

Latest commit

History

Repository files navigation

Adaptive Memory System for LLMs

The Problem

The Solution

Quick Start

API

Moltbot Microservice API (plug-and-play)

Results Summary

Tuning Parameters

Project Layout

How to Use It

Method 1: The "Sidecar" (Best for Moltbot)

Method 2: The Library (For Builders)

Deploy

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages