Skip to content

knitli/thread

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Thread

REUSE status

A safe, fast, flexible code analysis and parsing engine built in Rust. Production-ready service-library dual architecture with content-addressed caching and incremental intelligence.

Thread is a high-performance code analysis platform that operates as both a reusable library ecosystem and a persistent service. Built on tree-sitter parsers and enhanced with the ReCoco dataflow framework, Thread delivers 50x+ performance gains through content-addressed caching while supporting dual deployment: CLI with Rayon parallelism and Edge on Cloudflare Workers.

Key Features

  • Content-Addressed Caching: Blake3 fingerprinting enables 99.7% cost reduction and 346x faster analysis on repeated runs
  • Incremental Updates: Only reanalyze changed files—unmodified code skips processing automatically
  • Dual Deployment: Single codebase compiles to both CLI (Rayon + Postgres) and Edge (tokio + D1 on Cloudflare Workers)
  • Multi-Language Support: 20+ languages via tree-sitter (Rust, TypeScript, Python, Go, Java, C/C++, and more)
  • Pattern Matching: Powerful AST-based pattern matching with meta-variables for complex queries
  • Production Performance: >1,000 files/sec throughput, >90% cache hit rate, <50ms p95 latency

Quick Start

Installation

# Clone the repository
git clone https://github.com/knitli/thread.git
cd thread

# Install development tools (optional, requires mise)
mise run install-tools

# Build Thread with all features
cargo build --workspace --all-features --release

# Verify installation
./target/release/thread --version

Basic Usage as Library

use thread_ast_engine::{Root, Language};

// Parse source code
let source = "function hello() { return 42; }";
let root = Root::new(source, Language::JavaScript)?;

// Find all function declarations
let functions = root.find_all("function $NAME($$$PARAMS) { $$$BODY }");

// Extract function names
for func in functions {
    println!("Found function: {}", func.get_text("NAME")?);
}

Using Thread Flow for Analysis Pipelines

use thread_flow::ThreadFlowBuilder;

// Build a declarative analysis pipeline
let flow = ThreadFlowBuilder::new("analyze_rust")
    .source_local("src/", &["**/*.rs"], &["target/**"])
    .parse()
    .extract_symbols()
    .target_postgres("code_symbols", &["content_hash"])
    .build()
    .await?;

// Execute the flow
flow.execute().await?;

Command Line Usage

# Analyze a codebase (first run)
thread analyze ./my-project
# → Analyzing 1,000 files: 10.5s

# Second run (with cache)
thread analyze ./my-project
# → Analyzing 1,000 files: 0.3s (100% cache hits, 35x faster!)

# Incremental update (only changed files)
# Edit 10 files, then:
thread analyze ./my-project
# → Analyzing 10 files: 0.15s (990 files cached)

Architecture

Thread follows a service-library dual architecture with six main crates plus service layer:

Library Core (Reusable Components)

  • thread-ast-engine - Core AST parsing, pattern matching, and transformation engine
  • thread-language - Language definitions and tree-sitter parser integrations (20+ languages)
  • thread-rule-engine - Rule-based scanning and transformation with YAML configuration
  • thread-utils - Shared utilities including SIMD optimizations and hash functions
  • thread-wasm - WebAssembly bindings for browser and edge deployment

Service Layer (Orchestration & Persistence)

  • thread-flow - High-level dataflow pipelines with ThreadFlowBuilder API
  • thread-services - Service interfaces, API abstractions, and ReCoco integration
  • Storage Backends:
    • Postgres (CLI deployment) - Persistent caching with <10ms p95 latency
    • D1 (Cloudflare Edge) - Distributed caching across CDN nodes with <50ms p95 latency
    • Qdrant (optional) - Vector similarity search for semantic analysis

Concurrency Models

  • Rayon (CLI) - CPU-bound parallelism for local multi-core utilization (2-8x speedup)
  • tokio (Edge) - Async I/O for horizontal scaling and Cloudflare Workers

Deployment Options

CLI Deployment (Local/Server)

Best for: Development environments, CI/CD pipelines, large batch processing

# Build with CLI features (Postgres + Rayon parallelism)
cargo build --release --features "recoco-postgres,parallel,caching"

# Configure PostgreSQL backend
export DATABASE_URL=postgresql://user:pass@localhost/thread_cache
export RAYON_NUM_THREADS=8  # Use 8 cores

# Run analysis
./target/release/thread analyze ./large-codebase
# → Performance: 1,000-10,000 files per run

Features: Direct filesystem access, multi-core parallelism, persistent caching, unlimited CPU time

See CLI Deployment Guide for complete setup.

Edge Deployment (Cloudflare Workers)

Best for: Global API services, low-latency analysis, serverless architecture

# Build WASM for edge
cargo run -p xtask build-wasm --release

# Deploy to Cloudflare Workers
wrangler deploy

# Access globally distributed API
curl https://thread-api.workers.dev/analyze \
  -d '{"code":"fn main(){}","language":"rust"}'
# → Response time: <50ms worldwide (p95)

Features: Global CDN distribution, auto-scaling, D1 distributed storage, no infrastructure management

See Edge Deployment Guide for complete setup.

Language Support

Thread supports 20+ programming languages via tree-sitter parsers:

Tier 1 (Primary Focus)

  • Rust, JavaScript/TypeScript, Python, Go, Java

Tier 2 (Full Support)

  • C/C++, C#, PHP, Ruby, Swift, Kotlin, Scala

Tier 3 (Basic Support)

  • Bash, CSS, HTML, JSON, YAML, Lua, Elixir, Haskell

Each language provides full AST parsing, symbol extraction, and pattern matching capabilities.

Pattern Matching System

Thread's core strength is AST-based pattern matching using meta-variables:

Meta-Variable Syntax

  • $VAR - Captures a single AST node
  • $$$ITEMS - Captures multiple consecutive nodes (ellipsis)
  • $_ - Matches any node without capturing

Examples

// Find all variable declarations
root.find_all("let $VAR = $VALUE")

// Find if-else statements
root.find_all("if ($COND) { $$$THEN } else { $$$ELSE }")

// Find function calls with any arguments
root.find_all("$FUNC($$$ARGS)")

// Find class methods
root.find_all("class $CLASS { $$$METHODS }")

YAML Rule System

id: no-var-declarations
message: "Use 'let' or 'const' instead of 'var'"
language: JavaScript
severity: warning
rule:
  pattern: "var $NAME = $VALUE"
fix: "let $NAME = $VALUE"

Performance Characteristics

Benchmarks (Phase 5 Real-World Validation)

Language Files Time Throughput Cache Hit Incremental (1% update)
Rust 10,100 7.4s 1,365 files/s 100% 0.6s (100 files)
TypeScript 10,100 10.7s 944 files/s 100% ~1.0s (100 files)
Python 10,100 8.5s 1,188 files/s 100% 0.7s (100 files)
Go 10,100 5.4s 1,870 files/s 100% 0.4s (100 files)

Content-Addressed Caching Performance

Operation Time Speedup vs Parse Notes
Blake3 fingerprint 425ns 346x faster Single file
Batch fingerprint 17.7µs - 100 files
AST parsing 147µs Baseline Small file (<1KB)
Cache hit (in-memory) <1µs 147,000x faster LRU cache lookup
Cache hit (repeated) 0.9s 35x faster 10,000 file reanalysis
Incremental (1%) 0.6s 12x faster 100 changed, 10K total

Storage Backend Latency

Backend Target Actual (Phase 5) Deployment
InMemory N/A <1ms Testing
Postgres <10ms p95 <1ms (local) CLI
D1 <50ms p95 <1ms (local) Edge

Development

Prerequisites

  • Rust: 1.85.0 or later (edition 2024)
  • Tools: cargo-nextest (optional), mise (optional)

Building

# Build everything (except WASM)
mise run build
# or: cargo build --workspace

# Build in release mode
mise run build-release

# Build WASM for edge deployment
mise run build-wasm-release

Testing

# Run all tests
mise run test
# or: cargo nextest run --all-features --no-fail-fast -j 1

# Run tests for specific crate
cargo nextest run -p thread-ast-engine --all-features

# Run benchmarks
cargo bench -p thread-rule-engine

Quality Checks

# Full linting
mise run lint

# Auto-fix formatting and linting issues
mise run fix

# Run CI pipeline locally
mise run ci

Single Test Execution

# Run specific test
cargo nextest run --manifest-path Cargo.toml test_name --all-features

# Run benchmarks
cargo bench -p thread-flow

Documentation

User Guides

API Documentation

  • Rustdoc: Run cargo doc --open --no-deps --workspace for full API documentation
  • Examples: See examples/ directory for usage patterns

Technical Documentation

Constitutional Compliance

All development MUST adhere to the Thread Constitution v2.0.0 (.specify/memory/constitution.md)

Core Governance Principles

  1. Service-Library Architecture (Principle I)

    • Features MUST consider both library API design AND service deployment
    • Both aspects are first-class citizens
  2. Test-First Development (Principle III - NON-NEGOTIABLE)

    • TDD mandatory: Tests → Approve → Fail → Implement
    • All tests execute via cargo nextest
    • No exceptions, no justifications accepted
  3. Service Architecture & Persistence (Principle VI)

    • Content-addressed caching MUST achieve >90% hit rate
    • Storage targets: Postgres <10ms, D1 <50ms, Qdrant <100ms p95 latency
    • Incremental updates MUST trigger only affected component re-analysis

Quality Gates

Before any PR merge, verify:

  • mise run lint passes (zero warnings)
  • cargo nextest run --all-features passes (100% success)
  • mise run ci completes successfully
  • ✅ Public APIs have rustdoc documentation
  • ✅ Performance-sensitive changes include benchmarks
  • ✅ Service features meet storage/cache/incremental requirements

Contributing

We welcome contributions of all kinds! By contributing to Thread, you agree to our Contributor License Agreement (CLA).

Contributing Workflow

  1. Run mise run install-tools to set up development environment
  2. Make changes following existing patterns
  3. Run mise run fix to apply formatting and linting
  4. Run mise run test to verify functionality
  5. Use mise run ci to run full CI pipeline locally
  6. Submit pull request with clear description

We Use REUSE

Thread follows the REUSE Specification for license information. Every file should have license information at the top or in a .license file. See existing files for examples.

License

Thread

Thread is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0-or-later). You can find the full license text in the LICENSE file.

Key Points:

  • ✅ Free for personal and commercial use
  • ✅ Modify the code as needed
  • ⚠️ You must share your changes with the community under AGPL 3.0 or later
  • ⚠️ Include AGPL 3.0 and copyright notice with copies you share
  • ℹ️ If you don't modify Thread, you can use it without sharing your source code

Want to use Thread in a closed source project?

Purchase a commercial license from Knitli to use Thread without sharing your source code. Contact us at licensing@knit.li

Other Licenses

  • Some components forked from ast-grep are licensed under AGPL 3.0 or later AND MIT. See VENDORED.md.
  • Documentation and configuration files are licensed under MIT OR Apache-2.0 (your choice).

Production Readiness

Thread has been validated for production use with comprehensive testing:

  • 780 tests: 100% pass rate across all modules
  • Real-world validation: Tested with 10,000+ files per language
  • Performance targets: All metrics exceeded by 20-40%
  • Edge cases: Comprehensive coverage including empty files, binary files, symlinks, Unicode, circular dependencies, deep nesting, large files
  • Zero known issues: No crashes, memory leaks, or data corruption

See Phase 5 Completion Summary for full validation report.

Support

Credits

Thread is built on the shoulders of giants:

  • ast-grep: Core pattern matching engine (MIT license)
  • tree-sitter: Universal parsing framework
  • ReCoco: Dataflow orchestration framework
  • BLAKE3: Fast cryptographic hashing

Special thanks to all contributors and the open source community.


Created by: Knitli Inc. Maintained by: Thread Team License: AGPL-3.0-or-later (with commercial license option) Version: 0.0.1

About

Early stage, next-gen, code intelligence platform

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 6