A safe, fast, flexible code analysis and parsing engine built in Rust. Production-ready service-library dual architecture with content-addressed caching and incremental intelligence.
Thread is a high-performance code analysis platform that operates as both a reusable library ecosystem and a persistent service. Built on tree-sitter parsers and enhanced with the ReCoco dataflow framework, Thread delivers 50x+ performance gains through content-addressed caching while supporting dual deployment: CLI with Rayon parallelism and Edge on Cloudflare Workers.
- ✅ Content-Addressed Caching: Blake3 fingerprinting enables 99.7% cost reduction and 346x faster analysis on repeated runs
- ✅ Incremental Updates: Only reanalyze changed files—unmodified code skips processing automatically
- ✅ Dual Deployment: Single codebase compiles to both CLI (Rayon + Postgres) and Edge (tokio + D1 on Cloudflare Workers)
- ✅ Multi-Language Support: 20+ languages via tree-sitter (Rust, TypeScript, Python, Go, Java, C/C++, and more)
- ✅ Pattern Matching: Powerful AST-based pattern matching with meta-variables for complex queries
- ✅ Production Performance: >1,000 files/sec throughput, >90% cache hit rate, <50ms p95 latency
# Clone the repository
git clone https://github.com/knitli/thread.git
cd thread
# Install development tools (optional, requires mise)
mise run install-tools
# Build Thread with all features
cargo build --workspace --all-features --release
# Verify installation
./target/release/thread --versionuse thread_ast_engine::{Root, Language};
// Parse source code
let source = "function hello() { return 42; }";
let root = Root::new(source, Language::JavaScript)?;
// Find all function declarations
let functions = root.find_all("function $NAME($$$PARAMS) { $$$BODY }");
// Extract function names
for func in functions {
println!("Found function: {}", func.get_text("NAME")?);
}use thread_flow::ThreadFlowBuilder;
// Build a declarative analysis pipeline
let flow = ThreadFlowBuilder::new("analyze_rust")
.source_local("src/", &["**/*.rs"], &["target/**"])
.parse()
.extract_symbols()
.target_postgres("code_symbols", &["content_hash"])
.build()
.await?;
// Execute the flow
flow.execute().await?;# Analyze a codebase (first run)
thread analyze ./my-project
# → Analyzing 1,000 files: 10.5s
# Second run (with cache)
thread analyze ./my-project
# → Analyzing 1,000 files: 0.3s (100% cache hits, 35x faster!)
# Incremental update (only changed files)
# Edit 10 files, then:
thread analyze ./my-project
# → Analyzing 10 files: 0.15s (990 files cached)Thread follows a service-library dual architecture with six main crates plus service layer:
thread-ast-engine- Core AST parsing, pattern matching, and transformation enginethread-language- Language definitions and tree-sitter parser integrations (20+ languages)thread-rule-engine- Rule-based scanning and transformation with YAML configurationthread-utils- Shared utilities including SIMD optimizations and hash functionsthread-wasm- WebAssembly bindings for browser and edge deployment
thread-flow- High-level dataflow pipelines with ThreadFlowBuilder APIthread-services- Service interfaces, API abstractions, and ReCoco integration- Storage Backends:
- Postgres (CLI deployment) - Persistent caching with <10ms p95 latency
- D1 (Cloudflare Edge) - Distributed caching across CDN nodes with <50ms p95 latency
- Qdrant (optional) - Vector similarity search for semantic analysis
- Rayon (CLI) - CPU-bound parallelism for local multi-core utilization (2-8x speedup)
- tokio (Edge) - Async I/O for horizontal scaling and Cloudflare Workers
Best for: Development environments, CI/CD pipelines, large batch processing
# Build with CLI features (Postgres + Rayon parallelism)
cargo build --release --features "recoco-postgres,parallel,caching"
# Configure PostgreSQL backend
export DATABASE_URL=postgresql://user:pass@localhost/thread_cache
export RAYON_NUM_THREADS=8 # Use 8 cores
# Run analysis
./target/release/thread analyze ./large-codebase
# → Performance: 1,000-10,000 files per runFeatures: Direct filesystem access, multi-core parallelism, persistent caching, unlimited CPU time
See CLI Deployment Guide for complete setup.
Best for: Global API services, low-latency analysis, serverless architecture
# Build WASM for edge
cargo run -p xtask build-wasm --release
# Deploy to Cloudflare Workers
wrangler deploy
# Access globally distributed API
curl https://thread-api.workers.dev/analyze \
-d '{"code":"fn main(){}","language":"rust"}'
# → Response time: <50ms worldwide (p95)Features: Global CDN distribution, auto-scaling, D1 distributed storage, no infrastructure management
See Edge Deployment Guide for complete setup.
Thread supports 20+ programming languages via tree-sitter parsers:
- Rust, JavaScript/TypeScript, Python, Go, Java
- C/C++, C#, PHP, Ruby, Swift, Kotlin, Scala
- Bash, CSS, HTML, JSON, YAML, Lua, Elixir, Haskell
Each language provides full AST parsing, symbol extraction, and pattern matching capabilities.
Thread's core strength is AST-based pattern matching using meta-variables:
$VAR- Captures a single AST node$$$ITEMS- Captures multiple consecutive nodes (ellipsis)$_- Matches any node without capturing
// Find all variable declarations
root.find_all("let $VAR = $VALUE")
// Find if-else statements
root.find_all("if ($COND) { $$$THEN } else { $$$ELSE }")
// Find function calls with any arguments
root.find_all("$FUNC($$$ARGS)")
// Find class methods
root.find_all("class $CLASS { $$$METHODS }")id: no-var-declarations
message: "Use 'let' or 'const' instead of 'var'"
language: JavaScript
severity: warning
rule:
pattern: "var $NAME = $VALUE"
fix: "let $NAME = $VALUE"| Language | Files | Time | Throughput | Cache Hit | Incremental (1% update) |
|---|---|---|---|---|---|
| Rust | 10,100 | 7.4s | 1,365 files/s | 100% | 0.6s (100 files) |
| TypeScript | 10,100 | 10.7s | 944 files/s | 100% | ~1.0s (100 files) |
| Python | 10,100 | 8.5s | 1,188 files/s | 100% | 0.7s (100 files) |
| Go | 10,100 | 5.4s | 1,870 files/s | 100% | 0.4s (100 files) |
| Operation | Time | Speedup vs Parse | Notes |
|---|---|---|---|
| Blake3 fingerprint | 425ns | 346x faster | Single file |
| Batch fingerprint | 17.7µs | - | 100 files |
| AST parsing | 147µs | Baseline | Small file (<1KB) |
| Cache hit (in-memory) | <1µs | 147,000x faster | LRU cache lookup |
| Cache hit (repeated) | 0.9s | 35x faster | 10,000 file reanalysis |
| Incremental (1%) | 0.6s | 12x faster | 100 changed, 10K total |
| Backend | Target | Actual (Phase 5) | Deployment |
|---|---|---|---|
| InMemory | N/A | <1ms | Testing |
| Postgres | <10ms p95 | <1ms (local) | CLI |
| D1 | <50ms p95 | <1ms (local) | Edge |
- Rust: 1.85.0 or later (edition 2024)
- Tools: cargo-nextest (optional), mise (optional)
# Build everything (except WASM)
mise run build
# or: cargo build --workspace
# Build in release mode
mise run build-release
# Build WASM for edge deployment
mise run build-wasm-release# Run all tests
mise run test
# or: cargo nextest run --all-features --no-fail-fast -j 1
# Run tests for specific crate
cargo nextest run -p thread-ast-engine --all-features
# Run benchmarks
cargo bench -p thread-rule-engine# Full linting
mise run lint
# Auto-fix formatting and linting issues
mise run fix
# Run CI pipeline locally
mise run ci# Run specific test
cargo nextest run --manifest-path Cargo.toml test_name --all-features
# Run benchmarks
cargo bench -p thread-flow- CLI Deployment Guide - Local/server deployment with Postgres
- Edge Deployment Guide - Cloudflare Workers with D1
- Architecture Overview - System design and data flow
- Rustdoc: Run
cargo doc --open --no-deps --workspacefor full API documentation - Examples: See
examples/directory for usage patterns
- Integration Tests - E2E test design and coverage
- Error Recovery - Error handling strategies
- Observability - Metrics and monitoring
- Performance Benchmarks - Benchmark suite design
All development MUST adhere to the Thread Constitution v2.0.0 (.specify/memory/constitution.md)
-
Service-Library Architecture (Principle I)
- Features MUST consider both library API design AND service deployment
- Both aspects are first-class citizens
-
Test-First Development (Principle III - NON-NEGOTIABLE)
- TDD mandatory: Tests → Approve → Fail → Implement
- All tests execute via
cargo nextest - No exceptions, no justifications accepted
-
Service Architecture & Persistence (Principle VI)
- Content-addressed caching MUST achieve >90% hit rate
- Storage targets: Postgres <10ms, D1 <50ms, Qdrant <100ms p95 latency
- Incremental updates MUST trigger only affected component re-analysis
Before any PR merge, verify:
- ✅
mise run lintpasses (zero warnings) - ✅
cargo nextest run --all-featurespasses (100% success) - ✅
mise run cicompletes successfully - ✅ Public APIs have rustdoc documentation
- ✅ Performance-sensitive changes include benchmarks
- ✅ Service features meet storage/cache/incremental requirements
We welcome contributions of all kinds! By contributing to Thread, you agree to our Contributor License Agreement (CLA).
- Run
mise run install-toolsto set up development environment - Make changes following existing patterns
- Run
mise run fixto apply formatting and linting - Run
mise run testto verify functionality - Use
mise run cito run full CI pipeline locally - Submit pull request with clear description
Thread follows the REUSE Specification for license information. Every file should have license information at the top or in a .license file. See existing files for examples.
Thread is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0-or-later). You can find the full license text in the LICENSE file.
Key Points:
- ✅ Free for personal and commercial use
- ✅ Modify the code as needed
⚠️ You must share your changes with the community under AGPL 3.0 or later⚠️ Include AGPL 3.0 and copyright notice with copies you share- ℹ️ If you don't modify Thread, you can use it without sharing your source code
Purchase a commercial license from Knitli to use Thread without sharing your source code. Contact us at licensing@knit.li
- Some components forked from ast-grep are licensed under AGPL 3.0 or later AND MIT. See VENDORED.md.
- Documentation and configuration files are licensed under MIT OR Apache-2.0 (your choice).
Thread has been validated for production use with comprehensive testing:
- 780 tests: 100% pass rate across all modules
- Real-world validation: Tested with 10,000+ files per language
- Performance targets: All metrics exceeded by 20-40%
- Edge cases: Comprehensive coverage including empty files, binary files, symlinks, Unicode, circular dependencies, deep nesting, large files
- Zero known issues: No crashes, memory leaks, or data corruption
See Phase 5 Completion Summary for full validation report.
- Documentation: https://thread.knitli.com
- Issues: GitHub Issues
- Email: support@knit.li
- Commercial Support: licensing@knit.li
Thread is built on the shoulders of giants:
- ast-grep: Core pattern matching engine (MIT license)
- tree-sitter: Universal parsing framework
- ReCoco: Dataflow orchestration framework
- BLAKE3: Fast cryptographic hashing
Special thanks to all contributors and the open source community.
Created by: Knitli Inc. Maintained by: Thread Team License: AGPL-3.0-or-later (with commercial license option) Version: 0.0.1