Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ nul
*~
.claude
.codebase-intelligence.json
.cursor/
56 changes: 56 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,62 @@ These are non-negotiable. Every PR, feature, and design decision must respect th
- **No overclaiming in public docs**: README and CHANGELOG must be evidence-backed. Don't claim capabilities that aren't shipped and tested.
- **internal-docs is private**: Never commit `internal-docs/` pointer changes unless explicitly intended. The submodule is always dirty locally; ignore it.

## Evaluation Integrity (NON-NEGOTIABLE)

These rules prevent metric gaming, overfitting, and false quality claims. Violation of these rules means the feature CANNOT ship.

### Rule 1: Eval Sets are Frozen Before Implementation

- **Define test queries and expected results BEFORE writing any code**
- Commit the eval fixture (e.g., `tests/fixtures/eval-queries.json`) BEFORE starting implementation
- **NEVER adjust expected results to match system output** - If the system returns different results, that's a failure, not a fixture bug
- Exception: If the original expected result was factually wrong (file doesn't exist, query is ambiguous), document the correction with justification

### Rule 2: Eval Sets Must Be General

- **Minimum 20 queries** across diverse patterns (exact names, conceptual, multi-concept, edge cases)
- Test on **multiple codebases** (minimum 2: one you control, one public/real-world)
- Include queries that are HARD and likely to fail - don't cherry-pick easy wins
- Eval set must represent real user queries, not synthetic examples designed to pass

### Rule 3: Public Eval Methodology

- Full eval harness code must be in `tests/` (public repository)
- Eval fixtures must be public (or provide reproducible public examples)
- Document how to run eval: `npm run eval -- /path/to/codebase`
- Results must be reproducible by external users

### Rule 4: No Score Manipulation

- **NEVER add heuristics specifically to game eval metrics** (e.g., "if query contains X, boost Y")
- **NEVER adjust scoring to break ties just to improve top-1 accuracy**
- If you add ranking heuristics, they must be general-purpose and justified by search theory, not by "it makes test #7 pass"
- Document all ranking heuristics with research citations or principled justification

### Rule 5: Report Honestly

- Report **both improvements AND failures** (e.g., "9/20 pass, 11/20 fail")
- If top-3 recall is 80% but top-1 is 45%, say so - don't hide behind a single cherry-picked metric
- Acknowledge when improvements are **workarounds** (filtering, heuristics) vs **fundamental** (better embeddings, ML models)
- Include failure analysis in CHANGELOG: "Known limitations: struggles with multi-concept queries"

### Rule 6: Cross-Check with Real Usage

- Before claiming "X% improvement", test on a real codebase you didn't develop against
- Ask: "Would this improvement generalize to a Python codebase? A Go codebase?"
- If the improvement is framework-specific (e.g., Angular-only), say so explicitly

### Violation Response

If any agent violates these rules:
1. **STOP immediately** - do not proceed with the release
2. **Revert** any fixture adjustments made to game metrics
3. **Re-run eval** with frozen fixtures
4. **Document the violation** in internal-docs for learning
5. **Delay the release** until honest metrics are available

These rules exist because **trustworthiness is more valuable than a good-looking number**.

## Codebase Context

**At start of each task:** Call `get_memory` to load team conventions.
Expand Down
32 changes: 32 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,37 @@
# Changelog

## [1.6.0](https://github.com/PatrickSys/codebase-context/compare/v1.5.1...v1.6.0) (2026-02-10)

### Added

- **Search Quality Improvements** — Weighted hybrid search with intent-aware classification
- Intent-aware query classification (EXACT_NAME, CONCEPTUAL, FLOW, CONFIG, WIRING)
- Reciprocal Rank Fusion (RRF, k=60) for robust rank-based score combination
- Hard test-file filtering (eliminates spec contamination in non-test queries)
- Import-graph proximity reranking (structural centrality boosting)
- File-level deduplication (one best chunk per file)
- **Evaluation Harness** — Frozen fixture set with reproducible methodology
- **Embedding Upgrade** — Granite model support (47M params, 8192 context)
- **Chunk Optimization** — 100→50 lines, overlap 10→0, merge small chunks

### Changed

- **Dependencies**: `@xenova/transformers` v2 → `@huggingface/transformers` v3
- **Indexing**: Tighter chunks (50 lines) with zero overlap
- **Search**: RRF fusion immune to score distribution differences

### Fixed

- Intent-blind search (conceptual queries now classified and routed correctly)
- Spec file contamination (test files hard-filtered from non-test query results)
- Embedding truncation (granite's 8192 context eliminates previous 512 token limit)

### BREAKING CHANGES

**Re-index required** after upgrade due to model and chunking changes:
- Existing `.codebase-context/` indices from v1.5.x incompatible
- Run `refresh_index(incrementalOnly: false)` or delete `.codebase-context/` folder

## [1.5.1](https://github.com/PatrickSys/codebase-context/compare/v1.5.0...v1.5.1) (2026-02-08)


Expand Down
3 changes: 2 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -94,10 +94,10 @@
"type-check": "tsc --noEmit"
},
"dependencies": {
"@huggingface/transformers": "^3.8.1",
"@lancedb/lancedb": "^0.4.0",
"@modelcontextprotocol/sdk": "^1.25.2",
"@typescript-eslint/typescript-estree": "^7.0.0",
"@xenova/transformers": "^2.17.0",
"fuse.js": "^7.0.0",
"glob": "^10.3.10",
"hono": "4.11.7",
Expand Down Expand Up @@ -125,6 +125,7 @@
"pnpm": {
"onlyBuiltDependencies": [
"esbuild",
"onnxruntime-node",
"protobufjs",
"sharp"
]
Expand Down
Loading
Loading