This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
CodeGraph is a local-first code intelligence library + CLI + MCP server. It parses any supported codebase with tree-sitter, stores symbols/edges/files in SQLite (FTS5), and exposes a knowledge graph to AI agents (Claude Code, Cursor, Codex CLI, opencode) over MCP. Per-project data lives in .codegraph/. Extraction is deterministic — derived from AST, not LLM-summarized.
Distributed as @colbymchenry/codegraph on npm; same binary serves as installer, indexer, and MCP server.
npm run build # tsc + copy schema.sql and *.wasm into dist/; chmods dist/bin/codegraph.js
npm run dev # tsc --watch
npm run clean # rm -rf dist
npm test # vitest run (all)
npm run test:watch
npm run test:eval # only __tests__/evaluation/
npm run eval # build then run __tests__/evaluation/runner.ts via tsx
npm run cli # build then run the local dist binary
# Single test file / pattern
npx vitest run __tests__/installer-targets.test.ts
npx vitest run __tests__/extraction.test.ts -t "TypeScript"copy-assets (called from build) copies src/db/schema.sql and all src/extraction/wasm/*.wasm files into dist/. Any new SQL or grammar wasm must be copied or it won't ship.
Node engines: >=18.0.0 <25.0.0. There is a hard exit on Node 25.x (see src/bin/node-version-check.ts).
files → ExtractionOrchestrator (tree-sitter) → DB (nodes/edges/files)
↓
ReferenceResolver (imports, name-matching, framework patterns)
↓
GraphQueryManager / GraphTraverser (callers, callees, impact)
↓
ContextBuilder (markdown/JSON for AI consumption)
The public API surface is src/index.ts — the CodeGraph class wires all the layers and re-exports types. Library users only touch this file; the MCP server and CLI also drive it.
src/index.ts—CodeGraphclass:init/open/close,indexAll,sync,searchNodes,getCallers/getCallees,getImpactRadius,buildContext,watch/unwatch.src/db/—DatabaseConnection,QueryBuilder(prepared statements),schema.sql. Backed bybetter-sqlite3(native) when available, transparently falls back tonode-sqlite3-wasm.codegraph statussurfaces which backend is live; wasm is the slow path.src/extraction/—ExtractionOrchestrator, tree-sitter wrappers, per-language extractors underlanguages/(one file per language), plus standalone extractors for non-tree-sitter formats (svelte-extractor.ts,vue-extractor.ts,liquid-extractor.ts,dfm-extractor.tsfor Delphi).parse-worker.tsruns heavy parsing off the main thread.src/resolution/—ReferenceResolverorchestratesimport-resolver.ts(withpath-aliases.tsfor tsconfig path aliases + cargo workspace member globs),name-matcher.ts, andframeworks/(Express, Laravel, Rails, FastAPI, Django, Flask, Spring, Gin, Axum, ASP.NET, Vapor, React Router, SvelteKit, Vue/Nuxt, Cargo workspaces). Frameworks emitroutenodes andreferencesedges.src/graph/—GraphTraverser(BFS/DFS, impact radius, path finding) andGraphQueryManager(high-level queries).src/context/—ContextBuilder+ formatter for markdown/JSON output.src/search/— full-text query parser and helpers for FTS5.src/sync/—FileWatcher(native FSEvents/inotify/RDCW) with debounce + filter, and git-hook helpers.src/mcp/— MCP server (MCPServer,tools.ts,transport.ts).server-instructions.tsis what the server returns in the MCPinitializeresponse — keep it in sync with the user-facing tool guidance.src/installer/— see below.src/bin/codegraph.ts— CLI (commander). Subcommands:install,init,uninit,index,sync,status,query,files,context,affected,serve --mcp.src/ui/— terminal UI (shimmer progress, worker).
Defined in src/types.ts. Both extractors and resolvers must use these exact strings.
- NodeKind:
file,module,class,struct,interface,trait,protocol,function,method,property,field,variable,constant,enum,enum_member,type_alias,namespace,parameter,import,export,route,component. - EdgeKind:
contains,calls,imports,exports,extends,implements,references,type_of,returns,instantiates,overrides,decorates.
src/installer/ is the entry point for codegraph install (and the bare codegraph/npx @colbymchenry/codegraph invocation). Architecture:
targets/registry.tslists every supported agent.targets/types.tsdefines theAgentTargetinterface — adding a 5th agent (Continue, Zed, Windsurf…) is one new file intargets/+ one entry inregistry.ts. Each target owns its config-file location, MCP-server JSON/TOML/JSONC writing, and instructions-file path.- Current targets:
claude.ts,cursor.ts,codex.ts,opencode.ts. targets/toml.tsis a hand-rolled TOML serializer scoped to[mcp_servers.codegraph](used by Codex). Sibling tables and[[array_of_tables]]are preserved verbatim. No new dependency.- opencode reads
opencode.jsoncby default; the installer prefers existing.jsonc, falls back to.json, and creates.jsoncfor greenfield installs. Edits are surgical viajsonc-parserso user comments and formatting survive install/re-install/uninstall round-trips. instructions-template.tsis the agent-agnostic instructions file written to each target (e.g.CLAUDE.md,.cursor/rules/codegraph.mdc,~/.codex/AGENTS.md,~/.config/opencode/AGENTS.md). It explicitly says "trust codegraph results, don't re-verify with grep" — earlier versions prescribed Claude-specific "spawn an Explore agent" and confused other agents.claude-md-template.tsis the legacy Claude-only template, retained for compatibility paths.- All installer changes need matching coverage in
__tests__/installer-targets.test.ts— there are ~47 parameterized contract tests covering install idempotency, sibling preservation, uninstall reverses install, byte-equal re-runs returningunchanged, and partial-state recovery for Codex.
Cursor launches MCP subprocesses with the wrong cwd and doesn't pass rootUri in initialize. The installer injects --path into Cursor's MCP args — absolute path for local installs, ${workspaceFolder} for global installs. If you touch Cursor wiring, preserve this.
src/mcp/server-instructions.ts is sent back to the agent in the MCP initialize response. This is the first thing every agent sees about how to use the tools — treat it as the authoritative tool guidance and keep it in sync with instructions-template.ts and .cursor/rules/codegraph.mdc.
CodeGraph's core value is letting an agent answer structural/flow questions ("how does X reach Y", trace, impact, callers) with a few fast codegraph calls and zero Read/Grep. The optimization target is wall-clock latency + tool-call count — don't optimize for token cost. (Cost is lower, not "flat" as earlier framing claimed: a current-build with-vs-without A/B across the 7 README repos, median of 4, saved on average 35% cost · 57% tokens · 46% time · 71% tool calls — reproducing the published README. The mechanism is far fewer turns over a much smaller accumulated context — NOT cache-ability: the without-arm's huge token volume is mostly cheap cache-reads, which is why token-count savings (57%) look bigger than cost savings (35%). Measure tokens by summing per-turn assistant usage, not result.usage (last-turn only in current Claude Code). See docs/benchmarks/call-sequence-analysis.md.) The mechanism that drives everything here: an agent falls back to Read/Grep the instant a codegraph answer is insufficient. So every change is judged by one question — is codegraph's answer sufficient enough to stop the agent from reading?
Target behavior: a flow question resolves in 1 codegraph call on small repos, scaling to 3–5 on large, with Read/Grep = 0. When reviewing a PR or trying something new, do not regress this.
The lever that decides whether a retrieval change lands. Test before building anything here: does this make a tool the agent already calls do more with the input it already gives? If it instead needs the agent to behave differently — pick a different tool, query differently, learn from examples — it hits the low-salience wall and won't land.
CodeGraph's only channels to influence the agent are low-salience: the MCP initialize instructions (server-instructions.ts) and the tool descriptions. Changing them does not reliably move the agent's tool choice or query style — validated: trace-first steering ported into the server-instructions + tool descriptions (3 wording variants) never reproduced what a CLI --append-system-prompt achieved, and regressed wall-clock vs baseline. New tools fare worse (rarely chosen — the agent under-picks even trace); "better examples" is the same steering. The agent's tool-choice does improve on its own as host models get better at tool use — but that is not ours to force.
What works is meeting the agent where it already is:
- Sufficiency —
codegraph_traceinlines each hop's body + the destination's own callees, so one trace call ends the flow investigation (no follow-up explore/node/Read). - explore-flow —
codegraph_explore's query is a precise bag of symbol names (incl. qualifiedClass.method) spanning the flow the agent is after; explore finds the call path among those named symbols (riding synthesized edges) and leads its output with it — delivering trace-quality flow through the call the agent reliably makes. (buildFlowFromNamedSymbols: segment/co-naming disambiguation; ≤1 unnamed bridge so it never wanders a god-function's fan-out.)
What fails is the inverse — folding a precise answer into a fuzzy-input tool. codegraph_context gets a description, not symbols, so it can't disambiguate a flow's endpoints and surfaces the wrong feature. Precise output needs precise input.
The remaining lever under this axis is coverage: every flow made to connect statically (a new dynamic-dispatch synthesizer) is then surfaced automatically by explore-flow/trace, no agent change needed. Reactive/reconciler runtimes (Halo's ReactiveExtensionClient, MediatR, Vue Proxy) are the frontier — flows there have no static edges, so nothing surfaces (correctly — silent beats wrong). Full investigation + A/B record: docs/benchmarks/call-sequence-analysis.md.
Two functions in src/mcp/tools.ts scale explore with indexed file count. This is the expected resolution (a regression here silently forces agents back to Read):
| Repo | files | explore calls | chars/call | per-file |
|---|---|---|---|---|
| express (small) | 147 | 1 | 18K | 3800 |
| excalidraw/django (medium) | 643–3043 | 2 | 28K | 6500 |
| vscode (large) | 10446 | 3 | 35K | 7000 |
| ~20k / ~40k | — | 4 / 5 | 38K | 7000 |
getExploreBudget(fileCount)→ call budget:<500→1, <5000→2, <15000→3, <25000→4, ≥25000→5(max 5).getExploreOutputBudget(fileCount)→ per-call output (chars / files / per-file). Invariant: a larger tier must never get a smallermaxCharsPerFilethan a smaller tier. (Regression that motivated this doc: the<5000tier's 2500 was below the<500tier's 3800, so on a god-file repo — excalidraw's 415 KBApp.tsx— one explore returned <1% of the file and forced a Read.)- Explore output must never tell the agent to "use Read" — steer to another
codegraph_exploreand "treat returned source as already Read."
Static tree-sitter extraction misses computed/indirect calls, so flows break at dynamic dispatch and the agent reads to reconstruct them. Synthesizers/resolvers bridge these so trace/explore connect end-to-end (src/resolution/callback-synthesizer.ts, src/resolution/frameworks/). Channels today: callback/observer, EventEmitter, React re-render (setState→render), JSX child (render→child component), django ORM descriptor. All synthesized edges are provenance:'heuristic' with metadata.synthesizedBy + registeredAt (the wiring site), surfaced inline in trace, the node trail, and context call-paths.
Principle: partial coverage is WORSE than none. Bridging one boundary but not the next reveals a hop the agent then drills + reads to finish. Measured on excalidraw: react-render alone raised reads to 5–7; only completing the flow (adding the jsx-child hop) dropped it to 0–1. Always close the flow end-to-end and re-measure — never ship a half-bridged flow.
For each language × framework, validate on small, medium, and large real repos with ≥3 different flow prompts each:
- Pick the canonical flow for the framework ("how does X reach Y": state→render, request→handler→view, query→SQL, action→reducer→store…).
- Deterministic probes (
scripts/agent-eval/probe-{trace,node,context,explore}.mjsagainst the builtdist/):trace(from,to)connects end-to-end with no break; no node explosion (select count(*) from nodesstable before/after re-index); synthesized-edge precision spot-check (select … where provenance='heuristic'). - Agent A/B (
scripts/agent-eval/run-all.sh <repo> "<Q>"): with vs without codegraph, ≥2 runs/arm (run-to-run variance is large — never conclude from n=1). Record duration, total tool calls, Read, Grep. Optional forced-Read-0 sufficiency proof via the block-read hook (scripts/agent-eval/hook-settings.json). - Pass bar: a normal flow question reaches ~0 Read/Grep within the repo's explore-call budget, runs faster than without-codegraph, and shows no regression on a control repo. Record the numbers in
docs/design/dynamic-dispatch-coverage-playbook.md(the coverage matrix).
Full playbook + per-mechanism design: docs/design/dynamic-dispatch-coverage-playbook.md and docs/design/callback-edge-synthesis.md.
The template to replicate per language/framework. Question: "how does updating an element re-render the canvas on screen?" (the full flow crosses three React boundaries: observer callback, setState→render, and JSX child).
| Stage | duration | Read | Grep | codegraph |
|---|---|---|---|---|
| Without codegraph | 115–139s | 9–10 | 10–11 | 0 |
| Broken (explore-budget regression) | 131–139s | 5–10 | 3–5 | 6–14 |
| Fixed (budget + msgs + synthesis) | 64–112s | 0–2 | 2–4 | 3–10 |
| + trace-first steering | 51–74s | 0–2 | 0–4 | 3–4 |
n=4 unhooked runs/stage, same prompt. After steering flow questions to codegraph_trace first: best run 0 Read / 0 Grep / 3 codegraph / 51s; 2 of 4 fully clean (0 Read, 0 Grep). Steering eliminated the over-drill variance — call count tightened from 3–10 to 3–4, trace adoption went 3/4 → 4/4, and the search+callers path-reconstruction floundering dropped to 0. Run-to-run variance is still real; report the range, never a single run. Residual reads/greps are all the nonce data-flow (canvasNonce — a local prop with no graph edges); that's the def-use/data-flow frontier, left deliberately uncovered (tracking every local would explode the graph). Validated: trace(mutateElement, renderStaticScene) connects in 6 hops across all three boundaries (mutateElement → triggerUpdate → [callback] triggerRender → [react-render] render → [jsx] StaticCanvas → renderStaticScene), each hop showing inline source + the wiring site; node count stable at 9,289; 1 callback + 46 react-render + 280 jsx-render synthesized edges (no explosion, precision-checked).
Tests live in __tests__/ and mirror the module they cover. Notable ones beyond the obvious:
installer-targets.test.ts— parameterized contract suite across all 4 agent targets (see installer notes above).evaluation/—runner.ts+test-cases.tsexercise codegraph against synthetic projects and score the results; run vianpm run eval(builds first). Not part ofnpm test.sqlite-backend.test.ts— covers native + wasm backend selection and fallback.pr19-improvements.test.ts,frameworks-integration.test.ts— regression coverage for specific past PRs/incidents; don't rename these, the names anchor to git history.
Tests create temp dirs with fs.mkdtempSync and clean up in afterEach. They write real files and exercise real SQLite — there is no DB mocking.
Behavior that differs by platform (path resolution, drive letters, SENSITIVE_PATHS, %APPDATA% config dirs, CRLF) must be gated, not assumed. Use it.runIf(process.platform === 'win32')(...) for Windows-only assertions and it.runIf(process.platform !== 'win32')(...) for POSIX-only ones — e.g. /etc is sensitive on POSIX but resolves to C:\etc (non-existent) on Windows, so an ungated /etc assertion fails on Windows. Validate the Windows side for real (see below); don't merge a Windows-gated test you haven't seen run.
For any Windows-specific PR, bug, or implementation, validate it on the real Windows VM rather than guessing. Connection details live in the gitignored .parallels file at the repo root (VM name, guest IP, SSH user/key). prlctl exec needs Parallels Pro and is unavailable, so SSH is the bridge.
- Connect / run from the Mac host:
ssh <user>@<guest_ip> "...". For multi-line work, pipe PowerShell over stdin and refresh PATH from the registry first (sshd's session has a stale PATH after winget installs):ssh colby@10.211.55.3 "powershell -NoProfile -ExecutionPolicy Bypass -Command -" <<'PS' $env:Path = [Environment]::GetEnvironmentVariable("Path","Machine") + ";" + [Environment]::GetEnvironmentVariable("Path","User") Set-Location C:\dev\codegraph PS - Clone fresh into a Windows-local path (
C:\dev\codegraph) andnpm cithere — never run npm against the shared Mac repo, sinceesbuild/rollupship platform-specific binaries. - Guest toolchain (winget): Node LTS, Git, and the VC++ ARM64 redistributable (required by
@rollup/rollup-win32-arm64-msvc, which vitest pulls in). - Fetch a contributor PR head straight from their fork to dodge
pull/<n>/headlag:git fetch <fork-url> <branch>thengit checkout -f FETCH_HEAD. - Known pre-existing Windows failure:
security.test.ts > Session marker symlink resistance > does not follow a pre-planted symlink(symlink creation needs privileges on Windows). Unrelated to current work; don't let it mask new regressions.
Released to npm and mirrored as GitHub Releases. CHANGELOG.md is the source of truth; GitHub Release notes are extracted from it.
When asked for an entry for a new version:
- Add a new
## [X.Y.Z] - YYYY-MM-DDblock at the top ofCHANGELOG.md(under the intro, above the previous version). - Group under
### Added,### Changed,### Fixed,### Removed,### Deprecated,### Security— omit empty sections. - Write from the user's perspective, not the implementation's. Lead with the observable symptom or capability; mention internals only if a user needs them (e.g., to work around an existing bad install).
- Add the link reference at the bottom:
[X.Y.Z]: https://github.com/colbymchenry/codegraph/releases/tag/vX.Y.Z.
Releases are built and published by the GitHub Actions "Release" workflow
(.github/workflows/release.yml). It bundles a Node runtime per platform
(scripts/build-bundle.sh) and publishes both the GitHub Release and the npm
thin-installer (scripts/pack-npm.sh: a shim package + per-platform packages).
Publishing manually is wrong now — a plain npm publish ships the root
package (non-bundled), which breaks anyone on Node < 22.5.
After the changelog entry is written and package.json is bumped:
git add package.json package-lock.json CHANGELOG.md
git commit -m "release: X.Y.Z (<one-line summary>)"
git pushThen trigger Actions → Release → Run workflow (on main). It reads the
version from package.json, builds every platform bundle on one runner, creates
the GitHub Release with notes from the matching CHANGELOG.md section, and
publishes to npm. Requires the NPM_TOKEN repo secret.
Do not run npm publish, git push, or git tag yourself — these are
publish actions on shared state. Write the files, hand the user the commands.
- The
0.7.xline is in active multi-agent rollout. Any change tosrc/installer/(especiallytargets/) needs corresponding test coverage and a CHANGELOG entry — installer regressions break every new install silently. - When changing what the MCP tools do or how agents should use them, update all three of
src/mcp/server-instructions.ts,src/installer/instructions-template.ts, and.cursor/rules/codegraph.mdc— they're written to different places but say the same thing. - CodeGraph provides code context, not product requirements. For new features, ask the user about UX, edge cases, and acceptance criteria — the graph won't tell you.