qmd Configuration - kakkoyun's Brain Overflow

# qmd Configuration Local-first semantic search engine for markdown files by Tobi Lütke. Combines BM25 keyword search, vector embeddings, and LLM reranking — all running on-device via GGUF models with Metal GPU acceleration. No API keys, no cloud, always available. > [!info] Why qmd over obsidian-cli? > `obsidian-cli` requires Obsidian to be running. qmd works from any terminal, any Claude Code session, any time. It also covers project markdown (READMEs, ADRs, docs) that Obsidian doesn't index. ## Installation ```bash brew install tobil/tap/qmd ``` **Binary:** `/opt/homebrew/bin/qmd` ## Collections 11 collections configured covering four vaults and seven project repositories: | Collection | Path | Contents | |---|---|---| | `personal` | `~/Vaults/personal/` | Journals, decisions, notes, projects, ideas (PARA) | | `work` | `~/Vaults/work/` | Meetings, projects, devlog, team docs | | `blog` | `~/Vaults/blog/content/` | Blog posts and drafts (Hugo) | | `public-content` | `~/Vaults/public-content/` | Talks, presentations, public writing | | `claude-marketplace` | `~/Workspace/Projects/Personal/claude-marketplace/` | Plugin skills, commands, agents | | `dotfiles` | `~/Workspace/Projects/Personal/dotfiles/` | Shell config, Claude rules, hooks | | `dd-trace-go` | `~/dd/dd-trace-go/` | Datadog APM Go tracer docs | | `orchestrion` | `~/dd/orchestrion/` | Orchestrion compile-time instrumentation docs | | `otel-go` | `~/…/opentelemetry-go/` | OpenTelemetry Go SDK docs | | `otel-compile` | `~/…/otel-go-compile-instrumentation/` | OTel compile-time instrumentation docs | | `otel-contrib` | `~/…/opentelemetry-go-contrib/` | OTel Go contrib docs | ### Adding a collection ```bash qmd collection add ~/path/to/dir --name name qmd context add qmd://name/ "What this collection contains" qmd embed # generate embeddings (run once; slow) ``` ## Query Syntax ### Default: implicit expand (recommended) ```bash qmd query "how does compile-time instrumentation work" -c orchestrion -n 5 ``` Runs expand → hybrid BM25+vector → LLM reranking. Best results for natural language questions. ### Typed query document (precision) Multi-line queries combine search strategies: ```bash # BM25 + vector combined qmd query lex: CAP theorem distributed\nvec: consistency availability tradeoffs' # Phrase search with negation qmd query lex: "exact phrase" -excluded-term' # HyDE: find by hypothetical document qmd query hyde: A Go library that wraps HTTP handlers with distributed tracing spans' # With intent context qmd query intent: find design decision\nlex: authentication\nvec: session token JWT' ``` ### Query type comparison | Command | Strategy | Best for | |---|---|---| | `qmd query` | expand + hybrid + rerank | Default — natural language | | `qmd search` | BM25 keyword only | Fast exact-term lookup | | `qmd vsearch` | Vector similarity only | Semantic without keyword bias | ### Flags ```bash -c <name> # scope to one collection -n <N> # result count (default 5) ``` ## Common Commands ```bash # Search qmd query "morning journal workflow" -c personal -n 5 qmd search "ADR authentication" -c personal -n 5 qmd query "span attribute extraction" -c dd-trace-go -n 5 # Get document content qmd get qmd://personal/Projects/my-project.md # Batch fetch qmd multi-get "qmd://personal/Journal/2025/**" # Inspect qmd status # index health, collection list, pending embeddings qmd ls personal # browse indexed files qmd ls orchestrion/docs # browse subdirectory # Maintenance qmd embed # generate/refresh embeddings (slow — run after adding collections) qmd update # re-index changed files qmd context list # show all context metadata ``` ## Models All models run on-device via Metal GPU: | Model | Size | Purpose | |---|---|---| | `embeddinggemma-300M-Q8_0` | 328 MB | Document and query embeddings | | `Qwen3-Reranker-0.6B-Q8_0` | 600 MB | Re-scores candidate results | | `qmd-query-expansion-1.7B` | 1.28 GB | HyDE and query expansion (expand mode) | Models download automatically on first use to `~/.cache/qmd/models/`. > [!note] First-run performance > `qmd embed` downloads the embedding model (~328 MB) and processes all documents. > `qmd query` (expand mode) downloads the 1.3 GB generation model on first use. > Subsequent queries are fast — all models stay loaded in memory via Metal. ## Claude Code Integration ### MCP Server Configured globally in `~/.claude/mcp.json` (stow-managed via dotfiles): ```json { "mcpServers": { "qmd": { "command": "/opt/homebrew/bin/qmd", "args": ["mcp"] } } } ``` Exposes `mcp__qmd__query`, `mcp__qmd__get`, `mcp__qmd__multi_get`, `mcp__qmd__status` in every Claude Code session. ### Plugin Skill Installed as `search@kakkoyun-tools` in the claude-marketplace. Invoked as `/search:qmd [query]` or auto-triggered when user says "search notes", "find in vault", "look up in knowledge base". ``` /search:qmd how does automatic instrumentation work ``` ## Anti-Patterns > [!warning] Never during a search session > - **Never run `qmd embed`** — it takes minutes and blocks the index. Run only when explicitly re-indexing. > - **Never run `qmd update`** — collections auto-scan on query. > - **Never omit `-c`** when intent is scoped to a specific vault or project. > - **Never use `--full`** as default — use `qmd get` for full document content. ## Verification ```bash # Check all collections and embedding status qmd status # Test vault search qmd query "morning journal workflow" -c personal -n 3 # Test project search qmd query "span instrumentation" -c dd-trace-go -n 3 # Verify MCP symlink (stow-managed) ls -la ~/.claude/mcp.json ``` ## Decisions 1. **Local-first over cloud**: All models run on-device. No API keys, no rate limits, works offline. 2. **11 collections, not one giant index**: Scoped searches (`-c`) are more precise and faster than all-collection queries. Context metadata per collection improves relevance. 3. **MCP server globally configured**: `~/.claude/mcp.json` (not per-project) because note search is useful across all projects, not tied to any one repo. 4. **qmd over obsidian-cli**: obsidian-cli requires Obsidian to be open. qmd is always available from any terminal session. 5. **Expand mode as default**: `qmd query` (implicit expand) outperforms bare keyword or vector-only search for natural language questions. The 1.3 GB model is worth it for precision.