- Should I use RAG or fine-tune?
- Almost always RAG for facts that change, fine-tune (or post-train) for behaviour that doesn't. Most enterprise cases are RAG. We have a deeper take on the blog (RAG vs fine-tuning).
- Do I need a dedicated vector database?
- Usually not. pgvector with HNSW matches or beats dedicated vector databases up to ~1M vectors on equivalent compute (Supabase 2026 benchmarks). We move to Qdrant / Milvus / Vespa when scale, query throughput, or workload genuinely demands it. See the vector-databases page for the decision matrix.
- How do you measure RAG quality?
- Golden set of representative queries with expected citations + a separate eval set with adversarial cases. Retrieval metrics (Recall@K, MRR) gate the index. End-to-end metrics (faithfulness, answer quality via LLM-as-judge) gate the prompt + model. See /llm/evaluation for the broader eval discipline.
- What about GraphRAG?
- Microsoft's GraphRAG is the right answer for domains where multi-hop reasoning across linked entities is the question: clinical research, regulatory analysis, complex case files. Cost: building and maintaining the graph. See architectures for when it earns it.
- How long does a RAG engagement run?
- First production-grade build is typically an eight-week sprint. Enterprise programs (multi-corpus, GraphRAG, multimodal) run as quarterly phases. Ongoing partnerships make sense post-launch for eval ops, model migrations, and corpus drift monitoring.