Why Graph-RAG eats vector search for breakfast in the enterprise
Chunks-and-cosine works for blogs. It quietly falls apart on a deal room, where the answer lives across a memo, three appendices, and a footnote that cites them.
How we think about retrieval, what we've learned shipping Ember into PE firms, top-quartile consultancies, and venture funds — and the occasional opinion the rest of the industry won't have for another year.
Chunks-and-cosine works for blogs. It quietly falls apart on a deal room, where the answer lives across a memo, three appendices, and a footnote that cites them.
A pointer file and a manifest. That's the whole integration surface. Here's the design rationale, and what we deliberately left out.
Six weeks at a $40B AUM firm. The agents the partners already use, the documents they wouldn't share with each other, and the structure that finally made the agents useful.
If a memo references an exhibit and the exhibit references a model, your chunker has no idea. The graph does — and it changes which sources get retrieved.
Most enterprise RAG vendors will, given the right prompt, leak across customers. Here's the boring, low-glamour fix — and why it has to live below the retriever.
It's not the Confluence wiki. It's not even the deal memos. It's the way the partners reason about them — and that's the thing agents need next.
Peer-reviewed work, preprints, and internal technical reports. Most of what we publish is a direct consequence of something we hit in production at a customer — written up so the rest of the field can build on it.
A retrieval architecture that treats citations as first-class edges. Outperforms dense baselines by 17.4 nDCG@10 on a held-out PE deal-room benchmark.
A permission-aware retriever that enforces tenant boundaries below the embedding layer. Zero cross-tenant hits across 1.2M adversarial probes; 0.3% latency overhead.
Treating the document graph as a primary access path — not a post-hoc rerank — yields a 4× reduction in tokens-to-correct-answer on multi-hop financial queries.
A 12,400-query benchmark across 47 anonymized deal rooms. We release the evaluation harness; the underlying corpora remain under NDA.
Frontier 200k-context models degrade by 28% on attribution faithfulness past 60k tokens, even when raw recall stays flat. Implications for long-document RAG.
How we extract, normalize, and provenance-track 14 document types — including handwritten margin notes on legacy memos — for audit-grade retrieval.