Ember Insights · The Blog

Notes from inside real deployments.

How we think about retrieval, what we've learned shipping Ember into PE firms, top-quartile consultancies, and venture funds — and the occasional opinion the rest of the industry won't have for another year.

Retrieval

Why Graph-RAG eats vector search for breakfast in the enterprise

Chunks-and-cosine works for blogs. It quietly falls apart on a deal room, where the answer lives across a memo, three appendices, and a footnote that cites them.

2026-04-22
Architecture

Two files, any agent: how Ember Native plugs into your stack in an afternoon

A pointer file and a manifest. That's the whole integration surface. Here's the design rationale, and what we deliberately left out.

2026-04-08
Field Notes

Inside a PE deployment: what a deal team's brain actually looks like

Six weeks at a $40B AUM firm. The agents the partners already use, the documents they wouldn't share with each other, and the structure that finally made the agents useful.

2026-03-19
Graph

Document graphs, citations, and why chunking lies to you

If a memo references an exhibit and the exhibit references a model, your chunker has no idea. The graph does — and it changes which sources get retrieved.

2026-03-02
Security

The cross-tenant problem nobody talks about

Most enterprise RAG vendors will, given the right prompt, leak across customers. Here's the boring, low-glamour fix — and why it has to live below the retriever.

2026-02-14
Strategy

What we actually mean when we say "institutional knowledge"

It's not the Confluence wiki. It's not even the deal memos. It's the way the partners reason about them — and that's the thing agents need next.

2026-01-28
Ember Output · Technical Papers

Research from the team building the knowledge layer.

Peer-reviewed work, preprints, and internal technical reports. Most of what we publish is a direct consequence of something we hit in production at a customer — written up so the rest of the field can build on it.

NeurIPS '26 Workshop Apr 2026

GraphRAG-X: Citation-Aware Retrieval over Heterogeneous Enterprise Corpora

S. Chen, K. Thiagarajan, D. Lin, et al.

A retrieval architecture that treats citations as first-class edges. Outperforms dense baselines by 17.4 nDCG@10 on a held-out PE deal-room benchmark.

PDF arXiv BibTeX 14 pp
arXiv preprint Mar 2026 · 2603.11421

Cross-Tenant Retrieval Without Cross-Tenant Leakage

K. Thiagarajan, S. Chen

A permission-aware retriever that enforces tenant boundaries below the embedding layer. Zero cross-tenant hits across 1.2M adversarial probes; 0.3% latency overhead.

PDF arXiv BibTeX 22 pp
VLDB '26 Feb 2026 · in submission

Document Graphs as a First-Class Index

D. Lin, S. Chen, K. Thiagarajan

Treating the document graph as a primary access path — not a post-hoc rerank — yields a 4× reduction in tokens-to-correct-answer on multi-hop financial queries.

PDF arXiv BibTeX 18 pp
Tech Report Jan 2026 · TR-2026-01

Benchmarking Retrieval over Real Private-Equity Deal Rooms

D. Lin, S. Chen, K. Thiagarajan, M. Park

A 12,400-query benchmark across 47 anonymized deal rooms. We release the evaluation harness; the underlying corpora remain under NDA.

ICML '25 Workshop Jul 2025 · R2-FM

Long-Context Models Forget Earlier Citations: A Failure Mode We Should Stop Hiding

S. Chen, K. Thiagarajan

Frontier 200k-context models degrade by 28% on attribution faithfulness past 60k tokens, even when raw recall stays flat. Implications for long-document RAG.

Tech Report Oct 2025 · TR-2025-04

Ingestion at Audit Standards: A Pipeline for Regulated Verticals

M. Park, D. Lin

How we extract, normalize, and provenance-track 14 document types — including handwritten margin notes on legacy memos — for audit-grade retrieval.