Mathematical Foundations
Three techniques from differential geometry, algebraic topology, and stochastic analysis — each a first in agent memory systems.
Why Mathematics?
Current memory systems rely on cosine similarity for retrieval, pairwise checking for consistency, and hardcoded thresholds for lifecycle. These approaches work — but they have known limitations. Cosine ignores confidence. Pairwise checking misses transitive contradictions. Hardcoded thresholds do not adapt.
We asked: what if we replaced these heuristics with principled mathematics? The answer turned out to be three techniques from fields not previously applied to agent memory.
Fisher-Rao Information Geometry
Retrieval LayerThe problem: Cosine similarity treats embeddings as direction vectors. Two memories with the same meaning but different confidence look identical.
Our approach: We model each memory embedding as a diagonal Gaussian distribution with learned mean and variance. Similarity is measured along the Fisher-Rao geodesic — the natural metric on statistical manifolds. This is not an arbitrary choice: the Fisher-Rao metric is the unique Riemannian metric that is invariant under sufficient statistics.
What this means in practice:
- High-confidence memories and low-confidence memories about the same topic are distinguished
- Retrieval improves as the system learns — variance shrinks with repeated access (Bayesian conjugate updates)
- After 10 accesses, the system transitions from cosine similarity to full Fisher-Rao distance (graduated ramp)
Measured impact: Removing Fisher-Rao drops multi-hop accuracy by 12 percentage points (50% → 38%). Across six conversations, the three mathematical layers collectively contribute +12.7pp average improvement.
Sheaf Cohomology for Consistency
Consistency LayerThe problem: As memories accumulate, contradictions emerge. "Alice moved to London in March" vs "Alice lives in Paris as of April." Pairwise checking catches direct contradictions but misses transitive ones — and scales as O(n²).
Our approach: We model the knowledge graph as a cellular sheaf — an algebraic structure from topology that assigns vector spaces to nodes and edges, with restriction maps encoding how local data relates across the graph. Computing the first cohomology group H¹(G,F) reveals global inconsistencies:
- H¹ = 0 — All memories are globally consistent
- H¹ ≠ 0 — Contradictions exist, even if every local pair looks fine
What this means in practice: The system detects contradictions that no pairwise method can find. When detected, contradictions are resolved automatically (newer supersedes older with SUPERSEDES edges) or surfaced to the user.
Why sheaf theory: Sheaves are the natural mathematical language for "local-to-global" problems. Checking consistency of a knowledge graph from local edge data is exactly this kind of problem.
Riemannian Langevin Dynamics
Lifecycle LayerThe problem: Memory systems need lifecycle management — old, unused memories should be archived. Current systems use hardcoded thresholds ("archive after 30 days"). This does not adapt to usage patterns and requires manual tuning.
Our approach: Memory lifecycle evolves via stochastic gradient flow on the Poincaré ball. The potential function encodes access frequency, trust score, and recency. The dynamics are a discretized Langevin SDE on a Riemannian manifold with provable convergence to a stationary distribution.
Four lifecycle states:
- Active — Near the origin. Frequently used, instantly available
- Warm — Intermediate. Recently used, included in searches
- Cold — Further out. Older, retrievable on demand
- Archived — Near the boundary. Compressed, restorable
What this means in practice: No manual thresholds. Frequently accessed memories stay active longer. Low-trust memories decay faster. The system self-organizes toward the mathematically optimal distribution.
Ablation Results
Each row disables one component. The difference shows what each layer contributes.
| Configuration | Micro Avg | Multi-Hop | Open Domain |
|---|---|---|---|
| Full system (all layers) | 62.3% | 50% | 78% |
| − Math layers | 59.3% | 38% | 70% |
| − Entity channel | 56.8% | 38% | 73% |
| − BM25 channel | 53.2% | 23% | 71% |
| − Cross-encoder | 31.8% | 17% | — |
LoCoMo conv-30, 81 scored questions, Mode A (zero-LLM). Mathematical layers contribute +12pp on multi-hop reasoning (50% vs 38%).
Full Mathematical Treatment
Proofs, theorems, and detailed experimental methodology in the V3 paper.
Every algorithm is open source under MIT license. Take it, use it, extend it.