SuperLocalMemory Logo
SuperLocalMemory
Novel Contributions

Mathematical Foundations

Three techniques from differential geometry, algebraic topology, and stochastic analysis — each a first in agent memory systems.

Why Mathematics?

Current memory systems rely on cosine similarity for retrieval, pairwise checking for consistency, and hardcoded thresholds for lifecycle. These approaches work — but they have known limitations. Cosine ignores confidence. Pairwise checking misses transitive contradictions. Hardcoded thresholds do not adapt.

We asked: what if we replaced these heuristics with principled mathematics? The answer turned out to be three techniques from fields not previously applied to agent memory.

Fisher-Rao Information Geometry

Retrieval Layer

The problem: Cosine similarity treats embeddings as direction vectors. Two memories with the same meaning but different confidence look identical.

Our approach: We model each memory embedding as a diagonal Gaussian distribution with learned mean and variance. Similarity is measured along the Fisher-Rao geodesic — the natural metric on statistical manifolds. This is not an arbitrary choice: the Fisher-Rao metric is the unique Riemannian metric that is invariant under sufficient statistics.

What this means in practice:

  • High-confidence memories and low-confidence memories about the same topic are distinguished
  • Retrieval improves as the system learns — variance shrinks with repeated access (Bayesian conjugate updates)
  • After 10 accesses, the system transitions from cosine similarity to full Fisher-Rao distance (graduated ramp)

Measured impact: Removing Fisher-Rao drops multi-hop accuracy by 12 percentage points (50% → 38%). Across six conversations, the three mathematical layers collectively contribute +12.7pp average improvement.

Sheaf Cohomology for Consistency

Consistency Layer

The problem: As memories accumulate, contradictions emerge. "Alice moved to London in March" vs "Alice lives in Paris as of April." Pairwise checking catches direct contradictions but misses transitive ones — and scales as O(n²).

Our approach: We model the knowledge graph as a cellular sheaf — an algebraic structure from topology that assigns vector spaces to nodes and edges, with restriction maps encoding how local data relates across the graph. Computing the first cohomology group H¹(G,F) reveals global inconsistencies:

  • H¹ = 0 — All memories are globally consistent
  • H¹ ≠ 0 — Contradictions exist, even if every local pair looks fine

What this means in practice: The system detects contradictions that no pairwise method can find. When detected, contradictions are resolved automatically (newer supersedes older with SUPERSEDES edges) or surfaced to the user.

Why sheaf theory: Sheaves are the natural mathematical language for "local-to-global" problems. Checking consistency of a knowledge graph from local edge data is exactly this kind of problem.

Riemannian Langevin Dynamics

Lifecycle Layer

The problem: Memory systems need lifecycle management — old, unused memories should be archived. Current systems use hardcoded thresholds ("archive after 30 days"). This does not adapt to usage patterns and requires manual tuning.

Our approach: Memory lifecycle evolves via stochastic gradient flow on the Poincaré ball. The potential function encodes access frequency, trust score, and recency. The dynamics are a discretized Langevin SDE on a Riemannian manifold with provable convergence to a stationary distribution.

Four lifecycle states:

  • Active — Near the origin. Frequently used, instantly available
  • Warm — Intermediate. Recently used, included in searches
  • Cold — Further out. Older, retrievable on demand
  • Archived — Near the boundary. Compressed, restorable

What this means in practice: No manual thresholds. Frequently accessed memories stay active longer. Low-trust memories decay faster. The system self-organizes toward the mathematically optimal distribution.

Ablation Results

Each row disables one component. The difference shows what each layer contributes.

Configuration Micro Avg Multi-Hop Open Domain
Full system (all layers) 62.3% 50% 78%
− Math layers59.3%38%70%
− Entity channel56.8%38%73%
− BM25 channel53.2%23%71%
− Cross-encoder31.8%17%

LoCoMo conv-30, 81 scored questions, Mode A (zero-LLM). Mathematical layers contribute +12pp on multi-hop reasoning (50% vs 38%).

Full Mathematical Treatment

Proofs, theorems, and detailed experimental methodology in the V3 paper.

Every algorithm is open source under MIT license. Take it, use it, extend it.