Today we're shipping Oceum Recall — a compressed memory system that lets AI agent fleets share and search semantic knowledge at 1/10th the storage cost, with tenant-isolated encryption and time-based fidelity decay. Here's how it works and why it matters.

The problem

AI agents generate and consume embeddings constantly. Knowledge retrieval, memory search, categorization, similarity matching — every operation starts with a vector. At 1536 dimensions times 4 bytes per float, that's 6 KB per vector. A fleet of agents managing 100K knowledge chunks burns 600 MB of pure vector storage before you store a single row of metadata.

Scale that across organizations and the numbers compound. Cross-org isolation adds complexity — you can't just index everything together without leaking semantic similarity across tenant boundaries. And cold memories that haven't been accessed in months sit at the same fidelity as hot, actively-searched ones, wasting identical resources on data with fundamentally different access patterns.

The approach — inspired by TurboQuant

Google's TurboQuant algorithm (ICLR 2026) demonstrated that random orthogonal rotation followed by Lloyd-Max scalar quantization can compress embedding vectors to 3 bits per dimension with near-zero accuracy loss on retrieval benchmarks. The core insight: rotate the vector into a coordinate space where dimensions are decorrelated, then quantize each dimension independently with an optimal scalar quantizer.

We took that core math and built three novel layers on top — turning an academic compression algorithm into a production-grade memory system for multi-tenant AI fleets.

Three layers of innovation

1. Adaptive bit-width quantization

TurboQuant targets a fixed lossy bit-width. Recall goes further with four compression tiers: a functionally lossless mode using IEEE 754 half-precision with Brotli entropy coding (54% savings at 99.999997% cosine fidelity), plus 4-bit, 3-bit, and 2-bit lossy quantization with per-tier Lloyd-Max codebooks. Hot memories get lossless compression — zero fidelity loss. Warm memories compress to 4-bit. Cold memories decay to 3-bit, then 2-bit. The system automatically selects the optimal tier based on importance scoring.

2. Keyed rotation for tenant isolation

Instead of a random orthogonal rotation shared across all data, Recall derives a per-organization rotation matrix from HMAC-SHA256(master_secret, org_id). The rotation is deterministic (same org always gets the same matrix) and orthogonal (preserves dot products for similarity search). The result: even co-located data from different tenants lives in incompatible coordinate spaces. Cross-org similarity search returns noise, not neighbors.

3. Drift-aware fidelity decay

Memory isn't static. Recall implements cascading compression: vectors start lossless (float16 + Brotli, 54% savings), then decay to 4-bit, 3-bit, and finally 2-bit over time. When a cold memory is accessed, it snaps back to full fidelity from the original embedding. The decay schedule is configurable per-organization, and fidelity guarantees are maintained at each tier transition — you always know the minimum cosine similarity between the compressed and original vector.

The numbers

54%
Lossless savings (2,800 bytes vs 6,144 bytes per 1536-dim vector)
3e-8
Cosine error in lossless mode — smaller than floating point noise
4
Compression tiers — lossless, 4-bit, 3-bit, 2-bit with automatic decay
0
Dependencies — Node.js built-in zlib, self-contained

Recall ships today in both Oceum Cloud and Oceum Enterprise (self-hosted). The compression is transparent — agents interact with the same memory APIs, and Recall handles encoding, decoding, rotation, and tier management under the hood.

What this means for your fleet

Recall ships today in Oceum v0.3.0. Update your fleet and your agents get better memory for free.

Read the full security architecture documentation for details on keyed rotation encryption, key management, and compression fidelity guarantees.