Today we're shipping Oceum Recall — a compressed memory system that lets AI agent fleets share and search semantic knowledge at 1/10th the storage cost, with tenant-isolated encryption and time-based fidelity decay. Here's how it works and why it matters.
The problem
AI agents generate and consume embeddings constantly. Knowledge retrieval, memory search, categorization, similarity matching — every operation starts with a vector. At 1536 dimensions times 4 bytes per float, that's 6 KB per vector. A fleet of agents managing 100K knowledge chunks burns 600 MB of pure vector storage before you store a single row of metadata.
Scale that across organizations and the numbers compound. Cross-org isolation adds complexity — you can't just index everything together without leaking semantic similarity across tenant boundaries. And cold memories that haven't been accessed in months sit at the same fidelity as hot, actively-searched ones, wasting identical resources on data with fundamentally different access patterns.
The approach — inspired by TurboQuant
Google's TurboQuant algorithm (ICLR 2026) demonstrated that random orthogonal rotation followed by Lloyd-Max scalar quantization can compress embedding vectors to 3 bits per dimension with near-zero accuracy loss on retrieval benchmarks. The core insight: rotate the vector into a coordinate space where dimensions are decorrelated, then quantize each dimension independently with an optimal scalar quantizer.
We took that core math and built three novel layers on top — turning an academic compression algorithm into a production-grade memory system for multi-tenant AI fleets.
Three layers of innovation
1. Adaptive bit-width quantization
TurboQuant targets a fixed lossy bit-width. Recall goes further with four compression tiers: a functionally lossless mode using IEEE 754 half-precision with Brotli entropy coding (54% savings at 99.999997% cosine fidelity), plus 4-bit, 3-bit, and 2-bit lossy quantization with per-tier Lloyd-Max codebooks. Hot memories get lossless compression — zero fidelity loss. Warm memories compress to 4-bit. Cold memories decay to 3-bit, then 2-bit. The system automatically selects the optimal tier based on importance scoring.
2. Keyed rotation for tenant isolation
Instead of a random orthogonal rotation shared across all data, Recall derives a per-organization rotation matrix from HMAC-SHA256(master_secret, org_id). The rotation is deterministic (same org always gets the same matrix) and orthogonal (preserves dot products for similarity search). The result: even co-located data from different tenants lives in incompatible coordinate spaces. Cross-org similarity search returns noise, not neighbors.
3. Drift-aware fidelity decay
Memory isn't static. Recall implements cascading compression: vectors start lossless (float16 + Brotli, 54% savings), then decay to 4-bit, 3-bit, and finally 2-bit over time. When a cold memory is accessed, it snaps back to full fidelity from the original embedding. The decay schedule is configurable per-organization, and fidelity guarantees are maintained at each tier transition — you always know the minimum cosine similarity between the compressed and original vector.
The numbers
Recall ships today in both Oceum Cloud and Oceum Enterprise (self-hosted). The compression is transparent — agents interact with the same memory APIs, and Recall handles encoding, decoding, rotation, and tier management under the hood.
What this means for your fleet
- 54% smaller with zero quality loss. In lossless mode, your PostgreSQL vector storage drops from 600 MB to 274 MB for 100K vectors — with cosine error of 3e-8. Enable lossy decay tiers and cold memories compress to 78 MB. That's the difference between hitting your Supabase storage cap and having room to grow.
- Cross-org search is mathematically isolated. No runtime encryption overhead, no key exchange, no decrypt-before-search. The rotation preserves similarity within an org while making cross-org results meaningless.
- Old memories gracefully compress. Instead of deleting cold knowledge to save space, Recall decays it to 2-bit. When an agent needs it again, it snaps back to full fidelity from the source embedding. Nothing is lost — it just costs less to store.
- The package is open source. The
@oceum/vectorquantpackage is available on npm. Pure TypeScript, zero dependencies, works in Node.js, Deno, and the browser.
Recall ships today in Oceum v0.3.0. Update your fleet and your agents get better memory for free.
Read the full security architecture documentation for details on keyed rotation encryption, key management, and compression fidelity guarantees.