The Fluid Substrate: Streaming 1TB Models from NVMe via Io_uring

Doug_Bitterbot · 2025-12-09T21:33:25 1765316005

We’ve been working on Federated Learning (FL) for autonomous agents and hit a hard bottleneck: standard FL (sending gradients back and forth) is bandwidth-heavy and requires massive edge compute.

We wrote this paper to propose an architectural shift we call Fluid Federated Learning (FFL).

The core engineering contributions are:

Prism Protocol: We implemented a "Software-Defined Memory" architecture. It uses io_uring to stream sparse, random projections of model weights directly from NVMe storage to the GPU.

This allows us to process "Virtual Batches" of terabyte-scale models on commodity hardware by exploiting the Johnson-Lindenstrauss lemma (Holographic Slicing).

Federated State-Space Duality (F-SSD): Instead of averaging gradients (which is slow and leaky), we exploit the duality between Transformers and SSMs (like Mamba) to federate the Recurrent States.

The Result: We can run massive foundation models on edge devices with limited VRAM by treating the SSD as a "slow" memory tier without destroying optimization fidelity.

I’m curious if anyone here has experimented with io_uring for model serving? We found the async I/O overhead to be negligible compared to the memory gains, but wondering if there are better ways to handle the sparse projections.

Happy to answer questions on the implementation.