Training sits inside export-controlled, tightly networked GPU clusters; deployment already runs on memory-heavy, heterogeneous systems. Inference rewards large memory hierarchies, cheap accelerators, aggressive quantization, and orchestration across CPUs and SSDs, so hyperscalers (Amazon’s Trainium) and memory-rich devices (Mac Studio M3 Ultra running...