The latest research from Google
Jun 26, 2026
Accelerating Gemini Nano models on Pixel with frozen Multi-Token Prediction · The latest research from Google
Science, Technology & Innovation · Jun 26, 2026
Google retrofitted Multi-Token Prediction onto Gemini Nano v3 by freezing the backbone and training a lightweight drafting head plus verifier, yielding verified, bit-for-bit identical outputs with out-of-the-box latency speedups on Pixel devices without retraining or requalifying the base model.
Accelerating Gemini Nano models on Pixel with frozen Multi-Token Prediction · The latest research from Google
Science, Technology & Innovation · Jun 26, 2026
A “zero-copy” drafter design reuses the main model’s frozen key-value cache via an MTP head that cross-attends to it, eliminating drafter prefill latency and cutting ~130MB of runtime memory per instance versus a standalone drafter—addressing the mobile dynamic-memory bottleneck and showing memory architecture can be as decisive as model quality for edge AI deployment.
Accelerating Gemini Nano models on Pixel with frozen Multi-Token Prediction · The latest research from Google
Science, Technology & Innovation · Jun 26, 2026
Google's MTP uses speculative verification with richer drafts and a redesigned on-device inference stack to validate nearly two extra tokens per pass in production features (e.g., AI Notification Summaries, Proofread), cutting verification frequency, reducing how often heavy processors wake, lowering energy use and improving battery life—making on-device AI more usable and commercially defensible.
Accelerating Gemini Nano models on Pixel with frozen Multi-Token Prediction · The latest research from Google
Science, Technology & Innovation · Jun 26, 2026
Google found that attaching a lightweight MTP drafter head to a model’s final hidden states (a late-exit design) outperforms similarly sized standalone drafters in speculative decoding—giving ~50%+ speedups on Pixel 9 and up to 55% higher token acceptance—implying reuse of backbone internal state trumps mere parameter parity.