Apple's AI Breakthrough: Memory Limitations Overcome
Apple's new architecture revolutionizes on-device AI by bypassing traditional memory limits. Discover how this innovation changes the landscape for AI models and their capabilities.
Apple's Innovative AI Architecture
Apple has unveiled its third-generation AI models, the AFM 3 family, which tackle the longstanding memory limitations faced by on-device AI. Traditionally, on-device models have been constrained by the need to fit their entire weight set into DRAM, severely limiting their parameter counts. However, the AFM 3 Core Advanced model stores its 20 billion parameters in NAND flash memory instead, allowing for more complex and capable AI functionalities.
This groundbreaking approach utilizes a unique mechanism called Instruction-Following Pruning (IFP), which treats flash memory as the permanent home for the model's weights. By routing decisions based on prompts rather than individual tokens, Apple has effectively streamlined the process, enabling the model to generate responses more efficiently. Key features include:
- Storage in NAND Flash: The entire weight set is stored in flash, not DRAM.
- Expert Routing: Routing decisions are made once per prompt, optimizing performance.
- Collaboration with Google: The server-side models leverage Nvidia GPUs in Google Cloud for enhanced processing power.