Title: Titans: Learning to Memorize at Test Time
🔗 https://arxiv.org/abs/2501.00663
TL;DR: This paper introduces Titans, a new neural architecture family that combines attention (short-term memory) with a neural long-term memory module that learns to memorize information at test time. The neural memory uses gradient-based updates with momentum and forgetting mechanisms to store important information based on “surprise” metrics. The authors present three ways to incorporate this memory into architectures (as context, gate, or layer) and show Titans outperforms transformers and modern recurrent models across language modeling, commonsense reasoning, and needle-in-haystack tasks. Unlike transformers with quadratic complexity, Titans can efficiently scale to context windows beyond 2 million tokens while maintaining strong performance.
Speaker: Davide Napolitano
—
🗓️ Friday, May 16, 2025, Time 12:00-13:00
📍 Meeting Room 1 – DAUIN
💻 Zoom Meeting