{"id":12222,"date":"2025-05-15T18:56:51","date_gmt":"2025-05-15T16:56:51","guid":{"rendered":"https:\/\/dbdmg.polito.it\/dbdmg_web\/?p=12222"},"modified":"2025-05-20T19:04:31","modified_gmt":"2025-05-20T17:04:31","slug":"reading-group-16-may-2025","status":"publish","type":"post","link":"https:\/\/dbdmg.polito.it\/dbdmg_web\/2025\/reading-group-16-may-2025\/","title":{"rendered":"Reading Group 16 May 2025"},"content":{"rendered":"\n<p class=\" eplus-wrapper\"><strong>Title:<\/strong> Titans: Learning to Memorize at Test Time<br>\ud83d\udd17 <a href=\"https:\/\/arxiv.org\/abs\/2501.00663\">https:\/\/arxiv.org\/abs\/2501.00663<\/a><\/p>\n\n\n\n<p class=\" eplus-wrapper\"><strong>TL;DR:<\/strong> This paper introduces Titans, a new neural architecture family that combines attention (short-term memory) with a neural long-term memory module that learns to memorize information at test time. The neural memory uses gradient-based updates with momentum and forgetting mechanisms to store important information based on &#8220;surprise&#8221; metrics. The authors present three ways to incorporate this memory into architectures (as context, gate, or layer) and show Titans outperforms transformers and modern recurrent models across language modeling, commonsense reasoning, and needle-in-haystack tasks. Unlike transformers with quadratic complexity, Titans can efficiently scale to context windows beyond 2 million tokens while maintaining strong performance.<\/p>\n\n\n\n<p class=\" eplus-wrapper\"><strong>Speaker:<\/strong> Davide Napolitano<\/p>\n\n\n\n<p class=\" eplus-wrapper\">&#8212;<br>\ud83d\uddd3\ufe0f<strong> <\/strong>Friday, May 16, 2025, Time 12:00-13:00<br>\ud83d\udccd Meeting Room 1 \u2013 DAUIN<br>\ud83d\udcbb <a href=\"https:\/\/polito-it.zoom.us\/j\/85220942232?pwd=7MVooHhMRAnO0Cv9dj4dhDNTQ8KGwk.1&amp;from=addon\">Zoom Meeting<\/a><\/p>\n\n\n\n<p class=\" eplus-wrapper\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\ud83d\udcc3 Titans: Learning to Memorize at Test Time<\/p>\n<p>\ud83d\uddd3\ufe0f May 16, 2025, Time 12:00-13:00<\/p>\n","protected":false},"author":35,"featured_media":0,"comment_status":"closed","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"editor_plus_copied_stylings":"{}","footnotes":""},"categories":[42,45],"tags":[],"class_list":["post-12222","post","type-post","status-publish","format-standard","hentry","category-events","category-reading-group"],"_links":{"self":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts\/12222","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/users\/35"}],"replies":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/comments?post=12222"}],"version-history":[{"count":1,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts\/12222\/revisions"}],"predecessor-version":[{"id":12223,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts\/12222\/revisions\/12223"}],"wp:attachment":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/media?parent=12222"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/categories?post=12222"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/tags?post=12222"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}