{"id":12222,"date":"2025-05-15T18:56:51","date_gmt":"2025-05-15T16:56:51","guid":{"rendered":"https:\/\/dbdmg.polito.it\/dbdmg_web\/?p=12222"},"modified":"2025-05-20T19:04:31","modified_gmt":"2025-05-20T17:04:31","slug":"reading-group-16-may-2025","status":"publish","type":"post","link":"https:\/\/dbdmg.polito.it\/dbdmg_web\/2025\/reading-group-16-may-2025\/","title":{"rendered":"Reading Group 16 May 2025"},"content":{"rendered":"\n<p class=\"eplus-wrapper wp-block-paragraph\"><strong>Title:<\/strong> Titans: Learning to Memorize at Test Time<br>\ud83d\udd17 <a href=\"https:\/\/arxiv.org\/abs\/2501.00663\">https:\/\/arxiv.org\/abs\/2501.00663<\/a><\/p>\n\n\n\n<p class=\"eplus-wrapper wp-block-paragraph\"><strong>TL;DR:<\/strong> This paper introduces Titans, a new neural architecture family that combines attention (short-term memory) with a neural long-term memory module that learns to memorize information at test time. The neural memory uses gradient-based updates with momentum and forgetting mechanisms to store important information based on &#8220;surprise&#8221; metrics. The authors present three ways to incorporate this memory into architectures (as context, gate, or layer) and show Titans outperforms transformers and modern recurrent models across language modeling, commonsense reasoning, and needle-in-haystack tasks. Unlike transformers with quadratic complexity, Titans can efficiently scale to context windows beyond 2 million tokens while maintaining strong performance.<\/p>\n\n\n\n<p class=\"eplus-wrapper wp-block-paragraph\"><strong>Speaker:<\/strong> Davide Napolitano<\/p>\n\n\n\n<p class=\"eplus-wrapper wp-block-paragraph\">&#8212;<br>\ud83d\uddd3\ufe0f<strong> <\/strong>Friday, May 16, 2025, Time 12:00-13:00<br>\ud83d\udccd Meeting Room 1 \u2013 DAUIN<br>\ud83d\udcbb <a href=\"https:\/\/polito-it.zoom.us\/j\/85220942232?pwd=7MVooHhMRAnO0Cv9dj4dhDNTQ8KGwk.1&amp;from=addon\">Zoom Meeting<\/a><\/p>\n\n\n\n<p class=\"eplus-wrapper wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\ud83d\udcc3 Titans: Learning to Memorize at Test Time<\/p>\n<p>\ud83d\uddd3\ufe0f May 16, 2025, Time 12:00-13:00<\/p>\n","protected":false},"author":35,"featured_media":0,"comment_status":"closed","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"editor_plus_copied_stylings":"{}","footnotes":""},"categories":[42,45],"tags":[],"class_list":["post-12222","post","type-post","status-publish","format-standard","hentry","category-events","category-reading-group"],"_links":{"self":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts\/12222","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/users\/35"}],"replies":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/comments?post=12222"}],"version-history":[{"count":1,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts\/12222\/revisions"}],"predecessor-version":[{"id":12223,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts\/12222\/revisions\/12223"}],"wp:attachment":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/media?parent=12222"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/categories?post=12222"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/tags?post=12222"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}