{"id":11193,"date":"2025-02-05T13:58:42","date_gmt":"2025-02-05T12:58:42","guid":{"rendered":"https:\/\/dbdmg.polito.it\/dbdmg_web\/?p=11193"},"modified":"2025-03-03T12:13:30","modified_gmt":"2025-03-03T11:13:30","slug":"reading-group-7-feb-2025","status":"publish","type":"post","link":"https:\/\/dbdmg.polito.it\/dbdmg_web\/2025\/reading-group-7-feb-2025\/","title":{"rendered":"Reading Group 7 Feb 2025"},"content":{"rendered":"<p class=\" eplus-wrapper eplus-styles-uid-452832\"><strong>Title: <\/strong>DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning<br>\ud83d\udd17<a href=\"https:\/\/arxiv.org\/pdf\/2501.12948?\">https:\/\/arxiv.org\/pdf\/2501.12948<\/a><\/p>\n\n\n<p class=\" eplus-wrapper\"><strong>Abstract: <\/strong>Abstract: We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without super- vised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek- R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.<\/p>\n\n\n\n<p class=\" eplus-wrapper\"><strong>Speaker:<\/strong> Simone Papicchio<\/p>\n\n\n\n<p class=\" eplus-wrapper\">&#8212;<br>\ud83d\uddd3\ufe0f<strong> <\/strong>Friday, February 7, 2025, Time 17:00-18:00<br>\ud83d\udccd Meeting Room 1 &#8211; DAUIN<br>\ud83d\udcbb <a href=\"https:\/\/polito-it.zoom.us\/j\/85220942232?pwd=7MVooHhMRAnO0Cv9dj4dhDNTQ8KGwk.1&amp;from=addon\">Zoom Meeting<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\ud83d\udcc3 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning<\/p>\n<p>\ud83d\uddd3\ufe0f February 7, 2025, Time 17:00-18:00<\/p>\n","protected":false},"author":36,"featured_media":0,"comment_status":"closed","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"editor_plus_copied_stylings":"{}","footnotes":""},"categories":[42,45],"tags":[],"class_list":["post-11193","post","type-post","status-publish","format-standard","hentry","category-events","category-reading-group"],"_links":{"self":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts\/11193","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/users\/36"}],"replies":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/comments?post=11193"}],"version-history":[{"count":3,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts\/11193\/revisions"}],"predecessor-version":[{"id":11441,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts\/11193\/revisions\/11441"}],"wp:attachment":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/media?parent=11193"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/categories?post=11193"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/tags?post=11193"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}