Nature
[Paper Note] DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning, Guo+, Nature'25, 2025.09
Paper/Blog Link My Issue
#NLP #LanguageModel #Reasoning #read-later Issue Date: 2025-09-18 GPT Summary- 本研究では、LLMsの推論能力を強化学習(RL)を通じて向上させ、人間によるラベル付けの必要性を排除することを示す。提案するRLフレームワークは、高度な推論パターンの発展を促進し、数学やコーディングコンペティションなどのタスクで優れたパフォーマンスを達成する。さらに、出現的な推論パターンは小さなモデルの能力向上にも寄与する。 Comment
DeepSeek-R1の論文のNature版が出た模様。
解説:
Supplementary Materials:
https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-025-09422-z/MediaObjects/41586_2025_9422_MOESM1_ESM.pdf
おそらくこちらの方が重要
[Paper Note] A foundation model to predict and capture human cognition, Binz+, Nature'25, 2025.07
Paper/Blog Link My Issue
#LanguageModel #FoundationModel #CognitiveScience Issue Date: 2025-07-06 Comment
元ポスト:
[Paper Note] Training large language models on narrow tasks can lead to broad misalignment, Nature 649, 2026.01
Paper/Blog Link My Issue
#Article #NLP #LanguageModel #Alignment #Safety #read-later #Selected Papers/Blogs #EmergentMisalignment Issue Date: 2026-01-15 Comment
元ポスト:
元ポストによると、以下のような時系列でEmergent Misalignmentのliteratureは形成されていったらしい:
- [Paper Note] Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs, Jan Betley+, arXiv'25, 2025.02
- [Paper Note] Persona Features Control Emergent Misalignment, Miles Wang+, arXiv'25, 2025.06
- [Paper Note] Model Organisms for Emergent Misalignment, Edward Turner+, arXiv'25, 2025.06
- [Paper Note] Convergent Linear Representations of Emergent Misalignment, Anna Soligo+, arXiv'25, 2025.06
- Narrow Misalignment is Hard, Emergent Misalignment is Easy, Turner+, 2025.07
- [Paper Note] School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs, Mia Taylor+, arXiv'25, 2025.08
- From shortcuts to sabotage: natural emergent misalignment from reward hacking, Anthropic, 2025.11
- [Paper Note] Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs, Jan Betley+, arXiv'25, 2025.12