Off-Policy
#Analysis#Pocket#NLP#LanguageModel#ReinforcementLearning#TransferLearning#DPO#GRPO#VerifiableRewards#On-Policy#Non-VerifiableRewards
Issue Date: 2025-06-30 Bridging Offline and Online Reinforcement Learning for LLMs, Jack Lanchantin+, arXiv25 Comment元ポスト:https://x.com/jaseweston/status/1939673136842313960?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #Article#Tutorial#Pocket#ReinforcementLearning#Blog
Issue Date: 2021-06-07 ゼロから始めてオフライン強化学習とConservative Q-Learningを理解する, aiueola, 2021.05
Issue Date: 2025-06-30 Bridging Offline and Online Reinforcement Learning for LLMs, Jack Lanchantin+, arXiv25 Comment元ポスト:https://x.com/jaseweston/status/1939673136842313960?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #Article#Tutorial#Pocket#ReinforcementLearning#Blog
Issue Date: 2021-06-07 ゼロから始めてオフライン強化学習とConservative Q-Learningを理解する, aiueola, 2021.05