mid-training

#Analysis#Pocket#NLP#LanguageModel#ReinforcementLearning#PostTraining#read-later#Admin'sPick
Issue Date: 2025-06-27 OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling, Zengzhi Wang+, arXiv25 Comment元ポスト:https://x.com/sinclairwang1/status/1938244843857449431?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Qmid-trainingの観点から、post trainingにおけるRLがスケーリングする条件をsystematical ...