Post
New post: nanochat miniseries v1,
Paper/Blog Link My Issue
#Article #Pocket #LanguageModel #read-later Issue Date: 2026-01-09
The next equalizer is not model architecture, but mastery over data behavior, gm8xx8, 2025.12
Paper/Blog Link My Issue
#Article #Pretraining #NLP #LanguageModel #SyntheticData #Selected Papers/Blogs #DataMixture #PhaseTransition Issue Date: 2026-01-07 Comment
関連(4-epochまで再利用するのがコスパが良いことを示した研究):
- Scaling Data-Constrained Language Models, Niklas Muennighoff+, NeurIPS'23
関連(合成データの比率によるPhaseTransition):
- [Paper Note] Data Mixing Can Induce Phase Transitions in Knowledge Acquisition, Xinran Gu+, NeurIPS'25 Spotlight, 2025.05
- [Paper Note] Demystifying Synthetic Data in LLM Pre-training: A Systematic Study of Scaling Laws, Benefits, and Pitfalls, Feiyang Kang+, EMNLP'25, 2025.10
- [Paper Note] Why Less is More (Sometimes): A Theory of Data Curation, Elvis Dohmatob+, arXiv'25, 2025.11
Today's conversations about AI-assisted programming are strikingly similar to those from decades ago about the choice between low-level languages like C versus high-level languages like Python, Arvind Narayanan, 2025.12
Paper/Blog Link My Issue
#Article #NLP #LanguageModel #AIAgents #Coding #SoftwareEngineering Issue Date: 2025-12-31
Hot topics in RL, Kimbo, X, 2025.12
Paper/Blog Link My Issue
#Article #EfficiencyImprovement #NLP #LanguageModel #ReinforcementLearning #Diversity #train-inference-gap Issue Date: 2025-12-22 Comment
ロールアウト側のエンジンと、学習側のエンジンのトークンのlogprobのミスマッチによりon-policy RLを実施しているつもりが実はoff policyになってしまっているという話と
- Your Efficient RL Framework Secretly Brings You Off-Policy RL Training, Yao+, 2025.08
- [Paper Note] Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale
Thinking Model, Ling Team+, arXiv'25, 2025.10
- [Paper Note] Stabilizing MoE Reinforcement Learning by Aligning Training and
Inference Routers, Wenhan Ma+, arXiv'25, 2025.10
長いロールアウトを待っている間がアイドルタイムとなり学習が非常に遅くなる問題を、長すぎるロールアウトは待たないでモデルの重みをロールアウトの途中でもかけてしまい、新しいポリシーでロールアウトを継続すると学習は崩壊せずに高速化できるよ(=in flight updates)という話と
- [Paper Note] PipelineRL: Faster On-policy Reinforcement Learning for Long Sequence
Generation, Alexandre Piché+, arXiv'25, 2025.09
- PipelineRL, Piche+, ServiceNow, 2025.04
RLVRはもともとモデルが事前学習時に保持しているReasoningの能力を広げるわけではなく効率化するだけだよ、という主張と、
- [Paper Note] Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?, Yang Yue+, NeurIPS'25, 2025.04
効率化するだけという主張と、Reasoning能力を拡大しているよ、という相反する主張がコミュニティでされているがそれらをphysics of language modelsに則り完全にコントロールされた条件下で実験し、どのような条件でどのような挙動になるかを明らかにしたよ、という話
- [Paper Note] On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models, Charlie Zhang+, arXiv'25, 2025.12
RLVRはPass@1を報酬としているとみなせるが、それをPass@kにすることで、モデルがRL中に探索する能力が向上し、downstreamタスクのPass@kが向上するよ
- [Paper Note] Pass@k Training for Adaptively Balancing Exploration and Exploitation of
Large Reasoning Models, Zhipeng Chen+, arXiv'25
といったこの辺の話がホットトピックとして挙げられている。
train-inference-mismatchについては、以下もおもしろかった:
- SID-1 Technical Report: Test-Time Compute for Retrieval, SID Research, 2025.12
- [Paper Note] Defeating the Training-Inference Mismatch via FP16, Penghui Qi+, arXiv'25, 2025.10
Launch two tightly connected milestones in the Physics of LM series: a sharpened Part 4.1 (v2.0) and a brand new Part 4.2, Zeyuan Allen-Zhu, X, 2025.12
Paper/Blog Link My Issue
#Article Issue Date: 2025-12-17
Why Training MoEs is So Hard, _xjdr, X Post
Paper/Blog Link My Issue
#Article #NLP #LanguageModel #SmallModel #MoE(Mixture-of-Experts) #read-later #reading Issue Date: 2025-12-08
[Thread Memo] 最近の最適化に関する研究についての見解, Seunghyun Seo, 2025.10
Paper/Blog Link My Issue
#Article #NeuralNetwork #Optimizer Issue Date: 2025-10-28 Comment
関連:
- [Paper Note] Weight Decay may matter more than muP for Learning Rate Transfer in
Practice, Atli Kosson+, arXiv'25, 2025.10
- [Paper Note] Robust Layerwise Scaling Rules by Proper Weight Decay Tuning, Zhiyuan Fan+, arXiv'25, 2025.10
- [Paper Note] WHEN DOES SECOND-ORDER OPTIMIZATION SPEED UP TRAINING?, Ishikawa+, ICLR'24 Tiny Paper
- [Paper Note] Fantastic Pretraining Optimizers and Where to Find Them, Kaiyue Wen+, arXiv'25
A few prompt engineering tips that Ilya Sutskever picked up at OpenAI, Ilya Sutskever, 2024.09
Paper/Blog Link My Issue
#Article #NLP #LanguageModel #Prompting Issue Date: 2024-09-08