read-later

#Analysis#Pocket#NLP#LanguageModel#ReinforcementLearning#mid-training#PostTraining#Admin'sPick
Issue Date: 2025-06-27 OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling, Zengzhi Wang+, arXiv25 Comment元ポスト:https://x.com/sinclairwang1/status/1938244843857449431?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Qmid-trainingの観点から、post trainingにおけるRLがスケーリングする条件をsystematical ... #Analysis#Pocket#NLP#LanguageModel#SelfImprovement#ICLR#Verification
Issue Date: 2025-06-24 Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models, Yuda Song+, ICLR25 Comment参考:https://joisino.hatenablog.com/entry/misleadVerificationに対する理解を深めるのに非常に良さそう ... #Pocket
Issue Date: 2025-06-23 Reinforcement Learning Teachers of Test Time Scaling, Edoardo Cetin+, arXiv25 Comment元ポスト:https://x.com/sakanaailabs/status/1936965841188425776?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ...

#Pocket#NLP#LanguageModel#ReinforcementLearning#Reasoning#PostTraining#Admin'sPick
Issue Date: 2025-06-22 Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective, Zhoujun Cheng+, arXiv25 Comment元ポスト:https://x.com/chengzhoujun/status/1936113985507803365?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Qpost-trainingにおけるRLのcross domain(Math, Code, Science, Logic, T ... #Analysis#Pocket#NLP#LanguageModel#Supervised-FineTuning (SFT)
Issue Date: 2025-06-18 Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality, Yuto Harada+, arXiv25 Comment元ポスト:https://x.com/odashi_t/status/1935191113981403359?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #EfficiencyImprovement#MachineLearning#Pocket#NLP#LanguageModel#Supervised-FineTuning (SFT)#PostTraining
Issue Date: 2025-06-13 Resa: Transparent Reasoning Models via SAEs, Shangshang Wang+, arXiv25 Comment元ポスト:https://x.com/iscienceluvr/status/1933101904529363112?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q著者ポスト:https://x.com/upupwang/status/1933207676663865482?s=46&t ... #Analysis#Pocket#NLP#LanguageModel#Memorization
Issue Date: 2025-06-05 How much do language models memorize?, John X. Morris+, arXiv25 Comment元ポスト:https://x.com/rohanpaul_ai/status/1929989864927146414?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #Pocket#NLP#LanguageModel#LLMAgent#SelfImprovement
Issue Date: 2025-06-05 Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents, Jenny Zhang+, arXiv25 Comment元ポスト:https://www.linkedin.com/posts/omarsar_new-paper-open-ended-evolution-of-self-improving-activity-7334610178832556033-8dA-?utm_source=share&utm_me ... #Analysis#Pocket#NLP#LanguageModel#ReinforcementLearning
Issue Date: 2025-06-04 ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models, Mingjie Liu+, arXiv25 Comment元ポスト:https://x.com/hillbig/status/1930043688329326962?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #Pocket#NLP#LanguageModel#VerifiableRewards#RLVR#Verification
Issue Date: 2025-06-03 Pitfalls of Rule- and Model-based Verifiers -- A Case Study on Mathematical Reasoning, Yuzhen Huang+, arXiv25 Comment元ポスト:https://x.com/junxian_he/status/1929371821767586284?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Qverificationタスクに特化してfinetuningされたDiscriminative Classifierが、rewa ... #Pocket#NLP#LanguageModel#LLMAgent#SoftwareEngineering
Issue Date: 2025-06-01 Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering, Guangtao Zeng+, arXiv25 Comment元ポスト:https://x.com/gan_chuang/status/1928963872188244400?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #Pocket#NLP#Temporal#LanguageModel
Issue Date: 2025-05-27 Temporal Sampling for Forgotten Reasoning in LLMs, Yuetai Li+, arXiv25 Comment元ポスト:https://x.com/iscienceluvr/status/1927286319018832155?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QTemporal ForgettingとTemporal Sampling ... #Pocket#NLP#LanguageModel#LongSequence#OpenWeight
Issue Date: 2025-05-27 QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning, Fanqi Wan+, arXiv25 Comment元ポスト:https://x.com/_akhaliq/status/1927011243597967524?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #EfficiencyImprovement#Pocket#NLP#LanguageModel#ReinforcementLearning
Issue Date: 2025-05-09 Reinforcement Learning for Reasoning in Large Language Models with One Training Example, Yiping Wang+, arXiv25 Comment![image](https://github.com/user-attachments/assets/03cd9200-7fed-4c6d-a5a6-2379d2c8950a)下記ポストでQwenに対してpromptを適切に与えることで、追加のpost training無しで高い数学に関する能力を ... #Pocket#NLP#Dataset#LanguageModel#Mathematics#Coding
Issue Date: 2025-05-08 Rewriting Pre-Training Data Boosts LLM Performance in Math and Code, Kazuki Fujii+, arXiv25 Comment元ポスト:https://x.com/okoge_kaz/status/1920141189652574346?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q解説ポスト:https://x.com/hillbig/status/1920613041026314274?s=46&t=Y6U ... #Pocket#LanguageModel#RLVR
Issue Date: 2025-05-08 Absolute Zero: Reinforced Self-play Reasoning with Zero Data, Andrew Zhao+, arXiv25 Comment元ポスト:https://x.com/arankomatsuzaki/status/1919946713567264917?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #ComputerVision#Embeddings#Analysis#Pocket#NLP#LanguageModel#Supervised-FineTuning (SFT)#Chain-of-Thought#SSM (StateSpaceModel)#ICML#PostTraining
Issue Date: 2025-05-04 Layer by Layer: Uncovering Hidden Representations in Language Models, Oscar Skean+, ICML25 Comment現代の代表的な言語モデルのアーキテクチャ(decoder-only model, encoder-only model, SSM)について、最終層のembeddingよりも中間層のembeddingの方がdownstream task(MTEBの32Taskの平均)に、一貫して(ただし、これはMTE ... #Analysis#NLP#LanguageModel#Supervised-FineTuning (SFT)#ReinforcementLearning#Evaluation#SmallModel#PostTraining
Issue Date: 2025-04-13 A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility, Andreas Hochlehnert+, arXiv25 Comment元ポスト:https://x.com/wenhuchen/status/1911143014258405420?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QSLMをmath reasoning向けにpost-trainingする場合、RL(既存研究で試されているもの)よりも(大規模モデ ... #Analysis#Pretraining#Pocket#NLP#LanguageModel#Supervised-FineTuning (SFT)#ICLR
Issue Date: 2025-03-27 Overtrained Language Models Are Harder to Fine-Tune, Jacob Mitchell Springer+, ICLR25 Comment著者によるポスト:https://x.com/jacspringer/status/1904960783341023521?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q事前学習のトークン数を増やすとモデルのsensitivityが増し、post-trainingでのパフォーマンスの劣化 ... #MachineLearning#Pocket#LanguageModel#ReinforcementLearning#Reasoning#LongSequence#GRPO
Issue Date: 2025-03-20 DAPO: An Open-Source LLM Reinforcement Learning System at Scale, Qiying Yu+, arXiv25 Comment既存のreasoning modelのテクニカルレポートにおいて、スケーラブルなRLの学習で鍵となるレシピは隠されていると主張し、実際彼らのbaselineとしてGRPOを走らせたところ、DeepSeekから報告されているAIME2024での性能(47ポイント)よりもで 大幅に低い性能(30ポイント ... #Pocket#NLP#LanguageModel#Supervised-FineTuning (SFT)#Test-Time Scaling
Issue Date: 2025-02-07 s1: Simple test-time scaling, Niklas Muennighoff+, arXiv25 Comment解説:https://x.com/hillbig/status/1887260791981941121?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #EfficiencyImprovement#Pocket#NLP#LanguageModel#Scaling Laws
Issue Date: 2025-05-27 Densing Law of LLMs, Chaojun Xiao+, arXiv24 Comment元ポスト:https://x.com/hillbig/status/1926785750277693859?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q![image](https://github.com/user-attachments/assets/8cdcfe78-6682-4 ... #RecommenderSystems#Pocket#UAI#ColdStart
Issue Date: 2025-05-16 Cold-start Recommendation by Personalized Embedding Region Elicitation, Hieu Trung Nguyen+, UAI24 CommentOpenReview:https://openreview.net/forum?id=ciOkU5YpvU ... #NLP#Dataset#Japanese#Trustfulness
Issue Date: 2025-05-10 日本語TrustfulQAの構築, 中村+, NLP24 #Pocket#NLP#Dataset#LanguageModel#EMNLP#KnowledgeEditing
Issue Date: 2025-05-07 Editing Large Language Models: Problems, Methods, and Opportunities, Yunzhi Yao+, EMNLP24 #Analysis#NLP#LanguageModel#SyntheticData
Issue Date: 2025-05-06 Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers, Zeyuan Allen-Zhu+, ICML24 Tutorial Comment元ポスト:https://x.com/hillbig/status/1919878625488449849?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QCanon層の発見 ... #NLP#LanguageModel#RLHF#Reasoning#Mathematics#GRPO
Issue Date: 2025-01-04 DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models, Zhihong Shao+, arXiv24 Comment元ポスト:https://www.linkedin.com/posts/philipp-schmid-a6a2bb196_the-rlhf-method-behind-the-best-open-models-activity-7280850174522843137-3V9v?utm_source= ... #Pocket#NLP#LanguageModel#TheoryOfMind
Issue Date: 2024-12-31 Explore Theory of Mind: Program-guided adversarial data generation for theory of mind reasoning, Melanie Sclar+, arXiv24 Commentおもしろそう。あとで読む ... #NLP#LanguageModel#Alignment#DPO#PostTraining#Admin'sPick
Issue Date: 2024-09-25 Direct Preference Optimization: Your Language Model is Secretly a Reward Model, Rafael Rafailov+, N_A, NeurIPS24 CommentDPOを提案した研究 image解説ポスト:https://x.com/thet ... #MachineLearning#Pocket#NLP#LanguageModel#ICLR#ModelMerge
Issue Date: 2024-01-23 Knowledge Fusion of Large Language Models, Fanqi Wan+, N_A, ICLR24 Summary本研究では、既存の事前訓練済みの大規模言語モデル(LLMs)を統合することで、1つの強力なモデルを作成する方法を提案しています。異なるアーキテクチャを持つ3つの人気のあるLLMsを使用して、ベンチマークとタスクのパフォーマンスを向上させることを実証しました。提案手法のコード、モデルの重み、およびデータはGitHubで公開されています。 #EfficiencyImprovement#Pocket#NLP#LanguageModel#Inference
Issue Date: 2025-06-12 SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills, Amey Agrawal+, arXiv23 CommentvLLMでも採用されている `Chunked Prefills` と `Decode-Maximal Batching` を提案している。![Image](https://github.com/user-attachments/assets/4db0f73d-bdf4-4c2b-a765-2c9b ... #MachineLearning#Pocket#NLP#LanguageModel#NeurIPS#ITI (Inference Time Intervention)#Probing#Trustfulness
Issue Date: 2025-05-09 Inference-Time Intervention: Eliciting Truthful Answers from a Language Model, Kenneth Li+, NeurIPS23 CommentInference Time Interventionを提案した研究。Attention Headに対して線形プロービング[^1]を実施し、真実性に関連するであろうHeadをtopKで特定できるようにし、headの出力に対し真実性を高める方向性のベクトルvを推論時に加算することで(=interven ... #MachineLearning#Pocket#NLP#LanguageModel#NeurIPS#Scaling Laws
Issue Date: 2025-03-23 Scaling Data-Constrained Language Models, Niklas Muennighoff+, NeurIPS23 CommentOpenReview:https://openreview.net/forum?id=j5BuTrEj35チンチラ則のようなScaling Lawsはパラメータとデータ量の両方をスケールさせた場合の前提に立っており、かつデータは全てuniqueである前提だったが、データの枯渇が懸念される昨今の状況に ... #NLP#LanguageModel#Alignment#ChatGPT#RLHF#PPO (ProximalPolicyOptimization)#PostTraining
Issue Date: 2024-04-28 Training language models to follow instructions with human feedback, Long Ouyang+, N_A, NeurIPS22 Summary大規模な言語モデルは、ユーザーの意図に合わない出力を生成することがあります。本研究では、人間のフィードバックを使用してGPT-3を微調整し、InstructGPTと呼ばれるモデルを提案します。この手法により、13億パラメータのInstructGPTモデルの出力が175BのGPT-3の出力よりも好まれ、真実性の向上と有害な出力の削減が示されました。さらに、一般的なNLPデータセットにおける性能の低下は最小限でした。InstructGPTはまだ改善の余地がありますが、人間のフィードバックを使用した微調整が有望な方向であることを示しています。 CommentChatGPTの元となる、SFT→Reward Modelの訓練→RLHFの流れが提案された研究。DemonstrationデータだけでSFTするだけでは、人間の意図したとおりに動作しない問題があったため、人間の意図にAlignするように、Reward Modelを用いたRLHFでSFTの後に追加で ... image#RecommenderSystems#Pocket#Reproducibility
Issue Date: 2025-05-16 A Troubling Analysis of Reproducibility and Progress in Recommender Systems Research, Maurizio Ferrari Dacrema+, TOIS21 #RecommenderSystems#NeuralNetwork#CollaborativeFiltering#Pocket#MatrixFactorization#RecSys#Reproducibility
Issue Date: 2025-05-16 Neural Collaborative Filtering vs. Matrix Factorization Revisited, Steffen Rendle+, RecSys20 #RecommenderSystems#RecSys#Reproducibility
Issue Date: 2025-05-14 Are We Evaluating Rigorously? Benchmarking Recommendation for Reproducible Evaluation and Fair Comparison, Zun+, RecSys20 Comment日本語解説:https://qiita.com/smochi/items/c4cecc48e4aba0071ead ... #NeuralNetwork#ComputerVision#MachineLearning#Pocket#NLP#ICLR#KnowledgeEditing
Issue Date: 2025-05-07 Editable Neural Networks, Anton Sinitsin+, ICLR20 Comment(おそらく)Knowledge Editingを初めて提案した研究OpenReview:https://openreview.net/forum?id=HJedXaEtvS ... #RecommenderSystems#Pocket#Reproducibility
Issue Date: 2025-05-14 On the Difficulty of Evaluating Baselines: A Study on Recommender Systems, Steffen Rendle+, arXiv19 #Article#LLMAgent#Blog#Programming
Issue Date: 2025-06-21 AI-assisted coding for teams that cant get away with vibes, Atharva Raykar, 2025.05 Comment元ポスト:https://x.com/deedydas/status/1936090859319259321?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #Article#NLP#LLMAgent#Blog
Issue Date: 2025-06-21 Single vs Multi-Agent System?, PHILSCHMID, 2025.06 Comment元ポスト:https://x.com/_philschmid/status/1935985099171840140?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q関連:#2050 ... #Article#NLP#LanguageModel
Issue Date: 2025-06-18 Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities, Gemini Team, 2025.06 Comment関連ポスト:https://x.com/jaguring1/status/1935203032922485080?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q解説ポスト:https://x.com/_philschmid/status/1935019697683980603?s=46& ... #Article#Multi#NLP#LLMAgent#Blog
Issue Date: 2025-06-17 Don’t Build Multi-Agents, Cognition, 2025.06 Comment元ポスト:https://x.com/ngo275/status/1934819225111285852?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #Article#Blog
Issue Date: 2025-05-18 Lesson.3 秋葉氏に学ぶ AI 研究の最前線から見るこれまでとこれから, EM.FM, 2025.05 Comment元ポスト:https://x.com/srt_taka/status/1923380837246275692?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #Article#MachineLearning#Pocket#NLP#LanguageModel#Reasoning#GRPO
Issue Date: 2025-03-22 Understanding R1-Zero-Like Training: A Critical Perspective, 2025.03 Comment関連研究:#1815解説ポスト:https://x.com/wenhuchen/status/1903464313391624668?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q解説ポストを読むと、DAPOでの Token Level Policy UpdateのようなLengthに対 ...