read-later

#Pocket
Issue Date: 2025-05-09 Reinforcement Learning for Reasoning in Large Language Models with One Training Example, Yiping Wang+, arXiv25 #Pocket#NLP#Dataset#LanguageModel#Mathematics#Coding
Issue Date: 2025-05-08 Rewriting Pre-Training Data Boosts LLM Performance in Math and Code, Kazuki Fujii+, arXiv25 Comment元ポスト:https://x.com/okoge_kaz/status/1920141189652574346?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q解説ポスト:https://x.com/hillbig/status/1920613041026314274?s=46&t=Y6U ... #Pocket#LanguageModel#RLVR
Issue Date: 2025-05-08 Absolute Zero: Reinforced Self-play Reasoning with Zero Data, Andrew Zhao+, arXiv25 Comment元ポスト:https://x.com/arankomatsuzaki/status/1919946713567264917?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ...

#ComputerVision#Embeddings#Analysis#Pocket#NLP#LanguageModel#Supervised-FineTuning (SFT)#Chain-of-Thought#SSM (StateSpaceModel)#ICML#PostTraining
Issue Date: 2025-05-04 Layer by Layer: Uncovering Hidden Representations in Language Models, Oscar Skean+, ICML25 Comment現代の代表的な言語モデルのアーキテクチャ(decoder-only model, encoder-only model, SSM)について、最終層のembeddingよりも中間層のembeddingの方がdownstream task(MTEBの32Taskの平均)に、一貫して(ただし、これはMTE ... #Analysis#NLP#LanguageModel#Supervised-FineTuning (SFT)#ReinforcementLearning#Evaluation#SmallModel#PostTraining
Issue Date: 2025-04-13 A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility, Andreas Hochlehnert+, arXiv25 Comment元ポスト:https://x.com/wenhuchen/status/1911143014258405420?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QSLMをmath reasoning向けにpost-trainingする場合、RL(既存研究で試されているもの)よりも(大規模モデ ... #MachineLearning#Pocket#LanguageModel#ReinforcementLearning#Reasoning#LongSequence#GRPO
Issue Date: 2025-03-20 DAPO: An Open-Source LLM Reinforcement Learning System at Scale, Qiying Yu+, arXiv25 Comment既存のreasoning modelのテクニカルレポートにおいて、スケーラブルなRLの学習で鍵となるレシピは隠されていると主張し、実際彼らのbaselineとしてGRPOを走らせたところ、DeepSeekから報告されているAIME2024での性能(47ポイント)よりもで 大幅に低い性能(30ポイント ... #Pocket#NLP#LanguageModel#Supervised-FineTuning (SFT)#Test-time Compute
Issue Date: 2025-02-07 s1: Simple test-time scaling, Niklas Muennighoff+, arXiv25 Comment解説:https://x.com/hillbig/status/1887260791981941121?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #RecommenderSystems#Pocket#UAI#ColdStart
Issue Date: 2025-05-16 Cold-start Recommendation by Personalized Embedding Region Elicitation, Hieu Trung Nguyen+, UAI24 CommentOpenReview:https://openreview.net/forum?id=ciOkU5YpvU ... #NLP#Dataset#Japanese#Trustfulness
Issue Date: 2025-05-10 日本語TrustfulQAの構築, 中村+, NLP24 #Pocket#NLP#Dataset#LanguageModel#EMNLP#KnowledgeEditing
Issue Date: 2025-05-07 Editing Large Language Models: Problems, Methods, and Opportunities, Yunzhi Yao+, EMNLP24 #Analysis#NLP#LanguageModel#SyntheticData
Issue Date: 2025-05-06 Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers, Zeyuan Allen-Zhu+, ICML24 Tutorial Comment元ポスト:https://x.com/hillbig/status/1919878625488449849?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QCanon層の発見 ... #NLP#LanguageModel#RLHF#Reasoning#Mathematics#GRPO
Issue Date: 2025-01-04 DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models, Zhihong Shao+, arXiv24 Comment元ポスト:https://www.linkedin.com/posts/philipp-schmid-a6a2bb196_the-rlhf-method-behind-the-best-open-models-activity-7280850174522843137-3V9v?utm_source= ... #Pocket#NLP#LanguageModel#TheoryOfMind
Issue Date: 2024-12-31 Explore Theory of Mind: Program-guided adversarial data generation for theory of mind reasoning, Melanie Sclar+, arXiv24 Commentおもしろそう。あとで読む ... #NLP#LanguageModel#Alignment#DPO#PostTraining#Admin'sPick
Issue Date: 2024-09-25 Direct Preference Optimization: Your Language Model is Secretly a Reward Model, Rafael Rafailov+, N_A, NeurIPS24 CommentDPOを提案した研究 image ... #MachineLearning#Pocket#NLP#LanguageModel#NeurIPS#ITI (Inference Time Intervention)#Probing#Trustfulness
Issue Date: 2025-05-09 Inference-Time Intervention: Eliciting Truthful Answers from a Language Model, Kenneth Li+, NeurIPS23 CommentInference Time Interventionを提案した研究。Attention Headに対して線形プロービング[^1]を実施し、真実性に関連するであろうHeadをtopKで特定できるようにし、headの出力に対し真実性を高める方向性のベクトルvを推論時に加算することで(=interven ... #NLP#LanguageModel#Supervised-FineTuning (SFT)#InstructionTuning#ACL
Issue Date: 2023-03-30 Self-Instruct: Aligning Language Model with Self Generated Instructions, Wang+ (w_ Noah Smith), Univesity of Washington, ACL23 CommentAlpacaなどでも利用されているself-instruction技術に関する論文# 概要 ![image](https://user-images.githubusercontent.com/12249301/228716254-5f4d7451-a37a-4354-843d-7e4052ba23 ... #NLP#LanguageModel#Alignment#ChatGPT#RLHF#PPO (ProximalPolicyOptimization)#PostTraining
Issue Date: 2024-04-28 Training language models to follow instructions with human feedback, Long Ouyang+, N_A, NeurIPS22 Summary大規模な言語モデルは、ユーザーの意図に合わない出力を生成することがあります。本研究では、人間のフィードバックを使用してGPT-3を微調整し、InstructGPTと呼ばれるモデルを提案します。この手法により、13億パラメータのInstructGPTモデルの出力が175BのGPT-3の出力よりも好まれ、真実性の向上と有害な出力の削減が示されました。さらに、一般的なNLPデータセットにおける性能の低下は最小限でした。InstructGPTはまだ改善の余地がありますが、人間のフィードバックを使用した微調整が有望な方向であることを示しています。 CommentChatGPTの元となる、SFT→Reward Modelの訓練→RLHFの流れが提案された研究。DemonstrationデータだけでSFTするだけでは、人間の意図したとおりに動作しない問題があったため、人間の意図にAlignするように、Reward Modelを用いたRLHFでSFTの後に追加で ... image#RecommenderSystems#Pocket#Reproducibility
Issue Date: 2025-05-16 A Troubling Analysis of Reproducibility and Progress in Recommender Systems Research, Maurizio Ferrari Dacrema+, TOIS21 #RecommenderSystems#NeuralNetwork#CollaborativeFiltering#Pocket#MatrixFactorization#RecSys#Reproducibility
Issue Date: 2025-05-16 Neural Collaborative Filtering vs. Matrix Factorization Revisited, Steffen Rendle+, RecSys20 #RecommenderSystems#RecSys#Reproducibility
Issue Date: 2025-05-14 Are We Evaluating Rigorously? Benchmarking Recommendation for Reproducible Evaluation and Fair Comparison, Zun+, RecSys20 Comment日本語解説:https://qiita.com/smochi/items/c4cecc48e4aba0071ead ... #NeuralNetwork#ComputerVision#MachineLearning#Pocket#NLP#ICLR#KnowledgeEditing
Issue Date: 2025-05-07 Editable Neural Networks, Anton Sinitsin+, ICLR20 Comment(おそらく)Knowledge Editingを初めて提案した研究OpenReview:https://openreview.net/forum?id=HJedXaEtvS ... #RecommenderSystems#Pocket#Reproducibility
Issue Date: 2025-05-14 On the Difficulty of Evaluating Baselines: A Study on Recommender Systems, Steffen Rendle+, arXiv19 #Article#MachineLearning#Pocket#LanguageModel#Reasoning#GRPO
Issue Date: 2025-03-22 Understanding R1-Zero-Like Training: A Critical Perspective, 2025.03 Comment関連研究:#1815解説ポスト:https://x.com/wenhuchen/status/1903464313391624668?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q解説ポストを読むと、DAPOでの Token Level Policy UpdateのようなLengthに対 ...