Scaling Laws

#ComputerVision#Analysis#Pocket#pretrained-LM#TMLR
Issue Date: 2025-06-26 An Empirical Study of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration, Hiroki Naganuma+, TMLR25 CommentOpenReview:https://openreview.net/forum?id=tYjoHjShxF元ポスト:https://x.com/_hiroki11x/status/1938052113466323134?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #Pocket#NLP#LanguageModel#MoE(Mixture-of-Experts)#ICML
Issue Date: 2025-06-21 Scaling Laws for Upcycling Mixture-of-Experts Language Models, Seng Pei Liew+, ICML25 Comment元ポスト:https://x.com/sbintuitions/status/1935970879923540248?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QOpenReview:https://openreview.net/forum?id=ZBBo19jldX関連:#1546 ... #Pocket#NLP#LanguageModel#Distillation
Issue Date: 2025-05-29 Distillation Scaling Laws, Dan Busbridge+, arXiv25

#EfficiencyImprovement#Pretraining#Pocket#NLP#LanguageModel
Issue Date: 2025-05-21 Parallel Scaling Law for Language Models, Mouxiang Chen+, arXiv25 Comment元ポスト:https://x.com/hillbig/status/1924959706331939099?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q#405と考え方が似ている ... #Pretraining#Pocket#NLP#LanguageModel
Issue Date: 2025-03-23 Compute Optimal Scaling of Skills: Knowledge vs Reasoning, Nicholas Roberts+, arXiv25 Comment元ポスト:https://x.com/dair_ai/status/1903843682509312218?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q知識を問うQAのようなタスクはモデルのパラメータ量が必要であり、コーディングのようなReasoningに基づくタスクはデータ量が必要で ... #EfficiencyImprovement#Pocket#NLP#LanguageModel#read-later
Issue Date: 2025-05-27 Densing Law of LLMs, Chaojun Xiao+, arXiv24 Comment元ポスト:https://x.com/hillbig/status/1926785750277693859?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q![image](https://github.com/user-attachments/assets/8cdcfe78-6682-4 ... #MachineLearning#Pocket#NLP#LanguageModel#NeurIPS#read-later
Issue Date: 2025-03-23 Scaling Data-Constrained Language Models, Niklas Muennighoff+, NeurIPS23 CommentOpenReview:https://openreview.net/forum?id=j5BuTrEj35チンチラ則のようなScaling Lawsはパラメータとデータ量の両方をスケールさせた場合の前提に立っており、かつデータは全てuniqueである前提だったが、データの枯渇が懸念される昨今の状況に ... #MachineLearning#Pocket#NLP#LanguageModel#NeurIPS#Admin'sPick
Issue Date: 2025-03-23 Training Compute-Optimal Large Language Models, Jordan Hoffmann+, NeurIPS22 CommentOpenReview: https://openreview.net/forum?id=iBBcRUlOAPRchinchilla則 ... #Pocket#NLP#LanguageModel
Issue Date: 2025-05-31 Scaling Laws for Autoregressive Generative Modeling, Tom Henighan+, arXiv20 #MachineLearning#Pocket#NLP#LanguageModel
Issue Date: 2025-03-23 Scaling Laws for Neural Language Models, Jared Kaplan+, arXiv20 Comment日本語解説:https://www.slideshare.net/slideshow/dlscaling-laws-for-neural-language-models/243005067 ... #NeuralNetwork#ComputerVision#EfficiencyImprovement#Pocket#ICML#Admin'sPick
Issue Date: 2025-05-12 EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, Mingxing Tan+, ICML19 Comment元論文をメモってなかったので追加。#346も参照のこと。 ... #Article#Tutorial#Pretraining#MachineLearning#NLP#LanguageModel#Transformer#Chain-of-Thought#In-ContextLearning#Attention#DiffusionModel#SSM (StateSpaceModel)#PostTraining
Issue Date: 2025-05-31 2025年度人工知能学会全国大会チュートリアル講演「深層基盤モデルの数理」, Taiji Suzuki, 2025.05 Comment元ポスト:https://x.com/btreetaiji/status/1927678122817921442?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ...