Scaling Laws
#ComputerVision#Analysis#Pocket#pretrained-LM#TMLR
Issue Date: 2025-06-26 An Empirical Study of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration, Hiroki Naganuma+, TMLR25 CommentOpenReview:https://openreview.net/forum?id=tYjoHjShxF元ポスト:https://x.com/_hiroki11x/status/1938052113466323134?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #Pocket#NLP#LanguageModel#MoE(Mixture-of-Experts)#ICML
Issue Date: 2025-06-21 Scaling Laws for Upcycling Mixture-of-Experts Language Models, Seng Pei Liew+, ICML25 Comment元ポスト:https://x.com/sbintuitions/status/1935970879923540248?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QOpenReview:https://openreview.net/forum?id=ZBBo19jldX関連:#1546 ... #Pocket#NLP#LanguageModel#Distillation
Issue Date: 2025-05-29 Distillation Scaling Laws, Dan Busbridge+, arXiv25
Issue Date: 2025-06-26 An Empirical Study of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration, Hiroki Naganuma+, TMLR25 CommentOpenReview:https://openreview.net/forum?id=tYjoHjShxF元ポスト:https://x.com/_hiroki11x/status/1938052113466323134?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #Pocket#NLP#LanguageModel#MoE(Mixture-of-Experts)#ICML
Issue Date: 2025-06-21 Scaling Laws for Upcycling Mixture-of-Experts Language Models, Seng Pei Liew+, ICML25 Comment元ポスト:https://x.com/sbintuitions/status/1935970879923540248?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QOpenReview:https://openreview.net/forum?id=ZBBo19jldX関連:#1546 ... #Pocket#NLP#LanguageModel#Distillation
Issue Date: 2025-05-29 Distillation Scaling Laws, Dan Busbridge+, arXiv25
#EfficiencyImprovement#Pretraining#Pocket#NLP#LanguageModel
Issue Date: 2025-05-21 Parallel Scaling Law for Language Models, Mouxiang Chen+, arXiv25 Comment元ポスト:https://x.com/hillbig/status/1924959706331939099?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q#405と考え方が似ている ... #Pretraining#Pocket#NLP#LanguageModel
Issue Date: 2025-03-23 Compute Optimal Scaling of Skills: Knowledge vs Reasoning, Nicholas Roberts+, arXiv25 Comment元ポスト:https://x.com/dair_ai/status/1903843682509312218?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q知識を問うQAのようなタスクはモデルのパラメータ量が必要であり、コーディングのようなReasoningに基づくタスクはデータ量が必要で ... #EfficiencyImprovement#Pocket#NLP#LanguageModel#read-later
Issue Date: 2025-05-27 Densing Law of LLMs, Chaojun Xiao+, arXiv24 Comment元ポスト:https://x.com/hillbig/status/1926785750277693859?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q#PostTraining
Issue Date: 2025-05-31 2025年度人工知能学会全国大会チュートリアル講演「深層基盤モデルの数理」, Taiji Suzuki, 2025.05 Comment元ポスト:https://x.com/btreetaiji/status/1927678122817921442?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ...
Issue Date: 2025-05-21 Parallel Scaling Law for Language Models, Mouxiang Chen+, arXiv25 Comment元ポスト:https://x.com/hillbig/status/1924959706331939099?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q#405と考え方が似ている ... #Pretraining#Pocket#NLP#LanguageModel
Issue Date: 2025-03-23 Compute Optimal Scaling of Skills: Knowledge vs Reasoning, Nicholas Roberts+, arXiv25 Comment元ポスト:https://x.com/dair_ai/status/1903843682509312218?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q知識を問うQAのようなタスクはモデルのパラメータ量が必要であり、コーディングのようなReasoningに基づくタスクはデータ量が必要で ... #EfficiencyImprovement#Pocket#NLP#LanguageModel#read-later
Issue Date: 2025-05-27 Densing Law of LLMs, Chaojun Xiao+, arXiv24 Comment元ポスト:https://x.com/hillbig/status/1926785750277693859?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q#PostTraining
Issue Date: 2025-05-31 2025年度人工知能学会全国大会チュートリアル講演「深層基盤モデルの数理」, Taiji Suzuki, 2025.05 Comment元ポスト:https://x.com/btreetaiji/status/1927678122817921442?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ...