DPO
#Pretraining#Pocket#NLP#LanguageModel#Supervised-FineTuning (SFT)#Safety#Toxicity#ITI (Inference Time Intervention)
Issue Date: 2025-05-09 When Bad Data Leads to Good Models, Kenneth Li+, arXiv25 Comment元ポスト:https://x.com/ke_li_2021/status/1920646069613957606?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Qこれは面白そうWebコーパスなどを事前学習で利用する際は、質の高いデータを残して学習した方が良いとされているが、4chanのよう ... #Analysis#MachineLearning#Pocket#NLP#LanguageModel#Alignment#Hallucination#ICLR#Repetition
Issue Date: 2025-04-18 Learning Dynamics of LLM Finetuning, Yi Ren+, ICLR25 Comment元ポスト:https://x.com/joshuarenyi/status/1913033476275925414?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q解説ポスト:https://x.com/hillbig/status/1917189793588613299?s=46&t=Y ... #Pocket#NLP#LanguageModel#Alignment#ICLR#PostTraining#Diversity
Issue Date: 2025-02-01 Diverse Preference Optimization, Jack Lanchantin+, ICLR25 Comment元ポスト:https://x.com/jaseweston/status/1885399530419450257?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QOpenReview: https://openreview.net/forum?id=pOq9vDIYevDPOと同じ最適化方 ...
Issue Date: 2025-05-09 When Bad Data Leads to Good Models, Kenneth Li+, arXiv25 Comment元ポスト:https://x.com/ke_li_2021/status/1920646069613957606?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Qこれは面白そうWebコーパスなどを事前学習で利用する際は、質の高いデータを残して学習した方が良いとされているが、4chanのよう ... #Analysis#MachineLearning#Pocket#NLP#LanguageModel#Alignment#Hallucination#ICLR#Repetition
Issue Date: 2025-04-18 Learning Dynamics of LLM Finetuning, Yi Ren+, ICLR25 Comment元ポスト:https://x.com/joshuarenyi/status/1913033476275925414?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q解説ポスト:https://x.com/hillbig/status/1917189793588613299?s=46&t=Y ... #Pocket#NLP#LanguageModel#Alignment#ICLR#PostTraining#Diversity
Issue Date: 2025-02-01 Diverse Preference Optimization, Jack Lanchantin+, ICLR25 Comment元ポスト:https://x.com/jaseweston/status/1885399530419450257?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QOpenReview: https://openreview.net/forum?id=pOq9vDIYevDPOと同じ最適化方 ...
#NLP#LanguageModel#Alignment#PostTraining#read-later#Admin'sPick
Issue Date: 2024-09-25 Direct Preference Optimization: Your Language Model is Secretly a Reward Model, Rafael Rafailov+, N_A, NeurIPS24 CommentDPOを提案した研究
...
#Pocket#NLP#LanguageModel#Alignment#Supervised-FineTuning (SFT)#Safety#PostTraining
Issue Date: 2024-09-24 Backtracking Improves Generation Safety, Yiming Zhang+, N_A, arXiv24 Comment元ポスト: https://x.com/jaseweston/status/1838415378529112330?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #Article#NLP#LanguageModel#Alignment#Supervised-FineTuning (SFT)#Article#PostTraining
Issue Date: 2025-01-25 How to align open LLMs in 2025 with DPO & and synthetic data, PHILSCHMID, 2025.01 Comment元ポスト:https://x.com/_philschmid/status/1882428447877705908?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QDPOの概要やRLHFと比較した利点ルールベース、あるいはLLM as a Judgeを用いたOn-policy prefer ... #Article#MachineLearning#NLP#LanguageModel#Alignment#RLHF#Article
Issue Date: 2024-12-18 RLHF_DPO 小話, 和地瞭良_ Akifumi Wachi, 2024.04 Commentめちゃめちゃ勉強になる… ...
Issue Date: 2024-09-25 Direct Preference Optimization: Your Language Model is Secretly a Reward Model, Rafael Rafailov+, N_A, NeurIPS24 CommentDPOを提案した研究
Issue Date: 2024-09-24 Backtracking Improves Generation Safety, Yiming Zhang+, N_A, arXiv24 Comment元ポスト: https://x.com/jaseweston/status/1838415378529112330?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #Article#NLP#LanguageModel#Alignment#Supervised-FineTuning (SFT)#Article#PostTraining
Issue Date: 2025-01-25 How to align open LLMs in 2025 with DPO & and synthetic data, PHILSCHMID, 2025.01 Comment元ポスト:https://x.com/_philschmid/status/1882428447877705908?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QDPOの概要やRLHFと比較した利点ルールベース、あるいはLLM as a Judgeを用いたOn-policy prefer ... #Article#MachineLearning#NLP#LanguageModel#Alignment#RLHF#Article
Issue Date: 2024-12-18 RLHF_DPO 小話, 和地瞭良_ Akifumi Wachi, 2024.04 Commentめちゃめちゃ勉強になる… ...