Safety

#Pretraining#Pocket#NLP#LanguageModel#Supervised-FineTuning (SFT)#DPO#Toxicity#ITI (Inference Time Intervention)
Issue Date: 2025-05-09 When Bad Data Leads to Good Models, Kenneth Li+, arXiv25 Comment元ポスト:https://x.com/ke_li_2021/status/1920646069613957606?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Qこれは面白そうWebコーパスなどを事前学習で利用する際は、質の高いデータを残して学習した方が良いとされているが、4chanのよう ... #NLP#LanguageModel#Alignment#Supervised-FineTuning (SFT)
Issue Date: 2025-04-29 Safety Alignment Should Be Made More Than Just a Few Tokens Deep, Xiangyu Qi+, arXiv24 Comment元ポスト:https://x.com/hillbig/status/1917006979836612640?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QOpenReview:https://openreview.net/forum?id=6Mxhg9PtDESafety Alignme ... #Pocket#NLP#LanguageModel#Alignment#Supervised-FineTuning (SFT)#DPO#PostTraining
Issue Date: 2024-09-24 Backtracking Improves Generation Safety, Yiming Zhang+, N_A, arXiv24 Comment元ポスト: https://x.com/jaseweston/status/1838415378529112330?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ...