Safety

#Pocket#NLP#Dataset#LanguageModel#Alignment#Japanese#PostTraining
Issue Date: 2025-06-25 AnswerCarefully: A Dataset for Improving the Safety of Japanese LLM Output, Hisami Suzuki+, arXiv25 CommentBlog:https://llmc.nii.ac.jp/answercarefully-dataset/ ... #EfficiencyImprovement#Pocket#NLP#LanguageModel#Alignment#ReinforcementLearning
Issue Date: 2025-06-11 Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance, Ruizhong Qiu+, arXiv25 Comment元ポスト:https://x.com/gaotangli/status/1932289294657626189?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #Pretraining#Pocket#NLP#LanguageModel#Supervised-FineTuning (SFT)#DPO#Toxicity#ITI (Inference Time Intervention)
Issue Date: 2025-05-09 When Bad Data Leads to Good Models, Kenneth Li+, arXiv25 Comment元ポスト:https://x.com/ke_li_2021/status/1920646069613957606?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Qこれは面白そうWebコーパスなどを事前学習で利用する際は、質の高いデータを残して学習した方が良いとされているが、4chanのよう ...

#NLP#LanguageModel#Alignment#Supervised-FineTuning (SFT)
Issue Date: 2025-04-29 Safety Alignment Should Be Made More Than Just a Few Tokens Deep, Xiangyu Qi+, arXiv24 Comment元ポスト:https://x.com/hillbig/status/1917006979836612640?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QOpenReview:https://openreview.net/forum?id=6Mxhg9PtDESafety Alignme ... #Pocket#NLP#LanguageModel#Alignment#Supervised-FineTuning (SFT)#DPO#PostTraining
Issue Date: 2024-09-24 Backtracking Improves Generation Safety, Yiming Zhang+, N_A, arXiv24 Comment元ポスト: https://x.com/jaseweston/status/1838415378529112330?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ...