Subword

#Pocket#NLP#LanguageModel
Issue Date: 2025-06-11 StochasTok: Improving Fine-Grained Subword Understanding in LLMs, Anya Sims+, arXiv25 Comment元ポスト:https://x.com/cong_ml/status/1932369418534760554?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Qおもしろそう ... #Pretraining#MachineLearning#Pocket#NLP#LanguageModel#Tokenizer
Issue Date: 2024-11-12 LBPE: Long-token-first Tokenization to Improve Large Language Models, Haoran Lian+, arXiv24 CommentBPEとは異なりトークンの長さを優先してマージを実施することで、最終的なトークンを決定する手法で、![image](https://github.com/user-attachments/assets/99b91472-88d8-4792-bf04-acc67956e4f5)![image]( ...