Tokenizer

#ComputerVision#Pocket#NLP#LanguageModel#MulltiModal
Issue Date: 2025-06-24 Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations, Jiaming Han+, arXiv25 Comment元ポスト:https://x.com/_akhaliq/status/1937345768223859139?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Qtext modalityとvision modalityを共通の空間で表現する![image](https://github.co ... #Pretraining#Pocket#NLP#LanguageModel
Issue Date: 2025-06-23 From Bytes to Ideas: Language Modeling with Autoregressive U-Nets, Mathurin Videau+, arXiv25 Comment元ポスト:https://x.com/dair_ai/status/1936825784473096335?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #Pocket#NLP#LanguageModel
Issue Date: 2025-01-02 Byte Latent Transformer: Patches Scale Better Than Tokens, Artidoro Pagnoni+, arXiv24 Comment興味深い図しか見れていないが、バイト列をエンコード/デコードするtransformer学習して複数のバイト列をパッチ化(エントロピーが大きい部分はより大きなパッチにバイト列をひとまとめにする)、パッチからのバイト列生成を可能にし、パッチを変換するのをLatent Transformerで学習させるよ ...

#Pretraining#MachineLearning#Pocket#NLP#LanguageModel#Subword
Issue Date: 2024-11-12 LBPE: Long-token-first Tokenization to Improve Large Language Models, Haoran Lian+, arXiv24 CommentBPEとは異なりトークンの長さを優先してマージを実施することで、最終的なトークンを決定する手法で、![image](https://github.com/user-attachments/assets/99b91472-88d8-4792-bf04-acc67956e4f5)![image]( ... #Article#Sentence#NLP#LanguageModel
Issue Date: 2024-12-24 Large Concept Models: Language Modeling in a Sentence Representation Space, Meta, 2024.12 CommentLLMs have revolutionized the field of artificial intelligence and have emerged as the de-facto tool for many tasks. The current established technology ...