Tokenizer
#ComputerVision#Pocket#NLP#LanguageModel#MulltiModal
Issue Date: 2025-06-24 Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations, Jiaming Han+, arXiv25 Comment元ポスト:https://x.com/_akhaliq/status/1937345768223859139?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Qtext modalityとvision modalityを共通の空間で表現する、パッチからのバイト列生成を可能にし、パッチを変換するのをLatent Transformerで学習させるよ ...
Issue Date: 2025-06-24 Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations, Jiaming Han+, arXiv25 Comment元ポスト:https://x.com/_akhaliq/status/1937345768223859139?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Qtext modalityとvision modalityを共通の空間で表現する、パッチからのバイト列生成を可能にし、パッチを変換するのをLatent Transformerで学習させるよ ...
#Pretraining#MachineLearning#Pocket#NLP#LanguageModel#Subword
Issue Date: 2024-11-12 LBPE: Long-token-first Tokenization to Improve Large Language Models, Haoran Lian+, arXiv24 CommentBPEとは異なりトークンの長さを優先してマージを実施することで、最終的なトークンを決定する手法で、![image]( ... #Article#Sentence#NLP#LanguageModel
Issue Date: 2024-12-24 Large Concept Models: Language Modeling in a Sentence Representation Space, Meta, 2024.12 CommentLLMs have revolutionized the field of artificial intelligence and have emerged as the de-facto tool for many tasks. The current established technology ...