AttentionSinks

#Pocket#NLP#LanguageModel#Attention
Issue Date: 2025-04-09 Using Attention Sinks to Identify and Evaluate Dormant Heads in Pretrained LLMs, Pedro Sandoval-Segura+, arXiv25 Comment元ポスト:https://x.com/psandovalsegura/status/1909652533334712691?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #Pocket#NLP#LanguageModel#Attention#ICLR
Issue Date: 2025-04-05 When Attention Sink Emerges in Language Models: An Empirical View, Xiangming Gu+, ICLR25 CommentSink Rateと呼ばれる、全てのheadのFirst Tokenに対するattention scoreのうち(layer l * head h個存在する)、どの程度の割合のスコアが閾値を上回っているかを表す指標を提案#1860の先行研究 ... #Analysis#NLP#LanguageModel#Attention
Issue Date: 2025-04-05 Why do LLMs attend to the first token?, Federico Barbero+, arXiv25 Comment元ポスト:https://x.com/omarsar0/status/1908187563422261411?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QAttention Sinkによって、トークンの情報がover-mixingされることが抑制され、Decoder-only LLMの ...

#Pocket#Attention#ICLR
Issue Date: 2025-04-05 Efficient Streaming Language Models with Attention Sinks, Guangxuan Xiao+, ICLR24 CommentAttention Sinkという用語を提言した研究#1860の先行研究 ...