OpenWeightLLMに関する論文・技術記事メモの一覧

OpenWeightLLM

#ComputerVision #Pocket #Transformer #FoundationModel #CVPR
Issue Date: 2025-04-11 AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One, Mike Ranzinger+, CVPR25 Comment元ポスト:https://x.com/pavlomolchanov/status/1910391609927360831?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Qvision系のfoundation modelはそれぞれ異なる目的関数で訓練されてきており（CLIPは対照学習 #55 ... #ComputerVision #Pocket #NLP #LanguageModel #MulltiModal #SpeechProcessing #Video
Issue Date: 2025-03-31 Qwen2.5-Omni Technical Report, Jin Xu+, arXiv25 CommentQwen TeamによるマルチモーダルLLM。テキスト、画像、動画音声をinputとして受け取り、テキスト、音声をoutputする。![image](https://github.com/user-attachments/assets/03e54fd7-2011-4069-aa1b-38d1610元 ... #NLP #LanguageModel #SyntheticData #OpenSource
Issue Date: 2024-11-06 Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent, Xingwu Sun+, arXiv24 Comment合計パラメータ数はLlama-3.1-405Bと同等の389Bだが、MoEによって52BのActive ParameterでSoTAを達成したTencentのOpenSource LLM。大量のSynthetia Dataを利用している。 ...

#EfficiencyImprovement #Pocket #NLP #LanguageModel
Issue Date: 2024-04-23 Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone, Marah Abdin+, N_A, arXiv24 Summaryphi-3-miniは38億パラメータの言語モデルであり、3.3兆トークンで訓練されています。Mixtral 8x7BやGPT-3.5などの大規模モデルに匹敵する総合的なパフォーマンスを持ちながら、スマートフォンにデプロイ可能なサイズです。このモデルは、厳密にフィルタリングされたWebデータと合成データで構成されており、堅牢性、安全性、およびチャット形式に適合しています。また、phi-3-smallとphi-3-mediumというより大規模なモデルも紹介されています。 Comment#1039 の次の次（Phi2.0についてはメモってなかった）。スマホにデプロイできるレベルのサイズで、GPT3.5Turbo程度の性能を実現したらしいLlama2と同じブロックを利用しているため、アーキテクチャはLlama2と共通。 ... #Pocket #NLP #LanguageModel #OpenSource
Issue Date: 2024-03-05 OLMo: Accelerating the Science of Language Models, Dirk Groeneveld+, N_A, arXiv24 SummaryLMsの商業的重要性が高まる中、最も強力なモデルは閉鎖されており、その詳細が非公開になっている。そのため、本技術レポートでは、本当にオープンな言語モデルであるOLMoの初回リリースと、言語モデリングの科学を構築し研究するためのフレームワークについて詳細に説明している。OLMoはモデルの重みだけでなく、トレーニングデータ、トレーニングおよび評価コードを含むフレームワーク全体を公開しており、オープンな研究コミュニティを強化し、新しいイノベーションを促進することを目指している。 CommentModel Weightsを公開するだけでなく、training/evaluation codeとそのデータも公開する真にOpenな言語モデル（truly Open Language Model）。AllenAI ... #Pocket #NLP #LanguageModel
Issue Date: 2024-01-09 Mixtral of Experts, Albert Q. Jiang+, N_A, arXiv24 SummaryMixtralは、Sparse Mixture of Experts（SMoE）言語モデルであり、各レイヤーが8つのフィードフォワードブロックで構成されています。Mixtralは、トークンごとに2つのエキスパートを選択し、それらの出力を組み合わせます。Mixtralは、Llama 2 70BとGPT-3.5を上回る性能を持ち、数学、コード生成、多言語のベンチマークで特に優れています。また、Mixtral 8x7B Instructという指示に従うモデルも提供されており、人間のベンチマークを凌駕しています。 CommentMixture of experts Layer: inputを受け取ったrouterが、8つのexpertsのうち2つを選択し順伝搬。2つのexpertsのoutputを加重平均することで最終的なoutputとする。![image](https://github.com/user-attachm ... #ComputerVision #Pocket #NLP #LanguageModel #MulltiModal
Issue Date: 2025-04-11 PaLI-3 Vision Language Models: Smaller, Faster, Stronger, Xi Chen+, arXiv23 #NLP #LanguageModel #FoundationModel
Issue Date: 2023-07-22 Llama 2: Open Foundation and Fine-Tuned Chat Models, Hugo Touvron+, N_A, arXiv23 Summaryこの研究では、大規模な言語モデルであるLlama 2を開発し、微調整しています。Llama 2-Chatは対話に特化しており、オープンソースのチャットモデルを上回る性能を示しています。安全性の改善にも取り組んでおり、責任ある開発に貢献することを目指しています。 Comment参考: https://twitter.com/hillbig/status/1681436336451125257?s=46&t=LJIgfuO352oK3zU2FKFpNALlama, およびLlama2では、一般的なTransformer Decoderとは異なり、linear layerの” ...

#Article #NLP #Library #Supervised-FineTuning (SFT)#Article #MoE(Mixture-of-Experts)#PostTraining
Issue Date: 2025-05-11 ms-swiftによるMegatron-LMベースのQwen3のファインチューニング, Aratako, 2025.05 Comment元ポスト:https://x.com/aratako_lm/status/1921401994532487174?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QMegatron-SWIFTというAlibaba製のライブラリを利用しQwen3の継続事前学習とSFTを実施する方法を、ベストプ ... #Article #NLP #LanguageModel #Supervised-FineTuning (SFT)#ReinforcementLearning #Reasoning #SmallModel #GRPO
Issue Date: 2025-05-01 Phi-4-reasoning Technical Report, 2025.04 Comment元ポスト:https://x.com/dimitrispapail/status/1917731614899028190?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Qこちらの解説が非常によくまとまっている:https://x.com/_philschmid/status/1918216 ... #Article #NLP #LanguageModel #Alignment #Supervised-FineTuning (SFT)#ReinforcementLearning #InstructionTuning #Article #LongSequence #MultiLingual #MoE(Mixture-of-Experts)#PostTraining
Issue Date: 2025-04-29 Qwen3, Qwen Team, 2025.04 Comment119言語をサポートMoEモデル #1911 30B-A3B / 235B-A22N 128K context window Qwen2.5はMoEを採用していないので新たなアーキテクチャとなるDenseモデル（非MoEモデル）も公開BestPracticeに関するポスト:http ... #Article #ComputerVision #Pocket #NLP #LLMAgent #MulltiModal #Article #Reasoning #x-Use
Issue Date: 2025-04-18 Introducing UI-TARS-1.5, ByteDance, 2025.04 Commentpaper:https://arxiv.org/abs/2501.12326色々と書いてあるが、ざっくり言うとByteDanceによる、ImageとTextをinputとして受け取り、TextをoutputするマルチモーダルLLMによるComputer Use Agent (CUA)関連#1794元 ... #Article #NLP #LanguageModel #Reasoning
Issue Date: 2025-04-12 Seed-Thinking-v1.5, ByteDance, 2025.04 CommentDeepSeek-R1を多くのベンチで上回る200B, 20B activated paramのreasoning model最近のテキストのOpenWeightLLMはAlibaba, DeepSeek, ByteDance, Nvidiaの4強という感じかな…？（そのうちOpenAIがオープンに ... #Article #NLP #LanguageModel #Alignment #Supervised-FineTuning (SFT)#ReinforcementLearning #InstructionTuning #Pruning #Reasoning
Issue Date: 2025-04-08 Llama-3_1-Nemotron-Ultra-253B-v1, Nvidia, 2025.04 CommentDeepSeek-R1をGPQA Diamond #1155, AIME2024/2025, Llama4 MaverickをBFCLv2（Tool Calling, #1875), IFEVal #1137 で上回り, そのほかはArenaHardを除きDeepSeekR1と同等![image元ポ ... #Article #NLP #LanguageModel #DiffusionModel
Issue Date: 2025-04-08 Dream-v0-Instruct-7B, Dream-org, 2025.04 CommentOpenWeightな拡散言語モデル元ポスト:https://x.com/curveweb/status/1909551257725133132?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q関連:#1776 ... #Article #ComputerVision #NLP #LanguageModel #MulltiModal
Issue Date: 2025-04-05 Llama 4 Series, Meta, 2025.04 CommentDownloads:https://www.llama.com/?utm_source=twitter&utm_medium=organic_social&utm_content=image&utm_campaign=llama4Huggingface:https://huggingface.co/ ... #Article #NLP #LanguageModel #SoftwareEngineering
Issue Date: 2025-04-02 openhands-lm-32b-v0.1, all-hands, 2025.03 CommentQwen Coder 2.5 Instruct 32Bに基づく最先端のSWEタスクが実行可能なモデル ... #Article #ComputerVision #NLP #LanguageModel #MulltiModal
Issue Date: 2025-03-25 Qwen2.5-VL-32B-Instruct, Qwen Team, 2025.03 Comment元ポスト:https://x.com/alibaba_qwen/status/1904227859616641534?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #Article #NLP #LanguageModel #Reasoning
Issue Date: 2025-03-19 Llama Nemotron, Nvidia, 2025.03 CommentNvidiaによる初めてのreasoning model。元ポスト:https://x.com/kuchaev/status/1902078122792775771?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QArtificial Analysisにやるベンチマーク:https://x ... #Article #NLP #LanguageModel #Reasoning
Issue Date: 2025-03-18 EXAONE-Deep-32B, LG AI Research, 2025.03 Comment元ポスト:https://x.com/ai_for_success/status/1901908168805912602?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QEXAONE AI Model License Agreement 1.1 NC商用利用不可 ... #Article #ComputerVision #NLP #LanguageModel #MulltiModal
Issue Date: 2025-03-18 SmolDocling-256M, IBM Research, 2025.03 Comment元ポスト:https://www.linkedin.com/posts/andimarafioti_we-just-dropped-%F0%9D%97%A6%F0%9D%97%BA%F0%9D%97%BC%F0%9D%97%B9%F0%9D%97%97%F0%9D%97%BC%F0%9D%97%B0 ... #Article #ComputerVision #NLP #LanguageModel #MulltiModal
Issue Date: 2025-03-17 sarashina2-vision-{8b, 14b}, SB Intuitions, 2025.03 Comment元ポスト:https://x.com/sei_shinagawa/status/1901467733331701966?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QVLM。Xに散見される試行例を見ると日本語の読み取り性能は結構高そうに見える。モデル構成、学習の詳細、および評価:http ... #Article #NLP #LanguageModel
Issue Date: 2025-03-12 Introducing Gemma 3: The most capable model you can run on a single GPU or TPU, Google, 2025.03 CommentGoogleの新たなSLMで、デバイスやラップトップでも動作可能な軽量モデル。テキストだけでなく画像とShortVideoの認識もできて、140言語をサポート。おまけに27BモデルでLlama3-405BとDeepSeek-V3とo3-miniをChatbotArenaのリーダーボードで上回り、12 ... #Article #NLP #LanguageModel #Reasoning #MultiLingual
Issue Date: 2025-03-12 Reasoning with Reka Flash, Reka, 2025.03 CommentWeights: https://huggingface.co/RekaAI/reka-flash-3Apache-2.0< /reasoning >を強制的にoutputさせることでreasoningを中断させることができ予算のコントロールが可能とのこと ... #Article #NLP #LanguageModel #ReinforcementLearning #Reasoning
Issue Date: 2025-03-06 QwQ-32B: Embracing the Power of Reinforcement Learning, Qwen Team, 2025.03 Comment元ポスト:https://x.com/hillbig/status/1897426898642460724?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q#1787Artificial Analysisによるベンチマークスコア:https://x.com/artificialanlys/ ... #Article #NLP #LanguageModel
Issue Date: 2025-03-04 microsoft_Phi-4-multimodal-instruct, Microsoft, 2025.02 Comment元ポスト:https://www.linkedin.com/posts/vaibhavs10_holy-shitt-microsoft-dropped-an-open-source-activity-7300755229635944449-mQP8?utm_medium=ios_app&rcm=AC ... #Article #NLP #LanguageModel #Reasoning
Issue Date: 2025-02-17 Mistral-24B-Reasoning, yentinglin, 2025.02 CommentApache-2.0 ... #Article #ComputerVision #NLP #LanguageModel #MulltiModal
Issue Date: 2025-01-28 Janus-Series: Unified Multimodal Understanding and Generation Models, DeepSeek, 2025.01 CommentDeepSeekによる新たなVLM、Janus-Proが本日リリース。MIT LicenseJanus-Proのパフォーマンス。github上でのパフォーマンスの図解から引用。マルチモーダル（テキスト+画像）の理解に関するベンチマークでLLaVA超え。GenEval, DPG Benchと呼ばれる画 ... #Article #NLP #LanguageModel
Issue Date: 2025-01-21 DeepSeek-R1-Distill-Qwen, DeepSeek, 2025.01 CommentMIT Licence ... #Article #NLP #LanguageModel
Issue Date: 2025-01-21 DeepSeek-R1, DeepSeek, 2025.01 Comment参考:https://x.com/icoxfog417/status/1883339727446974616?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q参考:https://horomary.hatenablog.com/entry/2025/01/26/204545DeepSeek ... #Article #Survey #ComputerVision #NLP #LanguageModel #ProprietaryLLM
Issue Date: 2025-01-02 2024-ai-timeline, reach-vb, 2025.01 Comment月別で2024年にリリースされた主要なLLM（マルチモーダルなLLMも含む）のタイムラインがまとめられている。API Only（プロプライエタリ）なのか、OpenWeightなのかもタグ付けされている。 ... #Article #Pocket #NLP #LanguageModel
Issue Date: 2024-12-28 Deep-seek-v3, deepseek-ai, 2024.12 Comment参考（モデルの図解）:https://x.com/vtabbott_/status/1874449446056177717?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q参考:https://x.com/hillbig/status/1876397959841186148?s=46&t= ... #Article #Tools #NLP #Dataset #LanguageModel #Article
Issue Date: 2024-12-24 完全にオープンな約1,720億パラメータ（GPT-3級）の大規模言語モデル「llm-jp-3-172b-instruct3」を一般公開～GPT-3.5を超える性能を達成～ , NII, 2024.12 CommentGPT3.5と同程度のパラメータ数のコーパス、モデル、ツール、全てを公開。学習データまで含めてオープンなモデルとしては世界最大規模とのこと。Instructionチューニング済みのモデルはライセンスを読むと、ライセンスに記述されている内容を遵守すれば、誰でも（日本人なら18歳以上とかはあるが）アクセ ... #Article #NLP #LanguageModel #SpokenLanguageProcessing #OpenSource
Issue Date: 2024-12-13 LLaMA-Omni: Seamless Speech Interaction with Large Language Models, Meta, 2024.09 Comment音声とテキストのOpenSourceマルチモーダルモデル。inputは音声のみ？に見えるが、出力はテキストと音声の両方を実施できる。GPT-4oレベルのspeech capabilityを目指すとaboutに記載されている。興味深い。 installの説明に `Whisper-large-v3#1 ... #Article #NLP #LanguageModel
Issue Date: 2024-12-06 Llama3.3-70B, Meta, 2024.12 Comment3.1-70Bよりも性能向上し、3.1-405Bの性能により近く。（画像は元ポストより引用）![image](https://github.com/user-attachments/assets/07fb3043-131a-4564-be70-d34b70c31cca) ... #Article #Survey #NLP #Dataset #LanguageModel #Evaluation #Repository #Japanese #OpenSource
Issue Date: 2024-12-02 日本語LLMまとめ, LLM-jp, 2024.12 CommentLLM-jpによる日本語LLM（Encoder-Decoder系, BERT系, Bi-Encoders, Cross-Encodersを含む）のまとめ。テキスト生成に使うモデル、入力テキスト処理に使うモデル、Embedding作成に特化したモデル、視覚言語モデル、音声言語モデル、日本語LLM評価 ... #Article #Pretraining #NLP #LanguageModel #Japanese
Issue Date: 2024-11-25 Sarashina2-8x70Bの公開, SB Intuitions, 2024.11 CommentMoE Layerの説明、Sparse Upcyclingの説明、MoEモデルを学習する際に、学習時の学習率の設定が大きすぎると初期に損失が増大し、小さすぎると損失の増大は防げるがlong runで学習した際の性能向上が小さかったこと、元のモデルのパラメータを毀損しないように、Upcyclingをし ... #Article #Survey #NLP #LanguageModel #Article #OpenSource
Issue Date: 2024-11-15 ローカルLLMのリリース年表, npaka, 随時更新, 2024.11 CommentローカルLLMを含むOpenLLMのリリース日が年表としてまとまっており、随時更新されている模様。すごい。 ... #Article #NLP #LanguageModel #Japanese
Issue Date: 2024-11-09 sarashina2-8x70B, SBIntuitions, 2024.11 Commentプレスリリース:https://www.sbintuitions.co.jp/news/press/20241108_01/商用利用不可な点には注意アーキテクチャは70Bモデルx8のMixture of Experts（MoE）モデルカードによると、inferenceにはBF16で、A100 80G ... #Article #NLP #MultiLingual
Issue Date: 2024-10-24 Aya Expanse, Cohere, 2024.10 CommentCohereによるマルチリンガルLLM, 8B, 32Bのモデルが存在する。8BモデルのArenaHardでの評価![image](https://github.com/user-attachments/assets/c52678fd-b1a4-40ed-b6b9-7cc7d1096ff0) ... #Article #NLP
Issue Date: 2024-10-17 Llama-3.1-Nemotron-70B-Instruct, Nvidia, 2024.10 Commentpaper:https://arxiv.org/abs/2410.01257MTBench, Arena HardでGPT4o-20240513,Claude-3.5-sonnet-20240620をoutperform。Response lengthの平均が長いこと模様![image](https ... #Article #ComputerVision #GenerativeAI
Issue Date: 2024-10-05 MovieGen, Meta, 2024.10 #Article #NLP #LanguageModel #Japanese
Issue Date: 2024-10-04 Gemma-2-Baku, 2024.10 #Article #NLP #LanguageModel #Japanese
Issue Date: 2024-10-04 Gemma-2-JPN, 2024.10 Comment日本語データでfinetuningされてGemma2 ... #Article #ComputerVision #NLP #LanguageModel
Issue Date: 2024-09-27 Molmo, AI2, 2024.09 CommentMolmo is a family of open state-of-the-art multimodal AI models. Our most powerful model closes the gap between open and proprietary systems across a以 ... #Article #ComputerVision #NLP #LanguageModel #Article
Issue Date: 2024-09-25 Llama 3.2: Revolutionizing edge AI and vision with open, customizable models, Meta, 2024.09 Comment11Bと90BのVLMと、エッジデバイス向けの1B, 3BのSLMを発表。![image](https://github.com/user-attachments/assets/13c4af37-19bd-4de7-b501-eb48f955af0c)![image](https://githuLl ... #Article #NLP #LanguageModel #Japanese
Issue Date: 2024-09-25 LLM-jp-3 1.8B・3.7B・13B の公開, LLM.jp, 2024.09 CommentLLM-JP-Evalでの評価結果はこちら:https://huggingface.co/llm-jp/llm-jp-3-1.8b1.8Bのモデルが、モデルサイズに対して非常に性能が良いとのこと（確かに、3.8Bのモデルとの差があまりないように見える元ポスト:https://x.com/odashi ... #Article #NLP #LanguageModel #InstructionTuning #SelfCorrection #PostTraining
Issue Date: 2024-09-06 Reflection 70B, GlaiveAI, 2024.09 Commentただまあ仮に同じInputを利用していたとして、promptingは同じ（モデルがどのようなテキストを生成し推論を実施するかはpromptingのスコープではない）なので、そもそも同じInputなのでfair comparisonですよ、という話に仮になるのだとしたら、そもそもどういう設定で比較実験 ... #Article #Analysis #LanguageModel #Slide #Japanese
Issue Date: 2024-09-03 LLMに日本語テキストを学習させる意義, Koshiro Saito+, 第261回自然言語処理研究発表会, 2024.08 Comment英日翻訳や日本特有の知識を問われるようなQAにおいて、日本語データによる学習の効果があることが示唆されている模様。たとえば、#1359 に示されている通り、Llama2における日本語データの割合は0.2%とかなので、英語圏のOpenLLMにおいて、日本語データの比率がどれだけ少ないかがわかる。 ... #Article #Tutorial #NLP #LanguageModel #Slide
Issue Date: 2024-08-26 論文紹介 _ The Llama 3 Herd of Models, 2024.08 CommentLlama3の事前学習や事後学習のノウハウが詰まっており（安全性なども含む）、LLM学習に必要な要素が図解されており、非常に分かりやすい。たとえば下記図（スライド中より引用）などは、LLMの学習過程を説明する際にわかりやすそう ![image](https://github.com/useLLM ... #Article #NLP
Issue Date: 2024-08-24 Phi 3.5, Microsoft, 2024.08 CommentThe [Phi-3 model collection](https://ai.azure.com/explore/models?selectedCollection=phi) is the latest in Microsoft's family of Small Language Models ... #Article #NLP #Quantization
Issue Date: 2024-08-20 4-bit Llama 3.1, NeuralMagic, 2024.08 #Article #EfficiencyImprovement #Library #Article #LLMServing
Issue Date: 2024-08-05 DeepSpeed, vLLM, CTranslate2 で rinna 3.6b の生成速度を比較する, 2024.06 Comment[vllm](https://github.com/vllm-project/vllm)を使うのが一番お手軽で、inference速度が速そう。PagedAttentionと呼ばれるキャッシュを利用して高速化しているっぽい。（図はブログ中より引用） ![image](https://gitこちら ... #Article #NLP #Library
Issue Date: 2024-08-01 OpenLLM: Self-Hosting LLMs Made Easy CommentOpenLLMをself hostingする際に、OpenAIなどと同じインタフェースのAPIやChatを提供するライブラリ ... #Article #NLP
Issue Date: 2024-07-30 Gemma2, Google Deepmind, 2024 CommentReasoning, Math, CodeGenerationに強み![image](https://github.com/user-attachments/assets/b7f58129-1235-4812-9c5e-0607aa1bea66) ![image](https://github.co ... #Article #NLP #LanguageModel
Issue Date: 2024-07-25 Llama 3.1, 2024.07 CommentLlama系のモデルをFP8で学習する場合のレシピhttps://x.com/thom_wolf/status/1826924774997532799?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #Article #NLP #LanguageModel
Issue Date: 2024-07-11 大規模言語モデルの開発, 2024 #Article #NLP #LanguageModel
Issue Date: 2024-07-09 calm3-22B, 2024 Comment>LLMの日本語能力を評価するNejumi LLM リーダーボード3においては、700億パラメータのMeta-Llama-3-70B-Instructと同等の性能となっており、スクラッチ開発のオープンな日本語LLMとしてはトップクラスの性能となります（2024年7月現在）。モデルは商用利用可能なA ... #Article #NLP #LanguageModel
Issue Date: 2024-07-03 Llama 3 Swallow #Article #NLP #LanguageModel
Issue Date: 2024-04-18 LLaMA3, Apr, 2024 Commentライセンスによると、LLaMA3を利用したモデルはどんな場合でもLlama3をprefixとして付与しないといけないらしい元ツイート:https://x.com/gneubig/status/1781083579273089442?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QLLaMA ...

#Article #NLP #LanguageModel
Issue Date: 2024-04-10 Mixtral-8x22B-v0.1, 2024 CommentApache-2.0ライセンス, 日本語非対応 ... #Article #NLP #LanguageModel #ProprietaryLLM
Issue Date: 2024-04-10 Command R+, Cohere, 2024 CommentChatbot arenaでGPT-4-0314と同等の Elo Rate を獲得し（20240410時点）、日本語を含む10ヶ国語をサポート。コンテキストウィンドウサイズ128k。商用利用はAPIから、研究目的であればHuggingFaceから利用可能。 ...

#Article #NLP #LanguageModel
Issue Date: 2024-04-08 Gemma: Open Models Based on Gemini Research and Technology, 2024 CommentアーキテクチャはTransformer Decoderを利用。モデルのサイズは2Bと7B。オリジナルのTransformer Decoderアーキテクチャから、下記改善を実施している： Multi Query Attention #1272 を利用 RoPE Embedding #1Mistral ...