DiffusionModelに関する論文・技術記事メモの一覧

DiffusionModel

#EfficiencyImprovement #Pocket #NLP #LanguageModel
Issue Date: 2025-06-25 Mercury: Ultra-Fast Language Models Based on Diffusion, Inception Labs+, arXiv25 Comment元ポスト:https://x.com/arankomatsuzaki/status/1937360864262389786?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Qスループット（モデルのトークン生成速度）が、SoTAらしいdLLMモデル解説:https://x.com/hillbi ... #ComputerVision #Pocket #Transformer #VideoGeneration/Understandings
Issue Date: 2025-06-13 Seedance 1.0: Exploring the Boundaries of Video Generation Models, Yu Gao+, arXiv25 Comment元ポスト:https://x.com/scaling01/status/1933048431775527006?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #ComputerVision #Pocket #CVPR
Issue Date: 2025-06-06 Generative Omnimatte: Learning to Decompose Video into Layers, Yao-Chih Lee+, CVPR25 Comment元ポスト:https://x.com/yaochihlee/status/1930473521081397253?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Qざっくりしか読めていないが、Inputとして動画とmask（白:残す, 黒:消す, グレー: 不確定なオブジェクトやエフェクトが ...

#ComputerVision #Pocket #NLP #LanguageModel #MulltiModal
Issue Date: 2025-05-24 LaViDa: A Large Diffusion Language Model for Multimodal Understanding, Shufan Li+, arXiv25 Comment元ポスト:https://x.com/iscienceluvr/status/1925749919312159167?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QDiffusion Modelの波が来た同程度のサイズのARモデルをoutperform [^1]![image](http ... #EfficiencyImprovement #Pocket #NLP #LanguageModel
Issue Date: 2025-05-24 dKV-Cache: The Cache for Diffusion Language Models, Xinyin Ma+, arXiv25 Comment元ポスト:https://x.com/arankomatsuzaki/status/1925384029718946177?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q提案手法を適用した場合、ARなモデルとDiffusion Modelで、実際のところどの程度のdecoding速度の差 ... #Embeddings #Pocket #NLP #LanguageModel
Issue Date: 2025-05-24 Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective, Siyue Zhang+, arXiv25 Comment元ポスト:https://x.com/trtd6trtd/status/1925775950500806742?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #Pocket #NLP #LanguageModel #Supervised-FineTuning (SFT)#ReinforcementLearning #Reasoning #PostTraining #GRPO
Issue Date: 2025-04-18 d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning, Siyan Zhao+, arXiv25 Comment元ポスト:https://x.com/iscienceluvr/status/1912785180504535121?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-QdLLMに対してGRPOを適用する手法(diffuGRPO)を提案している。long CoTデータでSFTしてreasoni ... #ComputerVision #Pocket #NLP #LanguageModel
Issue Date: 2025-03-02 Large Language Diffusion Models, Shen Nie+, arXiv25 Comment元ポスト:https://x.com/dair_ai/status/1893698288328602022?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q参考:https://x.com/karpathy/status/1894923254864978091 ... #Tutorial #ComputerVision #Pocket
Issue Date: 2024-11-17 Tutorial on Diffusion Models for Imaging and Vision, Stanley H. Chan, arXiv24 Commentいつか読まなければならない ... #RecommenderSystems #Tutorial #LanguageModel #GenerativeAI
Issue Date: 2024-09-24 Recommendation with Generative Models, Yashar Deldjoo+, N_A, arXiv24 Comment生成モデルやGenerativeAIによるRecSysの教科書![image](https://github.com/user-attachments/assets/a76e5fd2-cd82-43f9-ac64-bb33c5fe1dc2) ... #ComputerVision #Pocket
Issue Date: 2024-09-01 Diffusion Models Are Real-Time Game Engines, Dani Valevski+, N_A, arXiv24 SummaryGameNGenは、ニューラルモデルによって完全に動作するゲームエンジンであり、高品質で長い軌跡上で複雑な環境とのリアルタイムインタラクションを可能にします。GameNGenは、単一のTPU上で秒間20フレーム以上でクラシックゲームDOOMをインタラクティブにシミュレートすることができます。次フレーム予測では、PSNRが29.4に達し、劣化JPEG圧縮と比較可能です。GameNGenは、2つの段階でトレーニングされます：（1）RLエージェントがゲームをプレイすることを学び、トレーニングセッションが記録され、（2）拡散モデルが過去のフレームとアクションのシーケンスに応じて次のフレームを生成するようにトレーニングされます。条件付きの拡張により、長い軌跡上で安定した自己回帰生成が可能となります。 CommentDiffusion Modelでゲーム映像を生成する取り組みらしい。ゲームのenvironmentに対して、ユーザのActionとframeの系列をエピソードとみなして生成するっぽい？project pageにデモがのっている https://gamengen.github.io/ ... #ComputerVision #Pocket #Personalization
Issue Date: 2023-07-22 FABRIC: Personalizing Diffusion Models with Iterative Feedback, Dimitri von Rütte+, N_A, arXiv23 Summary本研究では、拡散ベースのテキストから画像への変換モデルに人間のフィードバックを組み込む戦略を提案する。自己注意層を利用したトレーニングフリーなアプローチであるFABRICを提案し、さまざまな拡散モデルに適用可能であることを示す。また、包括的な評価方法を導入し、人間のフィードバックを統合した生成ビジュアルモデルのパフォーマンスを定量化するための堅牢なメカニズムを提供する。徹底的な分析により、反復的なフィードバックの複数のラウンドを通じて生成結果が改善されることを示す。これにより、個別化されたコンテンツ作成やカスタマイズなどの領域に応用が可能となる。 Commentupvote downvoteをフィードバックし、iterativeなmannerでDiffusionモデルの生成結果を改善できる手法。多くのDiffusion based Modelに対して適用可能デモ: https://huggingface.co/spaces/dvruette/fabric ... #ComputerVision #NaturalLanguageGeneration #NLP #MulltiModal #TextToImageGeneration
Issue Date: 2023-07-15 Learning to Imagine: Visually-Augmented Natural Language Generation, ACL23 Summary本研究では、視覚情報を活用した自然言語生成のためのLIVEという手法を提案しています。LIVEは、事前学習済み言語モデルを使用して、テキストに基づいて場面を想像し、高品質な画像を合成する方法です。また、CLIPを使用してテキストの想像力を評価し、段落ごとに画像を生成します。さまざまな実験により、LIVEの有効性が示されています。コード、モデル、データは公開されています。 Comment>まず、テキストに基づいて場面を想像します。入力テキストに基づいて高品質な画像を合成するために拡散モデルを使用します。次に、CLIPを使用して、テキストが想像力を喚起できるかを事後的に判断します。最後に、私たちの想像力は動的であり、段落全体に1つの画像を生成するのではなく、各文に対して合成を行います ... #ComputerVision #Pocket #NLP #Personalization #TextToImageGeneration
Issue Date: 2023-06-16 ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation, Shaozhe Hao+, N_A, arXiv23 Summary拡散モデルを用いたパーソナライズされた画像生成において、高速で軽量なプラグインメソッドであるViCoを提案。注目モジュールを導入し、注目ベースのオブジェクトマスクを使用することで、一般的な過学習の劣化を軽減。元の拡散モデルのパラメータを微調整せず、軽量なパラメータトレーニングだけで、最新のモデルと同等またはそれ以上の性能を発揮することができる。 #Article #Tutorial #Pretraining #MachineLearning #NLP #LanguageModel #Transformer #Chain-of-Thought #In-ContextLearning #Attention #SSM (StateSpaceModel)#Scaling Laws #PostTraining
Issue Date: 2025-05-31 2025年度人工知能学会全国大会チュートリアル講演「深層基盤モデルの数理」, Taiji Suzuki, 2025.05 Comment元ポスト:https://x.com/btreetaiji/status/1927678122817921442?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q ... #Article #Tutorial #ComputerVision #NLP #LanguageModel #Slide
Issue Date: 2025-05-24 【DL輪読会】 Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models, Deep Learning JP, 2025.05 Comment元ポスト:https://x.com/kym384/status/1925852937835737569?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q#1986 でLiteratureをざっくり把握してからこちらを読むのが良さそう。 ... #Article #Tutorial #ComputerVision #NLP #LanguageModel #Slide
Issue Date: 2025-05-24 Masked Diffusion Modelの進展, Deep Learning JP, 2025.03 Comment元ポスト:https://x.com/kym384/status/1925852884656099572?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Qスライド中のARのようにKV Cacheが使えない問題に対処した研究が#1984この辺はdLLMが有望であれば、どんどん進化していくのだ ... #Article #NLP #LanguageModel #OpenWeight
Issue Date: 2025-04-08 Dream-v0-Instruct-7B, Dream-org, 2025.04 CommentOpenWeightな拡散言語モデル元ポスト:https://x.com/curveweb/status/1909551257725133132?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q関連:#1776 ... #Article #Survey #ComputerVision #NaturalLanguageGeneration #NLP #LanguageModel #ImageCaptioning
Issue Date: 2023-11-02 Zero-shot Learning網羅的サーベイ: CLIPが切り開いたVision & Languageの新しい世界 Commentこれはすごいまとめ…。まだ途中までしか読めていない。CLIPからスタートしてCLIPを引用している論文から重要なものを概要付きでまとめている。 ... #Article #NeuralNetwork #ComputerVision #EfficiencyImprovement #NLP #LanguageModel #Blog
Issue Date: 2023-10-29 StableDiffusion, LLMのGPUメモリ削減のあれこれ CommentGradient Accumulation, Gradient Checkpointingの説明が丁寧でわかりやすかった。 ...