わたしのべんきょうノート

勉強した論文や技術等の情報をGithubのIssueにメモっているひとのブログ。それなりにメモの量が蓄積されてきたので、一度整理したいなと思いブログはじめてみました！自然言語処理(NLP), 推薦システム(RecommenderSystem), Educational Data Mining (EDM), Learning Analytics (LA)などの分野のメモが多いと思います。最近は特にLLMの勉強が多めです :)

May 06, 2026 AkihikoWATANABE About 4 mins

Light

Dark

MultiView

[Paper Note] From Correspondence to Actions: Human-Like Multi-Image Spatial Reasoning in Multi-modal Large Language Models, Masanari Oi+, arXiv'26, 2026.02

Paper/Blog Link My Issue
#Supervised-FineTuning (SFT) #MultiModal #Reasoning #ICML #PostTraining #GRPO #VisionLanguageModel #SpatialUnderstanding #Author Thread-Post Issue Date: 2026-05-06 GPT Summary- 視点間対応と逐次的視点変換を強化するために、HATCH（Human-Aware Training for Cross-view correspondence and viewpoint cHange）を提案。これにより、空間的整合性を促進し、視点遷移アクションを生成して推論を改善。実験結果は、HATCHが同規模のモデルを上回り、大規模モデルとも競合する性能を示した。 Comment

元ポスト:

Loading…

Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning, Google Deepmind, 2026.04

Paper/Blog Link My Issue
#Article #ComputerVision #NLP #Reasoning #Proprietary #Robotics #VisionLanguageActionModel #SpatialUnderstanding #Reference Collection #Initial Impression Notes #Author Thread-Post Issue Date: 2026-04-15 Comment

元ポスト:

Loading…

おー、とうとうDeepmindからVLAがでた。プロプライエタリモデル

私が知らなかっただけで、以前からリリースされていたようだ:
- Building the Next Generation of Physical Agents with Gemini Robotics-ER 1.5, Google, 2025.09
- https://developers.googleblog.com/en/building-the-next-generation-of-physical-agents-with-gemini-robotics-er-15/

ポイント解説:

Loading…