Monitorabilityに関する論文・技術記事メモの一覧

Monitorability

[Paper Note] MAS-ProVe: Understanding the Process Verification of Multi-Agent Systems, Vishal Venkataramani+, ICML'26, 2026.02

Paper/Blog Link My Issue
#NLP #LanguageModel #AIAgents #ICML #read-later #Selected Papers/Blogs #Verification #Author Thread-Post Issue Date: 2026-05-09 GPT Summary- MAS-ProVeは、マルチエージェントシステム（MAS）におけるプロセス検証の体系的研究を提示し、LLMを利用して検証の有効性を評価。エージェントレベルとイテレーションレベルでの評価を行い、5つの検証手法を検討。結果として、プロセスレベルの検証は必ずしも性能向上にはつながらず、高い分散が見られることが判明。LLMを判定者として用いるアプローチが効果的である一方、コンテキスト長と性能のトレードオフも観察。MAS向けの堅牢な検証法にはさらなる進展が必要であることが示された。 Comment

元ポスト:

Loading…

MASにおいてprocess levelのverificationを導入しても一貫して性能が向上するわけではなく、（途中推論の妥当性を判断するタスクは困難なものであることから既存の様々なverification手法には限界があり）分散が高いことが明らかになったとのこと。MASのような複雑なシステムはverificationによるプロセスレベルの精査が必要だと思われるので、現在の限界が示された点で重要な研究に見える。

[Paper Note] Analyzing and Improving Chain-of-Thought Monitorability Through Information Theory, Usman Anwar+, arXiv'26, 2026.02

Paper/Blog Link My Issue
#NLP #LanguageModel #AIAgents #Chain-of-Thought #Reasoning #Safety Issue Date: 2026-02-24 GPT Summary- CoTモニターは、推論の痕跡を分析し、LLMベースのシステムで出力の興味属性を検出する手法です。本稿では、CoTと出力間の相互情報量がモニタビリティの必要条件であることを示し、性能を損なう二つの誤差源を特定します。情報ギャップは抽出可能な情報量を、誘発誤差は監視関数の近似度を測ります。訓練目的を最適化してCoTモニタビリティを向上させる二つの補完的アプローチを提案：オラクルベース手法と条件付き相互情報量の最大化。これにより、モニターの精度向上とリワードハッキングの緩和を実証します。 Comment

元ポスト:

Loading…

Investigating the consequences of accidentally grading CoT during RL, OpenAI, 2026.05

Paper/Blog Link My Issue
#Article #NLP #LanguageModel #Chain-of-Thought #Author Thread-Post Issue Date: 2026-05-12 Comment

元ポスト:

Loading…

Evaluating chain-of-thought monitorability, OpenAI, 2025.12

Paper/Blog Link My Issue
#Article #NLP #LanguageModel #AIAgents #Chain-of-Thought #Evaluation #Reasoning #Safety #KeyPoint Notes Issue Date: 2025-12-19 Comment

元ポスト:

Loading…

Monitorabilityの定義が記述されているので覚えておくとよいかもしれない。
Monitorabilityを一言でいうと、"エージェントの挙動に関する重要な性質を正確に予測できる能力"ということっぽい。

Monitorabilityは以下の2つの側面を持ち、
>- Monitors. We could make monitors smarter in various ways, and we would like our evaluations to give higher scores to monitors that can make better predictions about an agent’s behavior.
>- Agents. There are many observations of the agent that could be given to the monitor, including the agent’s actions, outputs, neural activations, and recently with the advent of reasoning models, the chain-of-thought. It’s possible for some agents to have thoughts that are harder to understand than others, and so we would like our monitorability evaluations to be able to compare agents and not just monitors.

モニターそのものの賢さと、モニタリングされるエージェントも理解しやすいものとしづらいものがいるため、モニターの性能だけじゃなくてエージェントそのものも評価の対象にする必要がある。

monitarable / unmonitorable なCoTのサンプルとして以下が挙げられている。これは実際には実現不可能なコーディング課題が与えられているらしいのだが、左側はtestをpassしている（つまりなんかチートした）がそのことについてはCoTに言及されていない。一方、右側はチートに関して検討する様子がCoTに記述されており、これを見ることでユーザはモデルのよろしくない挙動に気付ける。