HATT v1.0：pair-wise 编解码

核心思想：把 logits 的每一维显式绑定到一条 agent-task 边

Legacy GAT 的问题

节点级聚合： x_i → GATConv → h_i^(2) ↓ GlobalPool ↓ MLP → logits_i logits_i[m] ≠ f(agent_i, task_m)

输出是节点 embedding，不是边得分
GlobalPool 对任务排列 invariant
不满足任务索引等变性

HATT v1.0 的改进

边级解码： agent_i ──┐ ├──→ Decoder → score_{i,j} task_j ──┘ ↑ edge_{ij} logits_i = [score_{i,1}, ..., score_{i,M}]

每条 agent-task 边独立打分
logits_i[m] 显式对应 pair[i,m]
交换 task_i 和 task_j，logits 自动交换

        关键转变：从 "图中节点的表示学习" 转向 "动作空间中每一条候选边的显式评分"。
    

HATT v1.0 架构：三个 Encoder + 两种注意力 + 边级 Decoder

输入 Dict Obs 编码器关系注意力边级解码 ┌─────────┐ │ ego_i │──→ Self Encoder ──→ self_h ┐ └─────────┘ │ ┌─────────┐ ├──→ aa_attention ──→ aa_context │ others_i│──→ Other Encoder ──→ other_h │ └─────────┘ │ ┌─────────┐ ├──→ ta_attention ──→ ta_context │ tasks_i │──→ Task Encoder ──→ task_h │ └─────────┘ ┘ ↓ fusion([self_h, aa_context, ta_context]) ↓ agent_context ↓ TaskActionDecoder(agent_context, task_h, edge_features) ↓ logits_i ∈ ℝ^M

维度速览（默认配置）

变量	形状	说明
`ego_i`	$\mathbb{R}^{11}$	agent 自身观测
`others_i`	$\mathbb{R}^{12 \times 11}$	其他 agent 观测
`tasks_i`	$\mathbb{R}^{10 \times 8}$	任务观测
`self_h / other_h / task_h`	$\mathbb{R}^{64}$	编码后隐向量
`aa_context / ta_context`	$\mathbb{R}^{64}$	关系上下文
`logits_i`	$\mathbb{R}^{10}$	每条 agent-task 边一个分数

公式：从 $o_i$ 到 $\text{logits}_i$

1. 三个 Encoder

$$\text{self}_h = \text{MLP}_{\text{self}}(\text{ego}_i) \in \mathbb{R}^{64}$$ $$\text{other}_h^{(n)} = \text{MLP}_{\text{other}}(\text{others}_{i,n}) \in \mathbb{R}^{64}, \quad n=1,\dots,N_a$$ $$\text{task}_h^{(m)} = \text{MLP}_{\text{task}}(\text{tasks}_{i,m}) \in \mathbb{R}^{64}, \quad m=1,\dots,M$$

2. 关系注意力（RelationAttention）

对 agent-agent 关系：

$$\text{aa\_context}_i = \text{RelationAttention}(\text{query}=\text{self}_h, \; \text{neighbors}=\{\text{other}_h^{(n)}\}_{n=1}^{N_a}, \; \text{edge}=\text{aa\_features})$$

对 task-agent 关系：

$$\text{ta\_context}_i = \text{RelationAttention}(\text{query}=\text{self}_h, \; \text{neighbors}=\{\text{task}_h^{(m)}\}_{m=1}^{M}, \; \text{edge}=\text{at\_features})$$

边特征仅 3 维：相对位置 2D + 一个关系标志位。

3. Fusion：融合自身与关系上下文

$$\text{agent\_context}_i = \text{LayerNorm}\left( \text{self}_h + \text{MLP}_{\text{fusion}}\big([\text{self}_h \;\|\; \text{aa\_context}_i \;\|\; \text{ta\_context}_i]\big) \right)$$

4. TaskActionDecoder：边级解码

$$\text{score}_{i,m} = \underbrace{\frac{\text{agent\_context}_i^\top W_q^\top W_k \, \text{task}_h^{(m)}}{\sqrt{64}}}_{\text{点积注意力}} + \underbrace{\text{MLP}_{\text{bias}}(\text{at\_features}_{i,m})}_{\text{边特征偏置}}$$ $$\text{logits}_i = [\text{score}_{i,1}, \; \text{score}_{i,2}, \; \dots, \; \text{score}_{i,M}] \in \mathbb{R}^{M}$$

5. 动作分布

$$\pi_i = \text{softmax}(\text{logits}_i) \in \mathbb{R}^{M}$$

为什么 HATT v1.0 满足任务索引等变性？

假设交换输入中的 $\text{task}_i$ 和 $\text{task}_j$：

$$ \text{tasks}_i = [\dots, \text{task}_i, \dots, \text{task}_j, \dots] \;\xrightarrow{\text{交换}}\; \text{tasks}_i' = [\dots, \text{task}_j, \dots, \text{task}_i, \dots] $$

由于 Task Encoder 是共享的，编码后的隐向量只是位置互换：

$$ [\dots, \text{task}_h^{(i)}, \dots, \text{task}_h^{(j)}, \dots] \;\xrightarrow{\text{交换}}\; [\dots, \text{task}_h^{(j)}, \dots, \text{task}_h^{(i)}, \dots] $$

Decoder 对每个位置 $m$ 计算 $\text{score}_{i,m} = f(\text{agent\_context}_i, \text{task}_h^{(m)}, \text{at\_features}_{i,m})$，因此输出 logits 也会相应交换：

$$\text{logits}_i = [\dots, \text{score}_{i,i}, \dots, \text{score}_{i,j}, \dots]$$ $$\text{logits}_i' = [\dots, \text{score}_{i,j}, \dots, \text{score}_{i,i}, \dots]$$

        结论：HATT v1.0 通过共享 Task Encoder 和位置对应的边级 Decoder，天然满足任务索引等变性。$\text{logits}_i[m]$ 显式对应 $\text{pair}[i,m]$。
    

文字总结：HATT v1.0 改进了什么？

Legacy GAT	HATT v1.0
全局建图，所有节点统一处理	以 focal agent 为中心，分别编码 ego/others/tasks
GATConv 输出节点 embedding	RelationAttention 输出关系上下文
GlobalPool 后接无约束 MLP	TaskActionDecoder 对每条 agent-task 边显式打分
logits 与任务无显式对应	logits[m] 显式对应 pair[i,m]
不满足任务索引等变性	满足任务索引等变性

仍然存在的不足：HATT v1.0 的 task 表示没有包含联盟信息，即不知道哪些 agent 已经加入某个任务。这由 HATT v2.0 的 Coalition Aggregation 解决。

HATT v1.0：从节点聚合到 pair-wise 编解码