(1)预训练和下游任务的优化目标之间差异过大; ➡️ “mimics the fine-tuning step within the pre-training step, and thus learns how to fine-tune during the pre-training process itself.”
(2)预训练中如何完成节点级别+图级别的自监督学习;(注意是节点和图级别都得是自监督,不需要任何额外的标签)➡️ “propose a self-supervised strategy with a dual adaptation mechanism, which is equipped with both node- and graph-level adaptations.” +“a sub-structure should be close to the whole graph in the representation space.”
已经设定好了预训练的优化目标($L^{pre}$)和微调的优化目标($L^{fine}$)
(1)预训练(self-supervised):
$$ \theta_0 = \argmin_{\theta}L^{pre}(f_{\theta};\mathcal{D}^{pre}) $$
(2)微调(supervised):
$$ \theta_1 = \theta_0-\eta\nabla_{\theta_0}L^{fine}(f_{\theta_0};\mathcal{D}^{tr}) $$
这个公式表示模型$f_{\theta_0}$在数据集$D^{tr}$下,以$L^{fine}$为目标函数、$\eta$为学习率进行一次梯度下降更新,得到新的模型参数$\theta_1$;
(1) 预训练:
从预训练数据集$\mathcal{D}^{pre}$上采样出模拟微调任务的训练集$\mathcal{D}^{tr}{\mathcal{T_G}}$(also called support set in MAML framework)和测试集$\mathcal{D}^{te}{\mathcal{T_G}}$(query set in MAML framework);
先在模拟任务的训练集$\mathcal{D}^{tr}_{\mathcal{T_G}}$上做“基于预训练目标函数$L^{pre}$的微调”, 得到如下模型: