update

922cbbb7 · 曹润柘 · c1d2097e · 922cbbb7 · 922cbbb7 · 922cbbb7
Commit 922cbbb7 authored Dec 14, 2020 by 曹润柘
--- a/Chapter16/Figures/figure-bilingual-dictionary-Induction.tex
+++ b/Chapter16/Figures/figure-bilingual-dictionary-Induction.tex
@@ -9,7 +9,7 @@
 \draw [-,thick] (-0.7,1.0)--(-0.7,-1.0);

 \node [anchor=center](c1) at (-0.1,0){\tiny{$\mathbi{Y}$}};
-\node [anchor=center](c2) at (-0.3,-0.7){\tiny{$\mathbi{W}\cdot \mathbi{X}$}};
+\node [anchor=center](c2) at (-0.3,-0.7){\tiny{$\mathbi{W} \mathbi{X}$}};
 \node [anchor=center,red!70](cr1) at (0.65,-0.65){\scriptsize{$\bullet$}}; 
 \node [anchor=center,ublue](cb1) at (0.6,-0.5){\scriptsize{$\bullet$}};
 \node [anchor=center,red!70](cr2) at (1.65,-0.65){\scriptsize{$\bullet$}}; 
@@ -30,7 +30,7 @@
 \draw [-,thick] (-0.7,1.0)--(-0.7,-1.0);

 \node [anchor=center](c1) at (-0.1,0){\tiny{$\mathbi{Y}$}};
-\node [anchor=center](c2) at (-0.3,-0.7){\tiny{$\mathbi{W}\cdot \mathbi{X}$}};
+\node [anchor=center](c2) at (-0.3,-0.7){\tiny{$\mathbi{W} \mathbi{X}$}};
 \node [anchor=center,red!70](cr1) at (0.65,-0.65){\scriptsize{$\bullet$}}; 
 \node [anchor=center,ublue](cb1) at (0.6,-0.5){\scriptsize{$\bullet$}};
 \node [anchor=center,red!70](cr2) at (1.65,-0.65){\scriptsize{$\bullet$}}; 
@@ -136,7 +136,7 @@
 \draw [-,thick] (-0.8,1.0)--(-0.8,-1.0);

 \node [anchor=center](c1) at (0.1,0.6){\tiny{$\mathbi{Y}$}};
-\node [anchor=center](c2) at (-0.45,-0.7){\tiny{$\mathbi{W}\cdot \mathbi{X}$}};
+\node [anchor=center](c2) at (-0.45,-0.7){\tiny{$\mathbi{W} \mathbi{X}$}};

 \node [anchor=center,red!70](cr1) at (0.2,-0.35){\scriptsize{$\bullet$}};
 \node [anchor=center,red!70](cr2) at (1.58,-0.78){\scriptsize{$\bullet$}};

--- a/Chapter16/Figures/figure-shared-space-inductive-bilingual-dictionary.tex
+++ b/Chapter16/Figures/figure-shared-space-inductive-bilingual-dictionary.tex
@@ -88,8 +88,8 @@

 \node [anchor=north](part1) at ([yshift=0.5em]circle1.south){\small{$\mathbi{X}$}};
 \node [anchor=west](part2) at ([xshift=6em]part1.east){\small{$\mathbi{Y}$}};
-\node [anchor=west](part3) at ([xshift=8.2em]part2.east){\small{$\mathbi{X}\cdot \mathbi{W}$}};
-\node [anchor=west](part3) at ([xshift=15.0em]part2.east){\small{$\mathbi{X}\cdot \mathbi{W}$和$\mathbi{Y}$在同一空间}};
+\node [anchor=west](part3) at ([xshift=8.5em]part2.east){\small{$\mathbi{X} \mathbi{W}$}};
+\node [anchor=west](part3) at ([xshift=15.0em]part2.east){\small{$\mathbi{X} \mathbi{W}$和$\mathbi{Y}$在同一空间}};

 \node [anchor=center](c1) at (5.4,-1.0){\small{$\mathbi{W}$}};


--- a/Chapter16/chapter16.tex
+++ b/Chapter16/chapter16.tex
@@ -595,19 +595,19 @@

 \begin{itemize}
 \vspace{0.5em}
-\item 基于GAN的方法\upcite{DBLP:conf/iclr/LampleCRDJ18,DBLP:conf/acl/ZhangLLS17,DBLP:conf/emnlp/XuYOW18,DBLP:conf/naacl/MohiuddinJ19}。在这个方法中，通过生成器来产生映射$\mathbi{W}$，鉴别器负责区分随机抽样的元素$\mathbi{W}\cdot \mathbi{X}$ 和$\mathbi{Y}$，两者共同优化收敛后即可得到映射$\mathbi{W}$。
+\item 基于GAN的方法\upcite{DBLP:conf/iclr/LampleCRDJ18,DBLP:conf/acl/ZhangLLS17,DBLP:conf/emnlp/XuYOW18,DBLP:conf/naacl/MohiuddinJ19}。在这个方法中，通过生成器来产生映射$\mathbi{W}$，鉴别器负责区分随机抽样的元素$\mathbi{W} \mathbi{X}$ 和$\mathbi{Y}$，两者共同优化收敛后即可得到映射$\mathbi{W}$。
 \vspace{0.5em}
 \item 基于Gromov-Wasserstein 的方法\upcite{DBLP:conf/emnlp/Alvarez-MelisJ18,DBLP:conf/lrec/GarneauGBDL20,DBLP:journals/corr/abs-1811-01124,DBLP:conf/emnlp/XuYOW18}。Wasserstein距离是度量空间中定义两个概率分布之间距离的函数。在这个任务中，它用来衡量不同语言中单词对之间的相似性，利用空间近似同构的信息可以定义出一些目标函数，之后通过优化该目标函数也可以得到映射$\mathbi{W}$。
 \vspace{0.5em}
 \end{itemize}

-\parinterval 在得到映射$\mathbi{W}$之后，对于$\mathbi{X}$中的任意一个单词$x_{i}$，通过$\mathbi{W}\cdot \mathbi{E}({x}_{i})$将其映射到空间$\mathbi{y}$中（$\mathbi{E}({x}_{i})$表示的是单词$x_{i}$的词嵌入向量），然后在$\mathbi{Y}$中找到该点的最近邻点$y_{j}$，于是$y_{j}$就是$x_{i}$的翻译词，重复该过程即可归纳出种子词典$D$，第一阶段结束。事实上，由于第一阶段缺乏监督信号，得到的种子词典$D$会包含大量的噪音，性能并不高，因此需要进行进一步的微调。
+\parinterval 在得到映射$\mathbi{W}$之后，对于$\mathbi{X}$中的任意一个单词$x_{i}$，通过$\mathbi{W} \mathbi{E}({x}_{i})$将其映射到空间$\mathbi{y}$中（$\mathbi{E}({x}_{i})$表示的是单词$x_{i}$的词嵌入向量），然后在$\mathbi{Y}$中找到该点的最近邻点$y_{j}$，于是$y_{j}$就是$x_{i}$的翻译词，重复该过程即可归纳出种子词典$D$，第一阶段结束。事实上，由于第一阶段缺乏监督信号，得到的种子词典$D$会包含大量的噪音，性能并不高，因此需要进行进一步的微调。

-\parinterval 微调的原理普遍基于普氏分析\upcite{DBLP:journals/corr/MikolovLS13}。假设现在有一个种子词典$D=\left\{x_{i}, y_{i}\right\}$其中${i \in\{1, n\}}$，和两个单语词嵌入$\mathbi{X}$和$\mathbi{Y}$，那么就可以将$D$作为{\small\bfnew{映射锚点}}\index{映射锚点}（Anchor\index{Anchor}）学习一个转移矩阵$\mathbi{W}$，使得$\mathbi{W}\cdot \mathbi{X}$与$\mathbi{Y}$这两个空间尽可能相近，此外通过对$\mathbi{W}$施加正交约束可以显著提高能\upcite{DBLP:conf/naacl/XingWLL15}，于是这个优化问题就转变成了{\small\bfnew{普鲁克问题}}\index{普鲁克问题}（Procrustes Problem\index{Procrustes Problem}）\upcite{DBLP:conf/iclr/SmithTHH17}，可以通过{\small\bfnew{奇异值分解}}\index{奇异值分解}（Singular Value Decomposition，SVD\index{Singular Value Decomposition，SVD}）来获得近似解：
+\parinterval 微调的原理普遍基于普氏分析\upcite{DBLP:journals/corr/MikolovLS13}。假设现在有一个种子词典$D=\left\{x_{i}, y_{i}\right\}$其中${i \in\{1, n\}}$，和两个单语词嵌入$\mathbi{X}$和$\mathbi{Y}$，那么就可以将$D$作为{\small\bfnew{映射锚点}}\index{映射锚点}（Anchor\index{Anchor}）学习一个转移矩阵$\mathbi{W}$，使得$\mathbi{W} \mathbi{X}$与$\mathbi{Y}$这两个空间尽可能相近，此外通过对$\mathbi{W}$施加正交约束可以显著提高能\upcite{DBLP:conf/naacl/XingWLL15}，于是这个优化问题就转变成了{\small\bfnew{普鲁克问题}}\index{普鲁克问题}（Procrustes Problem\index{Procrustes Problem}）\upcite{DBLP:conf/iclr/SmithTHH17}，可以通过{\small\bfnew{奇异值分解}}\index{奇异值分解}（Singular Value Decomposition，SVD\index{Singular Value Decomposition，SVD}）来获得近似解：

 \begin{eqnarray}
-\mathbi{W}^{\star} & = &\underset{\mathbi{W} \in O_{d}(\mathbb{R})}{\operatorname{argmin}}\|\mathbi{W}\cdot \mathbi{X}'- \mathbi{Y}' \|_{\mathrm{F}}=\mathbi{U}\cdot \mathbi{V}^{\rm{T}} \\ \label{eq:16-9}
-\textrm{s.t.\ \ \ \ } \mathbi{U} \Sigma \mathbi{V}^{\rm{T}} &= &\operatorname{SVD}\left(\mathbi{Y}'\cdot \mathbi{X}'^{\rm{T}}\right)
+\mathbi{W}^{\star} & = &\underset{\mathbi{W} \in O_{d}(\mathbb{R})}{\operatorname{argmin}}\|\mathbi{W} \mathbi{X}'- \mathbi{Y}' \|_{\mathrm{F}}=\mathbi{U} \mathbi{V}^{\rm{T}} \\ \label{eq:16-9}
+\textrm{s.t.\ \ \ \ } \mathbi{U} \Sigma \mathbi{V}^{\rm{T}} &= &\operatorname{SVD}\left(\mathbi{Y}' \mathbi{X}'^{\rm{T}}\right)
 \label{eq:16-10}
 \end{eqnarray}