合并分支 'shanweiqiao' 到 'caorunzhe'

15章文字查看合并请求 !939

合并分支 'shanweiqiao' 到 'caorunzhe'
15章文字查看合并请求 !939
24236d9b · 单韦乔 · aad92e64 · 6247b949 · 24236d9b · 24236d9b
Commit 24236d9b authored Jan 18, 2021 by 单韦乔
--- a/Chapter13/Figures/figure-bpe.tex
+++ b/Chapter13/Figures/figure-bpe.tex
@@ -11,11 +11,11 @@

 \node [anchor=west,tnode] (n3) at ([xshift=5em,yshift=0em]n2.east) {};
 \node [anchor=north west,align=left,font=\footnotesize] (n31) at ([xshift=0.2em,yshift=-0.2em]n3.north west) {{\small 词表}\\low\\lower\\newest\\widest\\$\ldots$};
-\node [anchor=north east,align=center,font=\footnotesize] (n32) at ([xshift=-0.2em,yshift=-0.2em]n3.north east) {{\small 频率}\\6\\2\\6\\3\\$\ldots$};
+\node [anchor=north east,align=center,font=\footnotesize] (n32) at ([xshift=-0.2em,yshift=-0.2em]n3.north east) {{\small 频次}\\6\\2\\6\\3\\$\ldots$};

 \node [anchor=west,tnode] (n4) at ([xshift=5em,yshift=0em]n3.east) {};
 \node [anchor=north west,align=left,font=\footnotesize] (n41) at ([xshift=0.2em,yshift=-0.2em]n4.north west) {{\small 词表}\\l/o/w\\l/o/w/e/r\\n/e/w/e/s/t\\w/i/d/e/s/t\\$\ldots$};
-\node [anchor=north east,align=center,font=\footnotesize] (n42) at ([xshift=-0.2em,yshift=-0.2em]n4.north east) {{\small 频率}\\6\\2\\6\\3\\$\ldots$};
+\node [anchor=north east,align=center,font=\footnotesize] (n42) at ([xshift=-0.2em,yshift=-0.2em]n4.north east) {{\small 频次}\\6\\2\\6\\3\\$\ldots$};


 \begin{pgfonlayer}{background}
@@ -32,13 +32,13 @@

 \node [anchor=north west,tnode] (n6) at ([xshift=0em,yshift=-0.5em]n5.south west) {};
 \node [anchor=north west,align=left,font=\footnotesize] (n61) at ([xshift=0.2em,yshift=-0.2em]n6.north west) {{\small 词表}\\l/o/w\\l/o/w/e/r\\n/e/w/e/s/t\\w/i/d/e/s/t\\$\ldots$};
-\node [anchor=north east,align=center,font=\footnotesize] (n62) at ([xshift=-0.2em,yshift=-0.2em]n6.north east) {{\small 频率}\\6\\2\\6\\3\\$\ldots$};
+\node [anchor=north east,align=center,font=\footnotesize] (n62) at ([xshift=-0.2em,yshift=-0.2em]n6.north east) {{\small 频次}\\6\\2\\6\\3\\$\ldots$};

 \draw [->,thick,ublue] ([xshift=-0em,yshift=-0em]n4.south) .. controls +(south:4em) and +(north:4em) .. ([xshift=1em,yshift=-0em]n6.north);

 \node [anchor=west,pnode] (n7) at ([xshift=5em,yshift=0em]n6.east) {};
 \node [anchor=north west,align=left,font=\footnotesize] (n71) at ([xshift=0.2em,yshift=-0.2em]n7.north west) {{\small 二元组}\\(e,s)\\(s,t)\\(l,o)\\(o,w)\\$\ldots$};
-\node [anchor=north east,align=center,font=\footnotesize] (n72) at ([xshift=-0.2em,yshift=-0.2em]n7.north east) {{\small 频率}\\9\\9\\8\\8\\$\ldots$};
+\node [anchor=north east,align=center,font=\footnotesize] (n72) at ([xshift=-0.2em,yshift=-0.2em]n7.north east) {{\small 频次}\\9\\9\\8\\8\\$\ldots$};

 \node [anchor=west,mnode] (n8) at ([xshift=5em,yshift=0em]n7.east) {};
 \node [anchor=north,align=center,font=\footnotesize] (n81) at ([xshift=0em,yshift=-0.2em]n8.north) {{\small 符号合并表}\\(e,s)};
@@ -51,14 +51,14 @@
 %第三排
 \node [anchor=north,tnode] (n9) at ([xshift=0em,yshift=-2.5em]n6.south) {};
 \node [anchor=north west,align=left,font=\footnotesize] (n91) at ([xshift=0.2em,yshift=-0.2em]n9.north west) {{\small 词表}\\l/o/w\\l/o/w/e/r\\n/e/w/{\red es}/t\\w/i/d/{\red es}/t\\$\ldots$};
-\node [anchor=north east,align=center,font=\footnotesize] (n92) at ([xshift=-0.2em,yshift=-0.2em]n9.north east) {{\small 频率}\\6\\2\\6\\3\\$\ldots$};
+\node [anchor=north east,align=center,font=\footnotesize] (n92) at ([xshift=-0.2em,yshift=-0.2em]n9.north east) {{\small 频次}\\6\\2\\6\\3\\$\ldots$};

 \draw [->,thick,ublue] ([xshift=-0em,yshift=-0em]n8.south) .. controls +(south:3em) and +(north:3em) .. ([xshift=1em,yshift=-0em]n9.north);
 \node [anchor=north west,ublue,font=\footnotesize,align=left] (l1) at ([xshift=1em,yshift=-0em]n7.south east) {在词表中\\[0.8ex]合并(e,s)};

 \node [anchor=west,pnode] (n10) at ([xshift=5em,yshift=0em]n9.east) {};
 \node [anchor=north west,align=left,font=\footnotesize] (n101) at ([xshift=0.2em,yshift=-0.2em]n10.north west) {{\small 二元组}\\(es,t)\\(l,o)\\(o,w)\\(n,e)\\$\ldots$};
-\node [anchor=north east,align=center,font=\footnotesize] (n102) at ([xshift=-0.2em,yshift=-0.2em]n10.north east) {{\small 频率}\\9\\8\\8\\6\\$\ldots$};
+\node [anchor=north east,align=center,font=\footnotesize] (n102) at ([xshift=-0.2em,yshift=-0.2em]n10.north east) {{\small 频次}\\9\\8\\8\\6\\$\ldots$};

 \node [anchor=west,mnode] (n11) at ([xshift=5em,yshift=0em]n10.east) {};
 \node [anchor=north,align=center,font=\footnotesize] (n111) at ([xshift=0em,yshift=-0.2em]n11.north) {{\small 符号合并表}\\(e,s)\\(es,t)};

--- a/Chapter13/chapter13.tex
+++ b/Chapter13/chapter13.tex
@@ -109,7 +109,7 @@

 \parinterval 字节对编码或双字节编码（BPE）是一种常用的子词词表构建方法。BPE方法最早用于数据压缩，该方法将数据中常见的连续字符串替换为一个不存在的字符，之后通过构建一个替换关系的对应表，对压缩后的数据进行还原\upcite{Gage1994ANA}。机器翻译借用了这种思想，把子词切分看作是学习对自然语言句子进行压缩编码表示的问题\upcite{DBLP:conf/acl/SennrichHB16a}。其目的是，保证编码（即子词切分）后的结果占用的字节尽可能少。这样，子词单元会尽可能被不同单词复用，同时又不会因为使用过小的单元造成子词切分序列过长。

-\parinterval 使用BPE算法进行子词切分包含两个步骤。首先，通过统计的方法构造符号合并表，图\ref{fig:13-3}给出了BPE算法中符号合并表的构造过程。在得到了符号合并表后，使用符号合并表对用字符表示的单词进行合并，得到以子词形式表示的文本。BPE算法最开始将单词切分为以字符表示的符号序列，并在尾部加上终结符（为了便于理解，图\ref{fig:13-3}中没有包含终结符）。然后按照符号合并表的顺序依次遍历，找到存在于符号合并表的2-gram符号组合，则对其进行合并，直至遍历结束。图\ref{fig:13-4}给出了一个使用字符合并表对单词进行子词切分的实例。红色单元为每次合并后得到的新符号，直至无法合并，或遍历结束，得到最终的合并结果。其中每一个单元为一个子词。
+\parinterval 使用BPE算法进行子词切分包含两个步骤。首先，通过统计的方法构造符号合并表，具体的方式为：先对分过词的文本进行统计，得到词表和词频，同时将词表中的单词分割为字符表示；其次统计词表中所有出现的二元组的频次，选择当前频次最高的二元组加入符号合并表，并将所有词表中出现的该二元组合并为一个单元；不断地重复上述过程，直到合并表的大小达到预先设定的大小，或者无法继续合并。图\ref{fig:13-4}给出了一个使用字符合并表对单词进行子词切分的实例。红色单元为每次合并后得到的新符号，直至无法合并，或遍历结束，得到最终的合并结果。其中每一个单元为一个子词。

 %----------------------------------------------
 \begin{figure}[htp]

--- a/Chapter15/Figures/figure-relative-position-weight.tex
+++ b/Chapter15/Figures/figure-relative-position-weight.tex
 \begin{tikzpicture}

-\tikzstyle{node1} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=green!80]
-\tikzstyle{node2} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=green!50]
-\tikzstyle{node3} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=green!20]
+\tikzstyle{node1} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=ugreen!80]
+\tikzstyle{node2} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=ugreen!50]
+\tikzstyle{node3} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=ugreen!20]
 \tikzstyle{node4} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt]
-\tikzstyle{node5} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=red!20]
-\tikzstyle{node6} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=red!50]
-\tikzstyle{node7} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=red!80]
+\tikzstyle{node5} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=teal!20]
+\tikzstyle{node6} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=teal!50]
+\tikzstyle{node7} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=teal!80]

 \begin{scope}[scale=1.0]
 \foreach \i / \j / \k / \z in

--- a/Chapter15/chapter15.tex
+++ b/Chapter15/chapter15.tex
@@ -113,7 +113,7 @@
 \begin{figure}[htp]
 \centering
 \input{./Chapter15/Figures/figure-relative-position-weight}
-\caption{相对位置权重$\mathbi{a}_{ij}$}
+\caption{相对位置权重$\mathbi{a}_{ij}$\upcite{DBLP:conf/emnlp/HuangLXX20}}
 \label{fig:15-2}
 \end{figure}
 %-------------------------------------------
@@ -168,11 +168,7 @@ A_{ij}^{\rm rel} &=& \underbrace{\mathbi{E}_{x_i}\mathbi{W}_Q\mathbi{W}_{K}^{T}\

 \begin{itemize}
 \vspace{0.5em}
-\item {\small\bfnew{引入高斯约束}}\upcite{Yang2018ModelingLF}。如图\ref{fig:15-3}所示，这类方法的核心思想是引入可学习的高斯分布$\mathbi{G}$作为局部约束，与注意力权重进行融合，具体的形式如下：
-\begin{eqnarray}
-\mathbi{e}_{ij} &=& \frac{(\mathbi{x}_i \mathbi{W}_Q){(\mathbi{x}_j \mathbi{W}_K)}^{T}}{\sqrt{d_k}} + \mathbi{G}
-\label{eq:15-15}
-\end{eqnarray}
+\item {\small\bfnew{引入高斯约束}}\upcite{Yang2018ModelingLF}。如图\ref{fig:15-3}所示，这类方法的核心思想是引入可学习的高斯分布$\mathbi{G}$作为局部约束，与注意力权重进行融合。

 %----------------------------------------------
 \begin{figure}[htp]
@@ -183,6 +179,12 @@ A_{ij}^{\rm rel} &=& \underbrace{\mathbi{E}_{x_i}\mathbi{W}_Q\mathbi{W}_{K}^{T}\
 \end{figure}
 %-------------------------------------------

+\noindent 具体的形式如下：
+\begin{eqnarray}
+\mathbi{e}_{ij} &=& \frac{(\mathbi{x}_i \mathbi{W}_Q){(\mathbi{x}_j \mathbi{W}_K)}^{T}}{\sqrt{d_k}} + \mathbi{G}
+\label{eq:15-15}
+\end{eqnarray}
+
 \noindent 其中，$\mathbi{G} \in \mathbb{R}^{m\times m}$。$\mathbi{G}$中的每个元素$G_{ij}$表示位置$j$和预测的中心位置$P_i$之间的关联程度，计算公式如下：
 \begin{eqnarray}
 G_{ij} &=& - \frac{{(j - P_i)}^2}{2\sigma_i^2}
@@ -719,7 +721,6 @@ C(\mathbi{x}_j \mathbi{W}_K,\omega) &=& (\mathbi{x}_{j-\omega},\ldots,\mathbi{x}
 {\bm \omega_{l+1}} &=& \sqrt{\sum_{j<l}\textrm{Var}[F_{l+1}(\mathbi{x}_l)]}
 \label{eq:15-48}
 \end{eqnarray}
-\vspace{0.5em}
 \end{itemize}

 \parinterval 这种动态的参数初始化方法不受限于具体的模型结构，方法有较好的通用性。
@@ -937,7 +938,7 @@ lr &=& d_{\textrm{model}}^{-0.5}\cdot step\_num^{-0.5}

 \noindent 可以翻译成：
 \begin{equation}
-\textrm{“私は緑茶を飲んでいます。”} （{\color{red} 日语单词没有切分？？？可以问一下张妍}）\nonumber
+\textrm{“私/は/緑茶/を/飲んでいます。”} \nonumber
 \end{equation}

 \parinterval 在标准的英语到日语的翻译中，英语短语“a cup of green tea”只会被翻译为“緑茶”一词。在加入句法树后，“a cup of green tea”会作为树中一个节点，这样可以更容易把英语短语作为一个整体进行翻译。

--- a/bibliography.bib
+++ b/bibliography.bib
@@ -9345,6 +9345,16 @@ author    = {Zhuang Liu and
  publisher={arXiv preprint arXiv:2002.11794},
  year={2020}
 }
+@inproceedings{DBLP:conf/emnlp/HuangLXX20,
+  author    = {Zhiheng Huang and
+               Davis Liang and
+               Peng Xu and
+               Bing Xiang},
+  title     = {Improve Transformer Models with Better Relative Position Embeddings},
+  pages     = {3327--3335},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2020}
+}
 %%%%% chapter 15------------------------------------------------------
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%