合并分支 'caorunzhe' 到 'master'

Caorunzhe 查看合并请求 !940

合并分支 'caorunzhe' 到 'master'
Caorunzhe 查看合并请求 !940
0f75ac50 · 曹润柘 · 78030d08 · 66284ba4 · 0f75ac50 · 0f75ac50
Commit 0f75ac50 authored Jan 18, 2021 by 曹润柘
--- a/Chapter13/Figures/figure-bpe.tex
+++ b/Chapter13/Figures/figure-bpe.tex
@@ -11,11 +11,11 @@
 \node [anchor=west,tnode] (n3) at ([xshift=5em,yshift=0em]n2.east) {};
 \node [anchor=north west,align=left,font=\footnotesize] (n31) at ([xshift=0.2em,yshift=-0.2em]n3.north west) {{\small 词表}\\low\\lower\\newest\\widest\\$\ldots$};
-\node [anchor=north east,align=center,font=\footnotesize] (n32) at ([xshift=-0.2em,yshift=-0.2em]n3.north east) {{\small 频率}\\6\\2\\6\\3\\$\ldots$};
+\node [anchor=north east,align=center,font=\footnotesize] (n32) at ([xshift=-0.2em,yshift=-0.2em]n3.north east) {{\small 频次}\\6\\2\\6\\3\\$\ldots$};
 \node [anchor=west,tnode] (n4) at ([xshift=5em,yshift=0em]n3.east) {};
 \node [anchor=north west,align=left,font=\footnotesize] (n41) at ([xshift=0.2em,yshift=-0.2em]n4.north west) {{\small 词表}\\l/o/w\\l/o/w/e/r\\n/e/w/e/s/t\\w/i/d/e/s/t\\$\ldots$};
-\node [anchor=north east,align=center,font=\footnotesize] (n42) at ([xshift=-0.2em,yshift=-0.2em]n4.north east) {{\small 频率}\\6\\2\\6\\3\\$\ldots$};
+\node [anchor=north east,align=center,font=\footnotesize] (n42) at ([xshift=-0.2em,yshift=-0.2em]n4.north east) {{\small 频次}\\6\\2\\6\\3\\$\ldots$};
 \begin{pgfonlayer}{background}
@@ -32,13 +32,13 @@
 \node [anchor=north west,tnode] (n6) at ([xshift=0em,yshift=-0.5em]n5.south west) {};
 \node [anchor=north west,align=left,font=\footnotesize] (n61) at ([xshift=0.2em,yshift=-0.2em]n6.north west) {{\small 词表}\\l/o/w\\l/o/w/e/r\\n/e/w/e/s/t\\w/i/d/e/s/t\\$\ldots$};
-\node [anchor=north east,align=center,font=\footnotesize] (n62) at ([xshift=-0.2em,yshift=-0.2em]n6.north east) {{\small 频率}\\6\\2\\6\\3\\$\ldots$};
+\node [anchor=north east,align=center,font=\footnotesize] (n62) at ([xshift=-0.2em,yshift=-0.2em]n6.north east) {{\small 频次}\\6\\2\\6\\3\\$\ldots$};
 \draw [->,thick,ublue] ([xshift=-0em,yshift=-0em]n4.south) .. controls +(south:4em) and +(north:4em) .. ([xshift=1em,yshift=-0em]n6.north);
 \node [anchor=west,pnode] (n7) at ([xshift=5em,yshift=0em]n6.east) {};
 \node [anchor=north west,align=left,font=\footnotesize] (n71) at ([xshift=0.2em,yshift=-0.2em]n7.north west) {{\small 二元组}\\(e,s)\\(s,t)\\(l,o)\\(o,w)\\$\ldots$};
-\node [anchor=north east,align=center,font=\footnotesize] (n72) at ([xshift=-0.2em,yshift=-0.2em]n7.north east) {{\small 频率}\\9\\9\\8\\8\\$\ldots$};
+\node [anchor=north east,align=center,font=\footnotesize] (n72) at ([xshift=-0.2em,yshift=-0.2em]n7.north east) {{\small 频次}\\9\\9\\8\\8\\$\ldots$};
 \node [anchor=west,mnode] (n8) at ([xshift=5em,yshift=0em]n7.east) {};
 \node [anchor=north,align=center,font=\footnotesize] (n81) at ([xshift=0em,yshift=-0.2em]n8.north) {{\small 符号合并表}\\(e,s)};
@@ -51,14 +51,14 @@
 %第三排
 \node [anchor=north,tnode] (n9) at ([xshift=0em,yshift=-2.5em]n6.south) {};
 \node [anchor=north west,align=left,font=\footnotesize] (n91) at ([xshift=0.2em,yshift=-0.2em]n9.north west) {{\small 词表}\\l/o/w\\l/o/w/e/r\\n/e/w/{\red es}/t\\w/i/d/{\red es}/t\\$\ldots$};
-\node [anchor=north east,align=center,font=\footnotesize] (n92) at ([xshift=-0.2em,yshift=-0.2em]n9.north east) {{\small 频率}\\6\\2\\6\\3\\$\ldots$};
+\node [anchor=north east,align=center,font=\footnotesize] (n92) at ([xshift=-0.2em,yshift=-0.2em]n9.north east) {{\small 频次}\\6\\2\\6\\3\\$\ldots$};
 \draw [->,thick,ublue] ([xshift=-0em,yshift=-0em]n8.south) .. controls +(south:3em) and +(north:3em) .. ([xshift=1em,yshift=-0em]n9.north);
 \node [anchor=north west,ublue,font=\footnotesize,align=left] (l1) at ([xshift=1em,yshift=-0em]n7.south east) {在词表中\\[0.8ex]合并(e,s)};
 \node [anchor=west,pnode] (n10) at ([xshift=5em,yshift=0em]n9.east) {};
 \node [anchor=north west,align=left,font=\footnotesize] (n101) at ([xshift=0.2em,yshift=-0.2em]n10.north west) {{\small 二元组}\\(es,t)\\(l,o)\\(o,w)\\(n,e)\\$\ldots$};
-\node [anchor=north east,align=center,font=\footnotesize] (n102) at ([xshift=-0.2em,yshift=-0.2em]n10.north east) {{\small 频率}\\9\\8\\8\\6\\$\ldots$};
+\node [anchor=north east,align=center,font=\footnotesize] (n102) at ([xshift=-0.2em,yshift=-0.2em]n10.north east) {{\small 频次}\\9\\8\\8\\6\\$\ldots$};
 \node [anchor=west,mnode] (n11) at ([xshift=5em,yshift=0em]n10.east) {};
 \node [anchor=north,align=center,font=\footnotesize] (n111) at ([xshift=0em,yshift=-0.2em]n11.north) {{\small 符号合并表}\\(e,s)\\(es,t)};

--- a/Chapter13/chapter13.tex
+++ b/Chapter13/chapter13.tex
@@ -109,7 +109,7 @@
 \parinterval 字节对编码或双字节编码（BPE）是一种常用的子词词表构建方法。BPE方法最早用于数据压缩，该方法将数据中常见的连续字符串替换为一个不存在的字符，之后通过构建一个替换关系的对应表，对压缩后的数据进行还原\upcite{Gage1994ANA}。机器翻译借用了这种思想，把子词切分看作是学习对自然语言句子进行压缩编码表示的问题\upcite{DBLP:conf/acl/SennrichHB16a}。其目的是，保证编码（即子词切分）后的结果占用的字节尽可能少。这样，子词单元会尽可能被不同单词复用，同时又不会因为使用过小的单元造成子词切分序列过长。
-\parinterval 使用BPE算法进行子词切分包含两个步骤。首先，通过统计的方法构造符号合并表，图\ref{fig:13-3}给出了BPE算法中符号合并表的构造过程。在得到了符号合并表后，使用符号合并表对用字符表示的单词进行合并，得到以子词形式表示的文本。BPE算法最开始将单词切分为以字符表示的符号序列，并在尾部加上终结符（为了便于理解，图\ref{fig:13-3}中没有包含终结符）。然后按照符号合并表的顺序依次遍历，找到存在于符号合并表的2-gram符号组合，则对其进行合并，直至遍历结束。图\ref{fig:13-4}给出了一个使用字符合并表对单词进行子词切分的实例。红色单元为每次合并后得到的新符号，直至无法合并，或遍历结束，得到最终的合并结果。其中每一个单元为一个子词。
+\parinterval 使用BPE算法进行子词切分包含两个步骤。首先，通过统计的方法构造符号合并表，具体的方式为：先对分过词的文本进行统计，得到词表和词频，同时将词表中的单词分割为字符表示；其次统计词表中所有出现的二元组的频次，选择当前频次最高的二元组加入符号合并表，并将所有词表中出现的该二元组合并为一个单元；不断地重复上述过程，直到合并表的大小达到预先设定的大小，或者无法继续合并。图\ref{fig:13-4}给出了一个使用字符合并表对单词进行子词切分的实例。红色单元为每次合并后得到的新符号，直至无法合并，或遍历结束，得到最终的合并结果。其中每一个单元为一个子词。
 %----------------------------------------------
 \begin{figure}[htp]

--- a/Chapter15/Figures/figure-relative-position-weight.tex
+++ b/Chapter15/Figures/figure-relative-position-weight.tex
 \begin{tikzpicture}
-\tikzstyle{node1} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=green!80]
+\tikzstyle{node1} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=ugreen!80]
-\tikzstyle{node2} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=green!50]
+\tikzstyle{node2} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=ugreen!50]
-\tikzstyle{node3} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=green!20]
+\tikzstyle{node3} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=ugreen!20]
 \tikzstyle{node4} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt]
-\tikzstyle{node5} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=red!20]
+\tikzstyle{node5} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=teal!20]
-\tikzstyle{node6} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=red!50]
+\tikzstyle{node6} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=teal!50]
-\tikzstyle{node7} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=red!80]
+\tikzstyle{node7} = [anchor=center,draw,minimum height=2em,minimum width=2em,inner sep=0pt,fill=teal!80]
 \begin{scope}[scale=1.0]
 \foreach \i / \j / \k / \z in

--- a/Chapter15/chapter15.tex
+++ b/Chapter15/chapter15.tex
@@ -113,7 +113,7 @@
 \begin{figure}[htp]
 \centering
 \input{./Chapter15/Figures/figure-relative-position-weight}
-\caption{相对位置权重$\mathbi{a}_{ij}$}
+\caption{相对位置权重$\mathbi{a}_{ij}$\upcite{DBLP:conf/emnlp/HuangLXX20}}
 \label{fig:15-2}
 \end{figure}
 %-------------------------------------------
@@ -168,11 +168,7 @@ A_{ij}^{\rm rel} &=& \underbrace{\mathbi{E}_{x_i}\mathbi{W}_Q\mathbi{W}_{K}^{T}\
 \begin{itemize}
 \vspace{0.5em}
-\item {\small\bfnew{引入高斯约束}}\upcite{Yang2018ModelingLF}。如图\ref{fig:15-3}所示，这类方法的核心思想是引入可学习的高斯分布$\mathbi{G}$作为局部约束，与注意力权重进行融合，具体的形式如下：
+\item {\small\bfnew{引入高斯约束}}\upcite{Yang2018ModelingLF}。如图\ref{fig:15-3}所示，这类方法的核心思想是引入可学习的高斯分布$\mathbi{G}$作为局部约束，与注意力权重进行融合。
-\begin{eqnarray}
-\mathbi{e}_{ij} &=& \frac{(\mathbi{x}_i \mathbi{W}_Q){(\mathbi{x}_j \mathbi{W}_K)}^{T}}{\sqrt{d_k}} + \mathbi{G}
-\label{eq:15-15}
-\end{eqnarray}
 %----------------------------------------------
 \begin{figure}[htp]
@@ -183,6 +179,12 @@ A_{ij}^{\rm rel} &=& \underbrace{\mathbi{E}_{x_i}\mathbi{W}_Q\mathbi{W}_{K}^{T}\
 \end{figure}
 %-------------------------------------------
+\noindent 具体的形式如下：
+\begin{eqnarray}
+\mathbi{e}_{ij} &=& \frac{(\mathbi{x}_i \mathbi{W}_Q){(\mathbi{x}_j \mathbi{W}_K)}^{T}}{\sqrt{d_k}} + \mathbi{G}
+\label{eq:15-15}
+\end{eqnarray}
 \noindent 其中，$\mathbi{G} \in \mathbb{R}^{m\times m}$。$\mathbi{G}$中的每个元素$G_{ij}$表示位置$j$和预测的中心位置$P_i$之间的关联程度，计算公式如下：
 \begin{eqnarray}
 G_{ij} &=& - \frac{{(j - P_i)}^2}{2\sigma_i^2}
@@ -719,7 +721,6 @@ C(\mathbi{x}_j \mathbi{W}_K,\omega) &=& (\mathbi{x}_{j-\omega},\ldots,\mathbi{x}
 {\bm \omega_{l+1}} &=& \sqrt{\sum_{j<l}\textrm{Var}[F_{l+1}(\mathbi{x}_l)]}
 \label{eq:15-48}
 \end{eqnarray}
-\vspace{0.5em}
 \end{itemize}
 \parinterval 这种动态的参数初始化方法不受限于具体的模型结构，方法有较好的通用性。
@@ -937,7 +938,7 @@ lr &=& d_{\textrm{model}}^{-0.5}\cdot step\_num^{-0.5}
 \noindent 可以翻译成：
 \begin{equation}
-\textrm{“私は緑茶を飲んでいます。”} （{\color{red} 日语单词没有切分？？？可以问一下张妍}）\nonumber
+\textrm{“私/は/緑茶/を/飲んでいます。”} \nonumber
 \end{equation}
 \parinterval 在标准的英语到日语的翻译中，英语短语“a cup of green tea”只会被翻译为“緑茶”一词。在加入句法树后，“a cup of green tea”会作为树中一个节点，这样可以更容易把英语短语作为一个整体进行翻译。

--- a/Chapter17/chapter17.tex
+++ b/Chapter17/chapter17.tex
@@ -75,7 +75,7 @@
 \parinterval 经过上面的描述可以看出，音频的表示实际上是一个非常长的采样点序列，这导致了直接使用现有的深度学习技术处理音频序列较为困难。并且，原始的音频信号中可能包含着较多的噪声、环境声或冗余信息，也会对模型产生干扰。因此，一般会对音频序列进行处理来提取声学特征，具体为将长序列的采样点序列转换为短序列的特征向量序列，再用于下游系统。虽然已有一些工作不依赖特征提取，直接在原始的采样点序列上进行声学建模和模型训练\upcite{DBLP:conf/interspeech/SainathWSWV15}，但目前的主流方法仍然是基于声学特征进行建模\upcite{DBLP:conf/icassp/MohamedHP12}。
-\parinterval 声学特征提取的第一步是预处理。其流程主要是对音频进行预加重、分帧和加窗。预加重是通过增强音频信号中的高频部分来减弱语音中对高频信号的抑制，使频谱更加顺滑。分帧（原理如图\ref{fig:17-3}所示）是基于短时平稳假设，即根据生物学特征，语音信号是一个缓慢变化的过程，10ms$\thicksim$30ms的信号片段是相对平稳的。基于这个假设，一般将每25ms作为一帧来提取特征，这个时间称为{\small\bfnew{帧长}}\index{帧长}（Frame Length）\index{Frame Length}。同时，为了保证不同帧之间的信号平滑性，使每两个相邻帧之间存在一定的重合部分。一般每隔10ms取一帧，这个时长称为{\small\bfnew{帧移}}\index{帧移}（Frame Shift）\index{Frame Shift}。为了缓解分帧带来的频谱泄漏问题，需要对每帧的信号进行加窗处理使其幅度在两段渐变到0，一般采用的是{\small\bfnew{汉明窗}}\index{汉明窗}（Hamming）\index{Hamming}\upcite{洪青阳2020语音识别原理与应用}。
+\parinterval 声学特征提取的第一步是预处理。其流程主要是对音频进行{\small\bfnew{预加重}}（Pre-emphasis）\index{预加重}\index{Pre-emphasis}、{\small\bfnew{分帧}}\index{分帧}（Framing）\index{Framing}和{\small\bfnew{加窗}}\index{加窗}（Windowing）\index{Windowing}。预加重是通过增强音频信号中的高频部分来减弱语音中对高频信号的抑制，使频谱更加顺滑。分帧（原理如图\ref{fig:17-3}所示）是基于短时平稳假设，即根据生物学特征，语音信号是一个缓慢变化的过程，10ms$\thicksim$30ms的信号片段是相对平稳的。基于这个假设，一般将每25ms作为一帧来提取特征，这个时间称为{\small\bfnew{帧长}}\index{帧长}（Frame Length）\index{Frame Length}。同时，为了保证不同帧之间的信号平滑性，使每两个相邻帧之间存在一定的重合部分。一般每隔10ms取一帧，这个时长称为{\small\bfnew{帧移}}\index{帧移}（Frame Shift）\index{Frame Shift}。为了缓解分帧带来的频谱泄漏问题，需要对每帧的信号进行加窗处理使其幅度在两段渐变到0，一般采用的是{\small\bfnew{汉明窗}}\index{汉明窗}（Hamming）\index{Hamming}\upcite{洪青阳2020语音识别原理与应用}。
 %----------------------------------------------------------------------------------------------------
 \begin{figure}[htp]
 \centering

--- a/bibliography.bib
+++ b/bibliography.bib
@@ -9345,6 +9345,16 @@ author    = {Zhuang Liu and
  publisher={arXiv preprint arXiv:2002.11794},
  year={2020}
 }
+@inproceedings{DBLP:conf/emnlp/HuangLXX20,
+  author    = {Zhiheng Huang and
+               Davis Liang and
+               Peng Xu and
+               Bing Xiang},
+  title     = {Improve Transformer Models with Better Relative Position Embeddings},
+  pages     = {3327--3335},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2020}
+}
 %%%%% chapter 15------------------------------------------------------
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

--- a/structure.tex
+++ b/structure.tex
@@ -628,6 +628,8 @@ addtohook={%
 %\usetikzlibrary{arrows}
 %\usetikzlibrary{decorations}
 \usetikzlibrary{arrows,shapes}
+\usepackage{xeCJK}
+\newfontfamily{\yh}{微软雅黑}
 %%%%%%%%%%%chapter5图片等---------------------------------------
 \usepackage{tikz-3dplot}