Commit a3a098db by 孟霞

合并分支 'master' 到 'mengxia'

Master

查看合并请求 !96
parents 317d5605 b1ec1728
......@@ -725,7 +725,9 @@ His house is on the south bank of the river.
\parinterval 《Neural Network Methods for Natural Language Processing》\cite{goldberg2017neural}是Yoav Goldberg编写的面向自然语言处理的深度学习参考书。相比《Deep Learning》,该书聚焦在自然语言处理中的深度学习方法,内容更加易读,非常适合刚入门自然语言处理及深度学习应用的人员参考。
\parinterval 《机器学习》\cite{周志华2016机器学习}由南京大学教授周志华教授所著,作为机器学习领域入门教材,该书尽可能地涵盖了机器学习基础知识的各个方面,试图尽可能少地使用数学知识介绍机器学习方法与思想。在机器翻译中使用的很多机器学习概念和方法可以从该书中进行学习。
\parinterval 《机器学习》\cite{周志华2016机器学习}由南京大学教授周志华教授所著,作为机器学习领域入门教材,该书尽可能地涵盖了机器学习基础知识的各个方面,试图尽可能少地使用数学知识介绍机器学习方法与思想。
\parinterval 《统计学习方法》({\red 参考文献})由李航博士所著,该书对机器学习的有监督和无监督等方法进行了全面而系统的介绍。可以作为梳理机器学习的知识体系,同时了解相关基础概念的参考读物。
\parinterval 《神经网络与深度学习》\cite{邱锡鹏2020神经网络与深度学习}由复旦大学邱锡鹏教授所著,全面的介绍了神经网络和深度学习的基本概念和常用技术,同时涉及了许多深度学习的前沿方法。该书适合初学者阅读,同时又不失为一本面向专业人士的参考书。
......
......@@ -225,7 +225,7 @@
\parinterval 中文:今天\ \ 天气\ \ 不错\ \
\parinterval 英文:Let's\ \ go\ \ !
\parinterval 英文:Let's\ \ go\ \ !
\vspace{1em}
......@@ -521,7 +521,7 @@ y_{j}^{ls}=(1-\alpha) \cdot \tilde{y}_j + \alpha \cdot q
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Layer Dropout}
\parinterval 随时网络层数的增多,相互适应也会出现在不同层之间。特别是在引入残差网络之后,不同层的输出可以进行线性组合,因此不同层之间的相互影响用会更加直接。对于这个问题,也可以使用Dropout的思想对不同层进行屏蔽。比如,可以使用一个开关来控制一个层能否发挥作用,这个开关以概率$p$被随机关闭,即该层有为$p$的可能性不工作。图\ref{fig:7-15}展示了Transformer多层网络引入Layer Dropout前后的情况。可以看到,使用Layer Dropout后,开关M会被随机打开或者关闭,以达到屏蔽某一层计算的目的。由于使用了残差网络,关闭每一层相当于``跳过''这一层网络,因此Layer Dropout并不会影响神经网络中数据流的传递。
\parinterval 随时网络层数的增多,相互适应也会出现在不同层之间。特别是在引入残差网络之后,不同层的输出可以进行线性组合,因此不同层之间的相互影响用会更加直接。对于这个问题,也可以使用Dropout的思想对不同层进行屏蔽。比如,可以使用一个开关来控制一个层能否发挥作用,这个开关以概率$p$被随机关闭,即该层有为$p$的可能性不工作。图\ref{fig:7-15}展示了Transformer多层网络引入Layer Dropout 前后的情况。可以看到,使用Layer Dropout后,开关M会被随机打开或者关闭,以达到屏蔽某一层计算的目的。由于使用了残差网络,关闭每一层相当于``跳过''这一层网络,因此Layer Dropout并不会影响神经网络中数据流的传递。
%----------------------------------------------
% 图7.
......@@ -989,7 +989,7 @@ y_{j}^{ls}=(1-\alpha) \cdot \tilde{y}_j + \alpha \cdot q
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{长度惩罚因子}
\parinterval 最常用的方法是直接对翻译概率进行正规化,也就是用译文长度来归一化翻译概率。令源语言句子为$\mathbf{x}=\{ x_1, ...,x_m \}$,译文为$\mathbf{y}=\{ y_1,...,y_n\}$,于是翻译模型得分$\textrm{score}(\mathbf{x},\mathbf{y})$可以被定义为:
\parinterval 最常用的方法是直接对翻译概率进行正规化,也就是用译文长度来归一化翻译概率。第六章已经对长度归一化方法进行过介绍。为了保证内容的连贯性,这里再简单回顾一下相关内容。令源语言句子为$\mathbf{x}=\{ x_1, ...,x_m \}$,译文为$\mathbf{y}=\{ y_1,...,y_n\}$, 于是翻译模型得分$\textrm{score}(\mathbf{x},\mathbf{y})$可以被定义为:
\begin{eqnarray}
\textrm{score}(\mathbf{x},\mathbf{y}) = \textrm{log}(\textrm{P}(\mathbf{y} | \mathbf{x}))
\label{eq:7-8}
......@@ -1539,7 +1539,7 @@ p_l=\frac{l}{2L}\cdot \varphi
\parinterval 除了多任务学习,还有一些方法将前向模型和反向模型一起训练,在训练过程中同时使用源语言端和目标语言端的单语数据来提升模型性能,双向训练的内容会在\ref{subsection-7.5.4}节中进行介绍。
%--7.5.3 知识精炼---------------------
\subsection{知识精炼}
\subsection{知识精炼}
\label{subsection-7.5.3}
\parinterval 理想的机器翻译系统应该是品质好、速度块、存储占用少。不过现实的机器翻译系统往往需要用运行速度和存储空间来换取翻译品质,比如,\ref{subsection-7.3.2}节提到的增大模型容量的方法就是通过增加模型参数量来达到更好的函数拟合效果,但是这也导致系统变得更加笨拙。在很多场景下,这样的模型甚至无法使用。比如,Transformer-Big等``大''模型通常在专用GPU服务器上运行,在手机等受限环境下仍很难应用。
......
\indexentry{流畅度|hyperpage}{12}
\indexentry{Fluency|hyperpage}{12}
\indexentry{准确性|hyperpage}{12}
\indexentry{Accuracy|hyperpage}{12}
\indexentry{充分性|hyperpage}{12}
\indexentry{Adequacy|hyperpage}{12}
\indexentry{翻译候选|hyperpage}{13}
\indexentry{Translation Candidate|hyperpage}{13}
\indexentry{训练|hyperpage}{15}
\indexentry{Training|hyperpage}{15}
\indexentry{解码|hyperpage}{15}
\indexentry{Decoding|hyperpage}{15}
\indexentry{推断|hyperpage}{15}
\indexentry{Inference|hyperpage}{15}
\indexentry{词对齐|hyperpage}{20}
\indexentry{Word Alignment|hyperpage}{20}
\indexentry{词对齐连接|hyperpage}{20}
\indexentry{解码|hyperpage}{23}
\indexentry{Decoding|hyperpage}{23}
\indexentry{噪声信道模型|hyperpage}{26}
\indexentry{Noise Channel Model|hyperpage}{26}
\indexentry{词对齐|hyperpage}{29}
\indexentry{Word Alignment|hyperpage}{29}
\indexentry{非对称的词对齐|hyperpage}{29}
\indexentry{Asymmetric Word Alignment|hyperpage}{29}
\indexentry{空对齐|hyperpage}{29}
\indexentry{拉格朗日乘数法|hyperpage}{37}
\indexentry{The Lagrange Multiplier Method|hyperpage}{37}
\indexentry{期望最大化|hyperpage}{40}
\indexentry{Expectation Maximization|hyperpage}{40}
\indexentry{期望频次|hyperpage}{40}
\indexentry{Expected Count|hyperpage}{41}
\indexentry{产出率|hyperpage}{44}
\indexentry{繁衍率|hyperpage}{44}
\indexentry{Fertility|hyperpage}{44}
\indexentry{扭曲度|hyperpage}{46}
\indexentry{Distortion|hyperpage}{46}
\indexentry{概念单元|hyperpage}{48}
\indexentry{概念|hyperpage}{48}
\indexentry{Concept|hyperpage}{48}
\indexentry{缺陷|hyperpage}{49}
\indexentry{Deficiency|hyperpage}{49}
\indexentry{凸函数|hyperpage}{54}
\indexentry{Convex function|hyperpage}{54}
\indexentry{对称化|hyperpage}{55}
\indexentry{Symmetrization|hyperpage}{55}
\indexentry{系统偏置|hyperpage}{56}
\indexentry{System Bias|hyperpage}{56}
\boolfalse {citerequest}\boolfalse {citetracker}\boolfalse {pagetracker}\boolfalse {backtracker}\relax
\babel@toc {english}{}
\defcounter {refsection}{0}\relax
\select@language {english}
\defcounter {refsection}{0}\relax
\contentsline {part}{\@mypartnumtocformat {I}{机器翻译基础}}{15}{part.1}
\contentsline {part}{\@mypartnumtocformat {I}{统计机器翻译}}{9}{part.1}%
\ttl@starttoc {default@1}
\defcounter {refsection}{0}\relax
\contentsline {chapter}{\numberline {1}机器翻译简介}{17}{chapter.1}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {1.1}机器翻译的概念}{17}{section.1.1}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {1.2}机器翻译简史}{20}{section.1.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {1.2.1}人工翻译}{20}{subsection.1.2.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {1.2.2}机器翻译的萌芽}{21}{subsection.1.2.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {1.2.3}机器翻译的受挫}{22}{subsection.1.2.3}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {1.2.4}机器翻译的快速成长}{23}{subsection.1.2.4}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {1.2.5}机器翻译的爆发}{24}{subsection.1.2.5}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {1.3}机器翻译现状}{25}{section.1.3}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {1.4}机器翻译方法}{26}{section.1.4}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {1.4.1}基于规则的机器翻译}{28}{subsection.1.4.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {1.4.2}基于实例的机器翻译}{28}{subsection.1.4.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {1.4.3}统计机器翻译}{29}{subsection.1.4.3}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {1.4.4}神经机器翻译}{30}{subsection.1.4.4}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {1.4.5}对比分析}{31}{subsection.1.4.5}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {1.5}翻译质量评价}{32}{section.1.5}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {1.5.1}人工评价}{32}{subsection.1.5.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {1.5.2}自动评价}{33}{subsection.1.5.2}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{BLEU}{33}{section*.16}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{TER}{35}{section*.17}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{基于检测点的评价}{35}{section*.18}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {1.6}机器翻译应用}{36}{section.1.6}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {1.7}开源项目与评测}{38}{section.1.7}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {1.7.1}开源机器翻译系统}{38}{subsection.1.7.1}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{统计机器翻译开源系统}{39}{section*.20}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{神经机器翻译开源系统}{40}{section*.21}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {1.7.2}常用数据集及公开评测任务}{42}{subsection.1.7.2}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {1.8}推荐学习资源}{44}{section.1.8}
\defcounter {refsection}{0}\relax
\contentsline {chapter}{\numberline {2}词法、语法及统计建模基础}{49}{chapter.2}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {2.1}问题概述 }{50}{section.2.1}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {2.2}概率论基础}{51}{section.2.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {2.2.1}随机变量和概率}{52}{subsection.2.2.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {2.2.2}联合概率、条件概率和边缘概率}{53}{subsection.2.2.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {2.2.3}链式法则}{54}{subsection.2.2.3}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {2.2.4}贝叶斯法则}{55}{subsection.2.2.4}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {2.2.5}KL距离和熵}{57}{subsection.2.2.5}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{信息熵}{57}{section*.28}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{KL距离}{58}{section*.30}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{交叉熵}{58}{section*.31}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {2.3}中文分词}{59}{section.2.3}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {2.3.1}基于词典的分词方法}{60}{subsection.2.3.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {2.3.2}基于统计的分词方法}{61}{subsection.2.3.2}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{统计模型的学习与推断}{61}{section*.35}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{掷骰子游戏}{62}{section*.37}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{全概率分词方法}{64}{section*.41}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {2.4}$n$-gram语言模型 }{66}{section.2.4}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {2.4.1}建模}{67}{subsection.2.4.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {2.4.2}未登录词和平滑算法}{69}{subsection.2.4.2}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{加法平滑方法}{70}{section*.47}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{古德-图灵估计法}{71}{section*.49}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{Kneser-Ney平滑方法}{72}{section*.51}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {2.5}句法分析(短语结构分析)}{74}{section.2.5}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {2.5.1}句子的句法树表示}{74}{subsection.2.5.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {2.5.2}上下文无关文法}{76}{subsection.2.5.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {2.5.3}规则和推导的概率}{80}{subsection.2.5.3}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {2.6}小结及深入阅读}{82}{section.2.6}
\defcounter {refsection}{0}\relax
\contentsline {part}{\@mypartnumtocformat {II}{统计机器翻译}}{85}{part.2}
\ttl@stoptoc {default@1}
\ttl@starttoc {default@2}
\defcounter {refsection}{0}\relax
\contentsline {chapter}{\numberline {3}基于词的机器翻译模型}{87}{chapter.3}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {3.1}什么是基于词的翻译模型}{87}{section.3.1}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {3.2}构建一个简单的机器翻译系统}{89}{section.3.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {3.2.1}如何进行翻译?}{89}{subsection.3.2.1}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{机器翻译流程}{90}{section*.64}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{人工翻译 vs. 机器翻译}{91}{section*.66}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {3.2.2}基本框架}{91}{subsection.3.2.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {3.2.3}单词翻译概率}{92}{subsection.3.2.3}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{什么是单词翻译概率?}{92}{section*.68}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{如何从一个双语平行数据中学习?}{92}{section*.70}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{如何从大量的双语平行数据中学习?}{94}{section*.71}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {3.2.4}句子级翻译模型}{95}{subsection.3.2.4}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{基础模型}{95}{section*.73}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{生成流畅的译文}{97}{section*.75}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {3.2.5}解码}{99}{subsection.3.2.5}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {3.3}基于词的翻译建模}{102}{section.3.3}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {3.3.1}噪声信道模型}{102}{subsection.3.3.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {3.3.2}统计机器翻译的三个基本问题}{104}{subsection.3.3.2}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{词对齐}{105}{section*.84}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{基于词对齐的翻译模型}{105}{section*.87}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{基于词对齐的翻译实例}{107}{section*.89}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {3.4}IBM模型1-2}{108}{section.3.4}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {3.4.1}IBM模型1}{108}{subsection.3.4.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {3.4.2}IBM模型2}{110}{subsection.3.4.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {3.4.3}解码及计算优化}{111}{subsection.3.4.3}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {3.4.4}训练}{112}{subsection.3.4.4}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{目标函数}{112}{section*.94}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{优化}{113}{section*.96}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {3.5}IBM模型3-5及隐马尔可夫模型}{119}{section.3.5}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {3.5.1}基于产出率的翻译模型}{119}{subsection.3.5.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {3.5.2}IBM 模型3}{122}{subsection.3.5.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {3.5.3}IBM 模型4}{123}{subsection.3.5.3}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {3.5.4} IBM 模型5}{125}{subsection.3.5.4}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {3.5.5}隐马尔可夫模型}{126}{subsection.3.5.5}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{隐马尔可夫模型}{127}{section*.108}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{词对齐模型}{128}{section*.110}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {3.5.6}解码和训练}{129}{subsection.3.5.6}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {3.6}问题分析}{129}{section.3.6}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {3.6.1}词对齐及对称化}{129}{subsection.3.6.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {3.6.2}Deficiency}{130}{subsection.3.6.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {3.6.3}句子长度}{131}{subsection.3.6.3}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {3.6.4}其他问题}{132}{subsection.3.6.4}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {3.7}小结及深入阅读}{132}{section.3.7}
\defcounter {refsection}{0}\relax
\contentsline {chapter}{\numberline {4}基于短语和句法的机器翻译模型}{135}{chapter.4}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {4.1}翻译中的结构信息}{135}{section.4.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {4.1.1}更大粒度的翻译单元}{136}{subsection.4.1.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {4.1.2}句子的结构信息}{138}{subsection.4.1.2}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {4.2}基于短语的翻译模型}{140}{section.4.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {4.2.1}机器翻译中的短语}{140}{subsection.4.2.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {4.2.2}数学建模及判别式模型}{143}{subsection.4.2.2}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{基于翻译推导的建模}{143}{section*.122}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{对数线性模型}{144}{section*.123}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{搭建模型的基本流程}{145}{section*.124}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {4.2.3}短语抽取}{146}{subsection.4.2.3}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{与词对齐一致的短语}{147}{section*.127}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{获取词对齐}{148}{section*.131}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{度量双语短语质量}{149}{section*.133}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {4.2.4}调序}{150}{subsection.4.2.4}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{基于距离的调序}{150}{section*.137}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{基于方向的调序}{151}{section*.139}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{基于分类的调序}{153}{section*.142}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {4.2.5}特征}{153}{subsection.4.2.5}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {4.2.6}最小错误率训练}{154}{subsection.4.2.6}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {4.2.7}栈解码}{157}{subsection.4.2.7}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{翻译候选匹配}{158}{section*.147}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{翻译假设扩展}{158}{section*.149}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{剪枝}{159}{section*.151}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{解码中的栈结构}{161}{section*.153}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {4.3}基于层次短语的模型}{162}{section.4.3}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {4.3.1}同步上下文无关文法}{165}{subsection.4.3.1}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{文法定义}{165}{section*.158}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{推导}{166}{section*.159}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{胶水规则}{167}{section*.160}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{处理流程}{168}{section*.161}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {4.3.2}层次短语规则抽取}{168}{subsection.4.3.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {4.3.3}翻译模型及特征}{170}{subsection.4.3.3}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {4.3.4}CYK解码}{171}{subsection.4.3.4}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {4.3.5}立方剪枝}{174}{subsection.4.3.5}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {4.4}基于语言学句法的模型}{177}{section.4.4}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {4.4.1}基于句法的翻译模型分类}{179}{subsection.4.4.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {4.4.2}基于树结构的文法}{179}{subsection.4.4.2}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{树到树翻译规则}{181}{section*.177}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{基于树结构的翻译推导}{183}{section*.179}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{树到串翻译规则}{185}{section*.182}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {4.4.3}树到串翻译规则抽取}{186}{subsection.4.4.3}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{树的切割与最小规则}{187}{section*.184}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{空对齐处理}{190}{section*.190}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{组合规则}{191}{section*.192}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{SPMT规则}{192}{section*.194}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{句法树二叉化}{193}{section*.196}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {4.4.4}树到树翻译规则抽取}{194}{subsection.4.4.4}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{基于节点对齐的规则抽取}{195}{section*.200}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{基于对齐矩阵的规则抽取}{196}{section*.203}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {4.4.5}句法翻译模型的特征}{198}{subsection.4.4.5}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {4.4.6}基于超图的推导空间表示}{199}{subsection.4.4.6}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {4.4.7}基于树的解码 vs 基于串的解码}{201}{subsection.4.4.7}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{基于树的解码}{203}{section*.210}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{基于串的解码}{204}{section*.213}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {4.5}小结及深入阅读}{206}{section.4.5}
\defcounter {refsection}{0}\relax
\contentsline {part}{\@mypartnumtocformat {III}{神经机器翻译}}{209}{part.3}
\ttl@stoptoc {default@2}
\ttl@starttoc {default@3}
\defcounter {refsection}{0}\relax
\contentsline {chapter}{\numberline {5}人工神经网络和神经语言建模}{211}{chapter.5}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {5.1}深度学习与人工神经网络}{212}{section.5.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {5.1.1}发展简史}{212}{subsection.5.1.1}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{早期的人工神经网络和第一次寒冬}{212}{section*.215}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{神经网络的第二次高潮和第二次寒冬}{213}{section*.216}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{深度学习和神经网络方法的崛起}{214}{section*.217}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {5.1.2}为什么需要深度学习}{215}{subsection.5.1.2}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{端到端学习和表示学习}{215}{section*.219}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{深度学习的效果}{216}{section*.221}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {5.2}神经网络基础}{216}{section.5.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {5.2.1}线性代数基础}{216}{subsection.5.2.1}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{标量、向量和矩阵}{217}{section*.223}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{矩阵的转置}{218}{section*.224}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{矩阵加法和数乘}{218}{section*.225}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{矩阵乘法和矩阵点乘}{219}{section*.226}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{线性映射}{220}{section*.227}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{范数}{221}{section*.228}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {5.2.2}人工神经元和感知机}{222}{subsection.5.2.2}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{感知机\ \raisebox {0.5mm}{------}\ 最简单的人工神经元模型}{223}{section*.231}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{神经元内部权重}{224}{section*.234}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{神经元的输入\ \raisebox {0.5mm}{------}\ 离散 vs 连续}{225}{section*.236}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{神经元内部的参数学习}{225}{section*.238}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {5.2.3}多层神经网络}{226}{subsection.5.2.3}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{线性变换和激活函数}{226}{section*.240}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{单层神经网络$\rightarrow $多层神经网络}{228}{section*.247}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {5.2.4}函数拟合能力}{229}{subsection.5.2.4}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {5.3}神经网络的张量实现}{233}{section.5.3}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {5.3.1} 张量及其计算}{234}{subsection.5.3.1}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{张量}{234}{section*.257}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{张量的矩阵乘法}{236}{section*.260}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{张量的单元操作}{237}{section*.262}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {5.3.2}张量的物理存储形式}{238}{subsection.5.3.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {5.3.3}使用开源框架实现张量计算}{238}{subsection.5.3.3}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {5.3.4}前向传播与计算图}{240}{subsection.5.3.4}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {5.3.5}神经网络实例}{243}{subsection.5.3.5}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {5.4}神经网络的参数训练}{244}{section.5.4}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {5.4.1}损失函数}{245}{subsection.5.4.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {5.4.2}基于梯度的参数优化}{245}{subsection.5.4.2}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{梯度下降}{246}{section*.280}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{梯度获取}{248}{section*.282}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{基于梯度的方法的变种和改进}{251}{section*.286}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {5.4.3}参数更新的并行化策略}{254}{subsection.5.4.3}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {5.4.4}梯度消失、梯度爆炸和稳定性训练}{256}{subsection.5.4.4}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{易于优化的激活函数}{256}{section*.289}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{梯度裁剪}{257}{section*.293}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{稳定性训练}{258}{section*.294}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {5.4.5}过拟合}{259}{subsection.5.4.5}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {5.4.6}反向传播}{260}{subsection.5.4.6}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{输出层的反向传播}{261}{section*.297}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{隐藏层的反向传播}{263}{section*.301}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{程序实现}{264}{section*.304}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {5.5}神经语言模型}{266}{section.5.5}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {5.5.1}基于神经网络的语言建模}{266}{subsection.5.5.1}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{基于前馈神经网络的语言模型}{267}{section*.307}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{基于循环神经网络的语言模型}{269}{section*.310}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{基于自注意力机制的语言模型}{270}{section*.312}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{语言模型的评价}{271}{section*.314}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {5.5.2}单词表示模型}{272}{subsection.5.5.2}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{One-hot编码}{272}{section*.315}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{分布式表示}{272}{section*.317}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {5.5.3}句子表示模型及预训练}{274}{subsection.5.5.3}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{简单的上下文表示模型}{274}{section*.321}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{ELMO模型}{276}{section*.324}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{GPT模型}{276}{section*.326}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{BERT模型}{277}{section*.328}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{为什么要预训练?}{278}{section*.330}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {5.6}小结及深入阅读}{279}{section.5.6}
\defcounter {refsection}{0}\relax
\contentsline {chapter}{\numberline {6}神经机器翻译模型}{281}{chapter.6}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {6.1}神经机器翻译的发展简史}{281}{section.6.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.1.1}神经机器翻译的起源}{283}{subsection.6.1.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.1.2}神经机器翻译的品质 }{285}{subsection.6.1.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.1.3}神经机器翻译的优势 }{288}{subsection.6.1.3}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {6.2}编码器-解码器框架}{290}{section.6.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.2.1}框架结构}{290}{subsection.6.2.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.2.2}表示学习}{291}{subsection.6.2.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.2.3}简单的运行实例}{292}{subsection.6.2.3}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.2.4}机器翻译范式的对比}{293}{subsection.6.2.4}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {6.3}基于循环神经网络的翻译模型及注意力机制}{294}{section.6.3}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.3.1}建模}{294}{subsection.6.3.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.3.2}输入(词嵌入)及输出(Softmax)}{298}{subsection.6.3.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.3.3}循环神经网络结构}{302}{subsection.6.3.3}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{循环神经单元(RNN)}{302}{section*.352}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{长短时记忆网络(LSTM)}{302}{section*.353}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{门控循环单元(GRU)}{304}{section*.356}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{双向模型}{306}{section*.358}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{多层循环神经网络}{306}{section*.360}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.3.4}注意力机制}{307}{subsection.6.3.4}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{翻译中的注意力机制}{308}{section*.363}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{上下文向量的计算}{309}{section*.366}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{注意力机制的解读}{312}{section*.371}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.3.5}训练}{314}{subsection.6.3.5}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{损失函数}{314}{section*.374}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{长参数初始化}{315}{section*.375}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{优化策略}{316}{section*.376}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{梯度裁剪}{316}{section*.378}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{学习率策略}{316}{section*.379}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{并行训练}{318}{section*.382}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.3.6}推断}{319}{subsection.6.3.6}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{贪婪搜索}{321}{section*.386}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{束搜索}{322}{section*.389}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{长度惩罚}{323}{section*.391}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.3.7}实例-GNMT}{324}{subsection.6.3.7}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {6.4}Transformer}{325}{section.6.4}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.4.1}自注意力模型}{327}{subsection.6.4.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.4.2}Transformer架构}{328}{subsection.6.4.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.4.3}位置编码}{330}{subsection.6.4.3}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.4.4}基于点乘的注意力机制}{333}{subsection.6.4.4}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.4.5}掩码操作}{335}{subsection.6.4.5}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.4.6}多头注意力}{336}{subsection.6.4.6}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.4.7}残差网络和层正则化}{337}{subsection.6.4.7}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.4.8}前馈全连接网络子层}{338}{subsection.6.4.8}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.4.9}训练}{339}{subsection.6.4.9}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.4.10}推断}{342}{subsection.6.4.10}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {6.5}序列到序列问题及应用}{342}{section.6.5}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.5.1}自动问答}{343}{subsection.6.5.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.5.2}自动文摘}{343}{subsection.6.5.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.5.3}文言文翻译}{344}{subsection.6.5.3}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.5.4}对联生成}{344}{subsection.6.5.4}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {6.5.5}古诗生成}{345}{subsection.6.5.5}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {6.6}小结及深入阅读}{346}{section.6.6}
\defcounter {refsection}{0}\relax
\contentsline {chapter}{\numberline {7}神经机器翻译实战 \ \raisebox {0.5mm}{------}\ 参加一次比赛}{349}{chapter.7}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {7.1}神经机器翻译并不简单}{349}{section.7.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {7.1.1}影响神经机器翻译性能的因素}{350}{subsection.7.1.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {7.1.2}搭建神经机器翻译系统的步骤 }{351}{subsection.7.1.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {7.1.3}架构选择 }{352}{subsection.7.1.3}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {7.2}数据处理}{352}{section.7.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {7.2.1}分词}{353}{subsection.7.2.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {7.2.2}标准化}{354}{subsection.7.2.2}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {7.2.3}数据清洗}{355}{subsection.7.2.3}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {7.2.4}子词切分}{357}{subsection.7.2.4}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{大词表和OOV问题}{358}{section*.429}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{子词}{358}{section*.431}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{双字节编码(BPE)}{359}{section*.433}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{其他方法}{362}{section*.436}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {7.3}建模与训练}{362}{section.7.3}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {7.3.1}正则化}{362}{subsection.7.3.1}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{L1/L2正则化}{364}{section*.438}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{标签平滑}{365}{section*.439}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{Dropout}{366}{section*.441}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{Layer Dropout}{367}{section*.444}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {7.3.2}增大模型容量}{368}{subsection.7.3.2}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{宽网络}{368}{section*.446}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{深网络}{369}{section*.448}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{增大输入层和输出层表示能力}{371}{section*.450}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{大模型的分布式计算}{371}{section*.451}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {7.3.3}大批量训练}{371}{subsection.7.3.3}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{为什么需要大批量训练}{372}{section*.452}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{如何构建批次}{373}{section*.455}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {7.4}推断}{374}{section.7.4}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {7.4.1}推断优化}{374}{subsection.7.4.1}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{推断系统的架构}{375}{section*.457}
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{自左向右推断 vs 自右向左推断}{376}{section*.459}
\contentsline {chapter}{\numberline {1}基于词的机器翻译模型}{11}{chapter.1}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{推断加速}{376}{section*.460}
\contentsline {section}{\numberline {1.1}什么是基于词的翻译模型}{11}{section.1.1}%
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {7.4.2}译文长度控制}{383}{subsection.7.4.2}
\contentsline {section}{\numberline {1.2}构建一个简单的机器翻译系统}{13}{section.1.2}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{长度惩罚因子}{384}{section*.466}
\contentsline {subsection}{\numberline {1.2.1}如何进行翻译?}{13}{subsection.1.2.1}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{译文长度范围约束}{385}{section*.468}
\contentsline {subsubsection}{机器翻译流程}{14}{section*.7}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{覆盖度模型}{385}{section*.469}
\contentsline {subsubsection}{人工翻译 vs. 机器翻译}{15}{section*.9}%
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {7.4.3}多模型集成}{386}{subsection.7.4.3}
\contentsline {subsection}{\numberline {1.2.2}基本框架}{15}{subsection.1.2.2}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{假设选择}{387}{section*.470}
\contentsline {subsection}{\numberline {1.2.3}单词翻译概率}{16}{subsection.1.2.3}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{局部预测融合}{388}{section*.472}
\contentsline {subsubsection}{什么是单词翻译概率?}{16}{section*.11}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{译文重组}{389}{section*.474}
\contentsline {subsubsection}{如何从一个双语平行数据中学习?}{17}{section*.13}%
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {7.5}进阶技术}{390}{section.7.5}
\contentsline {subsubsection}{如何从大量的双语平行数据中学习?}{18}{section*.14}%
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {7.5.1}深层模型}{390}{subsection.7.5.1}
\contentsline {subsection}{\numberline {1.2.4}句子级翻译模型}{19}{subsection.1.2.4}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{Post-Norm vs Pre-Norm}{391}{section*.477}
\contentsline {subsubsection}{基础模型}{19}{section*.16}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{层聚合}{393}{section*.480}
\contentsline {subsubsection}{生成流畅的译文}{21}{section*.18}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{深层模型的训练加速}{394}{section*.482}
\contentsline {subsection}{\numberline {1.2.5}解码}{23}{subsection.1.2.5}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{渐进式训练}{394}{section*.483}
\contentsline {section}{\numberline {1.3}基于词的翻译建模}{26}{section.1.3}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{分组稠密连接}{394}{section*.485}
\contentsline {subsection}{\numberline {1.3.1}噪声信道模型}{26}{subsection.1.3.1}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{学习率重置策略}{396}{section*.487}
\contentsline {subsection}{\numberline {1.3.2}统计机器翻译的三个基本问题}{28}{subsection.1.3.2}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{深层模型的鲁棒性训练}{397}{section*.489}
\contentsline {subsubsection}{词对齐}{29}{section*.27}%
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {7.5.2}单语数据的使用}{398}{subsection.7.5.2}
\contentsline {subsubsection}{基于词对齐的翻译模型}{30}{section*.30}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{伪数据}{399}{section*.492}
\contentsline {subsubsection}{基于词对齐的翻译实例}{31}{section*.32}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{预训练}{401}{section*.495}
\contentsline {section}{\numberline {1.4}IBM模型1-2}{32}{section.1.4}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{联合训练}{402}{section*.498}
\contentsline {subsection}{\numberline {1.4.1}IBM模型1}{32}{subsection.1.4.1}%
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {7.5.3}知识精炼}{403}{subsection.7.5.3}
\contentsline {subsection}{\numberline {1.4.2}IBM模型2}{34}{subsection.1.4.2}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{什么是知识精炼}{404}{section*.500}
\contentsline {subsection}{\numberline {1.4.3}解码及计算优化}{35}{subsection.1.4.3}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{知识精炼的基本方法}{405}{section*.501}
\contentsline {subsection}{\numberline {1.4.4}训练}{36}{subsection.1.4.4}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{机器翻译中的知识精炼}{406}{section*.503}
\contentsline {subsubsection}{目标函数}{36}{section*.37}%
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {7.5.4}双向训练}{407}{subsection.7.5.4}
\contentsline {subsubsection}{优化}{37}{section*.39}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{有监督对偶学习}{408}{section*.505}
\contentsline {section}{\numberline {1.5}IBM模型3-5及隐马尔可夫模型}{42}{section.1.5}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{无监督对偶学习}{409}{section*.506}
\contentsline {subsection}{\numberline {1.5.1}基于产出率的翻译模型}{44}{subsection.1.5.1}%
\defcounter {refsection}{0}\relax
\contentsline {subsubsection}{翻译中回译}{410}{section*.508}
\contentsline {subsection}{\numberline {1.5.2}IBM 模型3}{46}{subsection.1.5.2}%
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {7.6}小结及深入阅读}{410}{section.7.6}
\contentsline {subsection}{\numberline {1.5.3}IBM 模型4}{48}{subsection.1.5.3}%
\defcounter {refsection}{0}\relax
\contentsline {part}{\@mypartnumtocformat {IV}{附录}}{415}{part.4}
\ttl@stoptoc {default@3}
\ttl@starttoc {default@4}
\contentsline {subsection}{\numberline {1.5.4} IBM 模型5}{49}{subsection.1.5.4}%
\defcounter {refsection}{0}\relax
\contentsline {chapter}{\numberline {A}附录A}{417}{Appendix.1.A}
\contentsline {subsection}{\numberline {1.5.5}隐马尔可夫模型}{51}{subsection.1.5.5}%
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {A.1}基准数据集}{417}{section.1.A.1}
\contentsline {subsubsection}{隐马尔可夫模型}{51}{section*.51}%
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {A.2}平行语料}{418}{section.1.A.2}
\contentsline {subsubsection}{词对齐模型}{52}{section*.53}%
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {A.3}相关工具}{419}{section.1.A.3}
\contentsline {subsection}{\numberline {1.5.6}解码和训练}{53}{subsection.1.5.6}%
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {A.3.1}数据预处理工具}{419}{subsection.1.A.3.1}
\contentsline {section}{\numberline {1.6}问题分析}{54}{section.1.6}%
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {A.3.2}评价工具}{420}{subsection.1.A.3.2}
\contentsline {subsection}{\numberline {1.6.1}词对齐及对称化}{54}{subsection.1.6.1}%
\defcounter {refsection}{0}\relax
\contentsline {chapter}{\numberline {B}附录B}{421}{Appendix.2.B}
\contentsline {subsection}{\numberline {1.6.2}Deficiency}{55}{subsection.1.6.2}%
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {B.1}IBM模型3训练方法}{421}{section.2.B.1}
\contentsline {subsection}{\numberline {1.6.3}句子长度}{56}{subsection.1.6.3}%
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {B.2}IBM模型4训练方法}{423}{section.2.B.2}
\contentsline {subsection}{\numberline {1.6.4}其他问题}{56}{subsection.1.6.4}%
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {B.3}IBM模型5训练方法}{425}{section.2.B.3}
\contentsline {section}{\numberline {1.7}小结及深入阅读}{57}{section.1.7}%
\contentsfinish
......@@ -80,6 +80,7 @@
\noindent \textsc{东北大学自然语言处理实验室\ $\cdot$\ 小牛翻译}\\
\noindent \textsc{\url{https://opensource.niutrans.com/mtbook/index.html}}\\
\noindent \textsc{\url{https://github.com/NiuTrans/MTBook}}\\
\noindent {\red{Licensed under the Creative Commons Attribution-NonCommercial 4.0 Unported License (the ``License''). You may not use this file except in compliance with the License. You may obtain a copy of the License at \url{http://creativecommons.org/licenses/by-nc/4.0}. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \textsc{``as is'' basis, without warranties or conditions of any kind}, either express or implied. See the License for the specific language governing permissions and limitations under the License.}}\\
......@@ -121,9 +122,9 @@
% CHAPTERS
%----------------------------------------------------------------------------------------
%\include{Chapter1/chapter1}
\include{Chapter1/chapter1}
%\include{Chapter2/chapter2}
\include{Chapter3/chapter3}
%\include{Chapter3/chapter3}
%\include{Chapter4/chapter4}
%\include{Chapter5/chapter5}
%\include{Chapter6/chapter6}
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论