合并分支 'caorunzhe' 到 'master'

Caorunzhe 查看合并请求 !545

合并分支 'caorunzhe' 到 'master'
Caorunzhe 查看合并请求 !545
80ea32a3 · 曹润柘 · e29cb417 · 1633a63d · 80ea32a3 · e29cb417
Commit 80ea32a3 authored Dec 02, 2020 by 曹润柘
--- a/Chapter16/Figures/figure-optimization-of-the-model-initialization-method.tex
+++ b/Chapter16/Figures/figure-optimization-of-the-model-initialization-method.tex
+\begin{tabular}{c c}
+
+\begin{tikzpicture}
+\begin{scope}
+% ,minimum height =1em,minimum width=2em
+\tikzstyle{circle} = [draw,black,very thick,inner sep=3.5pt,rounded corners=4pt,minimum width=2em]
+\tikzstyle{word} = [inner sep=3.5pt]
+
+\node[circle](data) at (0,0) {数据};
+\node[circle](model) at ([xshift=5em]data.east) {模型};
+\node[word] (init) at ([xshift=-5em]data.west){初始化};
+
+\draw[->,very thick] (init.east) -- ([xshift=-0.2em]data.west);
+\draw [->,very thick] ([yshift=1pt]data.north) .. controls +(90:2em) and +(90:2em) .. ([yshift=1pt]model.north) node[above,midway] {参数优化};
+\draw [->,very thick] ([yshift=1pt]model.south) .. controls +(-90:2em) and +(-90:2em) .. ([yshift=1pt]data.south) node[below,midway] {数据优化};
+
+\node[word] at ([yshift=-5em]data.south){（a）思路1};
+
+\end{scope}
+\end{tikzpicture}
+&
+\begin{tikzpicture}
+\begin{scope}
+% ,minimum height =1em,minimum width=2em
+\tikzstyle{circle} = [draw,black,very thick,inner sep=3.5pt,rounded corners=4pt,minimum width=2em]
+\tikzstyle{word} = [inner sep=3.5pt]
+
+\node[circle](data) at (0,0) {数据};
+\node[circle](model) at ([xshift=5em]data.east) {模型};
+\node[word] (init) at ([xshift=5em]model.east){初始化};
+
+\draw[->,very thick] (init.west) -- ([xshift=0.2em]model.east);
+\draw [->,very thick] ([yshift=1pt]data.north) .. controls +(90:2em) and +(90:2em) .. ([yshift=1pt]model.north) node[above,midway] {参数优化};
+\draw [->,very thick] ([yshift=1pt]model.south) .. controls +(-90:2em) and +(-90:2em) .. ([yshift=1pt]data.south) node[below,midway] {数据优化};
+
+\node[word] at ([yshift=-5em]model.south){（b）思路2};
+
+\end{scope}
+\end{tikzpicture}
+
+\end{tabular}
\ No newline at end of file
--- a/Chapter16/Figures/figure-the-iterative-process-of-bidirectional-training.jpg
+++ b/Chapter16/Figures/figure-the-iterative-process-of-bidirectional-training.jpg
--- a/Chapter16/Figures/figure-the-iterative-process-of-bidirectional-training.png
+++ b/Chapter16/Figures/figure-the-iterative-process-of-bidirectional-training.png
--- a/Chapter16/Figures/figure-unmt-process.tex
+++ b/Chapter16/Figures/figure-unmt-process.tex
+
+\begin{tikzpicture}
+\begin{scope}
+% ,minimum height =1em,minimum width=2em
+\tikzstyle{circle} = [draw,black,very thick,inner sep=3.5pt,rounded corners=4pt,minimum width=2em,align=center]
+\tikzstyle{word} = [inner sep=3.5pt]
+
+\node[circle](center) at (0,0) {
+\begin{tabular}{c | c}
+$s\rightarrow t$ & $t\rightarrow s$ \\
+模型 & 模型
+\end{tabular}
+};
+\node[circle] (left) at ([xshift=-9em]center.west) {$s\rightarrow t$ \\ 数据};
+\node[circle] (right) at ([xshift=9em]center.east) {$t\rightarrow s$ \\ 数据};
+
+\node[word] (init) at ([yshift=6em]center.north){初始化};
+
+\node[circle] (down) at ([yshift=-8em]center.south) {$s,t$ \\ 数据};
+
+\draw[->,very thick] (init.south) -- ([yshift=0.2em]center.north);
+\draw[->,very thick] ([yshift=0.2em]down.north) -- ([yshift=-0.2em]center.south) node[pos=.44,midway,align=center] {语言模型\\目标函数\\（模型优化）};
+
+\draw[->,very thick] ([yshift=1pt]left.north) .. controls +(90:2em) and +(90:2em) .. ([yshift=1pt,xshift=-2.2em]center.north) node[above,midway,align=center] {正常MT目标函数\\（模型优化）};
+\draw[->,very thick] ([yshift=1pt,xshift=-1.8em]center.north) .. controls +(90:2em) and +(90:2em) .. ([yshift=1pt]right.north) node[above,pos=0.6,align=center] {回译\\（数据优化）};
+
+\draw [->,very thick] ([yshift=1pt]right.south) .. controls +(-90:2em) and +(-90:2em) .. ([yshift=1pt,xshift=2.2em]center.south) node[below,midway,align=center] {正常MT目标函数\\（模型优化）};
+\draw [->,very thick] ([yshift=1pt,xshift=1.8em]center.south) .. controls +(-90:2em) and +(-90:2em) .. ([yshift=1pt]left.south) node[below,pos=0.6,align=center] {回译\\（数据优化）};
+
+
+%\draw[->,very thick] (init.east) -- ([xshift=-0.2em]data.west);
+%\draw [->,very thick] ([yshift=1pt]data.north) .. controls +(90:2em) and +(90:2em) .. ([yshift=1pt]model.north) node[above,midway] {参数优化};
+%\draw [->,very thick] ([yshift=1pt]model.south) .. controls +(-90:2em) and +(-90:2em) .. ([yshift=1pt]data.south) node[below,midway] {数据优化};
+
+%\node[word] at ([yshift=-5em]data.south){（a）思路1};
+
+\end{scope}
+\end{tikzpicture}
--- a/Chapter16/Figures/figure-unsupervised-dual-learning-process.jpg.jpg
+++ b/Chapter16/Figures/figure-unsupervised-dual-learning-process.jpg.jpg
--- a/Chapter16/Figures/figure-unsupervised-dual-learning-process.png
+++ b/Chapter16/Figures/figure-unsupervised-dual-learning-process.png
--- a/Chapter16/chapter16.aux
+++ b/Chapter16/chapter16.aux
+\relax 
+\providecommand\zref@newlabel[2]{}
+\providecommand\hyper@newdestlabel[2]{}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {chapter}{\numberline {1}低资源神经机器翻译}{11}{chapter.1}\protected@file@percent }
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\addvspace {10\p@ }}
+\@writefile{lot}{\defcounter {refsection}{0}\relax }\@writefile{lot}{\addvspace {10\p@ }}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {section}{\numberline {1.1}数据的有效使用}{11}{section.1.1}\protected@file@percent }
+\newlabel{effective-use-of-data}{{1.1}{11}{数据的有效使用}{section.1.1}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsection}{\numberline {1.1.1}数据增强}{12}{subsection.1.1.1}\protected@file@percent }
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{1. 回译}{12}{section*.3}\protected@file@percent }
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {1.1}{\ignorespaces \color  {red}{回译方法的流程(新)} {\color  {blue} 图比以前清晰了，但是还是有些乱，可能你陷入到固有思维里了，可以找我再讨论下！}\relax }}{12}{figure.caption.4}\protected@file@percent }
+\providecommand*\caption@xref[2]{\@setref\relax\@undefined{#1}}
+\newlabel{fig:16-1-xc}{{1.1}{12}{\red {回译方法的流程(新)} {\color {blue} 图比以前清晰了，但是还是有些乱，可能你陷入到固有思维里了，可以找我再讨论下！}\relax }{figure.caption.4}{}}
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {1.2}{\ignorespaces \color  {red}{迭代式回译方法的流程，未修改} {\color  {blue} 这个图的逻辑我觉得是ok的，主要是这些线和过程需要再清晰一下，再找我讨论下！}\relax }}{13}{figure.caption.5}\protected@file@percent }
+\newlabel{fig:16-2-xc}{{1.2}{13}{\red {迭代式回译方法的流程，未修改} {\color {blue} 这个图的逻辑我觉得是ok的，主要是这些线和过程需要再清晰一下，再找我讨论下！}\relax }{figure.caption.5}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{2. 修改双语数据}{14}{section*.6}\protected@file@percent }
+\newlabel{add-noise}{{1.1.1}{14}{2. 修改双语数据}{section*.6}{}}
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {1.3}{\ignorespaces 三种加噪方法\relax }}{15}{figure.caption.7}\protected@file@percent }
+\newlabel{fig:16-4-xc}{{1.3}{15}{三种加噪方法\relax }{figure.caption.7}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{3. 双语句对挖掘}{16}{section*.8}\protected@file@percent }
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {1.4}{\ignorespaces 维基百科中的可比语料\relax }}{17}{figure.caption.9}\protected@file@percent }
+\newlabel{fig:16-5-xc}{{1.4}{17}{维基百科中的可比语料\relax }{figure.caption.9}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsection}{\numberline {1.1.2}基于语言模型的方法}{17}{subsection.1.1.2}\protected@file@percent }
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{1. 语言模型在目标端的融合}{18}{section*.10}\protected@file@percent }
+\newlabel{eq:16-1-xc}{{1.1}{18}{1. 语言模型在目标端的融合}{equation.1.1.1}{}}
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {1.5}{\ignorespaces \color  {red}{语言模型的浅融合与深融合，未修改} {\color  {blue} 图可以考虑删除了，要不也增加阅读的负担！}\relax }}{18}{figure.caption.11}\protected@file@percent }
+\newlabel{fig:16-6-xc}{{1.5}{18}{\red {语言模型的浅融合与深融合，未修改} {\color {blue} 图可以考虑删除了，要不也增加阅读的负担！}\relax }{figure.caption.11}{}}
+\newlabel{eq:16-2-xc}{{1.2}{18}{1. 语言模型在目标端的融合}{equation.1.1.2}{}}
+\newlabel{eq:16-3-xc}{{1.3}{19}{1. 语言模型在目标端的融合}{equation.1.1.3}{}}
+\newlabel{eq:16-4-xc}{{1.4}{19}{1. 语言模型在目标端的融合}{equation.1.1.4}{}}
+\newlabel{eq:16-5-xc}{{1.5}{19}{1. 语言模型在目标端的融合}{equation.1.1.5}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{2. 预训练词嵌入}{19}{section*.12}\protected@file@percent }
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{3. 预训练模型}{21}{section*.13}\protected@file@percent }
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {1.6}{\ignorespaces \color  {red}{MASS 预训练方法，重画}\relax }}{22}{figure.caption.14}\protected@file@percent }
+\newlabel{fig:16-8-xc}{{1.6}{22}{\red {MASS 预训练方法，重画}\relax }{figure.caption.14}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{4. 多任务学习}{23}{section*.15}\protected@file@percent }
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {1.7}{\ignorespaces \color  {red}{机器翻译中的多任务学习，重画}\relax }}{24}{figure.caption.16}\protected@file@percent }
+\newlabel{fig:16-9-xc}{{1.7}{24}{\red {机器翻译中的多任务学习，重画}\relax }{figure.caption.16}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {section}{\numberline {1.2}双向翻译模型}{24}{section.1.2}\protected@file@percent }
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsection}{\numberline {1.2.1}双向训练}{24}{subsection.1.2.1}\protected@file@percent }
+\newlabel{eq:16-6-xc}{{1.6}{24}{双向训练}{equation.1.2.6}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsection}{\numberline {1.2.2}对偶学习}{25}{subsection.1.2.2}\protected@file@percent }
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{1. 有监督对偶学习}{25}{section*.18}\protected@file@percent }
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {1.8}{\ignorespaces 双向训练的迭代过程\relax }}{26}{figure.caption.17}\protected@file@percent }
+\newlabel{fig:16-1-fk}{{1.8}{26}{双向训练的迭代过程\relax }{figure.caption.17}{}}
+\newlabel{eq:16-7-xc}{{1.7}{26}{1. 有监督对偶学习}{equation.1.2.7}{}}
+\newlabel{eq:16-8-xc}{{1.8}{26}{1. 有监督对偶学习}{equation.1.2.8}{}}
+\newlabel{eq:16-2-fk}{{1.9}{27}{1. 有监督对偶学习}{equation.1.2.9}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{2. 无监督对偶学习}{27}{section*.19}\protected@file@percent }
+\newlabel{eq:16-9-xc}{{1.10}{27}{2. 无监督对偶学习}{equation.1.2.10}{}}
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {1.9}{\ignorespaces 无监督对偶学习流程\relax }}{28}{figure.caption.20}\protected@file@percent }
+\newlabel{fig:16-10-xc}{{1.9}{28}{无监督对偶学习流程\relax }{figure.caption.20}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {section}{\numberline {1.3}多语言翻译模型}{28}{section.1.3}\protected@file@percent }
+\newlabel{multilingual-translation-model}{{1.3}{28}{多语言翻译模型}{section.1.3}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsection}{\numberline {1.3.1}基于枢轴语言的方法}{29}{subsection.1.3.1}\protected@file@percent }
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {1.10}{\ignorespaces 基于枢轴语言的翻译过程\relax }}{29}{figure.caption.21}\protected@file@percent }
+\newlabel{fig:16-1-ll}{{1.10}{29}{基于枢轴语言的翻译过程\relax }{figure.caption.21}{}}
+\newlabel{eq:ll-1}{{1.11}{29}{基于枢轴语言的方法}{equation.1.3.11}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsection}{\numberline {1.3.2}基于知识蒸馏的方法}{30}{subsection.1.3.2}\protected@file@percent }
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {1.11}{\ignorespaces 基于知识蒸馏的翻译过程\relax }}{30}{figure.caption.22}\protected@file@percent }
+\newlabel{fig:16-2-ll}{{1.11}{30}{基于知识蒸馏的翻译过程\relax }{figure.caption.22}{}}
+\newlabel{eq:ll-2}{{1.12}{30}{基于知识蒸馏的方法}{equation.1.3.12}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsection}{\numberline {1.3.3}基于迁移学习的方法}{31}{subsection.1.3.3}\protected@file@percent }
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {1.12}{\ignorespaces 传统机器学习\&迁移学习对比\relax }}{31}{figure.caption.23}\protected@file@percent }
+\newlabel{fig:16-3-ll}{{1.12}{31}{传统机器学习\&迁移学习对比\relax }{figure.caption.23}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{1. 参数初始化方法}{32}{section*.24}\protected@file@percent }
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {1.13}{\ignorespaces 参数初始化方法图\relax }}{32}{figure.caption.25}\protected@file@percent }
+\newlabel{fig:16-4-ll}{{1.13}{32}{参数初始化方法图\relax }{figure.caption.25}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{2. 多语言单模型系统}{32}{section*.26}\protected@file@percent }
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {1.14}{\ignorespaces 参数初始化方法图\relax }}{33}{figure.caption.27}\protected@file@percent }
+\newlabel{fig:16-5-ll}{{1.14}{33}{参数初始化方法图\relax }{figure.caption.27}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{3. 零资源翻译}{33}{section*.28}\protected@file@percent }
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {section}{\numberline {1.4}无监督机器翻译}{34}{section.1.4}\protected@file@percent }
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsection}{\numberline {1.4.1}无监督词典归纳}{35}{subsection.1.4.1}\protected@file@percent }
+\newlabel{unsupervised-dictionary-induction}{{1.4.1}{35}{无监督词典归纳}{subsection.1.4.1}{}}
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {1.15}{\ignorespaces 词典归纳原理图\relax }}{35}{figure.caption.29}\protected@file@percent }
+\newlabel{fig:16-1-lyf}{{1.15}{35}{词典归纳原理图\relax }{figure.caption.29}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{1. 方法框架}{35}{section*.30}\protected@file@percent }
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {1.16}{\ignorespaces 无监督词典归纳流程图（{\color  {red} A->a}）\textsuperscript  {\textsuperscript  {\cite {DBLP:conf/iclr/LampleCRDJ18}}}\relax }}{36}{figure.caption.31}\protected@file@percent }
+\newlabel{fig:16-2-lyf}{{1.16}{36}{无监督词典归纳流程图（{\color {red} A->a}）\upcite {DBLP:conf/iclr/LampleCRDJ18}\relax }{figure.caption.31}{}}
+\newlabel{eq:16-1}{{1.14}{37}{1. 方法框架}{equation.1.4.13}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{2. 鲁棒性问题}{37}{section*.32}\protected@file@percent }
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsection}{\numberline {1.4.2}无监督统计机器翻译}{38}{subsection.1.4.2}\protected@file@percent }
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{1. 无监督短语归纳}{38}{section*.33}\protected@file@percent }
+\newlabel{eq:16-2}{{1.15}{38}{1. 无监督短语归纳}{equation.1.4.15}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{2. 无监督权重调优}{39}{section*.34}\protected@file@percent }
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsection}{\numberline {1.4.3}无监督神经机器翻译}{39}{subsection.1.4.3}\protected@file@percent }
+\newlabel{unsupervised-NMT}{{1.4.3}{39}{无监督神经机器翻译}{subsection.1.4.3}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{1. 基于无监督统计机器翻译的方法}{39}{section*.35}\protected@file@percent }
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {1.17}{\ignorespaces 用无监督统计机器翻译训练神经机器翻译\relax }}{40}{figure.caption.36}\protected@file@percent }
+\newlabel{fig:16-1}{{1.17}{40}{用无监督统计机器翻译训练神经机器翻译\relax }{figure.caption.36}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{2. 基于无监督词典归纳的方法}{40}{section*.37}\protected@file@percent }
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{3. 更深层的融合}{40}{section*.39}\protected@file@percent }
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {1.18}{\ignorespaces 基于无监督词典归纳的方法\relax }}{41}{figure.caption.38}\protected@file@percent }
+\newlabel{fig:16-2}{{1.18}{41}{基于无监督词典归纳的方法\relax }{figure.caption.38}{}}
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {1.19}{\ignorespaces 模型初始化方法的优化\relax }}{41}{figure.caption.40}\protected@file@percent }
+\newlabel{fig:16-3}{{1.19}{41}{模型初始化方法的优化\relax }{figure.caption.40}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{4. 其它问题}{41}{section*.41}\protected@file@percent }
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {1.20}{\ignorespaces 无监督神经机器翻译模型训练流程\relax }}{43}{figure.caption.42}\protected@file@percent }
+\newlabel{fig:16-4}{{1.20}{43}{无监督神经机器翻译模型训练流程\relax }{figure.caption.42}{}}
+\@writefile{lot}{\defcounter {refsection}{0}\relax }\@writefile{lot}{\contentsline {table}{\numberline {1.1}{\ignorespaces 三种噪声函数（原句为``我\ 喜欢\ 吃\ 苹果\ 。''）。\relax }}{44}{table.caption.43}\protected@file@percent }
+\newlabel{tab:16-1}{{1.1}{44}{三种噪声函数（原句为``我\ 喜欢\ 吃\ 苹果\ 。''）。\relax }{table.caption.43}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {section}{\numberline {1.5}领域适应}{44}{section.1.5}\protected@file@percent }
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {1.21}{\ignorespaces 单词pitch（图里标红）在不同领域的不同词义实例\relax }}{44}{figure.caption.44}\protected@file@percent }
+\newlabel{fig:16-1-wbh}{{1.21}{44}{单词pitch（图里标红）在不同领域的不同词义实例\relax }{figure.caption.44}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsection}{\numberline {1.5.1}统计机器翻译中的领域适应}{45}{subsection.1.5.1}\protected@file@percent }
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{1. 基于混合模型的方法}{45}{section*.45}\protected@file@percent }
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{2. 基于数据加权的方法}{45}{section*.46}\protected@file@percent }
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{3. 基于数据选择的方法}{46}{section*.47}\protected@file@percent }
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{4. 基于伪数据的方法}{46}{section*.48}\protected@file@percent }
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsection}{\numberline {1.5.2}基于数据的神经机器翻译领域适应}{46}{subsection.1.5.2}\protected@file@percent }
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{1. 基于多领域数据的方法}{46}{section*.49}\protected@file@percent }
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{2. 基于数据选择的方法}{47}{section*.50}\protected@file@percent }
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{3. 基于单语数据的方法}{47}{section*.51}\protected@file@percent }
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsection}{\numberline {1.5.3}基于模型的神经机器翻译领域适应}{48}{subsection.1.5.3}\protected@file@percent }
+\newlabel{modeling-methods-in neural-machine-translation}{{1.5.3}{48}{基于模型的神经机器翻译领域适应}{subsection.1.5.3}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{1. 基于模型结构的方法}{48}{section*.52}\protected@file@percent }
+\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {1.22}{\ignorespaces 领域判别器示意图\relax }}{48}{figure.caption.53}\protected@file@percent }
+\newlabel{fig:16-2-wbh}{{1.22}{48}{领域判别器示意图\relax }{figure.caption.53}{}}
+\newlabel{eq:16-1-wbh}{{1.16}{48}{1. 基于模型结构的方法}{equation.1.5.16}{}}
+\newlabel{eq:16-2-wbh}{{1.17}{48}{1. 基于模型结构的方法}{equation.1.5.17}{}}
+\newlabel{eq:16-3-wbh}{{1.18}{49}{1. 基于模型结构的方法}{equation.1.5.18}{}}
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{2. 基于训练策略的方法}{49}{section*.54}\protected@file@percent }
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsubsection}{3. 基于模型推断的方法}{50}{section*.55}\protected@file@percent }
+\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {section}{\numberline {1.6}小结及扩展阅读}{50}{section.1.6}\protected@file@percent }
+\@setckpt{Chapter16/chapter16}{
+\setcounter{page}{52}
+\setcounter{equation}{18}
+\setcounter{enumi}{0}
+\setcounter{enumii}{0}
+\setcounter{enumiii}{0}
+\setcounter{enumiv}{0}
+\setcounter{footnote}{0}
+\setcounter{mpfootnote}{0}
+\setcounter{part}{0}
+\setcounter{chapter}{1}
+\setcounter{section}{6}
+\setcounter{subsection}{0}
+\setcounter{subsubsection}{0}
+\setcounter{paragraph}{0}
+\setcounter{subparagraph}{0}
+\setcounter{figure}{22}
+\setcounter{table}{1}
+\setcounter{tabx@nest}{0}
+\setcounter{listtotal}{0}
+\setcounter{listcount}{0}
+\setcounter{liststart}{0}
+\setcounter{liststop}{0}
+\setcounter{citecount}{0}
+\setcounter{citetotal}{0}
+\setcounter{multicitecount}{0}
+\setcounter{multicitetotal}{0}
+\setcounter{instcount}{348}
+\setcounter{maxnames}{3}
+\setcounter{minnames}{1}
+\setcounter{maxitems}{3}
+\setcounter{minitems}{1}
+\setcounter{citecounter}{0}
+\setcounter{maxcitecounter}{0}
+\setcounter{savedcitecounter}{0}
+\setcounter{uniquelist}{0}
+\setcounter{uniquename}{0}
+\setcounter{refsection}{0}
+\setcounter{refsegment}{0}
+\setcounter{maxextratitle}{0}
+\setcounter{maxextratitleyear}{0}
+\setcounter{maxextraname}{10}
+\setcounter{maxextradate}{0}
+\setcounter{maxextraalpha}{0}
+\setcounter{abbrvpenalty}{50}
+\setcounter{highnamepenalty}{50}
+\setcounter{lownamepenalty}{25}
+\setcounter{maxparens}{3}
+\setcounter{parenlevel}{0}
+\setcounter{mincomprange}{10}
+\setcounter{maxcomprange}{100000}
+\setcounter{mincompwidth}{1}
+\setcounter{afterword}{0}
+\setcounter{savedafterword}{0}
+\setcounter{annotator}{0}
+\setcounter{savedannotator}{0}
+\setcounter{author}{0}
+\setcounter{savedauthor}{0}
+\setcounter{bookauthor}{0}
+\setcounter{savedbookauthor}{0}
+\setcounter{commentator}{0}
+\setcounter{savedcommentator}{0}
+\setcounter{editor}{0}
+\setcounter{savededitor}{0}
+\setcounter{editora}{0}
+\setcounter{savededitora}{0}
+\setcounter{editorb}{0}
+\setcounter{savededitorb}{0}
+\setcounter{editorc}{0}
+\setcounter{savededitorc}{0}
+\setcounter{foreword}{0}
+\setcounter{savedforeword}{0}
+\setcounter{holder}{0}
+\setcounter{savedholder}{0}
+\setcounter{introduction}{0}
+\setcounter{savedintroduction}{0}
+\setcounter{namea}{0}
+\setcounter{savednamea}{0}
+\setcounter{nameb}{0}
+\setcounter{savednameb}{0}
+\setcounter{namec}{0}
+\setcounter{savednamec}{0}
+\setcounter{translator}{0}
+\setcounter{savedtranslator}{0}
+\setcounter{shortauthor}{0}
+\setcounter{savedshortauthor}{0}
+\setcounter{shorteditor}{0}
+\setcounter{savedshorteditor}{0}
+\setcounter{labelname}{0}
+\setcounter{savedlabelname}{0}
+\setcounter{institution}{0}
+\setcounter{savedinstitution}{0}
+\setcounter{lista}{0}
+\setcounter{savedlista}{0}
+\setcounter{listb}{0}
+\setcounter{savedlistb}{0}
+\setcounter{listc}{0}
+\setcounter{savedlistc}{0}
+\setcounter{listd}{0}
+\setcounter{savedlistd}{0}
+\setcounter{liste}{0}
+\setcounter{savedliste}{0}
+\setcounter{listf}{0}
+\setcounter{savedlistf}{0}
+\setcounter{location}{0}
+\setcounter{savedlocation}{0}
+\setcounter{organization}{0}
+\setcounter{savedorganization}{0}
+\setcounter{origlocation}{0}
+\setcounter{savedoriglocation}{0}
+\setcounter{origpublisher}{0}
+\setcounter{savedorigpublisher}{0}
+\setcounter{publisher}{0}
+\setcounter{savedpublisher}{0}
+\setcounter{language}{0}
+\setcounter{savedlanguage}{0}
+\setcounter{origlanguage}{0}
+\setcounter{savedoriglanguage}{0}
+\setcounter{pageref}{0}
+\setcounter{savedpageref}{0}
+\setcounter{textcitecount}{0}
+\setcounter{textcitetotal}{0}
+\setcounter{textcitemaxnames}{0}
+\setcounter{biburlbigbreakpenalty}{100}
+\setcounter{biburlbreakpenalty}{200}
+\setcounter{biburlnumpenalty}{0}
+\setcounter{biburlucpenalty}{0}
+\setcounter{biburllcpenalty}{0}
+\setcounter{smartand}{1}
+\setcounter{bbx:relatedcount}{0}
+\setcounter{bbx:relatedtotal}{0}
+\setcounter{parentequation}{0}
+\setcounter{notation}{0}
+\setcounter{dummy}{0}
+\setcounter{problem}{0}
+\setcounter{exerciseT}{0}
+\setcounter{exampleT}{0}
+\setcounter{vocabulary}{0}
+\setcounter{definitionT}{0}
+\setcounter{mdf@globalstyle@cnt}{0}
+\setcounter{mdfcountframes}{0}
+\setcounter{mdf@env@i}{0}
+\setcounter{mdf@env@ii}{0}
+\setcounter{mdf@zref@counter}{0}
+\setcounter{Item}{0}
+\setcounter{Hfootnote}{0}
+\setcounter{Hy@AnnotLevel}{0}
+\setcounter{bookmark@seq@number}{0}
+\setcounter{caption@flags}{0}
+\setcounter{continuedfloat}{0}
+\setcounter{cp@cnt}{0}
+\setcounter{cp@tempcnt}{0}
+\setcounter{subfigure}{0}
+\setcounter{lofdepth}{1}
+\setcounter{subtable}{0}
+\setcounter{lotdepth}{1}
+\setcounter{@pps}{0}
+\setcounter{@ppsavesec}{0}
+\setcounter{@ppsaveapp}{0}
+\setcounter{tcbbreakpart}{0}
+\setcounter{tcblayer}{0}
+\setcounter{tcolorbox@number}{0}
+\setcounter{section@level}{1}
+}
--- a/Chapter16/chapter16.tex
+++ b/Chapter16/chapter16.tex
@@ -301,7 +301,7 @@ g_{t}& = & \sigma (w^{T}s_{t}^{TM} + b)

 \section{双向翻译模型}

-\parinterval 机器翻译主要是通过双语数据训练一种语言到另外一种语言的翻译。显然这是一种双向任务。对于给定的双语数据，可以同时学习源语言到目标语言和目标语言到源语言的翻译模型。那么，两个方向的翻译模型能否联合起来，相辅相成呢？下面从双向训练和对偶学习两方面对双向翻译模型进行介绍。这些方法大量使用在低资源翻译系统中，比如，可以用双向翻译模型反复迭代构造伪数据。
+\parinterval 机器翻译主要是通过双语数据训练一种语言到另外一种语言的翻译。显然这是一种双向任务。对于给定的双语数据，可以同时学习源语言到目标语言和目标语言到源语言的翻译模型。那么，两个方向的翻译模型能否联合起来，相辅相成呢？下面从双向训练和对偶学习两方面对双向翻译模型进行介绍。这些方法被大量使用在低资源翻译系统中，比如，可以用双向翻译模型反复迭代构造伪数据。

 %----------------------------------------------------------------------------------------
 %    NEW SUBSUB-SECTION
@@ -309,7 +309,9 @@ g_{t}& = & \sigma (w^{T}s_{t}^{TM} + b)

 \subsection{双向训练}

-\parinterval 回顾神经机器翻译系统的建模过程，给定一个互译的句对$(\seq{x},\seq{y})$，一个从源语言句子$\seq{x}$到目标语言句子$\seq{y}$的翻译被表示为求条件概率$\funp{P}(\seq{y}|\seq{x})$的问题。类似地，一个从目标语言句子$\seq{y}$到源语言句子$\seq{x}$的翻译可以表示为$\funp{P}(\seq{x}|\seq{y})$。通常来说，神经机器翻译的训练一次只得到一个方向的模型，也就是$\funp{P}(\seq{y}|\seq{x})$或者$\funp{P}(\seq{x}|\seq{y})$。这意味着$\funp{P}(\seq{y}|\seq{x})$和$\funp{P}(\seq{x}|\seq{y})$之间是互相独立的。$\funp{P}(\seq{y}|\seq{x})$和$\funp{P}(\seq{x}|\seq{y})$是否真的没有关系呢？比如，假设$\seq{x}$和$\seq{y}$是相同大小的向量，且$\seq{x}$到$\seq{y}$的变换是一个线性变换，也就是与一个方阵$\seq{W}$做矩阵乘法：
+\parinterval 回顾神经机器翻译系统的建模过程，给定一个互译的句对$(\seq{x},\seq{y})$，一个从源语言句子$\seq{x}$到目标语言句子$\seq{y}$的翻译被表示为求条件概率$\funp{P}(\seq{y}|\seq{x})$的问题。类似地，一个从目标语言句子$\seq{y}$到源语言句子$\seq{x}$的翻译可以表示为$\funp{P}(\seq{x}|\seq{y})$。通常来说，神经机器翻译的训练一次只得到一个方向的模型，也就是$\funp{P}(\seq{y}|\seq{x})$或者$\funp{P}(\seq{x}|\seq{y})$。这意味着$\funp{P}(\seq{y}|\seq{x})$和$\funp{P}(\seq{x}|\seq{y})$之间是互相独立的。但
+
+$\funp{P}(\seq{y}|\seq{x})$和$\funp{P}(\seq{x}|\seq{y})$是否真的没有关系呢？比如，假设$\seq{x}$和$\seq{y}$是相同大小的向量，且$\seq{x}$到$\seq{y}$的变换是一个线性变换，也就是与一个方阵$\seq{W}$做矩阵乘法：

 \begin{eqnarray}
 \seq{y} & = & \seq{x} \cdot \seq{W}
@@ -318,15 +320,15 @@ g_{t}& = & \sigma (w^{T}s_{t}^{TM} + b)

 \parinterval 这里可以把$\seq{x}$和$\seq{y}$都看作分布式的向量表示；$\seq{W}$应当是一个满秩矩阵，否则对于任意一个$\seq{x}$经过$\seq{W}$变换得到的$\seq{y}$只落在所有可能的$\seq{y}$的一个子空间内，即在给定$\seq{W}$的情况下有些$\seq{y}$不能被任何一个$\seq{x}$表达，而这不符合常识，因为不管是什么句子，我们总能找到它的一种译文。若$\seq{W}$是满秩矩阵说明$\seq{W}$可逆，也就是给定$\seq{x}$到$\seq{y}$的变换$\seq{W}$下，$\seq{y}$到$\seq{x}$的变换必然是$\seq{W}$的逆而不是其他矩阵。

-\parinterval 这个例子说明$\funp{P}(\seq{y}|\seq{x})$和$\funp{P}(\seq{x}|\seq{y})$直觉上应当存在联系。当然，$\seq{x}$和$\seq{y}$之间是否存在简单的线性变换关系并没有结论，但是上面的例子给出了一种对源语言句子和目标语言句子进行相互转化的思路。实际上，研究人员已经通过一些数学技巧用目标函数来把$\funp{P}(\seq{y}|\seq{x})$和$\funp{P}(\seq{x}|\seq{y})$联系起来，这样训练神经机器翻译系统一次就可以同时得到两个方向的翻译模型，使得训练变得更加高效\upcite{Hassan2018AchievingHP,DBLP:conf/aaai/Zhang0LZC18,DBLP:conf/wmt/SunJXHWW19}。双向联合训练的基本思想是：使用两个方向的翻译模型对单语数据进行解码，之后用解码后的翻译与原始的单语数据作为训练语料，通过多次迭代更新两个方向上的机器翻译模型。
+\parinterval 这个例子说明$\funp{P}(\seq{y}|\seq{x})$和$\funp{P}(\seq{x}|\seq{y})$直觉上应当存在联系。当然，$\seq{x}$和$\seq{y}$之间是否存在简单的线性变换关系并没有结论，但是上面的例子给出了一种对源语言句子和目标语言句子进行相互转化的思路。实际上，研究人员已经通过一些数学技巧用目标函数来把$\funp{P}(\seq{y}|\seq{x})$和$\funp{P}(\seq{x}|\seq{y})$联系起来，这样训练神经机器翻译系统一次就可以同时得到两个方向的翻译模型，使得训练变得更加高效\upcite{Hassan2018AchievingHP,DBLP:conf/aaai/Zhang0LZC18,DBLP:conf/wmt/SunJXHWW19}。双向联合训练的基本思想是：使用两个方向的翻译模型对单语数据进行解码，之后用解码后的翻译结果与原始的单语数据作为训练语料，通过多次迭代更新两个方向上的机器翻译模型。

-\parinterval 图\ref{fig:16-1-fk}给出了一个双向训练的详细流程，其中$M_{x \rightarrow y}^{k}$表示第$k$轮得到的$x$到$y$的翻译模型，$M_{y \rightarrow x}^{k}$表示第$k$轮得到的$y$到$x$的翻译模型。这里只展示了前两轮迭代。在第一次迭代开始之前，首先使用双语数据对两个初始翻译模型执行预训练。为了保持一致性，这里称之为第0 轮迭代。在第一轮迭代中，首先使用这两个翻译模型$M_{x \rightarrow y}^{0}$和$M_{y \rightarrow x}^{0}$ 翻译单语数据$X=\{ x_i \}$ 和$Y= \{ y_i \}$ 后得到译文$\{\hat{y}_i^{0} \}$和$\{ \hat{x}_i^{0}\}$。进一步，构建伪训练数据集$\{ x_i,\hat{y}_i^{0}\}$ 与$\{ \hat{x}_i^{0},y_i \}$。然后，模型$M_{x \rightarrow y}^{1}$和$M_{y \rightarrow x}^{1}$使用上面的两个伪训练集和原始双语数据混合进行训练并执行参数更新，即用$\{ x_i,\hat{y}_i^{0}\} \bigcup \{ x_i,y_i\}$训练$M_{x \rightarrow y}^{1}$，用$\{ y_i,\hat{x}_i^{0}\} \bigcup \{ y_i,x_i\}$训练$M_{y \rightarrow x}^{1}$。第二轮迭代继续重复上述过程，使用更新参数后的翻译模型$M_{x \rightarrow y}^{1}$和$M_{y \rightarrow x}^{1}$ 得到新的伪数据集$\{ x_i,\hat{y}_i^{1}\}$ 与$\{ \hat{x}_i^{1},y_i \}$。然后，进一步得到翻译模型$M_{x \rightarrow y}^{2}$和$M_{y \rightarrow x}^{2}$。这种方式本质上也是一种自学习的过程，通过逐步生成更好的伪数据提升模型质量。
+\parinterval 图\ref{fig:16-1-fk}给出了一个双向训练的详细流程，其中$M_{x \rightarrow y}^{k}$表示第$k$轮得到的$x$到$y$的翻译模型，$M_{y \rightarrow x}^{k}$表示第$k$轮得到的$y$到$x$的翻译模型。这里只展示了前两轮迭代。在第一次迭代开始之前，首先使用双语数据对两个初始翻译模型执行预训练。为了保持一致性，这里称之为第0 轮迭代。在第一轮迭代中，首先使用这两个翻译模型$M_{x \rightarrow y}^{0}$和$M_{y \rightarrow x}^{0}$ 翻译单语数据$X=\{ x_i \}$ 和$Y= \{ y_i \}$ 后得到译文$\{\hat{y}_i^{0} \}$和$\{ \hat{x}_i^{0}\}$。进一步，构建伪训练数据集$\{ x_i,\hat{y}_i^{0}\}$ 与$\{ \hat{x}_i^{0},y_i \}$。然后使用上面的两个伪训练集和原始双语数据混合训练得到模型$M_{x \rightarrow y}^{1}$和$M_{y \rightarrow x}^{1}$并执行参数更新，即用$\{ x_i,\hat{y}_i^{0}\} \bigcup \{ x_i,y_i\}$训练$M_{x \rightarrow y}^{1}$，用$\{ y_i,\hat{x}_i^{0}\} \bigcup \{ y_i,x_i\}$训练$M_{y \rightarrow x}^{1}$。第二轮迭代继续重复上述过程，使用更新参数后的翻译模型$M_{x \rightarrow y}^{1}$和$M_{y \rightarrow x}^{1}$ 得到新的伪数据集$\{ x_i,\hat{y}_i^{1}\}$ 与$\{ \hat{x}_i^{1},y_i \}$。然后，进一步得到翻译模型$M_{x \rightarrow y}^{2}$和$M_{y \rightarrow x}^{2}$。这种方式本质上也是一种自学习的过程，通过逐步生成更好的伪数据提升模型质量。

 %----------------------------------------------
 \begin{figure}[h]
 \centering
-\includegraphics[scale=0.7]{Chapter16/Figures/figure-the-iterative-process-of-bidirectional-training.jpg}
-\caption{双向训练的迭代过程（{\color{red} 图需要修改！}）}
+\includegraphics[scale=0.7]{Chapter16/Figures/figure-the-iterative-process-of-bidirectional-training.png}
+\caption{双向训练的迭代过程}
 \label{fig:16-1-fk}
 \end{figure}
 %----------------------------------------------
@@ -354,13 +356,13 @@ g_{t}& = & \sigma (w^{T}s_{t}^{TM} + b)

 \parinterval 公式\ref{eq:16-7-xc}很自然地把两个方向的翻译模型$\funp{P}(\seq{y}|\seq{x})$和$\funp{P}(\seq{x}|\seq{y})$以及两个语言模型$\funp{P}(\seq{x})$和$\funp{P}(\seq{y})$联系起来：$\funp{P}(\seq{x})\funp{P}(\seq{y}|\seq{x})$应该与$\funp{P}(\seq{y})\funp{P}(\seq{x}|\seq{y})$接近，因为它们都表达了同一个联合分布$\funp{P}(\seq{x},\seq{y})$。因此，在构建训练两个方向的翻译模型的目标函数时，除了它们单独训练时各自使用的极大似然估计目标函数，可以额外增加一个目标项来鼓励两个方向的翻译模型：
 \begin{eqnarray}
-\mathcal{L}_{\rm{dual}} & = & (\log{\funp{P}(\seq{x})} + \log{\funp{P}(\seq{y}|\seq{x})} - \log{\funp{P}(\seq{y})} - \log{\funp{P}(\seq{x}|\seq{y}))^{2}}
+{L}_{\rm{dual}} & = & (\log{\funp{P}(\seq{x})} + \log{\funp{P}(\seq{y}|\seq{x})} - \log{\funp{P}(\seq{y})} - \log{\funp{P}(\seq{x}|\seq{y}))^{2}}
 \label{eq:16-8-xc}
 \end{eqnarray}

 \parinterval 通过该正则化项，我们将互为对偶的两个任务放在一块学习，通过任务对偶性加强监督学习的过程，就是有监督对偶学习\upcite{DBLP:conf/icml/XiaQCBYL17,qin2020dual}。这里，$\funp{P}(\seq{x})$和$\funp{P}(\seq{y})$这两个语言模型是预先训练好的，并不参与翻译模型的训练。可以看到，对于单独的一个模型来说，其目标函数增加了与另外一个方向的模型相关的项。这样的形式与L1/L2正则化非常类似（见{\chapternine}），因此可以把这个方法看作是一种任务特定的正则化的手段（由翻译任务本身的性质所启发而来）。有监督对偶学习实际上要优化下面这个损失函数:
 \begin{eqnarray}
-\mathcal{L} & = &  \log{\funp{P}(\seq{y}|\seq{x})}+\log{\funp{P}(\seq{x}|\seq{y})}+\mathcal{L}_{\rm{dual}}
+{L} & = &  \log{\funp{P}(\seq{y}|\seq{x})}+\log{\funp{P}(\seq{x}|\seq{y})}+\mathcal{L}_{\rm{dual}}
 \label{eq:16-2-fk}
 \end{eqnarray}

@@ -388,8 +390,8 @@ g_{t}& = & \sigma (w^{T}s_{t}^{TM} + b)
 %----------------------------------------------
 \begin{figure}[htp]
 \centering
-\includegraphics[scale=0.4]{./Chapter16/Figures/figure-unsupervised-dual-learning-process.jpg}
-\caption{无监督对偶学习流程（{\color{red} 图要改！}）}
+\includegraphics[scale=0.4]{./Chapter16/Figures/figure-unsupervised-dual-learning-process.png}
+\caption{无监督对偶学习流程}
 \label{fig:16-10-xc}
 \end{figure}
 %----------------------------------------------
@@ -769,7 +771,7 @@ P(\mathbi{y}|\mathbi{x}) & = & \frac{\mathrm{cos}(\mathbi{x},\mathbi{y})/\tau}{\

 \begin{figure}[h]
 \centering
-\includegraphics[scale=0.2,angle=90]{Chapter16/Figures/figure-unmt-idea3.jpg}
+\input{Chapter16/Figures/figure-optimization-of-the-model-initialization-method}
 \caption{模型初始化方法的优化}
 \label{fig:16-3}
 \end{figure}
@@ -798,8 +800,8 @@ P(\mathbi{y}|\mathbi{x}) & = & \frac{\mathrm{cos}(\mathbi{x},\mathbi{y})/\tau}{\

 \begin{figure}[h]
 \centering
-\includegraphics[scale=0.2,angle=90]{Chapter16/Figures/figure-unmt-process.jpg}
-\caption{无监督神经机器翻译模型训练流程。}
+\input{Chapter16/Figures/figure-unmt-process}
+\caption{无监督神经机器翻译模型训练流程}
 \label{fig:16-4}
 \end{figure}