Commit 7b9c2fa5 by 孟霞

合并分支 'master' 到 'mengxia'

Master

查看合并请求 !61
parents 3bf6c3ec 09e9367f
......@@ -21,8 +21,8 @@
{\footnotesize
\node [anchor=west] (n11) at ([xshift=-13em,yshift=2em]n1.west) {对训练和测试数据进行};
\node [anchor=west] (n12) at ([xshift=0em,yshift=-1.5em]n11.west) {处理,包括:数据清洗、};
\node [anchor=west] (n13) at ([xshift=0em,yshift=-1.5em]n12.west) {翻译单元(字词)切分、};
\node [anchor=west] (n14) at ([xshift=0em,yshift=-1.5em]n13.west) {译文后处理};
\node [anchor=west] (n13) at ([xshift=0em,yshift=-1.5em]n12.west) {子词切分、译文后处理};
\node [anchor=west] (n14) at ([xshift=0em,yshift=-1.5em]n13.west) {};
\node [anchor=west] (n31) at ([xshift=2em,yshift=0em]n3.north east) {神经网络模型设计,包括};
......
\begin{tikzpicture}
\tikzstyle{op} =[rounded corners=1pt,thick,minimum width=4.0em,minimum height=3.0em,draw,fill=red!5!white,font=\scriptsize]
\tikzstyle{data} = [cylinder,draw=black,thick,minimum height=3em,minimum width=3em,shape border rotate=0,cylinder uses custom fill, cylinder body fill=blue!10,cylinder end fill=blue!5,anchor = east,font=\scriptsize]
\tikzstyle{data} = [cylinder,draw=black,thick,minimum height=2.5em,minimum width=3em,shape border rotate=0,cylinder uses custom fill, cylinder body fill=blue!10,cylinder end fill=blue!5,anchor = east,font=\scriptsize]
\node[op] (node1) at (0,0) {分词};
\node[op,anchor = west] (node2) at ([xshift = 2.0em]node1.east) {符号标准化};
\node[op,anchor = west] (node3) at ([xshift = 2.0em]node2.east) {数据过滤};
\node[op,anchor = west] (node4) at ([xshift = 2.0em]node3.east) {子词切分};
\node [data,anchor = east] (data1) at ([xshift = -2.0em]node1.west){原始数据};
\node [data,anchor = west] (data2) at ([xshift = 2.0em]node3.east){训练数据};
\node [data,anchor = west] (data2) at ([xshift = 2.0em]node4.east){训练数据};
\draw[-stealth,line width=.05cm] ([xshift=0.25em]data1.east) -- ([xshift=-0.25em]node1.west);
\draw[-stealth,line width=.05cm] ([xshift=0.25em]node1.east) -- ([xshift=-0.25em]node2.west);
\draw[-stealth,line width=.05cm] ([xshift=0.25em]node2.east) -- ([xshift=-0.25em]node3.west);
\draw[-stealth,line width=.05cm] ([xshift=0.25em]node3.east) -- ([xshift=-0.25em]data2.west);
\draw[-stealth,line width=.05cm] ([xshift=0.25em]node3.east) -- ([xshift=-0.25em]node4.west);
\draw[-stealth,line width=.05cm] ([xshift=0.25em]node4.east) -- ([xshift=-0.25em]data2.west);
\end{tikzpicture}
\ No newline at end of file
......@@ -16,11 +16,11 @@
\begin{spacing}{1.2}
让计算机进行自然语言的翻译是人类长期的梦想,也是人工智能的终极目标之一。自上世纪九十年代起,机器翻译也迈入了基于统计建模的时代,而发展到今天,深度学习等机器学习方法已经在机器翻译中得到了大量的应用,取得了令人瞩目的进步。
让计算机进行自然语言的翻译是人类长期的梦想,也是人工智能的终极目标之一。自上世纪九十年代起,机器翻译迈入了基于统计建模的时代,发展到今天,深度学习等机器学习方法已经在机器翻译中得到了大量的应用,取得了令人瞩目的进步。
在这个时代背景下,对机器翻译的模型、方法、实现技术进行深入了解是自然语言处理领域研究者和实践者所渴望的。本书全面的回顾了近三十年内机器翻译的技术发展历程,并围绕统计建模和深度学习两个主题对机器翻译的技术方法进行了全面介绍。在写作中,笔者力求用朴实的语言和实例阐述机器翻译的基本模型和方法,同时对相关的技术前沿进行讨论。本书可以供计算机相关专业高年级本科生及研究生学习之用,也可以作为自然语言处理,特别是机器翻译相关研究人员的参考资料。
在这个时代背景下,对机器翻译的模型、方法和实现技术进行深入了解是自然语言处理领域研究者和实践者所渴望的。本书全面回顾了近三十年内机器翻译的技术发展历程,并围绕统计建模和深度学习两个主题对机器翻译的技术方法进行了全面介绍。在写作中,笔者力求用朴实的语言和简洁的实例阐述机器翻译的基本模型和方法,同时对相关的技术前沿进行讨论。本书可以供计算机相关专业高年级本科生及研究生学习之用,也可以作为自然语言处理,特别是机器翻译领域相关研究人员的参考资料。
本书共分为七个章节章节的顺序参考了机器翻译技术发展的时间脉络,同时兼顾了机器翻译知识体系的内在逻辑。各章节的主要内容包括:
本书共分为七个章节章节的顺序参考了机器翻译技术发展的时间脉络,同时兼顾了机器翻译知识体系的内在逻辑。各章节的主要内容包括:
\begin{itemize}
\vspace{0.4em}
......@@ -40,9 +40,9 @@
\vspace{0.4em}
\end{itemize}
其中,第一章是对机器翻译的整体介绍。第二章和第五章是对统计建模和深度学习方法的介绍,分别建立了两个机器翻译范式的基础知识体系 \ \dash \ 统计机器翻译和神经机器翻译。统计机器翻译部分(第三、四章)涉及早期的基于单词的翻译模型,以及本世纪初流行的基于短语和句法的翻译模型;神经机器翻译(第六、七章)代表了当今机器翻译的前沿,内容主要涉及了基于端到端表示学习的机器翻译建模方法。特别是,第七章对一些最新的神经机器翻译方法进行了讨论,为相关科学问题的研究和实用系统的开发提供了可落地的思路。图\ref{fig:preface}展示了本书各个章节及核心概念之间的关系。
其中,第一章是对机器翻译的整体介绍。第二章和第五章是对统计建模和深度学习方法的介绍,分别建立了两个机器翻译范式的基础知识体系 \ \dash \ 统计机器翻译和神经机器翻译。统计机器翻译部分(第三、四章)涉及早期的基于单词的翻译模型,以及本世纪初流行的基于短语和句法的翻译模型。神经机器翻译(第六、七章)代表了当今机器翻译的前沿,内容主要涉及了基于端到端表示学习的机器翻译建模方法。特别的,第七章对一些最新的神经机器翻译方法进行了讨论,为相关科学问题的研究和实用系统的开发提供了可落地的思路。图\ref{fig:preface}展示了本书各个章节及核心概念之间的关系。
{\red 用最简单的方式阐述机器翻译的基本思想}是笔者所期望达到的目标。但是,书中不可避免会使用一些形式化定义和算法的抽象描述。这时,笔者尽所能通过图例进行解释(本书共320张插图)。不过,本书所包含的内容较为广泛,难免会有疏漏,望读者海涵,并指出不当之处。
{\red 用最简单的方式阐述机器翻译的基本思想}是笔者所期望达到的目标。但是,书中不可避免会使用一些形式化定义和算法的抽象描述,因此,笔者尽所能通过图例进行解释(本书共320张插图)。不过,本书所包含的内容较为广泛,难免会有疏漏,望读者海涵,并指出不当之处。
\begin{figure}[htp]
\centering
......
......@@ -6345,4 +6345,69 @@ year = {2020},
author={Nepveu, Laurent and Lapalme, Guy and Langlais, Philippe and Foster, George F.},
booktitle={Conference on Empirical Methods in Natural Language Processing},
year={2004},
}
@inproceedings{wang-etal-2018-tencent,
title = "Tencent Neural Machine Translation Systems for {WMT}18",
author = "Wang, Mingxuan and
Gong, Li and
Zhu, Wenhuan and
Xie, Jun and
Bian, Chao",
booktitle = "Proceedings of the Third Conference on Machine Translation: Shared Task Papers",
month = oct,
year = "2018",
address = "Belgium, Brussels",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/W18-6429",
doi = "10.18653/v1/W18-6429",
pages = "522--527",
abstract = "We participated in the WMT 2018 shared news translation task on English鈫擟hinese language pair. Our systems are based on attentional sequence-to-sequence models with some form of recursion and self-attention. Some data augmentation methods are also introduced to improve the translation performance. The best translation result is obtained with ensemble and reranking techniques. Our Chinese鈫扙nglish system achieved the highest cased BLEU score among all 16 submitted systems, and our English鈫扖hinese system ranked the third out of 18 submitted systems.",
}
@article{DBLP:journals/corr/LeeCH16,
author = {Jason Lee and
Kyunghyun Cho and
Thomas Hofmann},
title = {Fully Character-Level Neural Machine Translation without Explicit
Segmentation},
journal = {CoRR},
volume = {abs/1610.03017},
year = {2016},
url = {http://arxiv.org/abs/1610.03017},
archivePrefix = {arXiv},
eprint = {1610.03017},
timestamp = {Mon, 13 Aug 2018 16:47:21 +0200},
biburl = {https://dblp.org/rec/journals/corr/LeeCH16.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
@INPROCEEDINGS{6289079,
author={M. {Schuster} and K. {Nakajima}},
booktitle={2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, title={Japanese and Korean voice search},
year={2012},
volume={},
number={},
pages={5149-5152}
}
@ARTICLE{61115,
author={J. {Lin}},
journal={IEEE Transactions on Information Theory},
title={Divergence measures based on the Shannon entropy},
year={1991},
volume={37},
number={1},
pages={145-151},}
@article{Mengzhou2019Graph,
title={Graph Based Translation Memory for Neural Machine Translation},
author={Mengzhou Xia and Guoping Huang and Lemao Liu and Shuming Shi},
year={2019},
}
@book{Qiuxiang2019Word,
title={Word Position Aware Translation Memory for Neural Machine Translation},
author={Qiuxiang He and Guoping Huang and Lemao Liu and Li Li},
year={2019},
}
\ No newline at end of file
......@@ -76,15 +76,15 @@
~\vfill
\thispagestyle{empty}
\noindent Copyright \copyright\ 2020 肖桐\ \ 朱靖波\\
\noindent Copyright \copyright\ 2020 肖桐\ \ 朱靖波\\
\noindent \textsc{东北大学自然语言处理实验室\ /\ 小牛翻译}\\
\noindent \textsc{东北大学自然语言处理实验室\ $\cdot$\ 小牛翻译}\\
\noindent \textsc{\url{https://github.com/NiuTrans/MTBook}}\\
\noindent \textsc{\url{https://github.com/NiuTrans/MTBook}}\\
\noindent {\red{Licensed under the Creative Commons Attribution-NonCommercial 4.0 Unported License (the ``License''). You may not use this file except in compliance with the License. You may obtain a copy of the License at \url{http://creativecommons.org/licenses/by-nc/4.0}. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \textsc{``as is'' basis, without warranties or conditions of any kind}, either express or implied. See the License for the specific language governing permissions and limitations under the License.}}\\
\noindent {\red{Licensed under the Creative Commons Attribution-NonCommercial 4.0 Unported License (the ``License''). You may not use this file except in compliance with the License. You may obtain a copy of the License at \url{http://creativecommons.org/licenses/by-nc/4.0}. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \textsc{``as is'' basis, without warranties or conditions of any kind}, either express or implied. See the License for the specific language governing permissions and limitations under the License.}}\\
\noindent \textit{First Edition, April 2020}
\noindent \textit{\today}
%----------------------------------------------------------------------------------------
% ACKNOWLEDGE PAGE
......@@ -121,14 +121,14 @@
% CHAPTERS
%----------------------------------------------------------------------------------------
\include{Chapter1/chapter1}
\include{Chapter2/chapter2}
\include{Chapter3/chapter3}
\include{Chapter4/chapter4}
\include{Chapter5/chapter5}
\include{Chapter6/chapter6}
\include{Chapter7/chapter7}
\include{ChapterAppend/chapterappend}
%\include{Chapter1/chapter1}
%\include{Chapter2/chapter2}
%\include{Chapter3/chapter3}
%\include{Chapter4/chapter4}
%\include{Chapter5/chapter5}
%\include{Chapter6/chapter6}
%\include{Chapter7/chapter7}
%\include{ChapterAppend/chapterappend}
%----------------------------------------------------------------------------------------
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论