\parinterval 与基于统计的BPE算法不同,基于Word Piece和1-gram Language Model(ULM)的方法则是利用语言模型进行子词词表的构造\upcite{DBLP:journals/corr/abs-1804-10959}。本质上,基于语言模型的方法和基于BPE的方法的思路是一样的,即通过合并字符和子词不断生成新的子词。它们的区别仅在于合并子词的方式不同。基于BPE的方法选择出现频次最高的连续字符2-gram合并为新的子词,而基于语言模型的方法中则是根据语言模型概率选择要合并哪些子词。
abstract = "We participated in the WMT 2018 shared news translation task on English鈫擟hinese language pair. Our systems are based on attentional sequence-to-sequence models with some form of recursion and self-attention. Some data augmentation methods are also introduced to improve the translation performance. The best translation result is obtained with ensemble and reranking techniques. Our Chinese鈫扙nglish system achieved the highest cased BLEU score among all 16 submitted systems, and our English鈫扖hinese system ranked the third out of 18 submitted systems.",
}
@article{DBLP:journals/corr/LeeCH16,
author = {Jason Lee and
Kyunghyun Cho and
Thomas Hofmann},
title = {Fully Character-Level Neural Machine Translation without Explicit