New NMT version based on Tensor2Tensor-1.6.5

Support transformer_dla Support transformer

New NMT version based on Tensor2Tensor-1.6.5
Support transformer_dla Support transformer
fa451fc4 · libei · fa451fc4 · fa451fc4 · fa451fc4 · fa451fc4
Commit fa451fc4 authored Feb 22, 2019 by libei
--- a/LICENSE
+++ b/LICENSE
--- a/README.md
+++ b/README.md
+# T2T: Tensor2Tensor Transformers
+
+[![PyPI
+version](https://badge.fury.io/py/tensor2tensor.svg)](https://badge.fury.io/py/tensor2tensor)
+[![GitHub
+Issues](https://img.shields.io/github/issues/tensorflow/tensor2tensor.svg)](https://github.com/tensorflow/tensor2tensor/issues)
+[![Contributions
+welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CONTRIBUTING.md)
+[![Gitter](https://img.shields.io/gitter/room/nwjs/nw.js.svg)](https://gitter.im/tensor2tensor/Lobby)
+[![License](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://opensource.org/licenses/Apache-2.0)
+
+[T2T](https://github.com/tensorflow/tensor2tensor) is a modular and extensible
+library and binaries for supervised learning with TensorFlow and with support
+for sequence tasks. It is actively used and maintained by researchers and
+engineers within the Google Brain team. You can read more about Tensor2Tensor in
+the recent [Google Research Blog post introducing
+it](https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html).
+
+We're eager to collaborate with you on extending T2T, so please feel
+free to [open an issue on
+GitHub](https://github.com/tensorflow/tensor2tensor/issues) or
+send along a pull request to add your dataset or model.
+See [our contribution
+doc](CONTRIBUTING.md) for details and our [open
+issues](https://github.com/tensorflow/tensor2tensor/issues).
+And chat with us and other users on
+[Gitter](https://gitter.im/tensor2tensor/Lobby).
+
+### Contents
+
+* [Walkthrough](#walkthrough)
+* [Installation](#installation)
+* [Features](#features)
+* [T2T Overview](#t2t-overview)
+  * [Datasets](#datasets)
+  * [Problems and Modalities](#problems-and-modalities)
+  * [Models](#models)
+  * [Hyperparameter Sets](#hyperparameter-sets)
+  * [Trainer](#trainer)
+* [Adding your own components](#adding-your-own-components)
+* [Adding a dataset](#adding-a-dataset)
+
+---
+
+## Walkthrough
+
+Here's a walkthrough training a good English-to-German translation
+model using the Transformer model from [*Attention Is All You
+Need*](https://arxiv.org/abs/1706.03762) on WMT data.
+
+```
+pip install tensor2tensor
+
+# See what problems, models, and hyperparameter sets are available.
+# You can easily swap between them (and add new ones).
+t2t-trainer --registry_help
+
+PROBLEM=wmt_ende_tokens_32k
+MODEL=transformer
+HPARAMS=transformer_base_single_gpu
+
+DATA_DIR=$HOME/t2t_data
+TMP_DIR=/tmp/t2t_datagen
+TRAIN_DIR=$HOME/t2t_train/$PROBLEM/$MODEL-$HPARAMS
+
+mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR
+
+# Generate data
+t2t-datagen \
+  --data_dir=$DATA_DIR \
+  --tmp_dir=$TMP_DIR \
+  --num_shards=100 \
+  --problem=$PROBLEM
+
+cp $TMP_DIR/tokens.vocab.* $DATA_DIR
+
+# Train
+# *  If you run out of memory, add --hparams='batch_size=2048' or even 1024.
+t2t-trainer \
+  --data_dir=$DATA_DIR \
+  --problems=$PROBLEM \
+  --model=$MODEL \
+  --hparams_set=$HPARAMS \
+  --output_dir=$TRAIN_DIR
+
+# Decode
+
+DECODE_FILE=$DATA_DIR/decode_this.txt
+echo "Hello world" >> $DECODE_FILE
+echo "Goodbye world" >> $DECODE_FILE
+
+BEAM_SIZE=4
+ALPHA=0.6
+
+t2t-trainer \
+  --data_dir=$DATA_DIR \
+  --problems=$PROBLEM \
+  --model=$MODEL \
+  --hparams_set=$HPARAMS \
+  --output_dir=$TRAIN_DIR \
+  --train_steps=0 \
+  --eval_steps=0 \
+  --decode_beam_size=$BEAM_SIZE \
+  --decode_alpha=$ALPHA \
+  --decode_from_file=$DECODE_FILE
+
+cat $DECODE_FILE.$MODEL.$HPARAMS.beam$BEAM_SIZE.alpha$ALPHA.decodes
+```
+
+---
+
+## Installation
+
+```
+# Assumes tensorflow or tensorflow-gpu installed
+pip install tensor2tensor
+
+# Installs with tensorflow-gpu requirement
+pip install tensor2tensor[tensorflow_gpu]
+
+# Installs with tensorflow (cpu) requirement
+pip install tensor2tensor[tensorflow]
+```
+
+Binaries:
+
+```
+# Data generator
+t2t-datagen
+
+# Trainer
+t2t-trainer --registry_help
+```
+
+Library usage:
+
+```
+python -c "from tensor2tensor.models.transformer import Transformer"
+```
+
+---
+
+## Features
+
+* Many state of the art and baseline models are built-in and new models can be
+  added easily (open an issue or pull request!).
+* Many datasets across modalities - text, audio, image - available for
+  generation and use, and new ones can be added easily (open an issue or pull
+  request for public datasets!).
+* Models can be used with any dataset and input mode (or even multiple); all
+  modality-specific processing (e.g. embedding lookups for text tokens) is done
+  with `Modality` objects, which are specified per-feature in the dataset/task
+  specification.
+* Support for multi-GPU machines and synchronous (1 master, many workers) and
+  asynchrounous (independent workers synchronizing through a parameter server)
+  distributed training.
+* Easily swap amongst datasets and models by command-line flag with the data
+  generation script `t2t-datagen` and the training script `t2t-trainer`.
+
+---
+
+## T2T overview
+
+### Datasets
+
+**Datasets** are all standardized on `TFRecord` files with `tensorflow.Example`
+protocol buffers. All datasets are registered and generated with the
+[data
+generator](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/bin/t2t-datagen)
+and many common sequence datasets are already available for generation and use.
+
+### Problems and Modalities
+
+**Problems** define training-time hyperparameters for the dataset and task,
+mainly by setting input and output **modalities** (e.g. symbol, image, audio,
+label) and vocabularies, if applicable. All problems are defined in
+[`problem_hparams.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/problem_hparams.py).
+**Modalities**, defined in
+[`modality.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/modality.py),
+abstract away the input and output data types so that **models** may deal with
+modality-independent tensors.
+
+### Models
+
+**`T2TModel`s** define the core tensor-to-tensor transformation, independent of
+input/output modality or task. Models take dense tensors in and produce dense
+tensors that may then be transformed in a final step by a **modality** depending
+on the task (e.g. fed through a final linear transform to produce logits for a
+softmax over classes). All models are imported in
+[`models.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/models/models.py),
+inherit from `T2TModel` - defined in
+[`t2t_model.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/t2t_model.py)
+- and are registered with
+[`@registry.register_model`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/registry.py).
+
+### Hyperparameter Sets
+
+**Hyperparameter sets** are defined and registered in code with
+[`@registry.register_hparams`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/registry.py)
+and are encoded in
+[`tf.contrib.training.HParams`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/training/python/training/hparam.py)
+objects. The `HParams` are available to both the problem specification and the
+model. A basic set of hyperparameters are defined in
+[`common_hparams.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/models/common_hparams.py)
+and hyperparameter set functions can compose other hyperparameter set functions.
+
+### Trainer
+
+The **trainer** binary is the main entrypoint for training, evaluation, and
+inference. Users can easily switch between problems, models, and hyperparameter
+sets by using the `--model`, `--problems`, and `--hparams_set` flags. Specific
+hyperparameters can be overridden with the `--hparams` flag. `--schedule` and
+related flags control local and distributed training/evaluation
+([distributed training documentation](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/docs/distributed_training.md)).
+
+---
+
+## Adding your own components
+
+T2T's components are registered using a central registration mechanism that
+enables easily adding new ones and easily swapping amongst them by command-line
+flag. You can add your own components without editing the T2T codebase by
+specifying the `--t2t_usr_dir` flag in `t2t-trainer`.
+
+You can currently do so for models, hyperparameter sets, and modalities. Please
+do submit a pull request if your component might be useful to others.
+
+Here's an example with a new hyperparameter set:
+
+```python
+# In ~/usr/t2t_usr/my_registrations.py
+
+from tensor2tensor.models import transformer
+from tensor2tensor.utils import registry
+
+@registry.register_hparams
+def transformer_my_very_own_hparams_set():
+  hparams = transformer.transformer_base()
+  hparams.hidden_size = 1024
+  ...
+```
+
+```python
+# In ~/usr/t2t_usr/__init__.py
+from . import my_registrations
+```
+
+```
+t2t-trainer --t2t_usr_dir=~/usr/t2t_usr --registry_help
+```
+
+You'll see under the registered HParams your
+`transformer_my_very_own_hparams_set`, which you can directly use on the command
+line with the `--hparams_set` flag.
+
+## Adding a dataset
+
+See the [data generators
+README](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/README.md).
+
+---
+
+*Note: This is not an official Google product.*
--- a/doc/CWMT-ZH2EN-小结.pptx
+++ b/doc/CWMT-ZH2EN-小结.pptx
--- a/doc/README.deal-test2bpe
+++ b/doc/README.deal-test2bpe
+deal-test2bpe.sh
+	使用教程
+
+需要指定的参数：
+	eval_dir=		# 指定测试集文件夹的路径(注意路径名后面要带 /)，比如   eval_dir=*****/eval/
+					# eval文件夹下是mt06、mt08等测试集所在的文件夹
+					
+	src_bpe=		# 指定源语言端的bpe词表路径，比如 中-->英方向的翻译，指定 src_bpe=****/zh.bpe
+	
+	input=input.token 	# 该变量指定了每个测试集源语言段的命名规则，注意测试源文件与该关键字一致，一般不改
+	
+	output=input.bpe  	# 该变量指定了每个测试集源语言段文件生成bpe文件的命名规则，注意测试源文件与该关键字一致，一般不改
+	
+	PYTHON=python3.6  	# 该变量制定了python的版本，虚拟环境可以设置为python
+	
+	APPLY_BPE=./subword-nmt-master/apply_bpe.py		# 该变量指定了apply_bpe.py文件所在的路径
+
+运行方式：
+	sh deal-test2bpe.sh
\ No newline at end of file
--- a/doc/usage.md
+++ b/doc/usage.md
+<!-- toc -->
+---
+
+------
+
+
+
+#环境安装
+## 1. 升级到python3.6
+**如果已经是3.6就不用升啦**
+```bash
+sudo yum update
+sudo yum install yum-utils
+sudo yum groupinstall development
+sudo yum install https://centos7.iuscommunity.org/ius-release.rpm
+sudo yum install python36u
+python3.6 -V
+sudo yum install python36u-pip
+sudo yum install python36u-devel
+```
+## 2. 安装splite3 (仅限阿里云)
+启动`tensorboard`时需要，否则报错：`No module named _sqlite3`
+```bash
+wget https://www.sqlite.org/2017/sqlite-autoconf-3170000.tar.gz --no-check-certificate
+tar zxvf sqlite-autoconf-3170000.tar.gz
+cd sqlite-autoconf-3170000
+./configure --prefix=/usr/local/sqlite3 --disable-static --enable-fts5 --enable-json1 CFLAGS="-g -O2 -DSQLITE_ENABLE_FTS3=1 -DSQLITE_ENABLE_FTS4=1 -DSQLITE_ENABLE_RTREE=1"
+```
+重新编译`python3.6`
+```bash
+wget https://www.python.org/ftp/python/3.6.4/Python-3.6.4.tgz (阿里云速度慢，可以直接windows上下载，上传到阿里云上)
+cd Python-3.6.4
+LD_RUN_PATH=/usr/local/sqlite3/lib ./configure LDFLAGS="-L/usr/local/sqlite3/lib" CPPFLAGS="-I /usr/local/sqlite3/include"
+LD_RUN_PATH=/usr/local/sqlite3/lib make
+LD_RUN_PATH=/usr/local/sqlite3/lib sudo make install
+```
+
+## 3. 创建虚拟环境
+```bash
+python3.6 -m venv 'your env name'
+```
+e.g.:`python3.6 -m venv env-cwmt`,会在当前目录下生成`env-cwmt`目录
+
+## 4. 激活虚拟环境
+```bash
+source 'your env name'/bin/activate
+```
+e.g.:`source env-cwmt/bin/activate`，此时你的命令行提示符变成了`(env-cwmt)***`
+## 5. 设置pip源
+
+将镜像源修改至清华
+
+```shell
+mkdir ~/.pip
+cd ~/.pip
+vi pip.conf
+[global]
+index‐url = https://pypi.tuna.tsinghua.edu.cn/simple
+[install]
+trusted‐host=mirrors.aliyun.com
+:wq
+```
+
+## 6. 安装tensorflow
+
+```bash
+pip install -i  https://pypi.tuna.tsinghua.edu.cn/simple tensorflow-gpu==1.3.0
+```
+如果是阿里云服务器，设置清华大学镜像，速度快；`tensorflow`使用`1.3.0`版本
+
+## 7. 安装其他package
+```bash 
+pip install -i  https://pypi.tuna.tsinghua.edu.cn/simple sympy
+```
+---
+
+#目录结构
+
+* bin/
+
+  包括：模型训练，解码，打分*
+> 训练脚本：`train.sh`
+>
+> 解码脚本：`decoder.sh`， `translate_dataset.sh`
+>
+> 打分脚本*：**todo**
+* data/
+
+  使用的数据集，目录结构是：`语言方向`/`版本`-`切分方式`-`数据集`，e.g. `zh2en/v4-bpe32k-cwmt`表示`v4`版本，切分方式是`bpe32k`，数据集是`cwmt`
+> `train` 目录是`tgt-gen.py`生成的`tfRecord`文件，以及bpe结果
+> `eval` 目录包括校验集和所有的测试集，每一个评价集合文件内，包括：`input.bpe`(源语输入的bpe结果)，`input.token`（源语输入的分词结果，实际并没有使用），`ref*`（单个/多个references）
+* output/
+
+  存放生成的模型，目录结构是：`语言方向`/`版本`-`切分方式`-`数据集`/`tag`
+> 如果使用过ensemble解码，还会生成存在ensemble model的目录，e.g. `ensemble15`
+* tensor2tensor/
+
+  核心代码
+
+
+* doc/
+
+  使用文档，或者实验记录
+---
+
+#训练流程
+
+##配置介绍
+
+编辑`bin/train.sh`，进行配置
+
+###硬件
+
+- `dev`表示使用的gpu设备，例如`dev=0,1,2,3`
+- `gpu_fraction`表示每个gpu占用的百分比，e.g. `gpu_fraction=0.95`，通常不用改
+
+###数据集
+
+- `lang`表示当前翻译方向，例如`lang=zh2en`
+- `datatype`表示使用数据的类型，例如`datatype=v4-bpe32k`
+- `dataset`表示使用的训练数据集，例如`dataset=cwmt`
+
+###训练参数
+
+- `model`表示使用的已注册模型，e.g. `model=transformer`
+- `param`表示使用的已注册参数, e.g. `param=transformer_base`
+- `train_step`表示当前模型要更新的次数, e.g. `train_step=103000`,这是使用`cwmt700w`数据更近近似`10epoch`的次数
+- `other_hparams`表示临时改变的训练参数，常见的用法是调整batch_size大小，e.g. `other_hparams='batch_size=2048'`，否则你必须在代码里注册相关的训练参数
+- `tag`表示当前跑的实验的名字，e.g. `tag=baseline-epoch20`，这个名字是方便你记录的。**每次跑不同的实验都应该改**
+
+## 使用方法
+
+更新设置后，直接运行`./train.sh`。该脚本会开始训练，并且会在`cpu`上进行多卡的自动校验。
+此时，你打开`tensorboard`，观察相应曲线即可
+
+----
+
+#解码流程
+
+##配置介绍
+
+###硬件
+
+* `device`表示使用的gpu设备，例如`device=(0 1 2 3)`
+
+  * > 注意，这里的`device`是`shell`中的数组，不是`train.sh`里的字符串，多个值之间用`空格`隔开
+    >
+    > `device`可以设置多卡，虽然每一个解码程序只能用一个卡，但是我们可以用多卡并行跑不同的实验
+
+###评价
+
+* `is_eval`表示是否进行评价（e.g. 跑`BLEU`），`1`表示解码+评价，`0`表示只解码
+
+  * > 本项目中设置为`1`
+
+* `eval_tool`表示使用的评价工具，可选`multi-bleu`和`mteval`
+
+  * > 本项目中设置`multi-bleu`
+
+* `lowercase`表示评价的时候，是否全小写化，即大小写不敏感，`1`表示大小写不敏感，`0`表示大小写敏感
+
+  * > 此选项目前只使用于`multi-bleu`，本项目中设置为`0`
+
+###数据集/词汇表
+
+* `lang`表示当前翻译方向，例如`lang=zh2en`
+
+* `datatype`表示使用数据的类型，例如`datatype=v4-bpe32k`
+
+* `dataset`表示使用的训练数据集，例如`dataset=cwmt`
+
+* `evalset`表示要被评价的测试集
+
+  * > 此选项是`shell`的数组，可以设置多个值，用`空格`隔开
+###翻译模型
+
+* `model`表示使用的已注册模型，e.g. `model=transformer`
+* `param`表示使用的已注册参数, e.g. `param=transformer_base`
+* ``tag`表示该模型对应的实验名称，e.g. `tag=baseline`，**注意修改**
+
+### 解码参数
+
+* `beam_size`表示`beam`大小，通常是`12`
+
+* `batch_size`表示解码时一个`batch`的大小，通常是`32`，如果出现`OOM`错误，可以降低该值
+
+* `alphas`表示长度归一化`(Layer Normalization, LN)`的系数
+
+  * > 此选项是`shell`的数组，可以设置多个值，程序能够自动进行`grid search`，寻找最优`alpha`
+
+* `ensemble`表示用来`checkpoint average`的数量
+
+  * > 如果使用单模型，不要设置这个参数值，空着就行
+    >
+    > 如果想使用ensemble, 例如最后`15`个模型平均，则设置此选项为`15`即可
+
+## 使用方法
+
+解码流程主要包括4个步骤：
+
+* 使用单模型，在`校验集`上搜索最好的超参数`alpha`
+* 使用单模型，用最好的`alpha`跑所有的`测试集`
+* 使用`ensemble`模型，在`校验集`上搜索最好的超参数`alpha`
+* 使用`ensemble`模型，用最好的`alpha`跑所有的`测试集`
+
+> `ensemble`模型的超参数`alpha`不一定每次都需要搜索，或者搜索范围可以小一点（一般跟单模型`alpha`差不多，不会相差特别悬殊)
+>
+> 可以根据最后解码报告的`bp`和`ratio`值来判断是否应该再调整`alpha`
+>
+> 每次跑的多个解码结果，会在程序结束后汇总在一起输出
+
+假设你刚使用`4卡(0,1,2,3)`训练完模型，设置`数据集`&`翻译模型`的相关参数（类似`train.sh`)
+
+###1. 单模型，搜索超参数
+
+* 设置解码用的gpu，能多用就多用
+
+  ```shell
+  device=(0 1 2 3)
+  ```
+
+* 设置校验集
+
+  ```shell
+  evalset=(cwmt18-dev)
+  ```
+
+* 设置要grid search的超参数
+
+  ```shell
+  alphas=(0.9 1.0 1.1 1.2 1.3)
+  ```
+
+  这里使用了5个值，可以比gpu的卡数多，程序会自动调度
+
+* 设置ensemble为空
+
+  ```shell
+  ensemble=
+  ```
+
+###2. 单模型，跑测试集
+
+- 设置解码用的gpu，能多用就多用
+
+  ```shell
+  device=(0 1 2 3)
+  ```
+
+- 设置测试集，一共8个
+
+  ```shell
+  evalset=(cwmt17-dev wmt17-test mt06 mt08 mt12-nd mt12-nw mt12-wb exact2k)
+  ```
+
+- 设置超参数，根据上一步的最优值，假设是`1.2`
+
+  ```shell
+  alphas=(1.2)
+  ```
+
+- 设置ensemble为空
+
+  ```shell
+  ensemble=
+  ```
+
+### 3. ensemble模型，搜索超参数
+
+- 设置解码用的gpu，能多用就多用
+
+  ```shell
+  device=(0 1 2 3)
+  ```
+
+- 设置校验集
+
+  ```shell
+  evalset=(cwmt18-dev)
+  ```
+
+- 设置要grid search的超参数，**不用像单模型那么多，在单模型最优值附近搜索就行**
+
+  ```shell
+  alphas=(1.1 1.2 1.3)
+  ```
+
+- 设置ensemble，假设使用`15`个模型来平均
+
+  ```shell
+  ensemble=15
+  ```
+
+### 4. ensemble模型，跑测试集
+
+- 设置解码用的gpu，能多用就多用
+
+  ```shell
+  device=(0 1 2 3)
+  ```
+
+- 设置测试集，一共8个
+
+  ```shell
+  evalset=(cwmt17-dev wmt17-test mt06 mt08 mt12-nd mt12-nw mt12-wb exact2k)
+  ```
+
+- 设置超参数，根据上一步的最优值，假设是`1.2`
+
+  ```shell
+  alphas=(1.2)
+  ```
+
+- 设置ensemble
+
+  ```shell
+  ensemble=15
+  ```
+
+### 
\ No newline at end of file
--- a/doc/usage.pdf
+++ b/doc/usage.pdf
--- a/lib/libcudart.so.9.0
+++ b/lib/libcudart.so.9.0
--- a/lib/libiomp5.so
+++ b/lib/libiomp5.so
--- a/lib/libstdc++.so.6
+++ b/lib/libstdc++.so.6
--- a/requirement.txt
+++ b/requirement.txt
+tensorflow-gpu==1.6
+requests
+scipy
+sympy
+h5py
+gym
\ No newline at end of file
--- a/script/NiuTrans-generate-xml-for-mteval.pl
+++ b/script/NiuTrans-generate-xml-for-mteval.pl
--- a/script/cal-batch-num.py
+++ b/script/cal-batch-num.py
+#encoding=utf-8
+import os
+import sys
+
+def calBatchNum(srcfile, dstfile, batchsize):
+    ret = True
+    try:
+        srcfd = open(srcfile,'r')
+    except IOError:
+        print ('srcfile does not exist!')
+    try:
+        dstfd = open(dstfile,'r')
+    except IOError:
+        print ('dstfile does not exist!')
+
+    maxlist=[]
+    with open(srcfile, encoding='utf-8') as srclines, open(dstfile) as dstlines:
+        for srcline, dstline in zip(srclines, dstlines):
+            srclinelist = srcline.split(' ')
+            srclinenum = len(srclinelist)
+            dstlinelist = dstline.split(' ')
+            dstlinenum = len(dstlinelist)
+            maxlist.append(max(srclinenum, dstlinenum))
+
+    batchnum = 1
+    batchroom = batchsize
+    for i in range(0, len(maxlist)):
+        batchroom = batchroom - maxlist[i]
+        if batchroom < 0:
+            batchnum = batchnum + 1
+            batchroom = batchsize - maxlist[i]
+            if batchroom < 0:
+                print('can not make room for this sentence 0', i)
+                return
+
+    print('total batch number is', batchnum)
+
+    srcfd.close()
+    dstfd.close()
+    return ret
+if __name__ == "__main__":
+    srcfile = ''
+    dstfile = ''
+    batchsizestr = ''
+    batchsize = 0
+    if len(sys.argv) == 4:
+        srcfile = sys.argv[1]
+        dstfile = sys.argv[2]
+        batchsizestr = sys.argv[3]
+        batchsize = int(batchsizestr)
+    else:
+        errorInfo = '****************** Error ******************\r\n'
+        errorInfo = errorInfo + 'Please input with srcfile path, dstfile path and batchsize\r\n'
+        errorInfo = errorInfo + 'e.g.:$ python ./calBatchNum.py srtfile dstfile batchsize\r\n'
+        print (errorInfo)
+        exit(1)
+    calBatchNum(srcfile, dstfile, batchsize)
--- a/script/conver_to_online_model_wmt19.py
+++ b/script/conver_to_online_model_wmt19.py
--- a/script/convert_to_online_model.py
+++ b/script/convert_to_online_model.py
--- a/script/deal-test2bpe.sh
+++ b/script/deal-test2bpe.sh
+#! /usr/bin/bash
+set -e
+##################################SET PARAMS#######################################
+# eval目录, e.g. ../data/zh2en/v4-bpe32k-cwmt18/eval/
+eval_dir=./eval/
+# 源语端BPE词表路径, e.g. ../data/zh2en/v4-bpe32k-cwmt18/eval/src.bpe
+src_bpe=./bpe0.5/zh.bpe
+# 源语端命名关键字,规定测试集源语言端命名需带input.token, 对应bpe输出input.bpe
+input=input.token
+output=input.bpe
+# python的版本
+PYTHON=python
+#apply_bpe程序路径
+APPLY_BPE=./subword-nmt-master/apply_bpe.py
+################################SET PARAMS#########################################
+
+if [ ! -d "$eval_dir" ]; then
+	echo "$eval_dir is not exists."
+	exit 1
+fi
+
+echo "######## START RUN ########"
+for file in $eval_dir/* ;do
+{
+	if [ -d $file ]; then
+		flag1=1
+		for sub_file in $file/*;do
+		{
+			if [ -f $sub_file ]; then
+				if [[ $sub_file =~ $output ]]; then
+					flag1=2
+					echo " HAS EXISTS , PROCESS FILE FOLDER : $file "
+					break
+				fi
+			fi	
+		}
+		done			
+                if [ "$flag1" != "2"  ]; then
+			flag2=3
+			for sub_file in $file/*;do
+			{
+				if [ -f $sub_file ]; then
+					if [[ $sub_file =~ $input ]]; then
+                                        	sub_file_bpe=${sub_file/%"token"/"bpe"} 
+                                        	#echo $sub_file_bpe
+						cmd="$PYTHON $APPLY_BPE -c $src_bpe   -i $sub_file -o $sub_file_bpe"
+                                        	$cmd
+                                        	echo " CREATE SUCCESSFUL , PROCESS FILE FOLDER : $file "
+						flag2=4
+					fi
+				fi
+			}
+			done
+			if [ "$flag2" != "4" ]; then
+				echo "WARNING: make sure source filename in the $file contains key, $key "
+
+			fi
+
+		fi
+	fi
+}
+done
+echo "######## END OF PROGRAM ########"
--- a/script/detoken/.gitignore
+++ b/script/detoken/.gitignore
+.idea/
--- a/script/detoken/bin/Detoken
+++ b/script/detoken/bin/Detoken
--- a/script/detoken/bin/PostProcessing.Conf
+++ b/script/detoken/bin/PostProcessing.Conf
+###########################################
+###     configuration file for SMT      ###
+###                                     ###
+###              2013-04-19             ###
+###########################################
+
+# punct mapping dictionary for detoken
+param="Punct-Mapping-Dict"      value="./punctuation.mapping.dat"
+
+# system log path
+param="system-log"              value="./system.detoken.log"
+
--- a/script/detoken/bin/o
+++ b/script/detoken/bin/o
--- a/script/detoken/bin/punctuation.mapping.dat
+++ b/script/detoken/bin/punctuation.mapping.dat
+,	，	NULL
+.	。	NULL
+(	（	NULL
+)	）	NULL
+;	；	NULL
+!	！	NULL
+?	？	NULL
+'	‘	’
+"	“	”
--- a/script/detoken/bin/readme.gitignore
+++ b/script/detoken/bin/readme.gitignore
+我存在得意义是为了提交空文件夹。
+如有你需要占用这个文件夹就把我替换掉吧，否则请无视我吧。
+要是你还觉得文件夹不够用，那就自己去新建吧~
--- a/script/detoken/bin/system.detoken.log
+++ b/script/detoken/bin/system.detoken.log
--- a/script/detoken/config/readme.gitignore
+++ b/script/detoken/config/readme.gitignore
+我存在得意义是为了提交空文件夹。
+如有你需要占用这个文件夹就把我替换掉吧，否则请无视我吧。
+要是你还觉得文件夹不够用，那就自己去新建吧~
--- a/script/detoken/lib/libPostProcessing.so
+++ b/script/detoken/lib/libPostProcessing.so
--- a/script/detoken/lib/readme.gitignore
+++ b/script/detoken/lib/readme.gitignore
+我存在得意义是为了提交空文件夹。
+如有你需要占用这个文件夹就把我替换掉吧，否则请无视我吧。
+要是你还觉得文件夹不够用，那就自己去新建吧~
--- a/script/detoken/logs/readme.gitignore
+++ b/script/detoken/logs/readme.gitignore
+我存在得意义是为了提交空文件夹。
+如有你需要占用这个文件夹就把我替换掉吧，否则请无视我吧。
+要是你还觉得文件夹不够用，那就自己去新建吧~
--- a/script/detoken/logs/system.detoken.log
+++ b/script/detoken/logs/system.detoken.log
--- a/script/detoken/resource/readme.gitignore
+++ b/script/detoken/resource/readme.gitignore
+我存在得意义是为了提交空文件夹。
+如有你需要占用这个文件夹就把我替换掉吧，否则请无视我吧。
+要是你还觉得文件夹不够用，那就自己去新建吧~
--- a/script/detoken/scripts/readme.gitignore
+++ b/script/detoken/scripts/readme.gitignore
+我存在得意义是为了提交空文件夹。
+如有你需要占用这个文件夹就把我替换掉吧，否则请无视我吧。
+要是你还觉得文件夹不够用，那就自己去新建吧~
--- a/script/detoken/src/basic_method.cpp
+++ b/script/detoken/src/basic_method.cpp
+/*
+* $Id:
+* 0008
+*
+* $File:
+* basic_method.cpp
+*
+* $Proj:
+* Decoder for Statistical Machine Translation
+*
+* $Func:
+* basic method
+*
+* $Version:
+* 0.0.1
+*
+* $Created by:
+* Qiang Li
+*
+* $Email
+* liqiangneu@gmail.com
+*
+* $Last Modified by:
+* 2014-04-11,16:56
+* 2012-12-04,16:29
+*/
+
+
+#include "basic_method.h"
+
+
+namespace basic_method {
+
+/*
+ * $Name: Split
+ * $Function: Split string with char
+ * $Date: 2014-04-11
+ */
+bool BasicMethod::Split(const string &phraseTable, const char &splitchar, vector< string > &dest) {
+  string::size_type splitPos = phraseTable.find(splitchar);
+  string::size_type lastSplitPos = 0;
+  string tempString;
+  while (splitPos != string::npos) {
+    tempString = phraseTable.substr(lastSplitPos, splitPos - lastSplitPos);
+    if (!tempString.empty()) {
+      dest.push_back(tempString);
+    }
+    lastSplitPos = splitPos + 1;
+    splitPos = phraseTable.find(splitchar, lastSplitPos);
+  }
+  if (lastSplitPos < phraseTable.size()) {
+    tempString = phraseTable.substr(lastSplitPos);
+    dest.push_back(tempString);
+  }
+  if (!dest.empty()) {
+    return true;
+  } else {
+    return false;
+  }
+}
+
+
+/*
+ * $Name: splitWithStr
+ * $Function: Split string with string
+ * $Date: 2014-04-11
+ */
+bool BasicMethod::SplitWithStr(const string &src, const string &separator, vector< string > &dest) {
+  string str = src;
+  string substring;
+  string::size_type start = 0, index = 0;
+  string::size_type separator_len = separator.size();
+  while (index != string::npos && start < src.size()) {
+    index = src.find(separator, start);
+    if (index == 0) {
+      start = start + separator_len;
+      continue;
+    }
+    if (index == string::npos) {
+      dest.push_back(src.substr(start));
+      break;
+    }
+    dest.push_back(src.substr(start,index-start));
+    start = index + separator_len;
+  }
+  return true;
+}
+
+
+bool BasicMethod::Replace_String(string & original , const string & source_str , const string & target_str)
+{
+	string::size_type pos = 0;
+	string::size_type src_len = source_str.size();
+	string::size_type tgt_len = target_str.size();
+	while( (pos = original.find(source_str, pos)) != string::npos)
+	{
+		original.replace(pos, src_len, target_str);
+		pos += tgt_len;
+	}
+	return true;
+}
+
+int BasicMethod::Get_Word_Count(string & input , char sep)
+{
+	RmEndSpace(input);
+	int word_count = 0;
+	string::size_type split_pos = input.find(sep);
+	string::size_type last_split_pos = 0;
+	
+	while (split_pos != string::npos)
+	{
+		word_count++;
+		last_split_pos = split_pos + 1;
+		split_pos = input.find(sep, last_split_pos);
+	}
+
+	return ++word_count;
+}
+
+
+/*
+ * $Name: size_tToString
+ * $Function: 
+ * $Date: 2014-04-11
+ */
+string BasicMethod::size_tToString(size_t &source) {
+  stringstream oss;
+  oss << source;
+  return oss.str();
+}
+
+
+/*
+ * $Name: intToString
+ * $Function: 
+ * $Date: 2014-04-11
+ */
+string BasicMethod::intToString(int &source) {
+  stringstream oss;
+  oss << source;
+  return oss.str();
+}
+
+
+/*
+ * $Name: ConvertCharToString
+ * $Function: 
+ * $Date: 2014-04-11
+ */
+string BasicMethod::ConvertCharToString( char &input_char ) {
+  stringstream oss;
+  oss << input_char;
+  return oss.str();
+}
+
+
+/*
+ * $Name: ClearIllegalChar
+ * $Function: 
+ * $Date: 2014-04-11
+ */
+bool BasicMethod::ClearIllegalChar( string &str ) {
+  string::size_type pos = 0;
+  while( ( pos = str.find( "\r", pos ) ) != string::npos ) {
+    str.replace( pos, 1, "" );
+  }
+
+  pos = 0;
+  while( ( pos = str.find( "\n", pos ) ) != string::npos ) {
+    str.replace( pos, 1, "" );
+  }
+  return true;
+}
+
+
+/*
+ * $Name: toUpper
+ * $Function: 
+ * $Date: 2014-04-11
+ */
+bool BasicMethod::toUpper( string &str ) {
+  for( string::size_type i = 0; i < str.size(); ++i ) {
+    if( islower( ( unsigned char )str.at( i ) ) ) {
+      str.at( i ) = toupper( ( unsigned char )str.at( i ) );
+    }
+  }
+
+  return true;
+}
+
+
+/*
+ * $Name: toLower
+ * $Function: 
+ * $Date: 2014-04-11
+ */
+bool BasicMethod::ToLower(string &str) {
+  for (string::size_type i = 0; i < str.size(); ++i) {
+    if (isupper((unsigned char)str.at(i))) {
+      str.at(i) = tolower((unsigned char)str.at(i));
+    }
+  }
+  return true;
+}
+
+
+/*
+ * $Name: RmEndSpace
+ * $Function: no trailing space
+ * $Date: 2014-04-11
+ */
+bool BasicMethod::RmEndSpace(string &str) {
+  if (str != "") {
+    string tmpStr;
+    int pos = (int)str.length() - 1;
+    while (pos >= 0 && str[ pos ] == ' ') {
+      --pos;
+    }
+    tmpStr = str.substr(0, pos + 1);
+    str = tmpStr;
+  }
+  return true;
+}
+
+
+/*
+ * $Name: RmStartSpace
+ * $Function: no leading space
+ * $Date: 2014-04-11
+ */
+bool BasicMethod::RmStartSpace(string &str) {
+  string tmpStr;
+  size_t pos = 0;
+  for( string::iterator iter = str.begin(); iter != str.end(); ++iter ) {
+    if( *iter != ' ' ) {
+      tmpStr = str.substr( pos, str.length() - pos );
+      break;
+    } else {
+      ++pos;
+    }
+  }
+  str = tmpStr;
+  return true;
+}
+
+/*
+* $Name: RemoveExtractSpace
+* $Function: One space only between words
+* $Date: 2014-04-11
+*/
+bool BasicMethod::RemoveExtraSpace( string &input_string, string &output_string ) {
+  char preceded_char = ' ';
+  for( string::iterator iter = input_string.begin(); iter != input_string.end(); ++ iter ) {
+    if( *iter == ' ' && preceded_char == ' ' ) {
+      continue;
+    } else {
+      output_string.push_back( *iter );
+      preceded_char = *iter;
+    }
+  }
+  return true;
+}
+
+/*
+ * $Name: deleteFileList
+ * $Function: 
+ * $Date: 2014-04-11
+ */
+bool BasicMethod::deleteFileList( vector<string> &fileList, SystemCommand &systemCommand ) {
+  clock_t start,finish;
+  string command;
+
+  for( vector< string >::iterator iter = fileList.begin(); iter != fileList.end(); ++iter ) {
+    command = systemCommand.delete_command_ + *iter;
+    start = clock();
+    cerr<<"Delete\n"
+        <<"        command : "<<command<<"\n"
+        <<"        input   : "<<*iter<<"\n"
+        <<flush;
+
+    system( command.c_str() );
+    finish = clock();
+    cerr<<"        time    : "<<(double)(finish-start)/CLOCKS_PER_SEC<<"s\n"
+        <<flush;
+  }
+
+  fileList.clear();
+  return true;
+}
+
+}
+
--- a/script/detoken/src/basic_method.h
+++ b/script/detoken/src/basic_method.h
+/****************************************************************************
+* Project Name : NiuTrans Server Decoder
+* File Name    : basic_method.h
+* Author       : Wang Qiang
+* Email        : wangqiang@zjyatuo.com
+* Create Time  : 2016/1/15 11:11:35
+* Copyright    : Copyright (c) 2016 Shenyang YaTrans Network Technology Co., Ltd. All Rights Reserved.
+*
+* basic toolkit 
+*
+****************************************************************************/
+
+
+
+#ifndef DECODER_BASIC_METHOD_H_
+#define DECODER_BASIC_METHOD_H_
+
+#include <iostream>
+#include <string>
+#include <vector>
+#include <fstream>
+#include <sstream>
+#include <set>
+#include <cstdio>
+#include <cstdlib>
+#include <cctype>
+#include <ctime>
+using namespace std;
+
+
+namespace basic_method
+{
+
+	class SystemCommand
+	{
+	public:
+		string sort_file_;
+		string delete_command_;
+
+	public:
+		SystemCommand(string &newSortFile, string &newDel) : sort_file_(newSortFile), delete_command_(newDel) {}
+	};
+
+
+	class BasicMethod
+	{
+	public:
+		typedef string::size_type STRPOS;
+
+	public:
+		bool Split(const string &phraseTable, const char &splitchar, vector< string > &dest);
+
+	public:
+		bool SplitWithStr(const string &src, const string &separator, vector< string > &dest);
+
+	public:
+		bool deleteFileList(vector< string > &fileList, SystemCommand &systemCommand);
+
+	public:
+		bool Replace_String(string & original , const string & source_str , const string & target_str);
+		int	Get_Word_Count(string & input , char sep);
+	
+
+	public:
+		string size_tToString(size_t &source);
+		string intToString(int &source);
+		string ConvertCharToString(char &input_char);
+
+	public:
+		bool ClearIllegalChar(string &str);
+
+	public:
+		bool toUpper(string &str);
+		bool ToLower(string &str);
+
+	public:
+		bool RmEndSpace(string &str);
+		bool RmStartSpace(string &str);
+
+	public:
+		bool RemoveExtraSpace(string &input_string, string &output_string);
+
+	};
+
+}
+
+#endif
+
--- a/script/detoken/src/detokenizer.cpp
+++ b/script/detoken/src/detokenizer.cpp
--- a/script/detoken/src/detokenizer.h
+++ b/script/detoken/src/detokenizer.h
+/*
+* $Id:
+* 0033
+*
+* $File:
+* detokenizer.h
+*
+* $Proj:
+* Detokenizer for Statistical Machine Translation
+*
+* $Func:
+* detokenizer
+*
+* $Version:
+* 0.0.1
+*
+* $Created by:
+* Qiang Li
+*
+* $Email
+* liqiangneu@gmail.com
+*
+* $Last Modified by:
+* 2013-03-17,20:16
+*/
+
+#ifndef DECODER_DETOKENIZER_H_
+#define DECODER_DETOKENIZER_H_
+
+#include <iostream>
+#include <iomanip>
+#include <map>
+#include <utility>
+#include <string>
+#include <cctype>
+#include <ctime>
+#include "basic_method.h"
+
+#ifndef WIN32
+#include <sys/time.h>
+#endif
+
+using namespace std;
+using namespace basic_method;
+
+namespace decoder_detokenizer
+{
+
+class PunctuationMap: public BasicMethod{
+ public:
+  map< string, pair< string, string > > punctuation_dictionary_;
+  // English punctuation set, just for e2c translation
+  set< string > punctuation_set;
+
+ public:
+  bool LoadPunctuation( string &punctuation_file );
+};
+
+class Detokenizer: public BasicMethod {
+ public:
+  string language_;
+  string input_file_;
+  string output_file_;
+
+ public:
+  Detokenizer(){}
+  ~Detokenizer(){}
+
+ public:
+  // for offline service
+  bool DetokenizerEn( map< string, string > &parameters );
+  bool DetokenizerZh( map< string, string > &parameters, PunctuationMap &punctuation_map );
+
+ public:
+  // for online service
+  bool DetokenizerEn( string &input_sentence, string &output_sentence );
+  bool DetokenizerZh( PunctuationMap &punctuation_map, string &input_sentence, string &output_sentence );
+
+ private:
+  bool ReplaceSpecChars( string &str );
+  bool DetokenEnStart( string &str );
+  bool DetokenZhStart( PunctuationMap &punctuation_map, string &str );
+  bool isAbbreviation( string &str );
+  bool isAlphaAndNumber( char character );
+  bool isLeftDelimiter( string &str );
+  bool isRightDelimiter( string &str );
+  bool isQuotMarks( string &str );
+  bool isHyphen( string &str );
+  bool isSpaces( string &str );
+  bool isDefinedMark( string &str );
+  bool CheckFilesInConf( map< string, string > &param );
+  bool CheckFileInConf( map< string, string > &param, string &fileKey );
+  bool PrintConfig();
+  
+
+ public:
+  static bool PrintDetokenizerLogo();
+ 
+};
+
+}
+
+#endif
+
+
--- a/script/detoken/src/interface.cpp
+++ b/script/detoken/src/interface.cpp
+/*
+* $Id:
+* 0004
+*
+* $File:
+* interface.cpp
+*
+* $Proj:
+* DetokenLib for Statistical Machine Translation
+*
+* $Func:
+* interface
+*
+* $Version:
+* 0.0.1
+*
+* $Created by:
+* Qiang Li
+*
+* $Email
+* liqiangneu@gmail.com
+*
+* $Last Modified by:
+* 2014-01-12,20:45,
+* 2014-01-10,13:14,
+*/
+
+
+#include "interface.h"
+using namespace detoken_interface;
+
+
+bool ParseParameterInConfig( string config, map< string, string > &param );
+bool CheckFileInConf( map< string, string > &param, string &fileKey );
+bool PrintDetokenLogo();
+
+
+/*
+* $Name: __Init
+* $Funtion: Init for Detokenizer
+* $Date: 2014-10-10
+*/
+void*  __init(const char* config)
+{
+	PrintDetokenLogo();
+	cerr<<"Parameters:\n"<<flush;
+	cerr<<"  config_file_      :  "<<config<<"\n"<<flush;
+	map< string, string > parameters_for_config;
+	ParseParameterInConfig( config, parameters_for_config );
+
+	DetokenInterface* interf = new DetokenInterface();
+
+#ifdef SUPPORT_ONLINE_SERVICE_EC_
+  string key = "Punct-Mapping-Dict";
+  CheckFileInConf(parameters_for_config, key);
+  interf->punct_mapping_dict_file_name_ = parameters_for_config[key];
+#endif
+
+	if( parameters_for_config.find( "system-log" ) == parameters_for_config.end() )
+	{
+		cerr<<"[Error] Please add parameter 'system-log' in your config file.\n"<<flush;
+		exit( 1 );
+	}
+	interf->system_log_file_name_ = parameters_for_config[ "system-log" ];
+	interf->system_log_.open(interf->system_log_file_name_.c_str(), ios::app);
+	if (!interf->system_log_)
+	{
+		cerr<<"ERROR: Please check the log path of \""<<interf->system_log_file_name_<<"\".\n"<<flush;
+		exit( 1 );
+	}
+
+#ifdef SUPPORT_ONLINE_SERVICE_EC_
+  interf->punctuation_map_.LoadPunctuation(parameters_for_config["Punct-Mapping-Dict"]);
+#endif
+
+  return ( void* ) interf;
+}
+
+
+/*
+* $Name: __reload
+* $Funtion: reload model for translation memory
+* $Date: 2014-01-12
+*/
+void __reload ( void* class_handle ) {
+  cerr<<"Reload Detokn...\n"<<flush;
+#ifdef SUPPORT_ONLINE_SERVICE_EC_
+  DetokenInterface* interf = (DetokenInterface*) class_handle;
+  interf->punctuation_map_.punctuation_dictionary_.clear();
+  interf->punctuation_map_.LoadPunctuation(interf->punct_mapping_dict_file_name_);
+#endif
+  return;
+}
+
+
+/*
+* $Name: __do_job
+* $Funtion: do job for translation memory
+* $Date: 2014-01-12
+*/
+//char* __do_job( void* class_handle, const char* msg_text, int print_log, const char* log_head ) 
+char*  __do_job(void* class_handle, const char* msg_text, const char* decoder_input, int sent_init, int print_log, const char* log_head)
+{
+#ifndef WIN32
+  timeval start_time, end_time;
+  gettimeofday( &start_time, NULL );
+
+  clock_t start_time_clock = clock();
+  clock_t end_time_clock = 0;
+#else
+  clock_t start_time = clock();
+  clock_t end_time = 0;
+#endif
+
+  cerr<<"Detokenizer...";
+  DetokenInterface* interf = (DetokenInterface*) class_handle;
+  string sentence(msg_text);
+  string final_translation_result;
+
+#ifdef SUPPORT_ONLINE_SERVICE_CE_
+  Detokenizer detokenizer_handle;
+  detokenizer_handle.DetokenizerEn(sentence, final_translation_result);
+#elif defined SUPPORT_ONLINE_SERVICE_EC_
+  Detokenizer detokenizer_handle;
+  detokenizer_handle.DetokenizerZh(interf->punctuation_map_, sentence, final_translation_result);
+#endif
+
+#ifdef WIN32
+  char* msg_res = new char[ final_translation_result.size() + 1 ];
+  strcpy_s( msg_res, final_translation_result.size() + 1, final_translation_result.c_str() );
+#else
+  char* msg_res = new char[ final_translation_result.size() + 1 ];
+  strncpy( msg_res, final_translation_result.c_str(), final_translation_result.size() + 1 );
+#endif
+  
+#ifndef WIN32
+  gettimeofday( &end_time, NULL );
+  double time = ( (double)( end_time.tv_sec - start_time.tv_sec ) * 1000000 + (double)(end_time.tv_usec - start_time.tv_usec) ) / 1000000;
+
+  end_time_clock = clock();
+  double time_clock = ( double )( end_time_clock - start_time_clock )/CLOCKS_PER_SEC;
+#else
+  end_time = clock();
+  double time = ( double )( end_time - start_time )/CLOCKS_PER_SEC;
+#endif
+
+  cerr<<"Done!\n"
+      <<"[INPUT  ]  "<<sentence<<"\n";
+  cerr<<"[DETOKEN]  "<<final_translation_result<<"\n";
+  cerr<<"[time="<<time<<"s speed="<<1.000/time<<"sent/s]   \n\n";
+
+  interf->system_log_<<"[INPUT  ]  "<<sentence<<"\n"
+                     <<"[DETOKEN]  "<<final_translation_result<<"\n";
+  interf->system_log_<<"[time="<<time<<"s speed="<<1.000/time<<"sent/s]   \n\n";
+
+  return msg_res;
+}
+
+
+/*
+* $Name: __destroy
+* $Funtion: destroy model for translation memory
+* $Date: 2014-01-12
+*/
+void   __destroy( void* class_handle ) {
+  DetokenInterface* interf = (DetokenInterface*) class_handle;
+  interf->system_log_.clear();
+  interf->system_log_.close();
+  delete interf;
+}
+
+
+/*
+ * $Name: ParseParameterInConfig
+ * $Funtion:
+ * $Date: 2013-05-13
+ */
+bool ParseParameterInConfig( string config, map< string, string > &param ) {
+  ifstream inputConfigFile( config.c_str() );
+  if ( !inputConfigFile ) {
+    cerr<<"ERROR: Config File does not exist, exit!\n"<<flush;
+    exit( 1 );
+  }
+
+  string lineOfConfigFile;
+  while ( getline( inputConfigFile, lineOfConfigFile ) ) {
+    BasicMethod bm;
+    bm.ClearIllegalChar( lineOfConfigFile );
+    bm.RmStartSpace    ( lineOfConfigFile );
+    bm.RmEndSpace      ( lineOfConfigFile );
+
+    if( lineOfConfigFile == "" || *lineOfConfigFile.begin() == '#' ) {
+      continue;
+    } else if ( lineOfConfigFile.find( "param=\"" ) == lineOfConfigFile.npos 
+                || lineOfConfigFile.find( "value=\"" ) == lineOfConfigFile.npos ) {
+      continue;
+    } else {
+      string::size_type pos = lineOfConfigFile.find( "param=\"" );
+
+      pos += 7;
+      string key;
+      for ( ; lineOfConfigFile[ pos ] != '\"' && pos < lineOfConfigFile.length(); ++pos ) {
+        key += lineOfConfigFile[ pos ];
+      }
+      if ( lineOfConfigFile[ pos ] != '\"' ) {
+        continue;
+      }
+
+      pos = lineOfConfigFile.find( "value=\"" );
+      pos += 7;
+      string value;
+
+      for ( ; lineOfConfigFile[ pos ] != '\"' && pos < lineOfConfigFile.length(); ++pos ) {
+        value += lineOfConfigFile[ pos ];
+      }
+
+      if ( lineOfConfigFile[ pos ] != '\"' ) {
+        continue;
+      }
+
+      if ( param.find( key ) == param.end() ) {
+        param.insert( make_pair( key, value ) );
+      } else {
+        param[ key ] = value;
+      }
+    }
+  }
+  return true;
+}
+
+
+/*
+ * $Name: CheckFileInConf
+ * $Function:
+ * $Date: 2013-05-13
+ */
+bool CheckFileInConf( map< string, string > &param, string &fileKey ) {
+  if( param.find( fileKey ) != param.end() ) {
+    ifstream inFile( param[ fileKey ].c_str() );
+    if ( !inFile ) {
+      cerr<<"ERROR: Please check the path of \""<<fileKey<<"\".\n"<<flush;
+      exit( 1 );
+    }
+    inFile.clear();
+    inFile.close();
+  } else {
+    cerr<<"ERROR: Please add parameter \""<<fileKey<<"\" in your config file.\n"<<flush;
+    exit( 1 );
+  }
+  return true;
+}
+
+
+/*
+ * $Name:
+ * $Funtion:
+ * $Date:
+ */
+bool PrintDetokenLogo() {
+  cerr<<"####### SMT ####### SMT ####### SMT ####### SMT ####### SMT #######\n"
+      <<"# Detokenizer                                                     #\n"
+      <<"#                                            Version 0.0.1        #\n"
+      <<"#                                            NEUNLPLab/YAYI corp  #\n"
+      <<"#                                            liqiangneu@gmail.com #\n"
+      <<"####### SMT ####### SMT ####### SMT ####### SMT ####### SMT #######\n"
+      <<flush;
+  return true;
+}
+
+
+
--- a/script/detoken/src/interface.h
+++ b/script/detoken/src/interface.h
+/*
+* $Id:
+* 0003
+*
+* $File:
+* interface.h
+*
+* $Proj:
+* DetokenLib for Statistical Machine Translation
+*
+* $Func:
+* header file of interface
+*
+* $Version:
+* 0.0.1
+*
+* $Created by:
+* Qiang Li
+*
+* $Email
+* liqiangneu@gmail.com
+*
+* $Last Modified by:
+* 2014-01-10,13:14
+*/
+
+#ifndef DETOKENLIB_INTERFACE_H_
+#define DETOKENLIB_INTERFACE_H_
+
+#include <iostream>
+#include <string>
+#include <cstring>
+#include "detokenizer.h"
+
+using namespace std;
+using namespace decoder_detokenizer;
+
+//#define SUPPORT_ONLINE_SERVICE_CE_
+#define SUPPORT_ONLINE_SERVICE_EC_
+
+
+namespace detoken_interface {
+
+class DetokenInterface {
+
+ public:
+  DetokenInterface(){}
+  ~DetokenInterface(){}
+
+#ifdef SUPPORT_ONLINE_SERVICE_EC_
+ public:
+  PunctuationMap punctuation_map_;
+#endif
+
+ public:
+  string punct_mapping_dict_file_name_;
+ public:
+  string system_log_file_name_;
+  ofstream system_log_;
+};
+
+}
+
+
+#ifdef WIN32
+#define DLLEXPORT __declspec(dllexport)
+#else
+#define DLLEXPORT
+#endif
+
+extern "C" DLLEXPORT void*  __init   ( const char* config );
+//extern "C" DLLEXPORT char*  __do_job ( void* class_handle, const char* msg_text, int print_log, const char* log_head );
+extern "C" DLLEXPORT char*  __do_job(void* class_handle, const char* msg_text, const char* decoder_input, int sent_init, int print_log, const char* log_head);
+extern "C" DLLEXPORT void   __reload ( void* class_handle );
+extern "C" DLLEXPORT void   __destroy( void* class_handle );
+
+#endif
+
+
--- a/script/detoken/src/main.cpp
+++ b/script/detoken/src/main.cpp
+/*
+* $Id:
+* 0002
+*
+* $File:
+* main.cpp
+*
+* $Proj:
+* RecaserLib for Statistical Machine Translation
+*
+* $Func:
+* main function
+*
+* $Version:
+* 0.0.1
+*
+* $Created by:
+* Qiang Li
+*
+* $Email
+* liqiangneu@gmail.com
+*
+* $Last Modified by:
+* 2014-01-10,13:14
+*/
+
+#include "main.h"
+
+int main( int argc, char * argv[] ) {
+  if( argc < 4 ) {
+    cerr<<"[USAGE] EXE CONFIG TEST OUTPUT\n"<<flush;
+    exit( 1 );
+  }
+
+  string config( argv[ 1 ] );
+  void* handle = ( void* )__init( config.c_str() );
+
+  cerr<<argv[ 2 ]<<"\n"<<flush;
+  ifstream infile( argv[ 2 ] );
+  if ( !infile ) {
+    cerr<<"Can not open file "<<argv[ 1 ]<<"\n"<<flush;
+    exit( 1 );
+  }
+
+  FILE *outfile = fopen(argv[3] , "w");
+
+  string sentence;
+  int lineNo = 0;
+  while ( getline( infile, sentence ) ) {
+    ++lineNo;
+    if( lineNo % 10000 == 0 )
+    {
+        fprintf(stderr,"\r\tprocessed %d lines." , lineNo );
+    }
+	char * output = __do_job(handle, sentence.c_str(), sentence.c_str(), 0, 1, "");
+    fprintf( outfile , "%s\n" , output );
+    delete []output;
+  }
+  infile.clear();
+  infile.close();
+  fclose(outfile);
+
+  __destroy( handle );
+  return 0;
+}
+
+
+
--- a/script/detoken/src/main.h
+++ b/script/detoken/src/main.h
+/*
+* $Id:
+* 0001
+*
+* $File:
+* main.h
+*
+* $Proj:
+* DetokenLib for Statistical Machine Translation
+*
+* $Func:
+* header file of main function
+*
+* $Version:
+* 0.0.1
+*
+* $Created by:
+* Qiang Li
+*
+* $Email
+* liqiangneu@gmail.com
+*
+* $Last Modified by:
+* 2014-10-10,12:34
+*/
+
+#ifndef DETOKENLIB_MAIN_H_
+#define DETOKENLIB_MAIN_H_
+
+#include <iostream>
+#include "interface.h"
+#include "detokenizer.h"
+
+using namespace std;
+
+namespace main_func
+{
+}
+
+
+#endif
--- a/script/detoken/src/makefile
+++ b/script/detoken/src/makefile
+sofile:
+	g++ -O2 -o ../lib/libPostProcessing.so *.cpp -fPIC -shared
+
+exe:
+	g++ -O2 -o ../bin/Detoken *.cpp
--- a/script/detoken/src/readme.gitignore
+++ b/script/detoken/src/readme.gitignore
+我存在得意义是为了提交空文件夹。
+如有你需要占用这个文件夹就把我替换掉吧，否则请无视我吧。
+要是你还觉得文件夹不够用，那就自己去新建吧~
--- a/script/detoken/temp/readme.gitignore
+++ b/script/detoken/temp/readme.gitignore
+我存在得意义是为了提交空文件夹。
+如有你需要占用这个文件夹就把我替换掉吧，否则请无视我吧。
+要是你还觉得文件夹不够用，那就自己去新建吧~
--- a/script/fine_tuning.py
+++ b/script/fine_tuning.py
--- a/script/fine_tuning.sh
+++ b/script/fine_tuning.sh
+## bpe file path
+SRC_TRAIN_PATH=../../zh.dev.bpe
+TGT_TRAIN_PATH=../../en.dev.bpe
+SRC_DEV_PATH=../../zh.dev.bpe
+TGT_DEV_PATH=../../en.dev.bpe
+## model saved path
+TRAINDATA_PATH=../../0813
+## 
+python3 fine_tuning.py \
+	--src_trainfile_path=$SRC_TRAIN_PATH \
+	--tgt_trainfile_path=$TGT_TRAIN_PATH \
+	--src_devfile_path=$SRC_DEV_PATH \
+	--tgt_devfile_paths=$TGT_DEV_PATH \
+	--src_dic_path=../../zh2en_final/source_dic \
+	--tgt_dic_path=../../zh2en_final/target_dic \
+	--final_traindata_path=$TRAINDATA_PATH
--- a/script/gen_avg_ensemble_model.py
+++ b/script/gen_avg_ensemble_model.py
+#usage: model_dir model_num out_dir
+
+import argparse
+import os
+import re
+import tensorflow as tf
+import numpy as np
+import six
+tf.logging.set_verbosity(tf.logging.INFO)
+
+parser = argparse.ArgumentParser()
+parser.add_argument('-model_dir', required=True, type=str, help='saved models path')
+parser.add_argument('-model_num', required=True, type=int, help='ensembled model numbers, we use the last models')
+parser.add_argument('-out_dir', required=True, type=str, help='output ensembled model path, do not set same as model_dir')
+
+args = parser.parse_args()
+assert os.path.exists(args.model_dir), 'check model dir!'
+assert args.out_dir != args.model_dir, 'do not set model_dir == output_dir'
+
+root_dir, dir_names, file_names = list(os.walk(args.model_dir))[0]
+index_list = []
+for file in file_names:
+    match = re.findall(r'model\.ckpt-(\d+)\.index', file)
+    if len(match) == 1:
+        index_list += match
+
+# we sort all files by descending order, so the recent file is sorted in the front
+index_list = [int(i) for i in index_list]
+index_list = sorted(index_list, reverse=True)
+print('total find %d model index'%len(index_list))
+print(index_list)
+
+model_num = args.model_num
+if args.model_num > len(index_list):
+    print('warning: you set model_num=%d, however only %d files are detected. so reset model_num=%d'%(args.model_num, len(index_list), len(index_list)))
+    model_num = len(index_list)
+
+# get ensembled model index
+index_list = index_list[:model_num]
+print('using following index-model')
+print(index_list)
+
+if not os.path.exists(args.out_dir):
+    os.mkdir(args.out_dir)
+
+"""
+    extract model parameters
+"""
+tf.logging.info("Reading variables and averaging checkpoints:")
+
+checkpoints = [os.path.join(args.model_dir, 'model.ckpt-{}'.format(index))  for index in index_list]
+for c in checkpoints:
+    tf.logging.info("%s ", c)
+var_list = tf.contrib.framework.list_variables(checkpoints[0])
+var_values, var_dtypes = {}, {}
+for (name, shape) in var_list:
+    if not name.startswith("global_step"):
+        var_values[name] = np.zeros(shape)
+for checkpoint in checkpoints:
+    reader = tf.contrib.framework.load_checkpoint(checkpoint)
+    for name in var_values:
+        tensor = reader.get_tensor(name)
+        var_dtypes[name] = tensor.dtype
+        var_values[name] += tensor
+    tf.logging.info("Read from checkpoint %s", checkpoint)
+for name in var_values:  # Average.
+    var_values[name] /= len(checkpoints)
+
+tf_vars = [
+    tf.get_variable(v, shape=var_values[v].shape, dtype=var_dtypes[name])
+    for v in var_values
+]
+placeholders = [tf.placeholder(v.dtype, shape=v.shape) for v in tf_vars]
+assign_ops = [tf.assign(v, p) for (v, p) in zip(tf_vars, placeholders)]
+global_step = tf.Variable(
+    0, name="global_step", trainable=False, dtype=tf.int64)
+saver = tf.train.Saver(tf.all_variables())
+
+# Build a model consisting only of variables, set them to the average values.
+with tf.Session() as sess:
+    sess.run(tf.initialize_all_variables())
+    for p, assign_op, (name, value) in zip(placeholders, assign_ops,
+                                           six.iteritems(var_values)):
+        sess.run(assign_op, {p: value})
+    # Use the built saver to save the averaged checkpoint.
+    saver.save(sess, os.path.join(args.out_dir, 'ensemble_%d'%model_num) , global_step=global_step)
+
+tf.logging.info("Averaged checkpoints saved in %s", args.out_dir)
--- a/script/mteval-v13a.pl
+++ b/script/mteval-v13a.pl
--- a/script/mteval_cwmt/NiuTrans-detokenizer.pl
+++ b/script/mteval_cwmt/NiuTrans-detokenizer.pl
+#!/usr/bin/perl -w
+##################################################################################
+#
+# NiuTrans - SMT platform
+# Copyright (C) 2011, NEU-NLPLab (http://www.nlplab.com/). All rights reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public
+# License along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA
+#
+##################################################################################
+
+#######################################
+#   version      : 1.1.0
+#   Function     : detokenizer
+#   Author       : Qiang Li
+#   Email        : liqiangneu@gmail.com
+#   Date         : 08/06/2012
+#   Last Modified: 
+#######################################
+
+
+use strict;
+use Encode;
+use utf8;
+
+my $logo =   "########### SCRIPT ########### SCRIPT ############ SCRIPT ##########\n".
+             "#                                                                  #\n".
+             "#     NiuTrans  detokenizer  (version 1.1.0)  --www.nlplab.com     #\n".
+             "#                                                                  #\n".
+             "########### SCRIPT ########### SCRIPT ############ SCRIPT ##########\n";
+
+print STDERR $logo;
+
+my %param;
+
+getParameter( @ARGV );
+detokenize();
+
+sub detokenize
+{
+          open( INFILE, "<", $param{ "-in"  } ) or die "Error: can not open file $param{ \"-in\" }.\n";
+          open( OUTPUT, ">", $param{ "-out" } ) or die "Error: can not open file $param{ \"-out\" }.\n";
+
+          my $sentNo = 0;
+          my $inputFileSent;
+          while( $inputFileSent = <INFILE> )
+          {
+                    ++$sentNo;
+                    $inputFileSent =~ s/[\r\n]//g;
+                    if( $inputFileSent =~ /^<.+>$/ || $inputFileSent =~ /^\s*$/ )
+                    {
+                              print OUTPUT $inputFileSent."\n";
+                    }
+                    else
+                    {
+                              my $detokenizeRes = startDetokenize( $inputFileSent );
+                              print OUTPUT $detokenizeRes."\n";
+                    }
+                    print STDERR "\rProcessed $sentNo lines." if( $sentNo % 100 == 0 );
+          }
+          print STDERR "\rProcessed $sentNo lines.\n";
+          
+          
+          close( INFILE );
+          close( OUTFILE );
+}
+
+sub startDetokenize
+{
+          my $sentence = $_[ 0 ];
+
+          $sentence =~ s/ \@\-\@ /-/g;  # de-escape special chars
+          $sentence =~ s/\&bar;/\|/g;   # factor separator
+          $sentence =~ s/\&lt;/\</g;    # xml
+          $sentence =~ s/\&gt;/\>/g;    # xml
+          $sentence =~ s/\&bra;/\[/g;   # syntax non-terminal (legacy)
+          $sentence =~ s/\&ket;/\]/g;   # syntax non-terminal (legacy)
+          $sentence =~ s/\&quot;/\"/g;  # xml
+          $sentence =~ s/\&apos;/\'/g;  # xml
+          $sentence =~ s/\&#91;/\[/g;   # syntax non-terminal
+          $sentence =~ s/\&#93;/\]/g;   # syntax non-terminal
+          $sentence =~ s/\&amp;/\&/g;   # escape escape
+
+          my @words = split / +/,$sentence;
+          my $sentenceDetoken = "";
+          my %quoteCount = ( "\'" => 0, "\"" => 0 );
+          my $connector = " ";
+          
+          my $wordCnt = 0;
+          my $preWord = "";
+          foreach my $word ( @words )
+          {
+                    if( $word =~ /^[\p{IsSc}\(\[\{]+$/ )
+                    {
+                              $sentenceDetoken = $sentenceDetoken.$connector.$word;
+                              $connector = "";
+                    }
+                    elsif( $word =~ /^[\,\.\?\!\:\;\\\%\}\]\)]+$/ )
+                    {
+                              $sentenceDetoken = $sentenceDetoken.$word;
+                              $connector = " ";
+                    }
+                    elsif( ( $wordCnt > 0 ) && ( $word =~ /^[\'][\p{IsAlpha}]/ ) && ( $preWord =~ /[\p{IsAlnum}]$/ ) )
+                    {
+                              $sentenceDetoken = $sentenceDetoken.$word;
+                              $connector = " ";
+                    }
+                    elsif( $word =~ /^[\'\"]+$/ )
+                    {
+                              if( !exists $quoteCount{ $word } )
+                              {
+                                        $quoteCount{ $word } = 0;
+                              }
+                              
+                              if( ( $quoteCount{ $word } % 2 ) eq 0 )
+                              {
+                                        if( ( $word eq "'" ) && ( $wordCnt > 0 ) && ( $preWord =~ /[s]$/ ) )
+                                        {
+                                                  $sentenceDetoken = $sentenceDetoken.$word;
+                                                  $connector = " ";
+                                        }
+                                        else
+                                        {
+                                                  $sentenceDetoken = $sentenceDetoken.$connector.$word;
+                                                  $connector = "";
+                                                  ++$quoteCount{ $word };
+                                        }
+                              }
+                              else
+                              {
+                                        $sentenceDetoken = $sentenceDetoken.$word;
+                                        $connector = " ";
+                                        ++$quoteCount{ $word };
+                              }
+                    }
+                    else
+                    {
+                              $sentenceDetoken = $sentenceDetoken.$connector.$word;
+                              $connector = " ";
+                    }
+                    
+                    $preWord = $word;
+                    ++$wordCnt;
+          }
+
+          $sentenceDetoken =~ s/ +/ /g;
+          $sentenceDetoken =~ s/^ //g;
+          $sentenceDetoken =~ s/ $//g;
+          
+          $sentenceDetoken =~ s/^([[:punct:]\s]*)([[:alpha:]])(.*)$/$1\U$2\E$3/ if( $param{ "-upcase" } eq 1);
+          
+          return $sentenceDetoken;
+}
+
+sub getParameter
+{
+          if( ( scalar( @_ ) < 4 ) || ( scalar( @_ ) % 2 != 0 ) )
+          {
+                    print STDERR "[USAGE]\n".
+                                 "         NiuTrans-detokenizer.pl               [OPTIONS]\n".
+                                 "[OPTION]\n".
+                                 "          -in     :  Input  File.\n".
+                                 "          -out    :  Output File.\n".
+                                 "          -upcase :  Uppercase the first char  [optional]\n".
+                                 "                     Default value is 1.\n".
+                                 "[EXAMPLE]\n".
+                                 "         perl NiuTrans-detokenizer.pl          [-in  FILE]\n".
+                                 "                                               [-out FILE]\n";
+                    exit( 0 );
+          }
+          
+          my $pos;
+          for( $pos = 0; $pos < scalar( @_ ); ++$pos )
+          {
+                    my $key = $ARGV[ $pos ];
+                    ++$pos;
+                    my $value = $ARGV[ $pos ];
+                    $param{ $key } = $value;
+          }
+          
+          if( !exists $param{ "-in" } )
+          {
+                    print STDERR "Error: please assign \"-in\"!\n";
+                    exit( 1 );
+          }
+          
+          if( !exists $param{ "-out" } )
+          {
+                    print STDERR "Error: please assign \"-out\"!\n";
+                    exit( 1 );
+          }
+          
+          if( !exists $param{ "-upcase" } )
+          {
+                    $param{ "-upcase" } = 1;
+          }
+          elsif( $param{ "-upcase" } ne 1 )
+          {
+                    $param{ "-upcase" } = 0;
+          }
+}
--- a/script/mteval_cwmt/Tool/1best_result
+++ b/script/mteval_cwmt/Tool/1best_result
--- a/script/mteval_cwmt/Tool/Twig.pm
+++ b/script/mteval_cwmt/Tool/Twig.pm
--- a/script/mteval_cwmt/Tool/make.py
+++ b/script/mteval_cwmt/Tool/make.py
+#!/usr/bin/python
+#coding: utf-8
+
+__author__ = "Summer Rain"
+  
+from sys import argv
+from time import time
+from os import path, system, listdir
+
+program         = "mteval_sbp.linux"
+include_path    = "."
+src_path        = "src"
+lib_path        = "."
+
+compile_flag    = "-O3"
+link_flag       = ""
+lib_flag        = ""
+
+if __name__ == "__main__":
+    files = [fe for fe in listdir(src_path) if fe.endswith(".cpp") or fe.endswith(".cc")]
+    ofes = []
+    for sfe in files:
+        #print "compiling %s ..." %(sfe)
+        print "%s %s %s" %("-" * 20, sfe, "-" * 20)
+        ofe = "." + sfe.replace(".cpp", ".o").replace(".cc", ".o")
+        time1 = path.getmtime(src_path + "/" + sfe)
+        if path.isfile(ofe):
+            time2 = path.getmtime(ofe)
+        else:
+            time2 = time1 - 1 
+
+        if time1 > time2 or len(argv) == 2 and argv[1] == "clean":
+            cmd = "g++ -c %s/%s %s -o %s -I%s" %(src_path, sfe, compile_flag, ofe, include_path)
+            print cmd
+            if system(cmd) != 0:
+                exit(1)
+        else:
+            print "%s is the newest" %(ofe)
+        ofes.append(ofe)    
+    if files != []:
+        print "-" * 40
+        cmd = "g++ %s -o %s %s %s -L%s" %(" ".join(ofes), program, link_flag, lib_flag, lib_path)
+        print cmd
+        system(cmd)
--- a/script/mteval_cwmt/Tool/mteval-v13a_Niu.pl
+++ b/script/mteval_cwmt/Tool/mteval-v13a_Niu.pl
--- a/script/mteval_cwmt/Tool/mteval-v13a_svn.pl
+++ b/script/mteval_cwmt/Tool/mteval-v13a_svn.pl
--- a/script/mteval_cwmt/Tool/mteval-v13m.pl
+++ b/script/mteval_cwmt/Tool/mteval-v13m.pl
--- a/script/mteval_cwmt/Tool/mteval_sbp
+++ b/script/mteval_cwmt/Tool/mteval_sbp
--- a/script/mteval_cwmt/Tool/mteval_sbp.linux
+++ b/script/mteval_cwmt/Tool/mteval_sbp.linux
--- a/script/mteval_cwmt/Tool/mteval_sbp.py
+++ b/script/mteval_cwmt/Tool/mteval_sbp.py
+# -*- coding: utf-8 -*-
+import codecs, re, sys
+import argparse
+
+parser = argparse.ArgumentParser(description='Invert translation result to xml format official eval tools require.')
+parser.add_argument('--src_testfile_path', required=True, help="the path of source test file.")
+parser.add_argument('--refs_testfile_path', required=True, help="the path of refs file, use puntch , split ref files.")
+parser.add_argument('--tst_testfile_path', required=True, help="the path of translation of source file.")
+parser.add_argument('--output_path', required=True, help="the path of output.")
+parser.add_argument('--srclang', required=True)
+parser.add_argument('--tgtlang', required=True)
+args = parser.parse_args()
+
+src_testfile_path=args.src_testfile_path
+refs_testfile_path=args.refs_testfile_path
+tst_testfile_path=args.tst_testfile_path
+output_path=args.output_path
+srclang=args.srclang
+tgtlang=args.tgtlang
+
+def get_system_info(organization, system_identify, system_description_info):
+    system_label = []
+    system_label.append("<system site=\"" + organization + "\""+ " " + "sysid=\"" + system_identify + "\">")
+    for info in system_description_info:
+        system_label.append(info.strip())
+    system_label.append("</system>")
+    return system_label
+
+def get_firstLine_info():
+    return ["<?xml version=\"1.0\" encoding=\"UTF-8\"?>"]
+
+def get_secondLine_info(setclass, setid, srclang, tgtlang):
+    return ["<" + setclass + " setid=\"" + setid + "\"" + " " + "srclang=\"" + srclang + "\"" + " " + "trglang=\"" + tgtlang +"\">"]
+
+def get_tstTail_info():
+    return ["</tstset>"]
+
+def get_refTail_info():
+    return ["</refset>"]
+
+def XMLformat(line):
+    special_char = ["&", ">", "<", "\"", r"'"]
+    repalce_char = [r"&amp;", r"&lt;", r"&gt;", r"&quot;", r"&apos;"]
+    count = 0
+    new_line = ""
+    while count < len(line):
+        c = line[count]
+
+        cc_spec = 0
+        flag = False
+        while cc_spec < len(special_char):
+            if c == special_char[cc_spec]:
+               flag = True
+               break
+            cc_spec += 1
+
+        if flag:
+            line = line[:count] + repalce_char[cc_spec] + line[(count + 1):]
+
+        count += 1
+    return line
+
+def handle_src_file(src_file, srclang, tgtlang, out_path):
+    # 该函数返回结果
+    src_xml_content = []
+
+    # 开头
+    src_xml_content.append(get_firstLine_info()[0])
+    src_xml_content.append(get_secondLine_info(setclass="srcset", setid= srclang + "_" + tgtlang + "_news_trans", srclang=srclang, tgtlang=tgtlang)[0])
+
+    # 正文
+    src_xml_content.append("<DOC docid=" + "\"news\">")
+
+    count = 0
+    src_file_handle = codecs.open(src_file, "r", "utf_8_sig")
+    for src_line in src_file_handle.read().strip().split("\n"):
+        count += 1
+        src_xml_content.append("<seg id=" + "\"" + str(count) + "\">" + XMLformat(src_line.strip()) + "</seg>")
+    src_xml_content.append("</DOC>")
+    src_file_handle.close()
+
+    # 结尾
+    src_xml_content.append("</srcset>")
+
+    # 输出
+    src_file_out = codecs.open(out_path +  "src.txt.xml", "w", "utf_8_sig")
+    for src_line in src_xml_content:
+        src_file_out.write(src_line + "\n")
+    src_file_out.close()
+    return src_xml_content
+
+def get_src_content(src_file, src_lang, tgt_lang, out_path, occasion):
+
+    src_xml_content = []
+    if occasion == True:
+        src_xml_content = handle_src_file(src_file, src_lang, tgt_lang, out_path)
+    else:
+        src_file_handle = codecs.open(src_file, "r", "utf_8_sig")
+        for line in src_file_handle.read().strip().split("\n"):
+            src_xml_content.append(line)
+        src_file_handle.close()
+
+    return src_xml_content[2:][:-1]
+
+def main(src_file, refs_file, tst_file, out_path ,src_lang="zh", tgt_lang="en", occasion=True):
+    """
+    :param src_file:
+    :param refs_file: list
+    :param tst_file:
+    :param out_path:
+    :param src_lang:
+    :param tgt_lang:
+    :param occasion:
+    :return:
+    """
+    # 将src_file转为特定的xml格式 或 读取src_file文件内容
+    src_xml_content = get_src_content(src_file, src_lang, tgt_lang, out_path, occasion)
+
+    # 比较src_xml_content内容， 将refs_file转为特定格式
+    refs_xml_content = []
+    transorg = 1  # 1表示第一个ref的数据，一次类推
+    for ref_file in refs_file:
+
+        ref_site = "\"transorg" + str(transorg) + "\""
+        ref_file_handle = codecs.open(ref_file, "r", "utf_8_sig")
+        ref_lines = ref_file_handle.read().strip().split("\n")
+
+        ref_line_count, count = 0, 0
+        ref_line = ""
+        while count < len(src_xml_content):
+            # src xml line
+            src_line = src_xml_content[count]
+
+            # ref raw line
+            if ref_line_count < len(ref_lines):
+                ref_line = ref_lines[ref_line_count]
+
+            # 依据src_xml_file文件情况
+            if "<DOC" in src_line:
+                # <DOC docid="news" site="transorg1">
+                refs_xml_content.append(src_line.strip()[:-1] + " " + "site=" + ref_site + ">")
+
+            elif "<p>" in src_line or "</p>" in src_line:
+                # <p> </p> 单独成一行
+                refs_xml_content.append(src_line)
+
+            elif "</DOC>" in src_line:
+                # </DOC > 单独成一行
+                refs_xml_content.append(src_line)
+
+            elif "<seg" in src_line:
+
+                rs = re.match(r"<seg id=\"(.*?)\">", src_line) # 提取<seg id="402">中间的数字
+                if rs:
+                    # rs.group(0) = <seg id=\"(.*?)\">
+                    line = rs.group(0) + XMLformat(ref_line.strip()) + "</seg>"
+                    refs_xml_content.append(line)
+                    ref_line_count += 1
+            count += 1
+
+        transorg += 1
+        ref_file_handle.close()
+
+    # tail
+    refs_out_contents = get_firstLine_info() + get_secondLine_info("refset",  src_lang + "_" + tgt_lang + "_news" , src_lang, tgt_lang) + \
+                     refs_xml_content + get_refTail_info()
+
+
+    # 比较src_xml_file,将翻译结果转为xml格式，注意目前只支持一个翻译结果
+    tst_file_handle = codecs.open(tst_file, "r", "utf_8_sig")
+    tst_lines = tst_file_handle.read().strip().split("\n")
+
+    transed_xml_content = []
+    count, tst_line_no = 0, 0
+    transed_line = ""
+    while count < len(src_xml_content):
+        src_line = src_xml_content[count]
+
+        if tst_line_no < len(tst_lines):
+            transed_line = tst_lines[tst_line_no]
+
+        if "<DOC" in src_line:
+            # <DOC docid="文档名称" sysid="系统标识">
+            transed_xml_content.append(src_line[:-1] + " sysid=\"" + src_lang + "_" + tgt_lang + "_trans" + "\">")
+
+        elif "<p>" in src_line or "</p>" in src_line:
+            transed_xml_content.append(src_line)
+
+        elif "</DOC" in src_line:
+            transed_xml_content.append(src_line)
+
+        elif "<seg" in src_line:
+            rs = re.match(r"<seg id=\"(.*?)\">", src_line)
+            if rs:
+                line = rs.group(0) + XMLformat(transed_line.strip()) + "</seg>"
+                transed_xml_content.append(line)
+                tst_line_no += 1
+        count += 1
+    tst_file_handle.close()
+
+    # <system site="单位名称" sysid="系统标识"> tail
+    transed_out_contents = get_firstLine_info() + get_secondLine_info("tstset", src_lang + "_" + tgt_lang + "_news", src_lang, tgt_lang) + \
+        get_system_info("Niu",  src_lang + "_" + tgt_lang + "_trans", ["系统描述信息"])  + transed_xml_content + get_tstTail_info()
+
+
+    ## 保存
+    ref_out_file = codecs.open(out_path + "ref.txt.xml", "w", "utf_8_sig")
+    max_size = len(refs_out_contents)
+    count = 0
+    while count < max_size:
+        out_line = refs_out_contents[count]
+        ref_out_file.write(out_line)
+        if count != (max_size - 1):
+            ref_out_file.write("\n")
+        count += 1
+
+    transed_out_file = codecs.open(out_path + "tst.txt.xml", "w", "utf_8_sig")
+    max_size = len(transed_out_contents)
+    count = 0
+    while count < max_size:
+        out_line = transed_out_contents[count]
+        transed_out_file.write(out_line)
+        if count != (max_size - 1):
+            transed_out_file.write("\n")
+        count += 1
+
+    ref_out_file.close()
+    transed_out_file.close()
+
+# main("./eval/ensemble/mt12-wb/input.token", ["./eval/ensemble/mt12-wb/ref0","./eval/ensemble/mt12-wb/ref1", "./eval/ensemble/mt12-wb/ref2", "./eval/ensemble/mt12-wb/ref3"], "./eval/ensemble\mt12-wb/mt12-wb.ensemble", "./eval/ensemble\mt12-wb/")
+print ("refs: " + refs_testfile_path)
+refs_list = []
+for line in refs_testfile_path.strip().split():
+    refs_list.append(line.strip())
+if len(refs_list) == 1 and len(refs_testfile_path.strip().split(",")) > 1:
+    refs_list = []
+    for line in refs_testfile_path.strip().split(","):
+        refs_list.append(line.strip())
+
+main(src_testfile_path, refs_list, tst_testfile_path, output_path, srclang, tgtlang)
+print("xml format file has created.")
--- a/script/mteval_cwmt/Tool/mteval_sbp.sh
+++ b/script/mteval_cwmt/Tool/mteval_sbp.sh
+src_file=./input.token
+refs_file=./ref0
+tst_file=./mt06.ensemble
+output_path=./
+srclang=zh
+tgtlang=en
+
+python3 mteval_sbp.py --src_testfile_path=$src_file \
+                      --refs_testfile_path=$refs_file \
+                      --tst_testfile_path=$tst_file \
+                      --output_path=$output_path \
+                      --srclang=$srclang \
+                      --tgtlang=$tgtlang
+./Tool/mteval_sbp.linux -c -r $output"ref.txt.xml" -s $output"src.txt.xml" -t $output"tst.txt.xml" > $output"1best_result"
+echo "eval result has created."
--- a/script/mteval_cwmt/Tool/src/MTEval.cpp
+++ b/script/mteval_cwmt/Tool/src/MTEval.cpp
--- a/script/mteval_cwmt/Tool/src/MTEvalLib.cpp
+++ b/script/mteval_cwmt/Tool/src/MTEvalLib.cpp
--- a/script/mteval_cwmt/Tool/src/MTEvalLib.h
+++ b/script/mteval_cwmt/Tool/src/MTEvalLib.h
+
+#pragma once
+
+#pragma warning(disable:4503)
+#pragma warning(disable:4786)
+
+#include "xmlfunc.h"
+
+#include<string>
+#include<vector>
+#include<map>
+
+using namespace std;
+
+const int max_Ngram=9;
+const int NIST_ORDER=5;
+const int BLEU_ORDER=4;
+
+typedef struct {
+	double cum;
+	double ind;
+} _cum_ind;
+
+typedef vector<_cum_ind> nscore_struct;
+
+typedef basic_string<int> SENT;
+typedef map<string,int> VOCAB;
+
+typedef map<SENT,double> GRAMMAP;
+
+typedef map<int,SENT> SEGMAP;
+typedef map<string,SEGMAP> DOCMAP;
+typedef map<string,DOCMAP> SITEMAP;
+
+typedef pair<double,int> SCORE;
+typedef map<int,SCORE> SEGSCORE; // segid, score
+typedef map<string,pair<SCORE,SEGSCORE> > DOCSCORE; // docid,score
+typedef map<string,pair<SCORE,DOCSCORE> > SITESCORE; // site,score
+
+typedef map<string,double> SCOREMAP;
+typedef map<string,nscore_struct> NSCOREMAP;
+
+tstring get_ref_data(const string & setid, SITEMAP & docs, VOCAB & voc, int preserve_case, const tstring & fn);
+tstring get_tst_data(const string & setid, SITEMAP & docs, const VOCAB & voc, int preserve_case, const tstring & fn);
+tstring get_source_info(DOCMAP & srcs, const tstring & fn);
+tstring sgm_get_ref_data(const string & setid, SITEMAP & docs, VOCAB & voc, int preserve_case, const tstring & fn);
+tstring sgm_get_tst_data(const string & setid, SITEMAP & docs, const VOCAB & voc, int preserve_case, const tstring & fn);
+tstring sgm_get_source_info(DOCMAP & srcs, const tstring & fn);
+void compute_ngram_info(const SITEMAP & refs, GRAMMAP & ngram_info);
+
+
+// Score NIST and BLEU
+// parameter: nist: non-zero for nist, and zero of bleu
+void score_system(const SITEMAP & refs, const SITEMAP & tsts, const GRAMMAP & ngram_info,const string & site, NSCOREMAP & SCOREmt,int nist, SITESCORE & score);
+double mper_score_system(const SITEMAP & refs, const SITEMAP & tsts, const string & site, SITESCORE & score);
+double mwer_score_system(const SITEMAP & refs, const SITEMAP & tsts, const string & site, SITESCORE & score);
+double gtm_score_system(const SITEMAP & refs, const SITEMAP & tsts, const string & site, SITESCORE & score);
+double ict_score_system(const SITEMAP & refs, const SITEMAP & tsts, const string & site, SITESCORE & score);
+
+//void NormalizeText(string & s, const tstring & lang);
+//void makelower(string & s);
+
+//void setdetail(int dt);
+//void setcase(int c);
+
+typedef void (* PWORDSEGMENTER)(const string & sent, vector<string> & words, const tstring & lang);
+void setwordsegmenter(PWORDSEGMENTER p);
+void defaultwordsegmenter(const string & s, vector<string> & words, const tstring & lang);
--- a/script/mteval_cwmt/Tool/src/xmlfunc.cpp
+++ b/script/mteval_cwmt/Tool/src/xmlfunc.cpp
--- a/script/mteval_cwmt/Tool/src/xmlfunc.h
+++ b/script/mteval_cwmt/Tool/src/xmlfunc.h
--- a/script/mteval_cwmt/Twig.pm
+++ b/script/mteval_cwmt/Twig.pm
--- a/script/mteval_cwmt/cwmt17-test/1best_result-detoken
+++ b/script/mteval_cwmt/cwmt17-test/1best_result-detoken
+MT evaluation scorer began on 2018 Apr 18 at 17:57:59
+  Evaluation of en-to-zh translation using:
+    src set (1 docs, 1000 segs)
+    ref set (1 refs)
+    tst set (1 systems)
+
+Scores of system: 
+NIST=6.9923  BLEU=0.2588 BLEU_SBP=0.2452 GTM=0.5959 mWER=0.6470 mPER=0.4637 ICT=0.2380
+
+# ------------------------------------------------------------------------
+Individual N-gram scoring
+        1-gram   2-gram   3-gram   4-gram   5-gram   6-gram   7-gram   8-gram   9-gram
+        ------   ------   ------   ------   ------   ------   ------   ------   ------
+ NIST:  5.4436   1.2857   0.2261   0.0303   0.0064   0.0019   0.0005   0.0000   0.0000 ""
+
+ BLEU:  0.6085   0.3340   0.2036   0.1286   0.0835   0.0552   0.0375   0.0254   0.0173 ""
+
+
+# ------------------------------------------------------------------------
+Cumulative N-gram scoring
+            1-gram   2-gram   3-gram   4-gram   5-gram   6-gram   7-gram   8-gram   9-gram
+            ------   ------   ------   ------   ------   ------   ------   ------   ------
+ NIST:      5.4436   6.7294   6.9555   6.9859   6.9923   6.9942   6.9946   6.9947   6.9947 ""
+
+ BLEU:      0.5832   0.4321   0.3315   0.2588   0.2047   0.1634   0.1316   0.1066   0.0867 ""
+ BLEU_SBP:  0.5525   0.4093   0.3140   0.2452   0.1939   0.1548   0.1247   0.1010   0.0821 ""
+
+MT evaluation scorer ended on 2018 Apr 18 at 17:58:01
--- a/script/mteval_cwmt/cwmt17-test/input.token
+++ b/script/mteval_cwmt/cwmt17-test/input.token
--- a/script/mteval_cwmt/cwmt17-test/ref.detoken
+++ b/script/mteval_cwmt/cwmt17-test/ref.detoken
--- a/script/mteval_cwmt/cwmt17-test/ref.detoken.detoken
+++ b/script/mteval_cwmt/cwmt17-test/ref.detoken.detoken
--- a/script/mteval_cwmt/cwmt17-test/ref.txt.xml
+++ b/script/mteval_cwmt/cwmt17-test/ref.txt.xml
--- a/script/mteval_cwmt/cwmt17-test/src.txt.xml
+++ b/script/mteval_cwmt/cwmt17-test/src.txt.xml
--- a/script/mteval_cwmt/cwmt17-test/trans
+++ b/script/mteval_cwmt/cwmt17-test/trans
--- a/script/mteval_cwmt/cwmt17-test/trans.detoken
+++ b/script/mteval_cwmt/cwmt17-test/trans.detoken
--- a/script/mteval_cwmt/cwmt17-test/trans.detoken.agv
+++ b/script/mteval_cwmt/cwmt17-test/trans.detoken.agv
--- a/script/mteval_cwmt/cwmt17-test/tst.txt.xml
+++ b/script/mteval_cwmt/cwmt17-test/tst.txt.xml
--- a/script/mteval_cwmt/detoken.perl
+++ b/script/mteval_cwmt/detoken.perl
--- a/script/mteval_cwmt/detoken.sh
+++ b/script/mteval_cwmt/detoken.sh
+for args in $@
+do
+   detokenizedfile=$args".detoken"
+   perl detoken.perl -l en < $args > $detokenizedfile
+done
--- a/script/mteval_cwmt/detoken_Niu.sh
+++ b/script/mteval_cwmt/detoken_Niu.sh
+for args in $@
+do
+   detokenizedfile=$args".detoken"
+   perl NiuTrans-detokenizer.pl -in $args -out $detokenizedfile
+  # perl detoken.perl -l en < $args > $detokenizedfile  
+done
+
--- a/script/mteval_cwmt/detoken_zzy.py
+++ b/script/mteval_cwmt/detoken_zzy.py
+# coding=utf-8
+
+import re
+import sys
+
+# - 
+# ℃ ？
+# 1.35
+
+if __name__ == '__main__':
+    print("python detoken_zzy.py infile outfile")
+    file_in = open(sys.argv[1],"r",encoding="UTF-8")
+    file_out = open(sys.argv[2],"w",encoding="UTF-8")
+    count = 0
+    while True:
+        line = file_in.readline()
+        if len(line) == 0:
+            break
+        line = line.replace(" - ","-")
+        line = line.replace(" -- ","-")    
+        line = line.replace(" -","-")  
+        line = line.replace("- ","-")              
+        dot = re.findall(r"\d\. \d",line)
+        
+        if len(dot) != 0:
+            for item in dot:
+                line = line.replace(item,item.replace(". ","."))
+            count += 1
+        file_out.write(line)
+    print("dot: " + str(count))
+    print("detoken_zzy done!")    
--- a/script/mteval_cwmt/en2zh-cwmt17-test/1best_result-detoken
+++ b/script/mteval_cwmt/en2zh-cwmt17-test/1best_result-detoken
+MT evaluation scorer began on 2018 Apr 17 at 14:01:04
+  Evaluation of en-to-zh translation using:
+    src set (1 docs, 1001 segs)
+    ref set (1 refs)
+    tst set (1 systems)
+
+Scores of system: 
+NIST=7.2479  BLEU=0.3026 BLEU_SBP=0.2835 GTM=0.6093 mWER=0.6603 mPER=0.4596 ICT=0.2502
+
+# ------------------------------------------------------------------------
+Individual N-gram scoring
+        1-gram   2-gram   3-gram   4-gram   5-gram   6-gram   7-gram   8-gram   9-gram
+        ------   ------   ------   ------   ------   ------   ------   ------   ------
+ NIST:  5.5411   1.3476   0.2812   0.0625   0.0155   0.0064   0.0029   0.0012   0.0005 ""
+
+ BLEU:  0.6222   0.3820   0.2484   0.1683   0.1146   0.0786   0.0544   0.0372   0.0255 ""
+
+
+# ------------------------------------------------------------------------
+Cumulative N-gram scoring
+            1-gram   2-gram   3-gram   4-gram   5-gram   6-gram   7-gram   8-gram   9-gram
+            ------   ------   ------   ------   ------   ------   ------   ------   ------
+ NIST:      5.5411   6.8887   7.1699   7.2324   7.2479   7.2543   7.2572   7.2583   7.2588 ""
+
+ BLEU:      0.5964   0.4673   0.3732   0.3026   0.2471   0.2028   0.1670   0.1377   0.1137 ""
+ BLEU_SBP:  0.5587   0.4378   0.3496   0.2835   0.2315   0.1899   0.1564   0.1290   0.1065 ""
+
+MT evaluation scorer ended on 2018 Apr 17 at 14:01:06
--- a/script/mteval_cwmt/en2zh-cwmt17-test/input.token
+++ b/script/mteval_cwmt/en2zh-cwmt17-test/input.token
--- a/script/mteval_cwmt/en2zh-cwmt17-test/ref
+++ b/script/mteval_cwmt/en2zh-cwmt17-test/ref
--- a/script/mteval_cwmt/en2zh-cwmt17-test/ref.detoken
+++ b/script/mteval_cwmt/en2zh-cwmt17-test/ref.detoken
--- a/script/mteval_cwmt/en2zh-cwmt17-test/ref.txt.xml
+++ b/script/mteval_cwmt/en2zh-cwmt17-test/ref.txt.xml
--- a/script/mteval_cwmt/en2zh-cwmt17-test/src.txt.xml
+++ b/script/mteval_cwmt/en2zh-cwmt17-test/src.txt.xml
--- a/script/mteval_cwmt/en2zh-cwmt17-test/trans
+++ b/script/mteval_cwmt/en2zh-cwmt17-test/trans
--- a/script/mteval_cwmt/en2zh-cwmt17-test/trans.detoken
+++ b/script/mteval_cwmt/en2zh-cwmt17-test/trans.detoken
--- a/script/mteval_cwmt/en2zh-cwmt17-test/tst.txt.xml
+++ b/script/mteval_cwmt/en2zh-cwmt17-test/tst.txt.xml
--- a/script/mteval_cwmt/lc.pl
+++ b/script/mteval_cwmt/lc.pl
+open in,"$ARGV[0]";
+
+while($in=<in>)
+{
+	chomp $in;
+	$in = lc($in);
+	print $in."\n";
+}
--- a/script/mteval_cwmt/mtdetoken.sh
+++ b/script/mteval_cwmt/mtdetoken.sh
+set -e
+basePath=./cwmt17-test/
+src_file=$basePath"input.token"
+refs_file=$basePath"ref.detoken"
+tst_file=$basePath"trans"
+output_path=$basePath
+detoken=./detoken.sh 
+srclang=en
+tgtlang=zh
+
+refs_file_detoken=
+
+IFS=',' arr=($refs_file)
+for x in ${arr[@]}; do
+   sh $detoken $x
+   refs_file_detoken=${refs_file_detoken}${x}".detoken"
+done
+refs_file_detoken=${refs_file_detoken%,*}
+sh $detoken $tst_file
+python3 detoken_zzy.py $tst_file".detoken" $tst_file".detoken.agv"
+echo "$refs_file $tst_file , detoken file has finished, output is $refs_file_detoken $tst_file".detoken""
+
+python3 ./Tool/mteval_sbp.py --src_testfile_path=$src_file \
+		      --refs_testfile_path=$refs_file\
+		      --tst_testfile_path=$tst_file".detoken.agv" \
+		      --output_path=$output_path \
+		      --srclang=$srclang \
+		      --tgtlang=$tgtlang
+./Tool/mteval_sbp.linux -c -r $output_path"ref.txt.xml" -s $output_path"src.txt.xml" -t $output_path"tst.txt.xml" > $output_path"1best_result-detoken"
+echo "eval result has created."
--- a/script/mteval_cwmt/mtdetokenen2zh.sh
+++ b/script/mteval_cwmt/mtdetokenen2zh.sh
+set -e
+basePath=./en2zh-cwmt17-test/
+src_file=$basePath"input.token"
+refs_file=$basePath"ref"
+tst_file=$basePath"trans"
+output_path=$basePath
+detoken=detoken.sh 
+srclang=en
+tgtlang=zh
+
+refs_file_detoken=
+
+IFS=',' arr=($refs_file)
+for x in ${arr[@]}; do
+   sh $detoken $x
+   refs_file_detoken=${refs_file_detoken}${x}".detoken"
+done
+refs_file_detoken=${refs_file_detoken%,*}
+sh $detoken $tst_file
+#python3 detoken_zzy.py $tst_file".detoken" $tst_file".detoken.agv"
+echo "$refs_file $tst_file , detoken file has finished, output is $refs_file_detoken $tst_file".detoken""
+
+python3 ./Tool/mteval_sbp.py --src_testfile_path=$src_file \
+		      --refs_testfile_path=$refs_file_detoken \
+		      --tst_testfile_path=$tst_file".detoken" \
+		      --output_path=$output_path \
+		      --srclang=$srclang \
+		      --tgtlang=$tgtlang
+./Tool/mteval_sbp.linux -c -r $output_path"ref.txt.xml" -s $output_path"src.txt.xml" -t $output_path"tst.txt.xml" > $output_path"1best_result-detoken"
+#perl ./Tool/mteval-v13a_Niu.pl -r $output_path"ref.txt.xml" -s $output_path"src.txt.xml" -t $output_path"tst.txt.xml" > $output_path"1best_result-detoken_m"
+echo "eval result has created."
--- a/script/mteval_cwmt/mteval_sbp-detoken.sh
+++ b/script/mteval_cwmt/mteval_sbp-detoken.sh
+set -e
+basePath=./cwmt18-dev/
+src_file=$basePath"input.token1000"
+#refs_file=$basePath"ref0",$basePath"ref1",$basePath"ref2",$basePath"ref3"
+refs_file=$basePath"ref1000"
+#refs_file=$basePath"cwmt18.untoken.enpun1000"
+tst_file=$basePath"cwmt18.rerank.1000"
+output_path=$basePath
+detoken=detoken.sh 
+srclang=zh
+tgtlang=en
+
+refs_file_detoken=
+
+IFS=',' arr=($refs_file)
+for x in ${arr[@]}; do
+   sh $detoken $x
+   refs_file_detoken=${refs_file_detoken}${x}".detoken"
+done
+refs_file_detoken=${refs_file_detoken%,*}
+sh $detoken $tst_file
+python3 detoken_zzy.py $tst_file".detoken" $tst_file".detoken.agv"
+echo "$refs_file $tst_file , detoken file has finished, output is $refs_file_detoken $tst_file".detoken""
+
+python3 ./Tool/mteval_sbp.py --src_testfile_path=$src_file \
+		      --refs_testfile_path=$refs_file \
+		      --tst_testfile_path=$tst_file \
+		      --output_path=$output_path \
+		      --srclang=$srclang \
+		      --tgtlang=$tgtlang
+./Tool/mteval_sbp.linux -c -r $output_path"ref.txt.xml" -s $output_path"src.txt.xml" -t $output_path"tst.txt.xml" > $output_path"1best_result-detoken"
+echo "eval result has created."
--- a/script/mteval_cwmt/mteval_sbp.sh
+++ b/script/mteval_cwmt/mteval_sbp.sh
+basePath=./baseline/exact2k/
+src_file=$basePath"input.token"
+refs_file=$basePath"ref"
+tst_file=$basePath"tst.trans"
+output_path=$basePath
+srclang=zh
+tgtlang=en
+
+python3 ./Tool/mteval_sbp.py --src_testfile_path=$src_file \
+                      --refs_testfile_path=$refs_file \
+                      --tst_testfile_path=$tst_file \
+                      --output_path=$output_path \
+                      --srclang=$srclang \
+                      --tgtlang=$tgtlang
+./Tool/mteval_sbp.linux -c -r $output_path"ref.txt.xml" -s $output_path"src.txt.xml" -t $output_path"tst.txt.xml" > $output_path"1best_result"
+echo "eval result has created."
--- a/script/mteval_wmt/Twig.pm
+++ b/script/mteval_wmt/Twig.pm
--- a/script/mteval_wmt/detoken.perl
+++ b/script/mteval_wmt/detoken.perl
--- a/script/mteval_wmt/detoken_en.py
+++ b/script/mteval_wmt/detoken_en.py
+# coding=utf-8
+
+import re
+import sys
+
+# - 
+# ℃ ？
+# 1.35
+
+if __name__ == '__main__':
+    print("python detoken_zzy.py infile outfile")
+    file_in = open(sys.argv[1],"r",encoding="UTF-8")
+    file_out = open(sys.argv[2],"w",encoding="UTF-8")
+    count = 0
+    while True:
+        line = file_in.readline()
+        if len(line) == 0:
+            break
+        line = line.replace(" - ","-")
+        line = line.replace(" -- ","-")    
+        line = line.replace(" -","-")  
+        line = line.replace("- ","-")              
+        dot = re.findall(r"\d\. \d",line)        
+        if len(dot) != 0:
+            for item in dot:
+                line = line.replace(item,item.replace(". ","."))
+            count += 1
+        file_out.write(line)
+    print("dot: " + str(count))
+    print("detoken_zzy done!")    
--- a/script/mteval_wmt/detoken_zh.py
+++ b/script/mteval_wmt/detoken_zh.py
+# coding=utf-8
+
+import re
+import sys
+
+# - 
+# ℃ ？
+# 1.35
+
+if __name__ == '__main__':
+    print("python detoken_zh.py infile outfile")
+    file_in = open(sys.argv[1],"r",encoding="UTF-8")
+    file_out = open(sys.argv[2],"w",encoding="UTF-8")
+    count = 0
+    while True:
+        line = file_in.readline()
+        if len(line) == 0:
+            break
+        line = line.replace(" - ","-")
+        line = line.replace(" -- ","-")    
+        line = line.replace(" -","-")  
+        line = line.replace("- ","-")  
+
+        dot = re.findall(r"\d\. \d",line)    
+        if len(dot) != 0:
+            for item in dot:
+                line = line.replace(item,item.replace(". ","."))
+                item = item.replace(". ",".")
+            count += 1
+
+        full_point = re.findall(r"\.$|\.\"$",line)
+        if len(full_point) != 0:
+            for item in full_point:
+                line = line.replace(item,item.replace(".","。"))
+                item = item.replace(".","。")
+            count += 1
+
+        quote = re.findall(r"\'.+?\'",line)
+        if len(quote) != 0:
+            for item in quote:
+                line = line.replace(item,item.replace("'","‘",1))
+                item = item.replace("'","‘",1)
+                line = line.replace(item,item.replace("'","’",1))  
+                item = item.replace("'","’",1)              
+            count += 2
+
+        quote_doble = re.findall(r"\".+?\"",line)
+        if len(quote_doble) != 0:
+            for item in quote_doble:
+                # if "\"" in item:
+                #     print(item)
+                line = line.replace(item,item.replace("\"","“",1))
+                # print(item)
+                item = item.replace("\"","“",1)
+                line = line.replace(item,item.replace("\"","”",1))  
+                item = item.replace("\"","”",1)           
+            count += 2      
+
+        colon = re.findall(r"[^\d]:",line) # maohao    
+        if len(colon) != 0:
+            for item in colon:
+                line = line.replace(item,item.replace(":","："))
+                item = item.replace(":","：")
+            count += 1   
+
+        line = line.replace("•","﹒")
+        line = line.replace(",","，")
+        file_out.write(line)
+    print("all: " + str(count))
+    print("detoken_zh done!")    
--- a/script/mteval_wmt/mteval-v13a_Niu.pl
+++ b/script/mteval_wmt/mteval-v13a_Niu.pl
--- a/script/mteval_wmt/mteval-v13a_svn.pl
+++ b/script/mteval_wmt/mteval-v13a_svn.pl
--- a/script/mteval_wmt/mteval-v13m.pl
+++ b/script/mteval_wmt/mteval-v13m.pl
--- a/script/mteval_wmt/newstest2017-enzh-ref.zh.sgm
+++ b/script/mteval_wmt/newstest2017-enzh-ref.zh.sgm
--- a/script/mteval_wmt/newstest2017-enzh-ref.zh.sgm.token
+++ b/script/mteval_wmt/newstest2017-enzh-ref.zh.sgm.token
--- a/script/mteval_wmt/newstest2017-enzh-src.en.sgm
+++ b/script/mteval_wmt/newstest2017-enzh-src.en.sgm
--- a/script/mteval_wmt/newstest2017-zhen-ref.en.sgm
+++ b/script/mteval_wmt/newstest2017-zhen-ref.en.sgm
--- a/script/mteval_wmt/newstest2017-zhen-src.zh.sgm
+++ b/script/mteval_wmt/newstest2017-zhen-src.zh.sgm
--- a/script/mteval_wmt/t.sgm
+++ b/script/mteval_wmt/t.sgm
--- a/script/mteval_wmt/t.sgm.token
+++ b/script/mteval_wmt/t.sgm.token
--- a/script/mteval_wmt/t.sgm2
+++ b/script/mteval_wmt/t.sgm2
--- a/script/mteval_wmt/t_wmten2zh.sgm.token
+++ b/script/mteval_wmt/t_wmten2zh.sgm.token
--- a/script/mteval_wmt/tokenizeChinese.py
+++ b/script/mteval_wmt/tokenizeChinese.py
--- a/script/mteval_wmt/tst.detoken
+++ b/script/mteval_wmt/tst.detoken
--- a/script/mteval_wmt/tst.detoken2
+++ b/script/mteval_wmt/tst.detoken2
--- a/script/mteval_wmt/tst.seg
+++ b/script/mteval_wmt/tst.seg
--- a/script/mteval_wmt/tst.token
+++ b/script/mteval_wmt/tst.token
--- a/script/mteval_wmt/wmt17-test.beam12.alpha1.3.ensemble20.trans
+++ b/script/mteval_wmt/wmt17-test.beam12.alpha1.3.ensemble20.trans
--- a/script/mteval_wmt/wmt17-test.beam12.alpha1.3.ensemble20.trans.detoken
+++ b/script/mteval_wmt/wmt17-test.beam12.alpha1.3.ensemble20.trans.detoken
--- a/script/mteval_wmt/wmt17-test.beam12.alpha1.3.ensemble20.trans.detoken2
+++ b/script/mteval_wmt/wmt17-test.beam12.alpha1.3.ensemble20.trans.detoken2
--- a/script/mteval_wmt/wrap-xml.perl
+++ b/script/mteval_wmt/wrap-xml.perl
--- a/script/multi-bleu.perl
+++ b/script/multi-bleu.perl
--- a/script/rerank_create_input.py
+++ b/script/rerank_create_input.py
--- a/script/rerank_score.py
+++ b/script/rerank_score.py
--- a/script/t2t-gen.py
+++ b/script/t2t-gen.py
--- a/setup.py
+++ b/setup.py
--- a/tensor2tensor/__init__.py
+++ b/tensor2tensor/__init__.py
--- a/tensor2tensor/bin/__init__.py
+++ b/tensor2tensor/bin/__init__.py
--- a/tensor2tensor/bin/make_tf_configs.py
+++ b/tensor2tensor/bin/make_tf_configs.py
--- a/tensor2tensor/bin/t2t-avg-all
+++ b/tensor2tensor/bin/t2t-avg-all
--- a/tensor2tensor/bin/t2t-bleu
+++ b/tensor2tensor/bin/t2t-bleu
--- a/tensor2tensor/bin/t2t-datagen
+++ b/tensor2tensor/bin/t2t-datagen
--- a/tensor2tensor/bin/t2t-decoder
+++ b/tensor2tensor/bin/t2t-decoder
--- a/tensor2tensor/bin/t2t-exporter
+++ b/tensor2tensor/bin/t2t-exporter
--- a/tensor2tensor/bin/t2t-insights-server
+++ b/tensor2tensor/bin/t2t-insights-server
--- a/tensor2tensor/bin/t2t-make-tf-configs
+++ b/tensor2tensor/bin/t2t-make-tf-configs
--- a/tensor2tensor/bin/t2t-query-server
+++ b/tensor2tensor/bin/t2t-query-server
--- a/tensor2tensor/bin/t2t-trainer
+++ b/tensor2tensor/bin/t2t-trainer
--- a/tensor2tensor/bin/t2t-translate-all
+++ b/tensor2tensor/bin/t2t-translate-all
--- a/tensor2tensor/bin/t2t_avg_all.py
+++ b/tensor2tensor/bin/t2t_avg_all.py
--- a/tensor2tensor/bin/t2t_bleu.py
+++ b/tensor2tensor/bin/t2t_bleu.py
--- a/tensor2tensor/bin/t2t_datagen.py
+++ b/tensor2tensor/bin/t2t_datagen.py
--- a/tensor2tensor/bin/t2t_decoder.py
+++ b/tensor2tensor/bin/t2t_decoder.py
--- a/tensor2tensor/bin/t2t_distill.py
+++ b/tensor2tensor/bin/t2t_distill.py
--- a/tensor2tensor/bin/t2t_trainer.py
+++ b/tensor2tensor/bin/t2t_trainer.py
--- a/tensor2tensor/bin/t2t_trainer_test.py
+++ b/tensor2tensor/bin/t2t_trainer_test.py
--- a/tensor2tensor/bin/t2t_translate_all.py
+++ b/tensor2tensor/bin/t2t_translate_all.py
--- a/tensor2tensor/data_generators/README.md
+++ b/tensor2tensor/data_generators/README.md
--- a/tensor2tensor/data_generators/__init__.py
+++ b/tensor2tensor/data_generators/__init__.py
--- a/tensor2tensor/data_generators/algorithmic.py
+++ b/tensor2tensor/data_generators/algorithmic.py
--- a/tensor2tensor/data_generators/algorithmic_math.py
+++ b/tensor2tensor/data_generators/algorithmic_math.py
--- a/tensor2tensor/data_generators/algorithmic_math_test.py
+++ b/tensor2tensor/data_generators/algorithmic_math_test.py
--- a/tensor2tensor/data_generators/algorithmic_test.py
+++ b/tensor2tensor/data_generators/algorithmic_test.py
--- a/tensor2tensor/data_generators/all_problems.py
+++ b/tensor2tensor/data_generators/all_problems.py
--- a/tensor2tensor/data_generators/all_problems_test.py
+++ b/tensor2tensor/data_generators/all_problems_test.py
--- a/tensor2tensor/data_generators/audio.py
+++ b/tensor2tensor/data_generators/audio.py
--- a/tensor2tensor/data_generators/audio_test.py
+++ b/tensor2tensor/data_generators/audio_test.py
--- a/tensor2tensor/data_generators/babi_qa.py
+++ b/tensor2tensor/data_generators/babi_qa.py
--- a/tensor2tensor/data_generators/bair_robot_pushing.py
+++ b/tensor2tensor/data_generators/bair_robot_pushing.py
--- a/tensor2tensor/data_generators/celeba.py
+++ b/tensor2tensor/data_generators/celeba.py
--- a/tensor2tensor/data_generators/cifar.py
+++ b/tensor2tensor/data_generators/cifar.py
--- a/tensor2tensor/data_generators/cipher.py
+++ b/tensor2tensor/data_generators/cipher.py
--- a/tensor2tensor/data_generators/cnn_dailymail.py
+++ b/tensor2tensor/data_generators/cnn_dailymail.py
--- a/tensor2tensor/data_generators/desc2code.py
+++ b/tensor2tensor/data_generators/desc2code.py
--- a/tensor2tensor/data_generators/desc2code_test.py
+++ b/tensor2tensor/data_generators/desc2code_test.py
--- a/tensor2tensor/data_generators/dna_encoder.py
+++ b/tensor2tensor/data_generators/dna_encoder.py
--- a/tensor2tensor/data_generators/dna_encoder_test.py
+++ b/tensor2tensor/data_generators/dna_encoder_test.py
--- a/tensor2tensor/data_generators/fsns.py
+++ b/tensor2tensor/data_generators/fsns.py
--- a/tensor2tensor/data_generators/gene_expression.py
+++ b/tensor2tensor/data_generators/gene_expression.py
--- a/tensor2tensor/data_generators/gene_expression_test.py
+++ b/tensor2tensor/data_generators/gene_expression_test.py
--- a/tensor2tensor/data_generators/generator_utils.py
+++ b/tensor2tensor/data_generators/generator_utils.py
--- a/tensor2tensor/data_generators/generator_utils_test.py
+++ b/tensor2tensor/data_generators/generator_utils_test.py
--- a/tensor2tensor/data_generators/google_robot_pushing.py
+++ b/tensor2tensor/data_generators/google_robot_pushing.py
--- a/tensor2tensor/data_generators/gym_problems.py
+++ b/tensor2tensor/data_generators/gym_problems.py
--- a/tensor2tensor/data_generators/gym_problems_test.py
+++ b/tensor2tensor/data_generators/gym_problems_test.py
--- a/tensor2tensor/data_generators/gym_utils.py
+++ b/tensor2tensor/data_generators/gym_utils.py
--- a/tensor2tensor/data_generators/ice_parsing.py
+++ b/tensor2tensor/data_generators/ice_parsing.py
--- a/tensor2tensor/data_generators/image_utils.py
+++ b/tensor2tensor/data_generators/image_utils.py
--- a/tensor2tensor/data_generators/image_utils_test.py
+++ b/tensor2tensor/data_generators/image_utils_test.py
--- a/tensor2tensor/data_generators/imagenet.py
+++ b/tensor2tensor/data_generators/imagenet.py
--- a/tensor2tensor/data_generators/imdb.py
+++ b/tensor2tensor/data_generators/imdb.py
--- a/tensor2tensor/data_generators/inspect_tfrecord.py
+++ b/tensor2tensor/data_generators/inspect_tfrecord.py
--- a/tensor2tensor/data_generators/lambada.py
+++ b/tensor2tensor/data_generators/lambada.py
--- a/tensor2tensor/data_generators/librispeech.py
+++ b/tensor2tensor/data_generators/librispeech.py
--- a/tensor2tensor/data_generators/lm1b.py
+++ b/tensor2tensor/data_generators/lm1b.py
--- a/tensor2tensor/data_generators/mnist.py
+++ b/tensor2tensor/data_generators/mnist.py
--- a/tensor2tensor/data_generators/mscoco.py
+++ b/tensor2tensor/data_generators/mscoco.py
--- a/tensor2tensor/data_generators/multinli.py
+++ b/tensor2tensor/data_generators/multinli.py
--- a/tensor2tensor/data_generators/ocr.py
+++ b/tensor2tensor/data_generators/ocr.py
--- a/tensor2tensor/data_generators/problem.py
+++ b/tensor2tensor/data_generators/problem.py
--- a/tensor2tensor/data_generators/problem_hparams.py
+++ b/tensor2tensor/data_generators/problem_hparams.py
--- a/tensor2tensor/data_generators/program_search.py
+++ b/tensor2tensor/data_generators/program_search.py
--- a/tensor2tensor/data_generators/program_search_test.py
+++ b/tensor2tensor/data_generators/program_search_test.py
--- a/tensor2tensor/data_generators/ptb.py
+++ b/tensor2tensor/data_generators/ptb.py
--- a/tensor2tensor/data_generators/snli.py
+++ b/tensor2tensor/data_generators/snli.py
--- a/tensor2tensor/data_generators/speech_recognition.py
+++ b/tensor2tensor/data_generators/speech_recognition.py
--- a/tensor2tensor/data_generators/squad.py
+++ b/tensor2tensor/data_generators/squad.py
--- a/tensor2tensor/data_generators/subject_verb_agreement.py
+++ b/tensor2tensor/data_generators/subject_verb_agreement.py
--- a/tensor2tensor/data_generators/test_data/corpus-1.txt
+++ b/tensor2tensor/data_generators/test_data/corpus-1.txt
--- a/tensor2tensor/data_generators/test_data/corpus-2.txt
+++ b/tensor2tensor/data_generators/test_data/corpus-2.txt
--- a/tensor2tensor/data_generators/test_data/vocab-1.txt
+++ b/tensor2tensor/data_generators/test_data/vocab-1.txt
--- a/tensor2tensor/data_generators/test_data/vocab-2.txt
+++ b/tensor2tensor/data_generators/test_data/vocab-2.txt
--- a/tensor2tensor/data_generators/text_encoder.py
+++ b/tensor2tensor/data_generators/text_encoder.py
--- a/tensor2tensor/data_generators/text_encoder_build_subword.py
+++ b/tensor2tensor/data_generators/text_encoder_build_subword.py
--- a/tensor2tensor/data_generators/text_encoder_test.py
+++ b/tensor2tensor/data_generators/text_encoder_test.py
--- a/tensor2tensor/data_generators/text_problems.py
+++ b/tensor2tensor/data_generators/text_problems.py
--- a/tensor2tensor/data_generators/text_problems_test.py
+++ b/tensor2tensor/data_generators/text_problems_test.py
--- a/tensor2tensor/data_generators/timeseries.py
+++ b/tensor2tensor/data_generators/timeseries.py
--- a/tensor2tensor/data_generators/timeseries_test.py
+++ b/tensor2tensor/data_generators/timeseries_test.py
--- a/tensor2tensor/data_generators/tokenizer.py
+++ b/tensor2tensor/data_generators/tokenizer.py
--- a/tensor2tensor/data_generators/tokenizer_test.py
+++ b/tensor2tensor/data_generators/tokenizer_test.py
--- a/tensor2tensor/data_generators/translate.py
+++ b/tensor2tensor/data_generators/translate.py
--- a/tensor2tensor/data_generators/translate_encs.py
+++ b/tensor2tensor/data_generators/translate_encs.py
--- a/tensor2tensor/data_generators/translate_ende.py
+++ b/tensor2tensor/data_generators/translate_ende.py
--- a/tensor2tensor/data_generators/translate_enet.py
+++ b/tensor2tensor/data_generators/translate_enet.py
--- a/tensor2tensor/data_generators/translate_enfr.py
+++ b/tensor2tensor/data_generators/translate_enfr.py
--- a/tensor2tensor/data_generators/translate_enid.py
+++ b/tensor2tensor/data_generators/translate_enid.py
--- a/tensor2tensor/data_generators/translate_enmk.py
+++ b/tensor2tensor/data_generators/translate_enmk.py
--- a/tensor2tensor/data_generators/translate_envi.py
+++ b/tensor2tensor/data_generators/translate_envi.py
--- a/tensor2tensor/data_generators/translate_enzh.py
+++ b/tensor2tensor/data_generators/translate_enzh.py
--- a/tensor2tensor/data_generators/translate_test.py
+++ b/tensor2tensor/data_generators/translate_test.py
--- a/tensor2tensor/data_generators/twentybn.py
+++ b/tensor2tensor/data_generators/twentybn.py
--- a/tensor2tensor/data_generators/video_generated.py
+++ b/tensor2tensor/data_generators/video_generated.py
--- a/tensor2tensor/data_generators/video_utils.py
+++ b/tensor2tensor/data_generators/video_utils.py
--- a/tensor2tensor/data_generators/wiki.py
+++ b/tensor2tensor/data_generators/wiki.py
--- a/tensor2tensor/data_generators/wikisum/README.md
+++ b/tensor2tensor/data_generators/wikisum/README.md
--- a/tensor2tensor/data_generators/wikisum/__init__.py
+++ b/tensor2tensor/data_generators/wikisum/__init__.py
--- a/tensor2tensor/data_generators/wikisum/delete_instances.sh
+++ b/tensor2tensor/data_generators/wikisum/delete_instances.sh
--- a/tensor2tensor/data_generators/wikisum/generate_vocab.py
+++ b/tensor2tensor/data_generators/wikisum/generate_vocab.py
--- a/tensor2tensor/data_generators/wikisum/get_references_commoncrawl.py
+++ b/tensor2tensor/data_generators/wikisum/get_references_commoncrawl.py
--- a/tensor2tensor/data_generators/wikisum/get_references_web.py
+++ b/tensor2tensor/data_generators/wikisum/get_references_web.py
--- a/tensor2tensor/data_generators/wikisum/get_references_web_single_group.py
+++ b/tensor2tensor/data_generators/wikisum/get_references_web_single_group.py
--- a/tensor2tensor/data_generators/wikisum/parallel_launch.py
+++ b/tensor2tensor/data_generators/wikisum/parallel_launch.py
--- a/tensor2tensor/data_generators/wikisum/produce_examples.py
+++ b/tensor2tensor/data_generators/wikisum/produce_examples.py
--- a/tensor2tensor/data_generators/wikisum/test_data/para_bad1.txt
+++ b/tensor2tensor/data_generators/wikisum/test_data/para_bad1.txt
--- a/tensor2tensor/data_generators/wikisum/test_data/para_good1.txt
+++ b/tensor2tensor/data_generators/wikisum/test_data/para_good1.txt
--- a/tensor2tensor/data_generators/wikisum/utils.py
+++ b/tensor2tensor/data_generators/wikisum/utils.py
--- a/tensor2tensor/data_generators/wikisum/utils_test.py
+++ b/tensor2tensor/data_generators/wikisum/utils_test.py
--- a/tensor2tensor/data_generators/wikisum/validate_data.py
+++ b/tensor2tensor/data_generators/wikisum/validate_data.py
--- a/tensor2tensor/data_generators/wikisum/wikisum.py
+++ b/tensor2tensor/data_generators/wikisum/wikisum.py
--- a/tensor2tensor/data_generators/wikitext103.py
+++ b/tensor2tensor/data_generators/wikitext103.py
--- a/tensor2tensor/data_generators/wsj_parsing.py
+++ b/tensor2tensor/data_generators/wsj_parsing.py
--- a/tensor2tensor/insights/README.md
+++ b/tensor2tensor/insights/README.md
--- a/tensor2tensor/insights/__init__.py
+++ b/tensor2tensor/insights/__init__.py
--- a/tensor2tensor/insights/graph.py
+++ b/tensor2tensor/insights/graph.py
--- a/tensor2tensor/insights/insight_configuration.proto
+++ b/tensor2tensor/insights/insight_configuration.proto
--- a/tensor2tensor/insights/polymer/.bowerrc
+++ b/tensor2tensor/insights/polymer/.bowerrc
--- a/tensor2tensor/insights/polymer/attention_visualization/attention-visualization.html
+++ b/tensor2tensor/insights/polymer/attention_visualization/attention-visualization.html
--- a/tensor2tensor/insights/polymer/attention_visualization/attention-visualization.js
+++ b/tensor2tensor/insights/polymer/attention_visualization/attention-visualization.js
--- a/tensor2tensor/insights/polymer/bower.json
+++ b/tensor2tensor/insights/polymer/bower.json
--- a/tensor2tensor/insights/polymer/common-types.js
+++ b/tensor2tensor/insights/polymer/common-types.js
--- a/tensor2tensor/insights/polymer/explore_view/explore-view.html
+++ b/tensor2tensor/insights/polymer/explore_view/explore-view.html
--- a/tensor2tensor/insights/polymer/explore_view/explore-view.js
+++ b/tensor2tensor/insights/polymer/explore_view/explore-view.js
--- a/tensor2tensor/insights/polymer/graph_visualization/graph-visualization.html
+++ b/tensor2tensor/insights/polymer/graph_visualization/graph-visualization.html
--- a/tensor2tensor/insights/polymer/graph_visualization/graph-visualization.js
+++ b/tensor2tensor/insights/polymer/graph_visualization/graph-visualization.js
--- a/tensor2tensor/insights/polymer/index.html
+++ b/tensor2tensor/insights/polymer/index.html
--- a/tensor2tensor/insights/polymer/insights_app/insights-app.html
+++ b/tensor2tensor/insights/polymer/insights_app/insights-app.html
--- a/tensor2tensor/insights/polymer/insights_app/insights-app.js
+++ b/tensor2tensor/insights/polymer/insights_app/insights-app.js
--- a/tensor2tensor/insights/polymer/language_selector/language-selector-content.html
+++ b/tensor2tensor/insights/polymer/language_selector/language-selector-content.html
--- a/tensor2tensor/insights/polymer/language_selector/language-selector-content.js
+++ b/tensor2tensor/insights/polymer/language_selector/language-selector-content.js
--- a/tensor2tensor/insights/polymer/language_selector/language-selector.html
+++ b/tensor2tensor/insights/polymer/language_selector/language-selector.html
--- a/tensor2tensor/insights/polymer/language_selector/language-selector.js
+++ b/tensor2tensor/insights/polymer/language_selector/language-selector.js
--- a/tensor2tensor/insights/polymer/processing_visualization/processing-visualization.html
+++ b/tensor2tensor/insights/polymer/processing_visualization/processing-visualization.html
--- a/tensor2tensor/insights/polymer/processing_visualization/processing-visualization.js
+++ b/tensor2tensor/insights/polymer/processing_visualization/processing-visualization.js
--- a/tensor2tensor/insights/polymer/query_card/query-card.html
+++ b/tensor2tensor/insights/polymer/query_card/query-card.html
--- a/tensor2tensor/insights/polymer/query_card/query-card.js
+++ b/tensor2tensor/insights/polymer/query_card/query-card.js
--- a/tensor2tensor/insights/polymer/tensor2tensor.html
+++ b/tensor2tensor/insights/polymer/tensor2tensor.html
--- a/tensor2tensor/insights/polymer/translation_result/translation-result.html
+++ b/tensor2tensor/insights/polymer/translation_result/translation-result.html
--- a/tensor2tensor/insights/polymer/translation_result/translation-result.js
+++ b/tensor2tensor/insights/polymer/translation_result/translation-result.js
--- a/tensor2tensor/insights/query_processor.py
+++ b/tensor2tensor/insights/query_processor.py
--- a/tensor2tensor/insights/server.py
+++ b/tensor2tensor/insights/server.py
--- a/tensor2tensor/insights/transformer_model.py
+++ b/tensor2tensor/insights/transformer_model.py
--- a/tensor2tensor/layers/__init__.py
+++ b/tensor2tensor/layers/__init__.py
--- a/tensor2tensor/layers/common_attention.py
+++ b/tensor2tensor/layers/common_attention.py
--- a/tensor2tensor/layers/common_attention_test.py
+++ b/tensor2tensor/layers/common_attention_test.py
--- a/tensor2tensor/layers/common_hparams.py
+++ b/tensor2tensor/layers/common_hparams.py
--- a/tensor2tensor/layers/common_image_attention.py
+++ b/tensor2tensor/layers/common_image_attention.py
--- a/tensor2tensor/layers/common_layers.py
+++ b/tensor2tensor/layers/common_layers.py
--- a/tensor2tensor/layers/common_layers_test.py
+++ b/tensor2tensor/layers/common_layers_test.py
--- a/tensor2tensor/layers/discretization.py
+++ b/tensor2tensor/layers/discretization.py
--- a/tensor2tensor/layers/discretization_test.py
+++ b/tensor2tensor/layers/discretization_test.py
--- a/tensor2tensor/layers/latent_layers.py
+++ b/tensor2tensor/layers/latent_layers.py
--- a/tensor2tensor/layers/modalities.py
+++ b/tensor2tensor/layers/modalities.py
--- a/tensor2tensor/layers/modalities_test.py
+++ b/tensor2tensor/layers/modalities_test.py
--- a/tensor2tensor/layers/rev_block.py
+++ b/tensor2tensor/layers/rev_block.py
--- a/tensor2tensor/layers/rev_block_test.py
+++ b/tensor2tensor/layers/rev_block_test.py
--- a/tensor2tensor/models/README.md
+++ b/tensor2tensor/models/README.md
--- a/tensor2tensor/models/__init__.py
+++ b/tensor2tensor/models/__init__.py
--- a/tensor2tensor/models/basic.py
+++ b/tensor2tensor/models/basic.py
--- a/tensor2tensor/models/basic_test.py
+++ b/tensor2tensor/models/basic_test.py
--- a/tensor2tensor/models/bytenet.py
+++ b/tensor2tensor/models/bytenet.py
--- a/tensor2tensor/models/bytenet_test.py
+++ b/tensor2tensor/models/bytenet_test.py
--- a/tensor2tensor/models/distillation.py
+++ b/tensor2tensor/models/distillation.py
--- a/tensor2tensor/models/image_transformer.py
+++ b/tensor2tensor/models/image_transformer.py
--- a/tensor2tensor/models/image_transformer_2d.py
+++ b/tensor2tensor/models/image_transformer_2d.py
--- a/tensor2tensor/models/image_transformer_2d_test.py
+++ b/tensor2tensor/models/image_transformer_2d_test.py
--- a/tensor2tensor/models/image_transformer_test.py
+++ b/tensor2tensor/models/image_transformer_test.py
--- a/tensor2tensor/models/layer_history.py
+++ b/tensor2tensor/models/layer_history.py
--- a/tensor2tensor/models/lstm.py
+++ b/tensor2tensor/models/lstm.py
--- a/tensor2tensor/models/lstm_test.py
+++ b/tensor2tensor/models/lstm_test.py
--- a/tensor2tensor/models/neural_gpu.py
+++ b/tensor2tensor/models/neural_gpu.py
--- a/tensor2tensor/models/neural_gpu_test.py
+++ b/tensor2tensor/models/neural_gpu_test.py
--- a/tensor2tensor/models/research/__init__.py
+++ b/tensor2tensor/models/research/__init__.py
--- a/tensor2tensor/models/research/adafactor_experiments.py
+++ b/tensor2tensor/models/research/adafactor_experiments.py
--- a/tensor2tensor/models/research/aligned.py
+++ b/tensor2tensor/models/research/aligned.py
--- a/tensor2tensor/models/research/attention_lm.py
+++ b/tensor2tensor/models/research/attention_lm.py
--- a/tensor2tensor/models/research/attention_lm_moe.py
+++ b/tensor2tensor/models/research/attention_lm_moe.py
--- a/tensor2tensor/models/research/autoencoders.py
+++ b/tensor2tensor/models/research/autoencoders.py
--- a/tensor2tensor/models/research/autoencoders_test.py
+++ b/tensor2tensor/models/research/autoencoders_test.py
--- a/tensor2tensor/models/research/cycle_gan.py
+++ b/tensor2tensor/models/research/cycle_gan.py
--- a/tensor2tensor/models/research/gene_expression.py
+++ b/tensor2tensor/models/research/gene_expression.py
--- a/tensor2tensor/models/research/gene_expression_test.py
+++ b/tensor2tensor/models/research/gene_expression_test.py
--- a/tensor2tensor/models/research/lm_experiments.py
+++ b/tensor2tensor/models/research/lm_experiments.py
--- a/tensor2tensor/models/research/multimodel.py
+++ b/tensor2tensor/models/research/multimodel.py
--- a/tensor2tensor/models/research/multimodel_test.py
+++ b/tensor2tensor/models/research/multimodel_test.py
--- a/tensor2tensor/models/research/next_frame.py
+++ b/tensor2tensor/models/research/next_frame.py
--- a/tensor2tensor/models/research/next_frame_test.py
+++ b/tensor2tensor/models/research/next_frame_test.py
--- a/tensor2tensor/models/research/rl.py
+++ b/tensor2tensor/models/research/rl.py
--- a/tensor2tensor/models/research/super_lm.py
+++ b/tensor2tensor/models/research/super_lm.py
--- a/tensor2tensor/models/research/transformer_moe.py
+++ b/tensor2tensor/models/research/transformer_moe.py
--- a/tensor2tensor/models/research/transformer_nat.py
+++ b/tensor2tensor/models/research/transformer_nat.py
--- a/tensor2tensor/models/research/transformer_revnet.py
+++ b/tensor2tensor/models/research/transformer_revnet.py
--- a/tensor2tensor/models/research/transformer_revnet_test.py
+++ b/tensor2tensor/models/research/transformer_revnet_test.py
--- a/tensor2tensor/models/research/transformer_sketch.py
+++ b/tensor2tensor/models/research/transformer_sketch.py
--- a/tensor2tensor/models/research/transformer_symshard.py
+++ b/tensor2tensor/models/research/transformer_symshard.py
--- a/tensor2tensor/models/research/transformer_vae.py
+++ b/tensor2tensor/models/research/transformer_vae.py
--- a/tensor2tensor/models/research/transformer_vae_test.py
+++ b/tensor2tensor/models/research/transformer_vae_test.py
--- a/tensor2tensor/models/research/universal_transformer.py
+++ b/tensor2tensor/models/research/universal_transformer.py
--- a/tensor2tensor/models/research/universal_transformer_test.py
+++ b/tensor2tensor/models/research/universal_transformer_test.py
--- a/tensor2tensor/models/research/universal_transformer_util.py
+++ b/tensor2tensor/models/research/universal_transformer_util.py
--- a/tensor2tensor/models/resnet.py
+++ b/tensor2tensor/models/resnet.py
--- a/tensor2tensor/models/resnet_test.py
+++ b/tensor2tensor/models/resnet_test.py
--- a/tensor2tensor/models/revnet.py
+++ b/tensor2tensor/models/revnet.py
--- a/tensor2tensor/models/revnet_test.py
+++ b/tensor2tensor/models/revnet_test.py
--- a/tensor2tensor/models/shake_shake.py
+++ b/tensor2tensor/models/shake_shake.py
--- a/tensor2tensor/models/slicenet.py
+++ b/tensor2tensor/models/slicenet.py
--- a/tensor2tensor/models/slicenet_test.py
+++ b/tensor2tensor/models/slicenet_test.py
--- a/tensor2tensor/models/transformer.py
+++ b/tensor2tensor/models/transformer.py
--- a/tensor2tensor/models/transformer_dla.py
+++ b/tensor2tensor/models/transformer_dla.py
--- a/tensor2tensor/models/transformer_test.py
+++ b/tensor2tensor/models/transformer_test.py
--- a/tensor2tensor/models/vanilla_gan.py
+++ b/tensor2tensor/models/vanilla_gan.py
--- a/tensor2tensor/models/xception.py
+++ b/tensor2tensor/models/xception.py
--- a/tensor2tensor/models/xception_test.py
+++ b/tensor2tensor/models/xception_test.py
--- a/tensor2tensor/notebooks/asr_transformer.ipynb
+++ b/tensor2tensor/notebooks/asr_transformer.ipynb
--- a/tensor2tensor/notebooks/hello_t2t-rl.ipynb
+++ b/tensor2tensor/notebooks/hello_t2t-rl.ipynb
--- a/tensor2tensor/notebooks/hello_t2t.ipynb
+++ b/tensor2tensor/notebooks/hello_t2t.ipynb
--- a/tensor2tensor/problems.py
+++ b/tensor2tensor/problems.py
--- a/tensor2tensor/rl/README.md
+++ b/tensor2tensor/rl/README.md
--- a/tensor2tensor/rl/__init__.py
+++ b/tensor2tensor/rl/__init__.py
--- a/tensor2tensor/rl/collect.py
+++ b/tensor2tensor/rl/collect.py
--- a/tensor2tensor/rl/envs/__init__.py
+++ b/tensor2tensor/rl/envs/__init__.py
--- a/tensor2tensor/rl/envs/batch_env.py
+++ b/tensor2tensor/rl/envs/batch_env.py
--- a/tensor2tensor/rl/envs/in_graph_batch_env.py
+++ b/tensor2tensor/rl/envs/in_graph_batch_env.py
--- a/tensor2tensor/rl/envs/py_func_batch_env.py
+++ b/tensor2tensor/rl/envs/py_func_batch_env.py
--- a/tensor2tensor/rl/envs/simulated_batch_env.py
+++ b/tensor2tensor/rl/envs/simulated_batch_env.py
--- a/tensor2tensor/rl/envs/tf_atari_wrappers.py
+++ b/tensor2tensor/rl/envs/tf_atari_wrappers.py
--- a/tensor2tensor/rl/envs/utils.py
+++ b/tensor2tensor/rl/envs/utils.py
--- a/tensor2tensor/rl/model_rl_experiment.py
+++ b/tensor2tensor/rl/model_rl_experiment.py
--- a/tensor2tensor/rl/model_rl_experiment_test.py
+++ b/tensor2tensor/rl/model_rl_experiment_test.py
--- a/tensor2tensor/rl/ppo.py
+++ b/tensor2tensor/rl/ppo.py
--- a/tensor2tensor/rl/rl_trainer_lib.py
+++ b/tensor2tensor/rl/rl_trainer_lib.py
--- a/tensor2tensor/rl/rl_trainer_lib_test.py
+++ b/tensor2tensor/rl/rl_trainer_lib_test.py
--- a/tensor2tensor/rl/t2t_rl_trainer.py
+++ b/tensor2tensor/rl/t2t_rl_trainer.py
--- a/tensor2tensor/serving/README.md
+++ b/tensor2tensor/serving/README.md
--- a/tensor2tensor/serving/__init__.py
+++ b/tensor2tensor/serving/__init__.py
--- a/tensor2tensor/serving/export.py
+++ b/tensor2tensor/serving/export.py
--- a/tensor2tensor/serving/query.py
+++ b/tensor2tensor/serving/query.py
--- a/tensor2tensor/serving/serving_utils.py
+++ b/tensor2tensor/serving/serving_utils.py
--- a/tensor2tensor/test_data/example_usr_dir/__init__.py
+++ b/tensor2tensor/test_data/example_usr_dir/__init__.py
--- a/tensor2tensor/test_data/example_usr_dir/my_submodule.py
+++ b/tensor2tensor/test_data/example_usr_dir/my_submodule.py
--- a/tensor2tensor/test_data/example_usr_dir/requirements.txt
+++ b/tensor2tensor/test_data/example_usr_dir/requirements.txt
--- a/tensor2tensor/test_data/transformer_test_ckpt/checkpoint
+++ b/tensor2tensor/test_data/transformer_test_ckpt/checkpoint
--- a/tensor2tensor/test_data/transformer_test_ckpt/flags.txt
+++ b/tensor2tensor/test_data/transformer_test_ckpt/flags.txt
--- a/tensor2tensor/test_data/transformer_test_ckpt/hparams.json
+++ b/tensor2tensor/test_data/transformer_test_ckpt/hparams.json
--- a/tensor2tensor/test_data/transformer_test_ckpt/model.ckpt-1.data-00000-of-00002
+++ b/tensor2tensor/test_data/transformer_test_ckpt/model.ckpt-1.data-00000-of-00002
--- a/tensor2tensor/test_data/transformer_test_ckpt/model.ckpt-1.data-00001-of-00002
+++ b/tensor2tensor/test_data/transformer_test_ckpt/model.ckpt-1.data-00001-of-00002
--- a/tensor2tensor/test_data/transformer_test_ckpt/model.ckpt-1.index
+++ b/tensor2tensor/test_data/transformer_test_ckpt/model.ckpt-1.index
--- a/tensor2tensor/test_data/transformer_test_ckpt/model.ckpt-1.meta
+++ b/tensor2tensor/test_data/transformer_test_ckpt/model.ckpt-1.meta
--- a/tensor2tensor/test_data/vocab.ende.32768
+++ b/tensor2tensor/test_data/vocab.ende.32768
--- a/tensor2tensor/test_data/vocab.ende.8192
+++ b/tensor2tensor/test_data/vocab.ende.8192
--- a/tensor2tensor/utils/__init__.py
+++ b/tensor2tensor/utils/__init__.py
--- a/tensor2tensor/utils/adafactor.py
+++ b/tensor2tensor/utils/adafactor.py
--- a/tensor2tensor/utils/avg_checkpoints.py
+++ b/tensor2tensor/utils/avg_checkpoints.py
--- a/tensor2tensor/utils/beam_search.py
+++ b/tensor2tensor/utils/beam_search.py
--- a/tensor2tensor/utils/beam_search_test.py
+++ b/tensor2tensor/utils/beam_search_test.py
--- a/tensor2tensor/utils/bleu_hook.py
+++ b/tensor2tensor/utils/bleu_hook.py
--- a/tensor2tensor/utils/bleu_hook_test.py
+++ b/tensor2tensor/utils/bleu_hook_test.py
--- a/tensor2tensor/utils/checkpoint_compatibility_test.py
+++ b/tensor2tensor/utils/checkpoint_compatibility_test.py
--- a/tensor2tensor/utils/cloud_mlengine.py
+++ b/tensor2tensor/utils/cloud_mlengine.py
--- a/tensor2tensor/utils/cloud_tpu.py
+++ b/tensor2tensor/utils/cloud_tpu.py
--- a/tensor2tensor/utils/data_reader.py
+++ b/tensor2tensor/utils/data_reader.py
--- a/tensor2tensor/utils/data_reader_test.py
+++ b/tensor2tensor/utils/data_reader_test.py
--- a/tensor2tensor/utils/decoding.py
+++ b/tensor2tensor/utils/decoding.py
--- a/tensor2tensor/utils/devices.py
+++ b/tensor2tensor/utils/devices.py
--- a/tensor2tensor/utils/diet.py
+++ b/tensor2tensor/utils/diet.py
--- a/tensor2tensor/utils/diet_test.py
+++ b/tensor2tensor/utils/diet_test.py
--- a/tensor2tensor/utils/expert_utils.py
+++ b/tensor2tensor/utils/expert_utils.py
--- a/tensor2tensor/utils/expert_utils_test.py
+++ b/tensor2tensor/utils/expert_utils_test.py
--- a/tensor2tensor/utils/flags.py
+++ b/tensor2tensor/utils/flags.py
--- a/tensor2tensor/utils/get_cnndm_rouge.sh
+++ b/tensor2tensor/utils/get_cnndm_rouge.sh
--- a/tensor2tensor/utils/get_ende_bleu.sh
+++ b/tensor2tensor/utils/get_ende_bleu.sh
--- a/tensor2tensor/utils/get_rouge.py
+++ b/tensor2tensor/utils/get_rouge.py
--- a/tensor2tensor/utils/learning_rate.py
+++ b/tensor2tensor/utils/learning_rate.py
--- a/tensor2tensor/utils/metrics.py
+++ b/tensor2tensor/utils/metrics.py
--- a/tensor2tensor/utils/metrics_hook.py
+++ b/tensor2tensor/utils/metrics_hook.py
--- a/tensor2tensor/utils/metrics_hook_test.py
+++ b/tensor2tensor/utils/metrics_hook_test.py
--- a/tensor2tensor/utils/metrics_test.py
+++ b/tensor2tensor/utils/metrics_test.py
--- a/tensor2tensor/utils/modality.py
+++ b/tensor2tensor/utils/modality.py
--- a/tensor2tensor/utils/multistep_optimizer.py
+++ b/tensor2tensor/utils/multistep_optimizer.py
--- a/tensor2tensor/utils/multistep_optimizer_test.py
+++ b/tensor2tensor/utils/multistep_optimizer_test.py
--- a/tensor2tensor/utils/optimize.py
+++ b/tensor2tensor/utils/optimize.py
--- a/tensor2tensor/utils/quantization.py
+++ b/tensor2tensor/utils/quantization.py
--- a/tensor2tensor/utils/registry.py
+++ b/tensor2tensor/utils/registry.py
--- a/tensor2tensor/utils/registry_test.py
+++ b/tensor2tensor/utils/registry_test.py
--- a/tensor2tensor/utils/rouge.py
+++ b/tensor2tensor/utils/rouge.py
--- a/tensor2tensor/utils/rouge_test.py
+++ b/tensor2tensor/utils/rouge_test.py
--- a/tensor2tensor/utils/t2t_model.py
+++ b/tensor2tensor/utils/t2t_model.py
--- a/tensor2tensor/utils/trainer_lib.py
+++ b/tensor2tensor/utils/trainer_lib.py
--- a/tensor2tensor/utils/trainer_lib_test.py
+++ b/tensor2tensor/utils/trainer_lib_test.py
--- a/tensor2tensor/utils/usr_dir.py
+++ b/tensor2tensor/utils/usr_dir.py
--- a/tensor2tensor/utils/yellowfin.py
+++ b/tensor2tensor/utils/yellowfin.py
--- a/tensor2tensor/utils/yellowfin_test.py
+++ b/tensor2tensor/utils/yellowfin_test.py
--- a/tensor2tensor/visualization/TransformerVisualization.ipynb
+++ b/tensor2tensor/visualization/TransformerVisualization.ipynb
--- a/tensor2tensor/visualization/__init__.py
+++ b/tensor2tensor/visualization/__init__.py
--- a/tensor2tensor/visualization/attention.js
+++ b/tensor2tensor/visualization/attention.js
--- a/tensor2tensor/visualization/attention.py
+++ b/tensor2tensor/visualization/attention.py
--- a/tensor2tensor/visualization/visualization.py
+++ b/tensor2tensor/visualization/visualization.py
--- a/tensor2tensor/visualization/visualization_test.py
+++ b/tensor2tensor/visualization/visualization_test.py