Commit 05715480 by huchi

Initial commit

parents
/bin
/build
/out
sample/train/iwlst14de-en.train.log
/models
/source/niutensor/
\ No newline at end of file
差异被折叠。 点击展开。
# Compilation Example
Here is some compilation example for Linux with MKL, OpenBLAS, or CUDA supported.
**Replace the path in your environment.**
## Compile with CUDA supported
```bash
git clone https://github.com/NiuTrans/NiuTrans.NMT.git
git clone https://github.com/NiuTrans/NiuTensor.git
mv NiuTrans.Tensor/source NiuTrans.NMT/source/niutensor
rm NiuTrans.NMT/source/niutensor/Main.cpp
rm -rf NiuTrans.NMT/source/niutensor/sample NiuTrans.NMT/source/niutensor/tensor/test
mkdir NiuTrans.NMT/build && cd NiuTrans.NMT/build
cmake -DUSE_CUDA=ON -DCUDA_TOOLKIT_ROOT_DIR='/home/nlplab/cuda9.2/' ..
make -j
```
## Compile with CUDA and FP16 supported
```bash
git clone https://github.com/NiuTrans/NiuTrans.NMT.git
git clone https://github.com/NiuTrans/NiuTensor.git
mv NiuTrans.Tensor/source NiuTrans.NMT/source/niutensor
rm NiuTrans.NMT/source/niutensor/Main.cpp
rm -rf NiuTrans.NMT/source/niutensor/sample NiuTrans.NMT/source/niutensor/tensor/test
mkdir NiuTrans.NMT/build && cd NiuTrans.NMT/build
cmake -DUSE_CUDA=ON -DCUDA_TOOLKIT_ROOT_DIR='/home/nlplab/cuda9.2/' -DUSE_FP16=ON ..
make -j
```
## Compile with MKL supported
```bash
git clone https://github.com/NiuTrans/NiuTrans.NMT.git
git clone https://github.com/NiuTrans/NiuTensor.git
mv NiuTrans.Tensor/source NiuTrans.NMT/source/niutensor
rm NiuTrans.NMT/source/niutensor/Main.cpp
rm -rf NiuTrans.NMT/source/niutensor/sample NiuTrans.NMT/source/niutensor/tensor/test
mkdir NiuTrans.NMT/build && cd NiuTrans.NMT/build
cmake -DUSE_MKL=ON -DINTEL_ROOT='/home/nlplab/intel/compilers_and_libraries_2020.2.254/linux' ..
make -j
```
## Compile with OpenBLAS supported
```bash
git clone https://github.com/NiuTrans/NiuTrans.NMT.git
git clone https://github.com/NiuTrans/NiuTensor.git
mv NiuTrans.Tensor/source NiuTrans.NMT/source/niutensor
rm NiuTrans.NMT/source/niutensor/Main.cpp
rm -rf NiuTrans.NMT/source/niutensor/sample NiuTrans.NMT/source/niutensor/tensor/test
mkdir NiuTrans.NMT/build && cd NiuTrans.NMT/build
cmake -DUSE_OPENBLAS=ON -DOPENBLAS_ROOT='/home/nlplab/openblas/' ..
make -j
```
# Training a new model
## IWSLT'14 German to English (Transformer)
The following instructions can train a Transformer model on the [IWSLT'14 German to English dataset](http://workshop2014.iwslt.org/downloads/proceeding.pdf).
Step 1: Prepare the training data:
*We provide the BPE code for better reproducibility. The source and target vocabulary are shared with 10,000 merges.*
```bash
# Extract the data
cd sample/train/
IWSLT_PATH=iwslt14.tokenized.de-en
tar -zxvf $IWSLT_PATH.tar.gz
IWSLT_PATH=sample/train/$IWSLT_PATH
# Binarize the data
cd ../..
python3 tools/GetVocab.py \
-raw $IWSLT_PATH/bpevocab \
-new $IWSLT_PATH/vocab.de
python3 tools/GetVocab.py \
-raw $IWSLT_PATH/bpevocab \
-new $IWSLT_PATH/vocab.en
python3 tools/PrepareParallelData.py \
-src $IWSLT_PATH/train.de -tgt $IWSLT_PATH/train.en \
-src_vocab $IWSLT_PATH/vocab.de -tgt_vocab $IWSLT_PATH/vocab.en \
-output $IWSLT_PATH/train.data
python3 tools/PrepareParallelData.py \
-src $IWSLT_PATH/valid.de -tgt $IWSLT_PATH/valid.en \
-src_vocab $IWSLT_PATH/vocab.de -tgt_vocab $IWSLT_PATH/vocab.en \
-output $IWSLT_PATH/valid.data
```
*You may extract the data manually on Windows.*
Step 2: Train the model with default configurations
(6 encoder/decoder layer, 512 model size, 50 epoches):
```bash
bin/NiuTrans.NMT \
-dev 0 \
-nepoch 50 \
-model model.bin \
-maxcheckpoint 10 \
-train $IWSLT_PATH/train.data \
-valid $IWSLT_PATH/valid.data
```
Step 3: Average the last ten checkpoints:
```bash
python tools/Ensemble.py -input 'model.bin.*' -output model.ensemble
```
It costs about 310s per epoch on a GTX 1080 Ti.
Expected BLEU score (lenalpha=0.6, maxlenalpha=1.2):
| Model type | Beam Search | Greedy Search |
| --------------- | --------------- | --------------- |
| Single model | 34.05 (beam=4) | 33.35 |
| Ensemble model | 34.48 (beam=4) | 34.01 |
We provide models trained with the default configurations:
[Google Drive](https://drive.google.com/drive/folders/10W89cx60Q7A9nGyg5fwLP21Sg53n6NXV?usp=sharing)
[Baidu Cloud](https://pan.baidu.com/s/1LbkV8kuaDWNunVR2jwOhRg) (password: bdwp)
\ No newline at end of file
# Translating with pre-trained models
## IWSLT'14 En-De Models
The following instructions can be used to translate with a pre-trained Transformer model.
You can evaluate models trained in the [training example](../sample/train) by two steps.
Step 1: Translate the IWSLT14 De-En test set (tokenized) on the GPU:
```
IWSLT_PATH=sample/train/iwslt14.tokenized.de-en
bin/NiuTrans.NMT \
-dev 0 \
-test $IWSLT_PATH/test.de \
-model model.bin \
-sbatch 64 \
-beamsize 1 \
-srcvocab $IWSLT_PATH/vocab.de \
-tgtvocab $IWSLT_PATH/vocab.en \
-output output.atat
sed -r 's/(@@ )|(@@ ?$)//g' < output.atat > output
```
You can also set `-dev -1` to use the CPU.
Step 2: Check the translation with [multi-bleu](https://github.com/moses-smt/mosesdecoder/blob/master/scripts/generic/multi-bleu.perl):
```
perl multi-bleu.perl $IWSLT_PATH/test.en < output
```
It takes about 15s for translating test.de (6,750 sentences) on a GTX 1080 Ti with a greedy search.
## WNGT 2020 Models
The models here are the submissions to the [WNGT 2020 efficiency task](https://sites.google.com/view/wngt20/efficiency-task), which focuses on developing efficient MT systems.
The WNGT 2020 efficiency task constrains systems to translate 1 million sentences on CPUs and GPUs under the condition of the [WMT 2019 English-German news](http://statmt.org/wmt19/translation-task.html) translation task.
- For CPUs, the performance was measured on an [AWS c5.metal instance](https://aws.amazon.com/cn/blogs/aws/now-available-new-c5-instance-sizes-and-bare-metal-instances/) with 96 logical Cascade Lake processors and 192 GB memory. We submitted one system (9-1-tiny) running with all CPU cores.
- For GPUs, the performance was measured on an [AWS g4dn.xlarge instance](https://aws.amazon.com/cn/ec2/instance-types/g4/) with an NVIDIA T4 GPU and 16 GB memory. We submitted four systems (9-1, 18-1, 35-1, 35-6) running with FP16.
We list the results of all submissions. See [the official results](https://docs.google.com/spreadsheets/d/1M82S5wPSIM543Gh20d71Zs0FNHJQ3JdiJzDECiYJNlE/edit#gid=0) for more details.
| Model type | Time (s) | File size (MiB) | BLEU | Word per second |
| ---------- | -------- | --------------- | ---- | --------------- |
| 9-1-tiny* | 810 | 66.8 | 27.0 | 18518 |
| 9-1 | 977 | 99.3 | 31.1 | 15353 |
| 18-1 | 1355 | 156.1 | 31.4 | 11070 |
| 35-1 | 2023 | 263.3 | 32.0 | 7418 |
| 35-6 | 3166 | 305.4 | 32.2 | 4738 |
<em>* means run on CPUs. </em>
Description:
* `Model type` - Number of encoder and decoder layers, e.g., 9-1 means that the model consists of 9 encoder layers and 1 decoder layer. The model size is 512 except for the *tiny* model, whose size is 256.
* `Time` - Real time took for translating the whole test set, which contains about 1 million sentences with ~15 million tokens. The time of the `tiny` model was measured on CPUs, while other models were measured on GPUs.
* `File size` - All models are stored in FP16 except for the `tiny` model stored in FP32.
* `BLEU` - We report the averaged sacre BLEU score across wmt10 to wmt19, wmt12 is excluded. BLEU+case.mixed+lang.en-de+numrefs.1+smooth.exp+test.wmt10+tok.13a+version.1.4.9 (for wmt10, similar for others).
All these models and docker images are available at:
[Baidu Cloud](https://pan.baidu.com/s/1J8kRoF3d5P-XA4Qd3YT4ZQ) password: bdwp
[Google Drive](https://drive.google.com/file/d/1tgCUN8TnUsbcI7BCYFQkj30rCvk68YRb) (docker images only)
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-10
*/
//#define CRTDBG_MAP_ALLOC
//#include <stdlib.h>
//#include <crtdbg.h>
#include "./nmt/NMT.h"
#include "niutensor/network/XNoder.h"
#include "niutensor/tensor/XTensor.h"
#include "niutensor/tensor/core/movement/Spread.h"
using namespace nmt;
using namespace nts;
void test() {
XTensor input, node, index;
InitTensor2D(&input, 32, 4);
InitTensor2D(&input, 13, 4);
InitTensor2D(&input, 32, 4);
XNoder::MakeGrad(&input);
XTensor* tmp = NewTensorBufV2(&input, input.devID, input.mem);
_SpreadForGather(tmp, node.grad, &index);
_SumMe(input.grad, tmp);
input.grad->Dump(stderr);
}
int main(int argc, const char** argv)
{
//_CrtSetDbgFlag(_CrtSetDbgFlag(_CRTDBG_REPORT_FLAG) | _CRTDBG_LEAK_CHECK_DF);
//_CrtSetBreakAlloc(2708);
NMTMain(argc - 1, argv + 1);
//test();
//_CrtDumpMemoryLeaks();
return 0;
}
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-10-09
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-04
*/
#include "Decoder.h"
#include "Utility.h"
#include "layer/LayerNorm.h"
#include "layer/CommonModules.h"
#include "../niutensor/tensor/core/CHeader.h"
namespace nmt
{
/* constructor */
AttDecoder::AttDecoder()
{
selfAtt = NULL;
fnns = NULL;
selfAttLayerNorms = NULL;
fnnLayerNorms = NULL;
enDeAtt = NULL;
enDeAttLayerNorms = NULL;
decoderLayerNorm = NULL;
selfAttCache = NULL;
enDeAttCache = NULL;
}
/* de-constructor */
AttDecoder::~AttDecoder()
{
delete[] selfAttCache;
delete[] enDeAttCache;
delete[] selfAtt;
delete[] fnns;
delete[] selfAttLayerNorms;
delete[] fnnLayerNorms;
delete[] enDeAtt;
delete[] enDeAttLayerNorms;
if (preNorm)
delete decoderLayerNorm;
}
/*
initialize the model
>> config - configurations of the model
*/
void AttDecoder::InitModel(Config& config)
{
devID = config.devID;
nlayer = config.nDecLayer;
hSize = config.modelSize;
eSize = config.embSize;
vSize = config.tgtVocabSize;
dropoutP = config.dropout;
preNorm = config.preNorm;
CheckNTErrors(nlayer >= 1, "We have one encoding layer at least!");
CheckNTErrors(vSize > 1, "set vocabulary size by \"-vsizetgt\"");
/* embedding model */
embedder.InitModel(config, false);
selfAtt = new Attention[nlayer];
fnns = new FNN[nlayer];
selfAttLayerNorms = new LN[nlayer];
enDeAtt = new Attention[nlayer];
enDeAttLayerNorms = new LN[nlayer];
fnnLayerNorms = new LN[nlayer];
selfAttCache = new Cache[nlayer];
enDeAttCache = new Cache[nlayer];
if (preNorm)
decoderLayerNorm = new LN;
/* initialize the stacked layers */
for (int i = 0; i < nlayer; i++) {
selfAtt[i].InitModel(config);
fnns[i].InitModel(config);
selfAttLayerNorms[i].InitModel(config);
fnnLayerNorms[i].InitModel(config);
enDeAtt[i].InitModel(config);
enDeAttLayerNorms[i].InitModel(config);
selfAttCache[i].enable = true;
enDeAttCache[i].enable = true;
}
if (preNorm)
decoderLayerNorm->InitModel(config);
}
/*
make the decoding network
>> inputDec - the input tensor of the decoder
>> outputEnc - the output tensor of the encoder
>> mask - mask that indicates which position is valid
>> maskEncDec - mask for the encoder-decoder attention
>> nstep - the current length of the decoder input
>> isTraining - indicates whether the model is used for training
<< return - the output tensor of the decoder
*/
XTensor AttDecoder::Make(XTensor& inputDec, XTensor& outputEnc, XTensor* mask,
XTensor* maskEncDec, int nstep, bool isTraining)
{
XTensor x;
x = embedder.Make(inputDec, true, isTraining, nstep);
/* dropout */
if (isTraining && dropoutP > 0)
x = Dropout(x, dropoutP);
for (int i = 0; i < nlayer; i++) {
XTensor att;
XTensor ende;
XTensor fnn;
XTensor res;
XTensor selfAttnBefore;
XTensor selfAttnAfter;
XTensor endeAttnBefore;
XTensor endeAttnAfter;
XTensor fnnBefore;
/* layer normalization with pre-norm for self-attn */
selfAttnBefore = LayerNorm(x, selfAttLayerNorms[i], preNorm, true, false);
/******************/
/* self attention */
att = selfAtt[i].Make(selfAttnBefore, selfAttnBefore, selfAttnBefore,
mask, isTraining, &selfAttCache[i], SELF_ATT);
/* dropout */
if (isTraining && dropoutP > 0)
att = Dropout(att, dropoutP);
/* residual connection */
res = Sum(att, x);
/* layer normalization with post-norm for self-attention */
selfAttnAfter = LayerNorm(res, selfAttLayerNorms[i], preNorm, false, true);
/* layer normalization with pre-norm for encoder-decoder attention */
endeAttnBefore = LayerNorm(selfAttnAfter, enDeAttLayerNorms[i], preNorm, true, false);
/* encoder-decoder attention */
ende = enDeAtt[i].Make(outputEnc, endeAttnBefore, outputEnc, maskEncDec,
isTraining, &enDeAttCache[i], EN_DE_ATT);
/* dropout */
if (isTraining && dropoutP > 0)
ende = Dropout(ende, dropoutP);
/* residual connection */
res = Sum(ende, selfAttnAfter);
/* layer normalization with post-norm for encoder-decoder attention */
endeAttnAfter = LayerNorm(res, enDeAttLayerNorms[i], preNorm, false, true);
/* layer normalization with pre-norm for fnn */
fnnBefore = LayerNorm(endeAttnAfter, fnnLayerNorms[i], preNorm, true, false);
/* fnn */
fnn = fnns[i].Make(fnnBefore, isTraining);
/* dropout */
if (isTraining && dropoutP > 0)
fnn = Dropout(fnn, dropoutP);
/* residual connection */
res = Sum(fnn, endeAttnAfter);
/* layer normalization with post-norm for fnn */
x = LayerNorm(res, fnnLayerNorms[i], preNorm, false, true);
}
if (preNorm)
return decoderLayerNorm->Make(x);
return x;
}
/*
make the decoding network
>> inputDec - the input tensor of the decoder
>> outputEnc - the output tensor of the encoder
>> mask - mask that indicates which position is valid
>> maskEncDec - mask for the encoder-decoder attention
>> nstep - the current length of the decoder input
>> isTraining - indicates whether the model is used for training
<< return - the output tensor of the decoder
*/
XTensor AttDecoder::MakeFast(XTensor& inputDec, XTensor& outputEnc, XTensor* mask,
XTensor* maskEncDec, int nstep, bool isTraining)
{
XTensor x;
x = embedder.Make(inputDec, true, isTraining, nstep);
/* dropout */
if (isTraining && dropoutP > 0)
x = Dropout(x, dropoutP);
for (int i = 0; i < nlayer; i++) {
XTensor res;
res = x;
/* layer normalization with pre-norm for self-attn */
x = selfAttLayerNorms[i].Make(x);
/******************/
/* self attention */
x = selfAtt[i].Make(x, x, x, mask, isTraining, &selfAttCache[i], SELF_ATT);
/* dropout */
if (isTraining && dropoutP > 0)
x = Dropout(x, dropoutP);
/* residual connection */
x = Sum(res, x);
res = x;
/* layer normalization with pre-norm for encoder-decoder attention */
x = enDeAttLayerNorms[i].Make(x);
/* encoder-decoder attention */
x = enDeAtt[i].Make(outputEnc, x, outputEnc, maskEncDec,
isTraining, &enDeAttCache[i], EN_DE_ATT);
/* dropout */
if (isTraining && dropoutP > 0)
x = Dropout(x, dropoutP);
/* residual connection */
x = Sum(res, x);
res = x;
/* layer normalization with pre-norm for fnn */
x = fnnLayerNorms[i].Make(x);
/* fnn */
x = fnns[i].Make(x, isTraining);
/* dropout */
if (isTraining && dropoutP > 0)
x = Dropout(x, dropoutP);
/* residual connection */
x = Sum(res, x);
}
x = decoderLayerNorm->Make(x);
return x;
}
}
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-04
*/
#ifndef __DECODER_H__
#define __DECODER_H__
#include "Encoder.h"
#include "Utility.h"
namespace nmt
{
class AttDecoder
{
public:
/* device id */
int devID;
/* layer number */
int nlayer;
/* hidden layer size of the FNN layer */
int hSize;
/* embedding size */
int eSize;
/* vocabulary size */
int vSize;
/* dropout probability */
DTYPE dropoutP;
/* embedding of word at each position */
Embedder embedder;
/* FNN model of each layer */
FNN* fnns;
/* attention model of each layer */
Attention* selfAtt;
/* layer normalization for attention */
LN* selfAttLayerNorms;
/* layer normalization for fnn */
LN* fnnLayerNorms;
/* layer normalization for decoder */
LN* decoderLayerNorm;
/* encoder-decoder attention model of each layer */
Attention* enDeAtt;
/* layer normalization for encoder-decoder attention */
LN* enDeAttLayerNorms;
/* layer cache list */
Cache* selfAttCache;
/* layer cache list */
Cache* enDeAttCache;
/* the location of layer normalization */
bool preNorm;
public:
/* constructor */
AttDecoder();
/* de-constructor */
~AttDecoder();
/* initialize the model */
void InitModel(Config& config);
/* make the decoding network */
XTensor Make(XTensor& inputDec, XTensor& outputEnc, XTensor* mask,
XTensor* maskEncDec, int nstep, bool isTraining);
/* make the decoding network (pre norm) */
XTensor MakeFast(XTensor& inputDec, XTensor& outputEnc, XTensor* mask,
XTensor* maskEncDec, int nstep, bool isTraining);
};
}
#endif
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-04
*/
#include "Encoder.h"
#include "Utility.h"
#include "layer/LayerNorm.h"
#include "layer/CommonModules.h"
#include "../niutensor/tensor/core/CHeader.h"
namespace nmt
{
/* constructor */
AttEncoder::AttEncoder()
{
selfAtt = NULL;
fnns = NULL;
attLayerNorms = NULL;
fnnLayerNorms = NULL;
encoderLayerNorm = NULL;
}
/* de-constructor */
AttEncoder::~AttEncoder()
{
delete[] selfAtt;
delete[] fnns;
delete[] attLayerNorms;
delete[] fnnLayerNorms;
if (preNorm)
delete encoderLayerNorm;
}
/*
initialize the model
>> config - configurations for the model
*/
void AttEncoder::InitModel(Config& config)
{
devID = config.devID;
nlayer = config.nEncLayer;
eSize = config.embSize;
hSize = config.modelSize;
vSize = config.srcVocabSize;
preNorm = config.preNorm;
dropoutP = config.dropout;
CheckNTErrors(nlayer >= 1, "We have one encoding layer at least!");
CheckNTErrors(vSize > 1, "Set vocabulary size by \"-vsize\"");
/* embedding model */
embedder.InitModel(config);
selfAtt = new Attention[nlayer];
fnns = new FNN[nlayer];
attLayerNorms = new LN[nlayer];
fnnLayerNorms = new LN[nlayer];
if (preNorm)
encoderLayerNorm = new LN;
/* initialize the stacked layers */
for (int i = 0; i < nlayer; i++) {
selfAtt[i].InitModel(config);
fnns[i].InitModel(config);
attLayerNorms[i].InitModel(config);
fnnLayerNorms[i].InitModel(config);
}
if (preNorm)
encoderLayerNorm->InitModel(config);
}
/*
make the encoding network
>> input - the input tensor of the encoder
>> mask - the mask that indicate each position is valid
>> maskEncDec - no use
>> isTraining - indicates whether the model is used for training
<< return - the output tensor of the encoder
*/
XTensor AttEncoder::Make(XTensor& input, XTensor* mask, XTensor& maskEncDec, bool isTraining)
{
XTensor x;
x = embedder.Make(input, false, isTraining);
/* dropout */
if (isTraining && dropoutP > 0)
x = Dropout(x, dropoutP);
for (int i = 0; i < nlayer; i++) {
XTensor att;
XTensor fnn;
XTensor res;
XTensor attnBefore;
XTensor attnAfter;
XTensor fnnBefore;
/* layer normalization with pre-norm for self-attn */
attnBefore = LayerNorm(x, attLayerNorms[i], preNorm, true, false);
/* self attention */
att = selfAtt[i].Make(attnBefore, attnBefore, attnBefore, mask, isTraining, NULL, SELF_ATT);
/* dropout */
if (isTraining && dropoutP > 0)
att = Dropout(att, dropoutP);
/* residual connection */
res = Sum(att, x);
/* layer normalization with post-norm for self-attn */
attnAfter = LayerNorm(res, attLayerNorms[i], preNorm, false, true);
/* layer normalization with pre-norm for fnn */
fnnBefore = LayerNorm(attnAfter, fnnLayerNorms[i], preNorm, true, false);
/* fnn */
fnn = fnns[i].Make(fnnBefore, isTraining);
/* dropout */
if (isTraining && dropoutP > 0)
fnn = Dropout(fnn, dropoutP);
/* residual connection */
res = Sum(fnn, attnAfter);
/* layer normalization with post-norm for fnn */
x = LayerNorm(res, fnnLayerNorms[i], preNorm, false, true);
}
if (preNorm)
return encoderLayerNorm->Make(x);
return x;
}
/*
make the encoding network
>> input - the input tensor of the encoder
>> mask - the mask that indicate each position is valid
>> maskEncDec - no use
>> isTraining - indicates whether the model is used for training
<< return - the output tensor of the encoder
*/
XTensor AttEncoder::MakeFast(XTensor& input, XTensor* mask, XTensor& maskEncDec, bool isTraining)
{
XTensor x;
x = embedder.Make(input, false, isTraining);
/* dropout */
if (isTraining && dropoutP > 0)
x = Dropout(x, dropoutP);
for (int i = 0; i < nlayer; i++) {
XTensor res;
res = x;
/* layer normalization with pre-norm for self-attn */
x = attLayerNorms[i].Make(x);
/* self attention */
x = selfAtt[i].Make(x, x, x, mask, isTraining, NULL, SELF_ATT);
/* dropout */
if (isTraining && dropoutP > 0)
x = Dropout(x, dropoutP);
/* residual connection */
x = Sum(res, x);
res = x;
/* layer normalization with pre-norm for fnn */
x = fnnLayerNorms[i].Make(x);
/* fnn */
x = fnns[i].Make(x, isTraining);
/* dropout */
if (isTraining && dropoutP > 0)
x = Dropout(x, dropoutP);
/* residual connection */
x = Sum(res, x);
}
x = encoderLayerNorm->Make(x);
return x;
}
/*
make the encoding network (wrapper)
>> input - the input tensor of the encoder
>> mask - the mask that indicate each position is valid
>> isTraining - indicates whether the model is used for training
<< return - the output tensor of the encoder
*/
XTensor AttEncoder::Make(XTensor& input, XTensor* mask, bool isTraining)
{
XTensor nothing;
return Make(input, mask, nothing, isTraining);
}
}
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-04
*/
#ifndef __ENCODER_H__
#define __ENCODER_H__
#include "Utility.h"
#include "layer/FNN.h"
#include "layer/Attention.h"
#include "layer/Embedding.h"
#include "layer/LayerNorm.h"
#include "../niutensor/network/XNet.h"
using namespace nts;
namespace nmt
{
/*
base class of the encoder
*/
class Encoder
{
public:
virtual XTensor Make(XTensor& input, XTensor* mask, XTensor& mask2, bool isTraining) = 0;
};
/*
the encoder based on self-attention
*/
class AttEncoder : Encoder
{
public:
/* device id */
int devID;
/* layer number */
int nlayer;
/* hidden layer size of the FNN layer */
int hSize;
/* embedding size */
int eSize;
/* vocabulary size */
int vSize;
/* dropout probability */
DTYPE dropoutP;
/* some positions can be ignored in attention. this is useful in lm where the first position needs
special design for the attention model. */
int ignored;
/* embedding of word at each position */
Embedder embedder;
/* FNN model of each layer */
FNN* fnns;
/* attention model of each layer */
Attention* selfAtt;
/* layer normalizations for attention */
LN* attLayerNorms;
/* layer normalization for fnn */
LN* fnnLayerNorms;
/* layer normalization for encoder */
LN* encoderLayerNorm;
/* the location of layer normalization */
bool preNorm;
public:
/* constructor */
AttEncoder();
/* de-constructor */
~AttEncoder();
/* initialize the model */
void InitModel(Config& config);
/* make the encoding network */
XTensor Make(XTensor& input, XTensor* mask, XTensor& maskEncDec, bool isTraining);
/* make the encoding network */
XTensor MakeFast(XTensor& input, XTensor* mask, XTensor& maskEncDec, bool isTraining);
/* make the encoding network (wrapper) */
XTensor Make(XTensor& input, XTensor* mask, bool isTraining);
};
}
#endif
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-04
*/
#ifndef __MODEL_H__
#define __MODEL_H__
#include "Encoder.h"
#include "Decoder.h"
#include "layer/FNN.h"
#include "layer/Output.h"
#include "Utility.h"
#include "layer/Attention.h"
namespace nmt
{
/* a nmt model that keeps parameters of the encoder,
the decoder and the output layer (softmax). */
class Model
{
public:
/* device id */
int devID;
/* the encoder */
AttEncoder* encoder;
/* the decoder */
AttDecoder* decoder;
/* output layer */
Output* outputLayer;
/* indicates whether the model is running for language modeling */
bool isLM;
/* indicates whether the model is running for machine translation */
bool isMT;
/* indicates whether the model is running with FP16 data type */
bool useFP16;
/* number of heads in the attention model */
int nhead;
/* indicates whether share encoders embeddings with decoders */
int shareAllEmbeddings;
/* indicates whether share decoder embeddings with output weights */
int shareDecInputOutputWeight;
public:
/* constructor */
Model();
/* de-constructor */
~Model();
/* initialize the model */
void InitModel(Config& config);
/* print model configurations */
void ShowModelConfig(Config& config);
/* make the encoding network */
XTensor MakeEncoder(XTensor& input, XTensor* mask, bool isTraining);
/* make the encoding network */
XTensor MakeDecoder(XTensor& inputEnc, XTensor& inputDec, XTensor* mask,
XTensor& MaskEncDec, bool isTraining);
/* make the network for language modeling (with the output softmax layer) */
void MakeLM(XTensor& input, XTensor& output, XTensor& padding, bool isTraining);
/* make the network for machine translation (with the output softmax layer) */
void MakeMT(XTensor& inputEnc, XTensor& inputDec, XTensor& output,
XTensor& paddingEnc, XTensor& paddingDec, bool isTraining);
/* make the mask for training MT models */
void MakeMTMask(XTensor& inputEnc, XTensor& inputDec,
XTensor& paddingEnc, XTensor& paddingDec,
XTensor& maskEnc, XTensor& maskDec, XTensor& maskEncDec);
/* make the mask of the encoder */
void MakeMTMaskEnc(XTensor& paddingEnc, XTensor& maskEnc);
/* make the mask of the decoder */
void MakeMTMaskDec(XTensor& paddingEnc, XTensor& paddingDec,
XTensor& maskDec, XTensor& maskEncDec);
/* get parameter matrices */
void GetParams(TensorList& list);
/* dump the model to a file */
void Dump(const char* fn);
/* read the parameters */
void Read(FILE* file);
};
}
#endif
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-06, 2020-07
*/
#include <ctime>
#include "NMT.h"
#include "train/Trainer.h"
#include "translate/Translator.h"
namespace nmt
{
int NMTMain(int argc, const char** argv)
{
if (argc == 0)
return 1;
/* load configurations */
Config config(argc, argv);
srand(1);
/* training */
if (strcmp(config.trainFN, "") != 0) {
Model model;
model.InitModel(config);
Trainer trainer;
trainer.Init(config);
trainer.Train(config.trainFN, config.validFN, config.modelFN, &model);
}
/* translating */
if (strcmp(config.testFN, "") != 0 && strcmp(config.outputFN, "") != 0) {
/* disable grad flow */
DISABLE_GRAD;
Model model;
model.InitModel(config);
Translator translator;
translator.Init(config);
translator.Translate(config.testFN, config.srcVocabFN,
config.tgtVocabFN, config.outputFN, &model);
}
return 0;
}
}
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* An implementation of the NMT system.
*/
#ifndef __NMT_H__
#define __NMT_H__
namespace nmt
{
/* entrance of the program */
int NMTMain(int argc, const char** argv);
}
#endif
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-04, 2020-06
*/
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <string>
#include <fstream>
#include <sstream>
#include "Utility.h"
#include "../niutensor/tensor/XGlobal.h"
using namespace nts;
using namespace std;
namespace nmt
{
/*
load configurations from the command
>> argc - number of arguments
>> argv - the list of arguments
*/
Config::Config(int argc, const char** argv)
{
char** args = new char* [MAX_PARAM_NUM];
for (int i = 0; i < argc; i++) {
args[i] = new char[strlen(argv[i]) + 1];
strcpy(args[i], argv[i]);
}
char* configFN = new char[1024];
LoadParamString(argc, args, "config", configFN, "");
int argsNum = argc;
/* load configurations from a file */
if (strcmp(configFN, "") != 0)
argsNum = LoadFromFile(configFN, args);
ShowParams(argsNum, args);
/* options for the model */
LoadParamInt(argsNum, args, "nhead", &nhead, 4);
LoadParamInt(argsNum, args, "enclayer", &nEncLayer, 6);
LoadParamInt(argsNum, args, "declayer", &nDecLayer, 6);
LoadParamInt(argsNum, args, "maxrp", &maxRP, 8);
LoadParamInt(argsNum, args, "embsize", &embSize, 512);
LoadParamInt(argsNum, args, "modelsize", &modelSize, 512);
LoadParamInt(argsNum, args, "maxpos", &maxPosLen, 1024);
LoadParamInt(argsNum, args, "fnnhidden", &fnnHiddenSize, modelSize * 2);
LoadParamInt(argsNum, args, "vsize", &srcVocabSize, 10152);
LoadParamInt(argsNum, args, "vsizetgt", &tgtVocabSize, 10152);
LoadParamInt(argsNum, args, "padid", &padID, 1);
LoadParamInt(argsNum, args, "startid", &startID, 2);
LoadParamInt(argsNum, args, "endid", &endID, 2);
LoadParamBool(argsNum, args, "rpr", &useRPR, false);
LoadParamBool(argsNum, args, "prenorm", &preNorm, true);
// TODO: refactor the parameters type to support weight sharing during training
LoadParamInt(argsNum, args, "shareemb", &shareAllEmbeddings, 0);
LoadParamInt(argsNum, args, "sharedec", &shareDecInputOutputWeight, 0);
LoadParamString(argsNum, args, "model", modelFN, "");
LoadParamString(argsNum, args, "srcvocab", srcVocabFN, "vocab.src");
LoadParamString(argsNum, args, "tgtvocab", tgtVocabFN, "vocab.tgt");
/* options for training */
LoadParamString(argsNum, args, "train", trainFN, "");
LoadParamString(argsNum, args, "valid", validFN, "");
LoadParamInt(argsNum, args, "dev", &devID, 0);
LoadParamInt(argsNum, args, "wbatch", &wBatchSize, 4096);
LoadParamInt(argsNum, args, "sbatch", &sBatchSize, 8);
isTraining = (strcmp(trainFN, "") == 0) ? false : true;
LoadParamBool(argsNum, args, "mt", &isMT, true);
LoadParamFloat(argsNum, args, "dropout", &dropout, 0.3);
LoadParamFloat(argsNum, args, "fnndrop", &fnnDropout, 0.1);
LoadParamFloat(argsNum, args, "attdrop", &attDropout, 0.1);
LoadParamFloat(argc, args, "lrate", &lrate, 0.0015F);
LoadParamFloat(argc, args, "lrbias", &lrbias, 0);
LoadParamInt(argc, args, "nepoch", &nepoch, 50);
LoadParamInt(argc, args, "maxcheckpoint", &maxCheckpoint, 10);
LoadParamInt(argc, args, "nstep", &nstep, 100000);
LoadParamInt(argc, args, "nwarmup", &nwarmup, 8000);
LoadParamBool(argc, args, "adam", &useAdam, true);
LoadParamFloat(argc, args, "adambeta1", &adamBeta1, 0.9F);
LoadParamFloat(argc, args, "adambeta2", &adamBeta2, 0.98F);
LoadParamFloat(argc, args, "adamdelta", &adamDelta, 1e-9F);
LoadParamBool(argc, args, "shuffled", &isShuffled, true);
LoadParamFloat(argc, args, "labelsmoothing", &labelSmoothingP, 0.1);
LoadParamInt(argc, args, "nstepcheckpoint", &nStepCheckpoint, -1);
LoadParamBool(argc, args, "epochcheckpoint", &useEpochCheckpoint, true);
LoadParamInt(argc, args, "updatestep", &updateStep, 1);
LoadParamBool(argc, args, "sorted", &isLenSorted, false);
LoadParamInt(argc, args, "bufsize", &bufSize, 50000);
LoadParamBool(argc, args, "doubledend", &isDoubledEnd, false);
LoadParamBool(argc, args, "smallbatch", &isSmallBatch, true);
LoadParamBool(argc, args, "bigbatch", &isBigBatch, false);
LoadParamBool(argc, args, "randbatch", &isRandomBatch, false);
LoadParamInt(argc, args, "bucketsize", &bucketSize, wBatchSize * 10);
/* options for translating */
LoadParamString(argsNum, args, "test", testFN, "");
LoadParamString(argsNum, args, "output", outputFN, "");
LoadParamInt(argsNum, args, "beamsize", &beamSize, 1);
LoadParamBool(argsNum, args, "fp16", &useFP16, false);
LoadParamFloat(argsNum, args, "lenalpha", &lenAlpha, 0.6);
LoadParamFloat(argsNum, args, "maxlenalpha", &maxLenAlpha, 1.2);
for (int i = 0; i < argc; i++)
delete[] args[i];
delete[] args;
delete[] configFN;
}
/*
load configurations from a file
>> configFN - path to the configuration file
>> args - the list to store the configurations
format: one option per line, separated by a blank or a tab
*/
int Config::LoadFromFile(const char* configFN, char** args) {
ifstream f(configFN, ios::in);
CheckNTErrors(f.is_open(), "unable to open the config file");
int argsNum = 0;
/* parse arguments */
string key, value;
while (f >> key >> value) {
key += '-';
strcpy(args[argsNum++], key.c_str());
strcpy(args[argsNum++], value.c_str());
}
/* record the number of arguments */
return argsNum;
}
void LoadParamString(int argc, char** argv, const char* name, char* p, const char* defaultP)
{
char vname[128];
vname[0] = '-';
strcpy(vname + 1, name);
bool hit = false;
for (int i = 0; i < argc; i++) {
if (!strcmp(argv[i], vname) && i + 1 < argc) {
strcpy(p, argv[i + 1]);
hit = true;
break;
}
}
if (!hit)
strcpy(p, defaultP);
}
void LoadParamInt(int argc, char** argv, const char* name, int* p, int defaultP)
{
char vname[128];
vname[0] = '-';
strcpy(vname + 1, name);
bool hit = false;
for (int i = 0; i < argc; i++) {
if (!strcmp(argv[i], vname) && i + 1 < argc) {
*(int*)p = atoi(argv[i + 1]);
hit = true;
break;
}
}
if (!hit)
*p = defaultP;
}
void LoadParamBool(int argc, char** argv, const char* name, bool* p, bool defaultP)
{
char vname[128];
vname[0] = '-';
strcpy(vname + 1, name);
bool hit = false;
for (int i = 0; i < argc; i++) {
if (!strcmp(argv[i], vname)) {
*(bool*)p = true;
hit = true;
break;
}
}
if (!hit)
*p = defaultP;
}
void LoadParamFloat(int argc, char** argv, const char* name, float* p, float defaultP)
{
char vname[128];
vname[0] = '-';
strcpy(vname + 1, name);
bool hit = false;
for (int i = 0; i < argc; i++) {
if (!strcmp(argv[i], vname) && i + 1 < argc) {
*p = (float)atof(argv[i + 1]);
hit = true;
break;
}
}
if (!hit)
*p = defaultP;
}
void ShowParams(int argc, char** argv)
{
fprintf(stderr, "args:\n");
for (int i = 0; i < argc; i++) {
if (argv[i][1] == 0)
continue;
if (argv[i][0] == '-' && (argv[i][1] < '1' || argv[i][1] > '9')) {
if (i + 1 < argc && argv[i + 1][0] != '-')
fprintf(stderr, " %s=%s\n", argv[i], argv[i + 1]);
else
fprintf(stderr, " %s=yes\n", argv[i]);
}
}
fprintf(stderr, "\n");
}
#define MAX_WORD_NUM 120
/*
split string by delimiter, this will return indices of all sub-strings
>> s - the original string
>> delimiter - as it is
<< indices - indices of all sub-strings
*/
UInt64List SplitToPos(const string& s, const string& delimiter)
{
UInt64List indices;
if (delimiter.length() == 0) {
indices.Add(0);
}
size_t pos = 0;
uint64_t start = 0;
while ((pos = s.find(delimiter, start)) != string::npos) {
if (pos != start) {
indices.Add(start);
}
start = pos + delimiter.length();
}
if (start != s.length()) {
indices.Add(start);
}
return indices;
}
/* split a string to a int64_t list */
IntList SplitInt(const string& s, const string& delimiter)
{
IntList values;
auto indices = SplitToPos(s, delimiter);
for (int i = 0; i < indices.Size(); i++) {
values.Add(strtol(s.data() + indices[i], nullptr, 10));
}
return values;
}
/* split a string to a float list */
FloatList SplitFloat(const string& s, const string& delimiter)
{
FloatList values;
auto indices = SplitToPos(s, delimiter);
for (int i = 0; i < indices.Size(); i++) {
values.Add(strtof(s.data() + indices[i], nullptr));
}
return values;
}
}
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-06
*/
#ifndef __UTILITY_H__
#define __UTILITY_H__
#include <string>
#include <cstdio>
#include "../niutensor/tensor/XList.h"
using namespace std;
using namespace nts;
namespace nmt
{
#define MAX_PARAM_NUM 100
/* load arguments */
void LoadParamInt(int argc, char** argv, const char* name, int* p, int defaultP);
void LoadParamBool(int argc, char** argv, const char* name, bool* p, bool defaultP);
void LoadParamFloat(int argc, char** argv, const char* name, float* p, float defaultP);
void LoadParamString(int argc, char** argv, const char* name, char* p, const char* defaultP);
/* show arguments */
void ShowParams(int argc, char** argv);
/* split string */
IntList SplitInt(const string& s, const string& delimiter);
FloatList SplitFloat(const string& s, const string& delimiter);
UInt64List SplitToPos(const string& s, const string& delimiter);
/* configurations for */
class Config {
public:
/* path to the model */
char modelFN[1024];
/* path to the source vocab */
char srcVocabFN[1024];
/* path to the target vocab */
char tgtVocabFN[1024];
/* path to the input file (for inference) */
char testFN[1024];
/* path to the output file (for inference) */
char outputFN[1024];
/* path to the training file */
char trainFN[1024];
/* path to the validation file */
char validFN[1024];
/* device id */
int devID;
/* beam size */
int beamSize;
/* word batch size */
int wBatchSize;
/* sentence batch size */
int sBatchSize;
/* number of heads in attention */
int nhead;
/* number of encoder layers */
int nEncLayer;
/* number of decoder layers */
int nDecLayer;
/* the maximum relative position in RPR attentions */
int maxRP;
/* the dimension of embeddings */
int embSize;
/* the dimension of hidden layer */
int modelSize;
/* the maximum length in positional embedding */
int maxPosLen;
/* the dimension of fnn hidden layer */
int fnnHiddenSize;
/* the vocab size of source sequence */
int srcVocabSize;
/* the vocab size of target sequence */
int tgtVocabSize;
/* the padding id */
int padID;
/* start symbol */
int startID;
/* end symbol */
int endID;
/* indicates whether the model uses pre-norm */
bool preNorm;
/* indicates whether the model is running for machine translation */
bool isMT;
/* indicates whether share encoder decoder embeddings */
int shareAllEmbeddings;
/* indicates whether share decoder embeddings and output weights */
int shareDecInputOutputWeight;
/* indicates whether the model is running with FP16 data type */
bool useFP16;
/* indicates whether we use the RPR attention */
bool useRPR;
/* indicates whether we train the model */
bool isTraining;
/* dropout rate for the model */
float dropout;
/* dropout rate for fnn layers */
float fnnDropout;
/* dropout rate for attention layers */
float attDropout;
/* the alpha parameter controls the length preference */
float lenAlpha;
/* scalar of the input sequence (for max number of search steps) */
float maxLenAlpha;
/* learning rate */
float lrate;
/* the parameter that controls the maximum learning rate in training */
float lrbias;
/* training epoch number */
int nepoch;
/* training step number */
int nstep;
/* the maximum number of saved checkpoints */
int maxCheckpoint;
/* indicates whether we use Adam */
bool useAdam;
/* hyper parameters of Adam */
float adamBeta1;
float adamBeta2;
float adamDelta;
/* step number of warm-up for training */
int nwarmup;
/* indicates whether the data file is shuffled for training */
bool isShuffled;
/* the factor of label smoothing */
float labelSmoothingP;
/* number of steps after which we make a checkpoint */
int nStepCheckpoint;
/* indicates whether we make a checkpoint after each training epoch */
bool useEpochCheckpoint;
/* number of batches on which we do model update */
int updateStep;
/* indicates whether the sequence is sorted by length */
bool isLenSorted;
/* buffer size */
int bufSize;
/* indicates whether we double the </s> symbol for the output of LM */
bool isDoubledEnd;
/* indicates whether we use batchsize = max * sc
rather rather than batchsize = word-number, where max is the maximum
length and sc is the sentence number */
bool isSmallBatch;
/* counterpart of "isSmallBatch" */
bool isBigBatch;
/* randomize batches */
bool isRandomBatch;
/* bucket size */
int bucketSize;
public:
/* load configurations from the command */
Config(int argc, const char** argv);
/* load configurations from a file */
int LoadFromFile(const char* configFN, char** args);
};
}
#endif
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-04, 2020-06
*/
#ifndef __ATTENTION_H__
#define __ATTENTION_H__
#include "NNUtil.h"
#include "../Utility.h"
#include "../../niutensor/network/XNet.h"
#include "../../niutensor/tensor/core/CHeader.h"
using namespace nts;
namespace nmt
{
/* attention type */
enum { NONE, SELF_ATT, EN_DE_ATT };
/* layer cache for keys and values */
class Cache
{
public:
/* cache for keys, (B, L, H) */
XTensor key;
/* cache for values, (B, L, H) */
XTensor value;
public:
/* indicates cache miss if 'true' */
bool miss;
/* indicates whether we use cache */
bool enable;
/* constructor */
Cache();
/* update the states cache */
void Update(XTensor&& k, XTensor&& v);
/* keep alive states */
void KeepAlive(XTensor& aliveIdx);
/* reorder alive states */
void Reorder(XTensor& reorder);
};
/* multi-head attention */
class Attention
{
public:
/* device id */
int devID;
/* head number */
int nhead;
/* transformation matrix for Q */
XTensor weightQ;
/* bias for Q */
XTensor biasQ;
/* transformation matrix for K */
XTensor weightK;
/* bias for K */
XTensor biasK;
/* transformation matrix for V */
XTensor weightV;
/* bias for V */
XTensor biasV;
XTensor wBig;
XTensor bBig;
/* RPR emb */
XTensor RPEmbK;
/* transformation after dot-product attention */
XTensor weightO;
/* bias after dot-product attention */
XTensor biasO;
/* size of transformed Q and K */
int dk;
/* size of transformed V */
int dv;
/* size of input Q, K and V */
int d;
/* indicates whether we use the RPR attention */
bool useRPR;
/* dropout probability */
DTYPE dropoutP;
/* the maximum relative window size */
int maxRP;
public:
/* constructor */
Attention();
/* de-constructor */
~Attention();
/* initialize the model */
void InitModel(Config& config);
/* make the network */
XTensor Make(XTensor& k, XTensor& q, XTensor& v,
XTensor* mask, bool isTraining,
Cache* cache, int cacheType);
/* make the attention network given keys, queries and values (after linear transformation) */
XTensor MakeAttention(XTensor& k, XTensor& q, XTensor& v,
XTensor* mask, bool isTraining);
/* make the attention network given keys, queries and values (after linear transformation) */
XTensor MakeRPRAttention(XTensor& k, XTensor& q, XTensor& v,
XTensor* mask, bool isTraining, bool isEnc);
/* generate relative position embeddings */
XTensor GetRPEmbedding(const int lenQ, const int lenKV, const int maxRelativeLen, const bool isEnc);
/* relative position-aware dot-product attention inner calculation */
XTensor RPDotProduct(XTensor& x, XTensor& y, XTensor& z, const bool is_key);
};
}
#endif
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Bei Li (libei_neu@outlook.com) 2020-02-05
* This file includes some common modules of the Transformer model
*/
#include "CommonModules.h"
#include "../../niutensor/tensor/core/CHeader.h"
#include "../../niutensor/tensor/function/FHeader.h"
namespace nmt
{
/*
flexible layer normalization for the Transformer
>> input - input tensor
>> ln - the layernorm network
>> prenorm - whether we use prenorm or not
>> before - whether we use layernorm before attention/fnn
>> after - whether we use layernorm after attention/fnn
*/
XTensor LayerNorm(XTensor& input, LN& ln, bool prenorm, bool before, bool after)
{
if (after ^ prenorm)
return ln.Make(input);
else
return input;
}
}
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Bei Li (libei_neu@outlook.com) 2020-02-03
*/
#ifndef __COMMONMODULE_H__
#define __COMMONMODULE_H__
#include "LayerNorm.h"
#include "CommonModules.h"
using namespace nts;
namespace nmt
{
/* the layer normalization module to control pre-norm or post-norm*/
XTensor LayerNorm(XTensor& input, LN& ln, bool prenorm, bool before, bool after);
}
#endif
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-08-01
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-07
*/
#include "Embedding.h"
#include "../Utility.h"
#include "../../niutensor/tensor/core/CHeader.h"
namespace nmt
{
/* constructor */
Embedder::Embedder()
{
devID = -1;
vSize = -1;
maxLength = -1;
}
/* de-constructor */
Embedder::~Embedder()
{
}
/*
initialize the model
>> config - configurations of the model
>> isEnc - indicates if it is used for the encoder
*/
void Embedder::InitModel(Config& config, bool isEnc)
{
devID = config.devID;
d = config.modelSize;
padIdx = config.padID;
eSize = config.embSize;
maxLength = config.maxPosLen;
vSize = (isEnc) ? config.srcVocabSize : config.tgtVocabSize;
InitTensor2D(&w, vSize, eSize, X_FLOAT, devID);
maxLength = maxLength + 1 + 1;
DTYPE v = 1.0F / (float)sqrt((float)eSize);
w.SetDataRandn(0, v);
/* create the positional embedding matrix */
MakePosEmbedding(maxLength);
}
/*
make positional embeddings (of size eSize * length)
>> length - length of the sequence
*/
void Embedder::MakePosEmbedding(int length)
{
InitTensor2D(&posEmbeddingBase, length, eSize, X_FLOAT, devID);
float* data = new float[posEmbeddingBase.unitNum];
for (int pos = 0; pos < length; pos++) {
float* dp = data + pos * eSize;
int channelSize = eSize / 2;
int offset = 0;
for (int i = 0; i < channelSize; i++) {
dp[offset++] = (float)sin(pos * exp(-i * log(10000.0F) / (channelSize - 1)));
}
for (int i = 0; i < channelSize; i++) {
dp[offset++] = (float)cos(pos * exp(-i * log(10000.0F) / (channelSize - 1)));
}
}
/* padding zeros */
int padStart = padIdx * eSize;
for (int i = padStart; i < padStart + eSize; i++)
data[i] = 0.F;
posEmbeddingBase.SetData(data, posEmbeddingBase.unitNum);
if (w.dataType != posEmbeddingBase.dataType)
posEmbeddingBase = ConvertDataType(posEmbeddingBase, w.dataType);
delete[] data;
}
/*
make the network
>> input - the word indices
>> nstep - the length of current sequence
>> isDec - indicates whether it is decoder
>> isTraining - indicates whether it is training
<< return - word & position embeddings of the input
*/
XTensor Embedder::Make(XTensor& input, bool isDec, bool isTraining, int nstep)
{
/* make sure the padding index is 1 */
CheckNTErrors(input.order > 1, "Wrong input tensor size!");
CheckNTErrors(input.dimSize[input.order - 1] < maxLength, "The sequence is too long!");
CheckNTErrors(vSize > 0, "Set vocabulary size by \"-vsize\"");
CheckNTErrors(eSize > 0, "Set embedding size by \"-esize\"");
XTensor wordEmbedding, position, posEmbedding;
InitTensor1D(&position, input.GetDim(-1), X_INT, devID);
if (!isDec || isTraining || input.GetDim(-1) > 1)
{
position.Range(0, position.unitNum, 1);
// disable grad
ScaleAndShiftMe(position, 1.0F, float(padIdx + 1));
}
else
{
/* decoder embeddings during decoding */
position.SetDataFixed(nstep + padIdx + 1);
}
/* we make positional embeddings first */
XTensor embTMP;
embTMP = Gather(posEmbeddingBase, position);
posEmbedding = Unsqueeze(embTMP, 0, input.GetDim(0));
/* then we make word embeddings */
//w.enableGrad = false;
wordEmbedding = Gather(w, input);
wordEmbedding = Linear(wordEmbedding, (float)sqrt((float)eSize));
/* we sum over the two embeddings */
SumMe(wordEmbedding, posEmbedding);
return wordEmbedding;
}
}
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-08-01
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-07
*/
#ifndef __EMBEDDING_H__
#define __EMBEDDING_H__
#include "../Utility.h"
#include "../../niutensor/network/XNet.h"
using namespace nts;
namespace nmt
{
#define DEFAULT_EMBEDDING_SIZE 512
/*
embedding (of word at position i):
word embedding + positional embedding
*/
class Embedder
{
public:
/* device id */
int devID;
/* vocabulary size */
int vSize;
/* embedding size */
int eSize;
/* maximum length of the sequence */
int maxLength;
/* dimension size of the hidden layers in the model */
int d;
/* padding index */
int padIdx;
/* word embedding matrix */
XTensor w;
/* predefined positional embeddings. It can speeds up
the embedding processing by re-loading. */
XTensor posEmbeddingBase;
public:
/* constructor */
Embedder();
/* de-constructor */
~Embedder();
/* initialize the model */
void InitModel(Config& config, bool isEnc = true);
/* make positional embeddings */
void MakePosEmbedding(int length);
/* make the network */
XTensor Make(XTensor& input, bool isDec, bool isTraining, int nstep = 0);
};
}
#endif
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-04
*/
#include "FNN.h"
#include "Embedding.h"
#include "../Utility.h"
#include "../../niutensor/tensor/core/CHeader.h"
#include "../../niutensor/tensor/function/FHeader.h"
namespace nmt
{
/* constructor */
FNN::FNN()
{
inSize = -1;
outSize = -1;
hSize = -1;
}
/* de-constructor */
FNN::~FNN()
{
}
/*
initialize the model
>> argc - number of arguments
>> argv - list of pointers to the arguments
>> config - configurations of the model
*/
void FNN::InitModel(Config& config)
{
devID = config.devID;
inSize = config.modelSize;
outSize = config.modelSize;
hSize = config.fnnHiddenSize;
dropoutP = config.fnnDropout;
InitTensor2D(&w1, inSize, hSize, X_FLOAT, devID);
InitTensor1D(&b1, hSize, X_FLOAT, devID);
InitTensor2D(&w2, hSize, outSize, X_FLOAT, devID);
InitTensor1D(&b2, outSize, X_FLOAT, devID);
float scale = 1.0F;
_SetDataFanInOut(&w1, scale);
_SetDataFanInOut(&w2, scale);
w1.SetDataRand(-(DTYPE)sqrt(6.0F / inSize), (DTYPE)sqrt(6.0F / inSize));
w2.SetDataRand(-(DTYPE)sqrt(6.0F / hSize), (DTYPE)sqrt(6.0F / hSize));
b1.SetZeroAll();
b2.SetZeroAll();
}
/*
make the network
y = max(0, x * w1 + b1) * w2 + b2
>> input - the input tensor
>> return - the output tensor
*/
XTensor FNN::Make(XTensor& input, bool isTraining)
{
XTensor t1;
/* t1 = max(0, x * w1 + b1) */
t1 = Rectify(MulAndShift(input, w1, b1));
if (isTraining && dropoutP > 0)
t1 = Dropout(t1, dropoutP);
/* result = t1 * w2 + b2 */
return MulAndShift(t1, w2, b2);
}
}
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-04
*/
#ifndef __FNN_H__
#define __FNN_H__
#include "LayerNorm.h"
#include "../Utility.h"
#include "../../niutensor/tensor/XTensor.h"
using namespace nts;
namespace nmt
{
/* a fnn: y = max(0, x * w1 + b1) * w2 + b2 */
class FNN
{
public:
/* device id */
int devID;
/* size of input vector */
int inSize;
/* size of output vector */
int outSize;
/* size of hidden layers */
int hSize;
/* matrix of transformation 1 */
XTensor w1;
/* bias of transformation 1 */
XTensor b1;
/* matrix of transformation 2 */
XTensor w2;
/* bias of transformation 2 */
XTensor b2;
/* dropout probability */
DTYPE dropoutP;
public:
/* constructor */
FNN();
/* de-constructor */
~FNN();
/* initialize the model */
void InitModel(Config& config);
/* make the network */
XTensor Make(XTensor& input, bool isTraining);
};
}
#endif
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Bei Li (libei_neu@outlook.com) 2020-02-03
*/
#include "GLU.h"
#include "Embedding.h"
#include "../Utility.h"
#include "../../niutensor/tensor/core/CHeader.h"
#include "../../niutensor/tensor/function/FHeader.h"
namespace nmt
{
/* constructor */
GLU::GLU()
{
inSize = -1;
outSize = -1;
hSize = -1;
}
/* de-constructor */
GLU::~GLU()
{
}
/*
initialize the model
>> config - configurations of the model
*/
void GLU::InitModel(Config& config)
{
devID = config.devID;
float minmax = 0;
inSize = config.modelSize;
outSize = config.modelSize;
InitTensor2D(&w1, hSize, outSize, X_FLOAT, devID);
InitTensor1D(&b1, outSize, X_FLOAT, devID);
InitTensor2D(&w2, hSize, outSize, X_FLOAT, devID);
InitTensor1D(&b2, outSize, X_FLOAT, devID);
}
/*
make the network
y = W1 * x + b1 * sigmod(W2 * x + b2)
>> input - the input tensor, size = 2 * hSize
>> return - the output tensor, size = hSize
*/
XTensor GLU::Make(XTensor& input)
{
XTensor t1;
XTensor t2;
TensorList input_list;
/* split the input into two vectors with the dim hSize */
Split(input, input_list, -1, 2);
/* t1 = W1 * x + b1 */
t1 = MulAndShift(input_list.GetItem(0), w1, b1);
/* t2 = W2 * x + b2 */
t2 = MulAndShift(input_list.GetItem(1), w2, b2);
return t1 * Sigmoid(t2);
}
}
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Bei Li (libei_neu@outlook.com) 2020-02-03
*/
#ifndef __GLU_H__
#define __GLU_H__
#include "LayerNorm.h"
using namespace nts;
namespace nmt
{
/* a fnn: y = max(0, x * w1 + b1) * w2 + b2 */
class GLU
{
public:
/* device id */
int devID;
/* size of input vector */
int inSize;
/* size of output vector */
int outSize;
/* size of hidden layers */
int hSize;
/* matrix of transformation 1 */
XTensor w1;
/* bias of transformation 1 */
XTensor b1;
/* matrix of transformation 2 */
XTensor w2;
/* bias of transformation 2 */
XTensor b2;
public:
/* constructor */
GLU();
/* de-constructor */
~GLU();
/* initialize the model */
void InitModel(Config& config);
/* make the network */
XTensor Make(XTensor& input);
};
}
#endif
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Bei Li (libei_neu@outlook.com) 2020-02-03
*/
#include "Embedding.h"
#include "LayerNorm.h"
#include "LayerHistory.h"
#include "../Utility.h"
#include "../../niutensor/tensor/core/CHeader.h"
#define SAFE_DELETE(x) do{ if((x) != NULL){delete (x); (x) = NULL;} } while(false)
#define SAFE_DELETE_ARRAY(x) do{ if((x) != NULL) {delete [] (x); (x)=NULL;} } while(false)
namespace nmt
{
/* constructor */
LayerHistory::LayerHistory()
{
d = -1;
count = -1;
weight = NULL;
layerNorms = NULL;
}
/* de-constructor */
LayerHistory::~LayerHistory()
{
history.Clear();
delete[] layerNorms;
}
/*
initialize the model
>> config - configurations of the model
*/
void LayerHistory::InitModel(Config& config)
{
devID = config.devID;
d = config.modelSize;
nlayer = config.nEncLayer;
InitTensor2D(&weight, nlayer + 1, nlayer + 1, X_FLOAT, devID);
layerNorms = new LN[nlayer];
/* initialize the layer normalization of each layer */
for (int i = 0; i < nlayer; i++) {
layerNorms[i].InitModel(config);
}
}
/*
the Add operation
>> tensor - the previous layer output. It might be of size B * L * H
where B = batch size, L = sequence length,
and H = vector size of each position
*/
void LayerHistory::Add(XTensor& tensor)
{
/* the embedding is not normed */
count += 1;
if (history.Size() == 0) {
//sample_ = tensor;
history.Add(&tensor);
return;
}
XTensor ln = layerNorms[count - 2].Make(tensor);
history.Add(&ln);
}
/*
generate the weight sum vector of all previous layer output in the history as the layer input
*/
XTensor LayerHistory::Pop()
{
/* the number of layer output in the history */
size_t size = history.Size();
TensorList historyList;
for (size_t i = 0; i < size; i++)
historyList.Add(history[i]);
/* we need stack the tensor along the first dim*/
XTensor stackTensor = Stack(historyList, 0);
XTensor interWeight;
InitTensor2D(&interWeight, 1, weight.dimSize[1], DEFAULT_DTYPE, devID);
XTensor layerWeight;
InitTensor1D(&layerWeight, size, DEFAULT_DTYPE, devID);
_SelectRange(&weight, &interWeight, 0, size - 1, size);
interWeight.Reshape(interWeight.unitNum);
_SelectRange(&interWeight, &layerWeight, 0, 0, size);
MultiplyDimMe(stackTensor, layerWeight, 0);
XTensor result;
ReduceSum(stackTensor, result, 0);
return result;
}
void LayerHistory::ClearHistory()
{
history.Clear();
}
}
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Bei Li (libei_neu@outlook.com) 2020-02-03
*/
#ifndef __LAYERHISTORY_H__
#define __LAYERHISTORY_H__
#include "LayerNorm.h"
#include "LayerHistory.h"
#include "../../niutensor/tensor/function/FHeader.h"
using namespace nts;
namespace nmt
{
/*
multi-head attention
y(Q, K, V) = cat(head_1, head_2, ..., head_n)
where head_i = Attention(Q * w_i^Q, K * w_i^K, V * w_i^V)
attention(Q, K, V) = softmax(Q * K^T/d_k^0.5) V
d_k = dimension size of K
*/
class LayerHistory
{
public:
/* device id */
int devID;
/* the triangle weight matrix for dlcl */
XTensor weight;
/* hidden size */
int d;
/* layer number */
int nlayer;
/* current layer number */
int count;
/* a history to store the value of intimidate layers */
TensorList history;
/* layer normalization for each intimidate layer */
LN* layerNorms;
public:
/* constructor */
LayerHistory();
/* de-constructor */
~LayerHistory();
/* initialize the model */
void InitModel(Config& config);
/* add the layer output to the history */
void Add(XTensor& tensor);
/* compute the layer input for the current layer, the weight sum of all previous layer output after normed in the history */
XTensor Pop();
/* clean the history*/
void ClearHistory();
};
}
#endif
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-04
*/
#include "Embedding.h"
#include "LayerNorm.h"
#include "../Utility.h"
#include "../../niutensor/tensor/core/CHeader.h"
namespace nmt
{
/* constructor */
LN::LN()
{
devID = -1;
d = 0;
}
/* de-constructor */
LN::~LN()
{
}
/*
initialize the model
>> argc - number of arguments
>> argv - list of pointers to the arguments
>> config - configurations of the model
*/
void LN::InitModel(Config& config)
{
devID = config.devID;
d = config.modelSize;
InitTensor1D(&w, d, X_FLOAT, devID);
InitTensor1D(&b, d, X_FLOAT, devID);
w.SetDataRand(1.0F, 1.0F);
b.SetZeroAll();
w.SetDataFixed(1);
}
/*
make the network
>> input - the input tensor
>> return - layer normalization output
*/
XTensor LN::Make(XTensor& input)
{
XTensor& x = input;
XTensor xn;
XTensor mean;
XTensor variance;
XTensor standard;
XTensor meanFilled;
XTensor standardFilled;
TENSOR_DATA_TYPE dataType = input.dataType;
if (dataType == X_FLOAT16) {
/* reduce functions can only run with FP32 */
x = ConvertDataType(input, X_FLOAT);
}
/* \mu = (sum_i x_i)/m */
mean = ReduceMean(x, x.order - 1);
/* \sigma = (sum_i (x_i - \mu)^2)/m */
variance = ReduceVariance(x, x.order - 1, mean);
/* standard = sqrt(variance) */
standard = Power(variance, 0.5F);
/* unsqueeze mean and standard deviation to fit them into
the same shape of x */
meanFilled = Unsqueeze(mean, x.order - 1, x.GetDim(-1));
standardFilled = Unsqueeze(standard, x.order - 1, x.GetDim(-1));
/* x' = (x - \mu)/standard */
xn = (x - meanFilled) / standardFilled;
if (dataType != mean.dataType) {
x = ConvertDataType(x, dataType);
xn = ConvertDataType(xn, dataType);
}
/* result = x' * w + b */
return xn * w + b;
}
}
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-04
*/
#ifndef __LAYERNORMAL_H__
#define __LAYERNORMAL_H__
#include "../Utility.h"
#include "../../niutensor/network//XNet.h"
using namespace nts;
namespace nmt
{
/* layer normalization: y = norm(x) * w + b
where norm(x) = (x - mean)/standardDeviation */
class LN
{
public:
/* device id */
int devID;
/* the transformation matrix w */
XTensor w;
/* the bias term b */
XTensor b;
/* dimension size of the model */
int d;
public:
/* constructor */
LN();
/* de-constructor */
~LN();
/* initialize the model */
void InitModel(Config& config);
/* make the network */
XTensor Make(XTensor& input);
};
}
#endif
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: HU Chi (huchinlp@foxmail.com) 2020-03-21
*/
#include "NNUtil.h"
namespace nmt
{
/*
a wrapper for the gather function
>> src - the input tensor
>> index - the index tensor
<< res - the output tensor
*/
XTensor AutoGather(XTensor& src, XTensor& index)
{
if (src.order == 2)
return Gather(src, index);
else {
CheckNTErrors(src.order == 3, "the source must be 3d");
int order = src.order;
int dimSize[MAX_TENSOR_DIM_NUM];
for (int i = 0; i < src.order; i++) {
dimSize[i] = src.dimSize[i];
}
src.Reshape(src.dimSize[0], src.dimSize[1] * src.dimSize[2]);
XTensor res = Gather(src, index);
src.Reshape(order, dimSize);
dimSize[0] = index.dimSize[0];
dimSize[1] = res.unitNum / (dimSize[0] * dimSize[2]);
res.Reshape(order, dimSize);
return res;
}
}
}
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: HU Chi (huchinlp@foxmail.com) 2020-03-21
*/
#ifndef __NNUTIL_H__
#define __NNUTIL_H__
#include "../../niutensor/tensor/XGlobal.h"
#include "../../niutensor/tensor/core/CHeader.h"
#include "../../niutensor/tensor/function/FHeader.h"
using namespace nts;
namespace nmt
{
/* the gather function for tensor with any dimension */
XTensor AutoGather(XTensor& src, XTensor& index);
}
#endif
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-04
*/
#include "Output.h"
#include "Embedding.h"
#include "../Utility.h"
#include "../../niutensor/tensor/core/CHeader.h"
namespace nmt
{
/* constructor */
Output::Output()
{
devID = -1;
vSize = -1;
hSize = -1;
}
/* de-constructor */
Output::~Output()
{
}
/*
initialize the model
>> config - configurations of the model
*/
void Output::InitModel(Config& config)
{
devID = config.devID;
hSize = config.modelSize;
vSize = config.tgtVocabSize;
InitTensor2D(&w, vSize, hSize, X_FLOAT, devID);
DTYPE v = 1.0F / (float)sqrt((float)hSize);
w.SetDataRandn(0, v);
}
/*
make the network (redefined output tensor)
>> input - input tensor
>> output - output tensor
>> isTraining - whether it is used for training
>> normalized - whether ignore the log-softmax
*/
void Output::Make(XTensor& input, XTensor& output, bool isTraining, bool normalized)
{
XTensor& x = input;
output = MMul(x, X_NOTRANS, w, X_TRANS);
/* use softmax for training */
if (isTraining) {
output = Softmax(output, -1);
return;
}
/* normalize the output for beam search */
if (normalized) {
auto dataType = output.dataType;
if (dataType == X_FLOAT16)
output = ConvertDataType(output, X_FLOAT);
output = LogSoftmax(output, -1);
if (output.dataType != dataType)
output = ConvertDataType(output, dataType);
}
}
}
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-04
*/
#ifndef __OUTPUT_H__
#define __OUTPUT_H__
#include "../Utility.h"
#include "../../niutensor/tensor/function/FHeader.h"
using namespace nts;
namespace nmt
{
/* output layer */
class Output
{
public:
/* device id */
int devID;
/* vocabulary size */
int vSize;
/* vector size of the linear transformation */
int hSize;
/* transformation matrix */
XTensor w;
public:
/* constructor */
Output();
/* de-constructor */
~Output();
/* initialize the model */
void InitModel(Config& config);
/* make the network (redefined output tensor) */
void Make(XTensor& input, XTensor& output, bool isTraining, bool normalized);
};
}
#endif
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: HU Chi (huchinlp@foxmail.com) 2019-04-03
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-06
*/
#ifndef __TRAIN_DATASET_H__
#define __TRAIN_DATASET_H__
#include <cstdio>
#include <vector>
#include <fstream>
#include "../../niutensor/tensor/XList.h"
#include "../../niutensor/tensor/XTensor.h"
#include "../../niutensor/tensor/XGlobal.h"
#define MAX_WORD_NUM 120
using namespace std;
namespace nts {
/* a class of sentence pairs for training */
struct TrainExample {
/* id of the sentence pair */
int id;
/* source language setence (tokenized) */
IntList srcSent;
/* target language setence (tokenized) */
IntList tgtSent;
/* the key used to shuffle items in a bucket */
int key;
/* the key used to shuffle buckets */
int bucketKey;
};
/* A `TrainDataSet` is associated with a file which contains training data. */
struct TrainDataSet {
public:
/* the data buffer */
TrainBufferType buffer;
/* a list of empty line number */
IntList emptyLines;
/* the pointer to file stream */
FILE* fp;
/* current index in the buffer */
size_t curIdx;
/* size of used data in the buffer */
size_t bufferUsed;
/* size of the bucket used for grouping sentences */
size_t bucketSize;
/* indicates whether it is used for training */
bool isTraining;
public:
/* sort the input by length (in descending order) */
void SortByLength();
/* sort buckets by key (in descending order) */
void SortBucket();
/* sort the output by key (in descending order) */
void SortInBucket(int begin, int end);
/* load data from a file to the buffer */
void LoadDataToBuffer();
/* generate a mini-batch */
UInt64List LoadBatch(XTensor* batchEnc, XTensor* paddingEnc,
XTensor* batchDec, XTensor* paddingDec, XTensor* label,
size_t minSentBatch, size_t batchSize, int devID);
/* initialization function */
void Init(const char* dataFile, int bucketSize, bool training);
/* check if the buffer is empty */
bool IsEmpty();
/* reset the buffer */
void ClearBuf();
/* group data into buckets with similar length */
void BuildBucket();
/* de-constructor */
~TrainDataSet();
};
}
#endif // __TRAIN_DATASET_H__
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-08-02
*/
#ifndef __TRAINER_H__
#define __TRAINER_H__
#include "../Model.h"
#include "TrainDataSet.h"
using namespace nts;
namespace nmt
{
/* trainer of the model */
class Trainer
{
public:
/* configurations */
Config* cfg;
/* dimension size of each inner layer */
int d;
/* step number of warm-up for training */
int nwarmup;
/* vocabulary size of the source side */
int vSize;
/* vocabulary size of the target side */
int vSizeTgt;
/* learning rate */
float lrate;
/* the parameter that controls the maximum learning rate in training */
float lrbias;
/* sentence batch size */
int sBatchSize;
/* word batch size */
int wBatchSize;
/* size of bucket for grouping data by length */
int bucketSize;
/* training epoch number */
int nepoch;
/* traing step number */
int nstep;
/* the maximum number of saved checkpoints */
int maxCheckpoint;
/* indicates whether we use adam */
bool useAdam;
/* hyper parameters of adam*/
float adamBeta1;
float adamBeta2;
float adamDelta;
float adamBeta1T;
float adamBeta2T;
/* list of the moment of the parameter matrices */
TensorList moments;
/* list of the 2nd order moment of the parameter matrices */
TensorList moments2nd;
/* indicates whether the data file is shuffled for training */
bool isShuffled;
/* the factor of label smoothing */
DTYPE labelSmoothingP;
/* number of steps after which we make a checkpoint */
int nStepCheckpoint;
/* indicates whether we make a checkpoint after each training epoch */
bool useEpochCheckpoint;
/* number of batches on which we do model update */
int updateStep;
/* indicates whether the sequence is sorted by length */
bool isLenSorted;
/* used for loading batches */
TrainDataSet batchLoader;
public:
/* constructor */
Trainer();
/* de-constructor */
~Trainer();
/* initialize the trainer */
void Init(Config& config);
/* train the model */
void Train(const char* fn, const char* validFN, const char* modelFN, Model* model);
/* test the model */
void Validate(const char* fn, const char* ofn, Model* model);
/* make a checkpoint */
void MakeCheckpoint(Model* model, const char* validFN, const char* modelFN, const char* label, int id);
/* update the model by delta rule */
void Update(Model* model, const float lr);
/* prepare model for training */
void PrepareModel(Model* model);
};
}
#endif
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: HU Chi (huchinlp@foxmail.com) 2019-04-03
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-06
*/
#include <string>
#include <vector>
#include <cstdlib>
#include <fstream>
#include <algorithm>
#include "DataSet.h"
#include "../Utility.h"
using namespace nmt;
namespace nts {
/* sort the output by id (in ascending order) */
void DataSet::SortInput() {
sort(inputBuffer.items, inputBuffer.items + inputBuffer.count,
[](Example* a, Example* b) {
return a->values.count > b->values.count;
});
}
/* sort the input by length (in descending order) */
void DataSet::SortOutput() {
sort(outputBuffer.items, outputBuffer.items + outputBuffer.count,
[](Result* a, Result* b) {
return a->id < b->id;
});
}
/*
load data from the file to the buffer
*/
void DataSet::LoadDataToBuffer()
{
string line;
inputBuffer.Clear();
bufferUsed = 0;
int id = 0;
const string tokenDelimiter = " ";
while (getline(*fp, line)) {
IntList values;
/* load words and transform them to ids */
auto indices = SplitToPos(line, tokenDelimiter);
/* reserve the first 120 words if the input is too long */
size_t maxLen = indices.Size() > MAX_WORD_NUM ? MAX_WORD_NUM : indices.Size();
for (size_t i = 0; i < maxLen; i++) {
auto offset = (i != (indices.Size() - 1)) ?
indices[i + 1] - indices[i] - tokenDelimiter.size()
: line.size() - indices[i];
string word = line.substr(indices[i], offset);
if (srcVocab.word2id.find(word) == srcVocab.word2id.end())
values.Add(UNK);
else
values.Add(srcVocab.word2id.at(word));
}
/* make sure that the sequence ends with EOS */
if (values.Size() != 0 && values[-1] != EOS)
values.Add(EOS);
Example* example = new Example;
example->id = id;
example->values = values;
if (values.Size() != 0)
inputBuffer.Add(example);
else
emptyLines.Add(id);
id++;
}
fp->close();
SortInput();
XPRINT1(0, stderr, "[INFO] loaded %d sentences\n", id);
}
/*
load a mini-batch to the device (for translating)
>> batchEnc - a tensor to store the batch of input
>> paddingEnc - a tensor to store the batch of paddings
>> minSentBatch - the minimum number of sentence batch
>> batchSize - the maxium number of words in a batch
>> devID - the device id, -1 for the CPU
<< indices of the sentences
*/
UInt64List DataSet::LoadBatch(XTensor* batchEnc, XTensor* paddingEnc,
size_t minSentBatch, size_t batchSize, int devID)
{
size_t realBatchSize = minSentBatch;
/* get the maximum sentence length in a mini-batch */
size_t maxLen = inputBuffer[bufferUsed]->values.Size();
/* dynamic batching for sentences */
//while ((realBatchSize < (inputBuffer.Size() - bufferUsed))
// && (realBatchSize * maxLen < batchSize)) {
// realBatchSize++;
//}
/* real batch size */
if ((inputBuffer.Size() - bufferUsed) < realBatchSize) {
realBatchSize = inputBuffer.Size() - bufferUsed;
}
CheckNTErrors(maxLen != 0, "invalid length");
int* batchValues = new int[realBatchSize * maxLen];
float* paddingValues = new float[realBatchSize * maxLen];
for (int i = 0; i < realBatchSize * maxLen; i++) {
batchValues[i] = PAD;
paddingValues[i] = 1.0F;
}
size_t curSrc = 0;
/* right padding */
UInt64List infos;
size_t totalLength = 0;
for (int i = 0; i < realBatchSize; ++i) {
infos.Add(inputBuffer[bufferUsed + i]->id);
totalLength += inputBuffer[bufferUsed + i]->values.Size();
curSrc = maxLen * i;
for (int j = 0; j < inputBuffer[bufferUsed + i]->values.Size(); j++)
batchValues[curSrc++] = inputBuffer[bufferUsed + i]->values[j];
while (curSrc < maxLen * (i + 1))
paddingValues[curSrc++] = 0;
}
infos.Add(totalLength);
InitTensor2D(batchEnc, realBatchSize, maxLen, X_INT, devID);
InitTensor2D(paddingEnc, realBatchSize, maxLen, X_FLOAT, devID);
bufferUsed += realBatchSize;
batchEnc->SetData(batchValues, batchEnc->unitNum);
paddingEnc->SetData(paddingValues, paddingEnc->unitNum);
delete[] batchValues;
delete[] paddingValues;
return infos;
}
/*
the constructor of DataSet
>> dataFile - path of the data file
>> srcVocabFN - path of the source vocab file
>> tgtVocabFN - path of the target vocab file
*/
void DataSet::Init(const char* dataFile, const char* srcVocabFN, const char* tgtVocabFN)
{
fp = new ifstream(dataFile);
CheckNTErrors(fp->is_open(), "Can not open the test data");
bufferUsed = 0;
CheckNTErrors(strcmp(srcVocabFN, "") != 0, "missing source vocab file");
CheckNTErrors(strcmp(tgtVocabFN, "") != 0, "missing target vocab file");
srcVocab.Load(srcVocabFN);
/* share source and target vocabs */
if (strcmp(srcVocabFN, tgtVocabFN) == 0) {
XPRINT(0, stderr, "[INFO] share source and target vocabs \n");
tgtVocab.CopyFrom(srcVocab);
}
else {
tgtVocab.Load(tgtVocabFN);
}
LoadDataToBuffer();
}
/* check if the buffer is empty */
bool DataSet::IsEmpty() {
if (bufferUsed < inputBuffer.Size())
return false;
return true;
}
/* dump the translation to a file */
void DataSet::DumpRes(const char* ofn)
{
ofstream ofile(ofn, ios::out);
for (int t = 0; t < outputBuffer.Size(); t++) {
auto res = outputBuffer[t];
for (int i = 0; i < res->res.Size(); i++) {
if (res->res[i] < 4)
break;
ofile << tgtVocab.id2word[res->res[i]] << " ";
}
ofile << "\n";
}
ofile.close();
}
/* de-constructor */
DataSet::~DataSet()
{
/* release the file */
delete fp;
/* release the input buffer */
for (int i = 0; i < inputBuffer.Size(); i++)
delete inputBuffer[i];
/* release the output buffer */
for (int i = 0; i < outputBuffer.Size(); i++)
delete outputBuffer[i];
}
}
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: HU Chi (huchinlp@foxmail.com) 2019-04-03
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-06
*/
#ifndef __DATASET_H__
#define __DATASET_H__
#include <cstdio>
#include <vector>
#include <fstream>
#include "Vocab.h"
#include "../../niutensor/tensor/XList.h"
#include "../../niutensor/tensor/XTensor.h"
#include "../../niutensor/tensor/XGlobal.h"
#define MAX_WORD_NUM 120
using namespace std;
namespace nts {
/* the struct of tokenized input */
struct Example {
int id;
IntList values;
};
/* the struct of tokenized output */
struct Result {
int id;
IntList res;
};
/* A `DataSet` is associated with a file which contains variable length data.*/
struct DataSet {
public:
/* the data buffer */
InputBufferType inputBuffer;
/* a list of empty line number */
IntList emptyLines;
/* the result buffer */
OutputBufferType outputBuffer;
/* the pointer to file stream */
ifstream* fp;
/* size of used data in buffer */
size_t bufferUsed;
/* the source vocabulary */
Vocab srcVocab;
/* the target vocabulary */
Vocab tgtVocab;
public:
/* sort the input by length */
void SortInput();
/* reorder the output by ids */
void SortOutput();
/* load data from a file to the buffer */
void LoadDataToBuffer();
/* generate a mini-batch */
UInt64List LoadBatch(XTensor* batchEnc, XTensor* paddingEnc,
size_t sBatch, size_t wBatch, int devID);
/* initialization function */
void Init(const char* dataFile, const char* srcVocabFN, const char* tgtVocabFN);
/* check if the buffer is empty */
bool IsEmpty();
/* dump the translations to a file */
void DumpRes(const char* ofn);
/* de-constructor */
~DataSet();
};
}
#endif // __DATASET_H__
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2019-04-08
* Start of a new week - I just finished several documents.
* Writing document is harder than writing code :)
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-04
*/
#include "LengthPenalty.h"
using namespace nts;
namespace nmt
{
/*
GNMT-like length penalty: pl = ((5 + n)/(5 + 1))^\alpha
where n = length of the sequence
>> length - length of the sequence
>> alpha - the parameter controls the length preference
<< return - length penalty of the sequence
*/
float LengthPenalizer::GNMT(float length, float alpha)
{
float base;
float lp;
base = (length + 5.0F) / (1.0F + 5.0F);
lp = pow(base, alpha);
return lp;
}
}
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2019-04-08
* Start of a new week - I just finished several documents.
* Writing document is harder than writing code :)
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-04
*/
#ifndef __LENGTHPENALTY_H__
#define __LENGTHPENALTY_H__
#include "../Utility.h"
#include "../../niutensor/tensor/XTensor.h"
using namespace nts;
namespace nmt
{
/* We intend to penalize short sequences because they have higher score
in product of a sequence of probability-like terms and have more chances
to beat others in search. */
class LengthPenalizer
{
public:
/* GNMT-like length penalty: pl = ((5 + n)/(5 + 1))^\alpha
where n = length of the sequence */
static float GNMT(float length, float alpha);
};
}
#endif
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2019-03-13
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-04
*/
#include <iostream>
#include "Predictor.h"
#include "../layer/NNUtil.h"
using namespace nts;
namespace nmt
{
/* constructor */
StateBundle::StateBundle()
{
states = NULL;
isStart = false;
}
/* de-constructor */
StateBundle::~StateBundle()
{
if (states != NULL)
delete[] states;
}
/*
create states
>> num - number of states
*/
void StateBundle::MakeStates(int num)
{
CheckNTErrors(num > 0, "invalid number");
if (states != NULL)
delete[] states;
states = new State[num];
for (int i = 0; i < num; i++) {
states[i].prediction = -1;
states[i].pid = _PID_EMPTY;
states[i].isEnd = false;
states[i].isStart = false;
states[i].isCompleted = false;
states[i].prob = 0;
states[i].probPath = 0;
states[i].modelScore = 0;
states[i].nstep = 0;
states[i].last = NULL;
}
stateNum = num;
}
/* constructor */
Predictor::Predictor()
{
startSymbol = 2;
}
/* de-constructor */
Predictor::~Predictor()
{
}
/*
create an initial state
>> model - the model
>> top - the top-most layer of the network
>> input - input of the network
>> beamSize - beam size
>> state - the state to be initialized
*/
void Predictor::Create(Model* model, XTensor* top, const XTensor* input,
int beamSize, StateBundle* state)
{
int dims[MAX_TENSOR_DIM_NUM];
for (int i = 0; i < input->order - 1; i++)
dims[i] = input->dimSize[i];
dims[input->order - 1] = beamSize;
InitTensor(&state->probPath, input->order, dims, X_FLOAT, input->devID);
InitTensor(&state->endMark, input->order, dims, X_INT, input->devID);
state->probPath.SetZeroAll();
state->nstep = 0.0F;
state->endMark.SetZeroAll();
state->stateNum = 0;
}
/*
set start symbol
>> symbol - the symbol (in integer)
*/
void Predictor::SetStartSymbol(int symbol)
{
startSymbol = symbol;
}
/*
read a state
>> model - the model that keeps the network created so far
>> state - a set of states. It keeps
1) hypotheses (states)
2) probabilities of hypotheses
3) parts of the network for expanding toward the next state
*/
void Predictor::Read(Model* model, StateBundle* state)
{
m = model;
s = state;
}
/*
predict the next state
>> next - next states
>> aliveIndices - indices of alive states, (B)
>> absoluteIdx - the absolute indices of alive states, (B)
>> encoding - encoder output, (B, L, E)
>> inputEnc - input of the encoder, (B, L)
>> paddingEnc - padding of the encoder, (B, L)
>> rawBatchSize - the raw batch size (in case of some states are pruned)
>> isStart - whether it is the start state or not
>> reorderState - the new order of states
>> needReorder - whether we need reordering the states
>> nstep - current time step of the target sequence
*/
void Predictor::Predict(StateBundle* next, XTensor& aliveState, XTensor& encoding,
XTensor& inputEnc, XTensor& paddingEnc, int batchSize, bool isStart,
XTensor& reorderState, bool needReorder, int nstep)
{
int dims[MAX_TENSOR_DIM_NUM];
/* word indices of positions up to next state */
XTensor inputDec;
/* the first token */
XTensor first;
InitTensor2D(&first, batchSize, 1, X_INT, inputEnc.devID);
first.SetDataFixed(startSymbol);
/* add a new word into the input sequence of the decoder side */
if (isStart) {
inputDec = Identity(first);
}
else {
/* only pass one step to the decoder */
inputDec = GetLastPrediction(s, inputEnc.devID);
}
/* keep alive states for the decoder */
if (aliveState.dimSize[0] < batchSize) {
/* alive inputs */
inputDec = AutoGather(inputDec, aliveState);
/* alive cache */
for (int i = 0; i < m->decoder->nlayer; i++) {
m->decoder->selfAttCache[i].KeepAlive(aliveState);
m->decoder->enDeAttCache[i].KeepAlive(aliveState);
}
}
if (needReorder) {
for (int i = 0; i < m->decoder->nlayer; i++) {
m->decoder->selfAttCache[i].Reorder(reorderState);
m->decoder->enDeAttCache[i].Reorder(reorderState);
}
}
/* prediction probabilities */
XTensor& output = next->prob;
XTensor decoding;
for (int i = 0; i < inputDec.order - 1; i++)
dims[i] = inputDec.dimSize[i];
dims[inputDec.order - 1] = inputDec.dimSize[inputDec.order - 1];
XTensor paddingDec;
InitTensor(&paddingDec, inputDec.order, dims, X_INT, paddingEnc.devID);
paddingDec.SetDataFixed(1);
XTensor maskDec;
XTensor maskEncDec;
/* decoder mask */
m->MakeMTMaskDec(paddingEnc, paddingDec, maskDec, maskEncDec);
/* make the decoding network */
decoding = m->decoder->Make(inputDec, encoding, NULL, &maskEncDec, nstep, false);
CheckNTErrors(decoding.order >= 2, "The tensor must be of order 2 or larger!");
/* generate the output probabilities */
m->outputLayer->Make(decoding, output, false, true);
}
/*
generate paths up to the states of the current step
>> state - state bundle of the current step
*/
XTensor Predictor::GeneratePaths(StateBundle* state)
{
CheckNTErrors(state->stateNum >= 0, "Illegal state!");
int distance = -1;
for (int i = 0; i < state->stateNum; i++) {
State* cur = state->states + i;
int nsteps = 0;
while (cur != NULL) {
nsteps++;
cur = cur->last;
}
if (nsteps > distance)
distance = nsteps;
}
XTensor path;
InitTensor2D(&path, state->stateNum, distance, X_INT);
path.SetZeroAll();
for (int i = 0; i < state->stateNum; i++) {
State* cur = state->states + i;
int nsteps = 0;
while (cur != NULL) {
nsteps++;
path.Set2DInt(cur->prediction, i, distance - nsteps);
cur = cur->last;
}
}
return path;
}
/*
get the predictions of the previous step
>> state - state bundle of the current step
>> devID - the device id for the predictions
*/
XTensor Predictor::GetLastPrediction(StateBundle* state, int devID)
{
CheckNTErrors(state->stateNum >= 0, "Illegal state!");
IntList last;
for (int i = 0; i < state->stateNum; i++) {
State* cur = state->states + i;
last.Add(cur->prediction);
}
XTensor lastPred;
InitTensor2D(&lastPred, int(last.Size()), 1, X_INT, devID);
lastPred.SetData(last.items, int(last.Size()));
return lastPred;
}
}
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2019-03-13
* This is the first source file I create in 2019 - new start!
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-04
*/
#ifndef __PREDICTOR_H__
#define __PREDICTOR_H__
#include "../Model.h"
#include "LengthPenalty.h"
using namespace std;
namespace nmt
{
#define _PID_EMPTY -1
/* state for search. It keeps the path (back-pointer), prediction distribution,
and etc. It can be regarded as a hypotheses in translation. */
class State
{
public:
/* we assume that the prediction is an integer */
int prediction;
/* id of the problem. One can regard it as the sentence id when we
translate a number of sentences in the batched manner. The hypotheses
is empty if id = -1 */
int pid;
/* indicates whether the state is an end */
bool isEnd;
/* indicates whether the state is the start */
bool isStart;
/* indicates whether the state is completed */
bool isCompleted;
/* probability of every prediction (last state of the path) */
float prob;
/* probability of every path */
float probPath;
/* model score of every path. A model score = path probability + some other stuff */
float modelScore;
/* number of steps we go over so far */
int nstep;
/* pointer to the previous state */
State* last;
};
/* a bundle of states */
class StateBundle
{
public:
/* predictions */
XTensor prediction;
/* id of the previous state that generates the current one */
XTensor preID;
/* mark that indicates whether each hypotheses is completed */
XTensor endMark;
/* probability of every prediction (last state of the path) */
XTensor prob;
/* probability of every path */
XTensor probPath;
/* model score of every path */
XTensor modelScore;
/* step number of each hypotheses */
float nstep;
/* list of states */
State* states;
/* number of states */
int stateNum;
/* indicates whether it is the first state */
bool isStart;
public:
/* constructor */
StateBundle();
/* de-constructor */
~StateBundle();
/* create states */
void MakeStates(int num);
};
/* The predictor reads the current state and then predicts the next.
It is exactly the same procedure of MT inference -
we get the state of previous words and then generate the next word.
Here, a state can be regarded as the representation of words (word
indices, hidden states, embeddings and etc.). */
class Predictor
{
private:
/* pointer to the transformer model */
Model* m;
/* current state */
StateBundle* s;
/* start symbol */
int startSymbol;
/* end symbol */
int endSymbol;
public:
/* constructor */
Predictor();
/* de-constructor */
~Predictor();
/* create an initial state */
void Create(Model* model, XTensor* top, const XTensor* input, int beamSize, StateBundle* state);
/* set the start symbol */
void SetStartSymbol(int symbol);
/* read a state */
void Read(Model* model, StateBundle* state);
/* predict the next state */
void Predict(StateBundle* next, XTensor& aliveIndices, XTensor& encoding,
XTensor& inputEnc, XTensor& paddingEnc, int rawBatchSize,
bool isStart, XTensor& reorderState, bool needReorder, int nstep);
/* generate paths up to the states of the current step */
XTensor GeneratePaths(StateBundle* state);
/* get the predictions of the previous step */
XTensor GetLastPrediction(StateBundle* state, int devID);
};
}
#endif
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2019-03-27
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-04, 2020-06
*/
#ifndef __SEARCH_H__
#define __SEARCH_H__
#include "../Model.h"
#include "Predictor.h"
using namespace std;
namespace nmt
{
/* The class organizes the search process. It calls "predictors" to generate
distributions of the predictions and prunes the search space by beam pruning.
This makes a graph where each path represents a translation hypotheses.
The output can be the path with the highest model score. */
class BeamSearch
{
private:
/* the alpha parameter controls the length preference */
float alpha;
/* predictor */
Predictor predictor;
/* max length of the generated sequence */
int maxLength;
/* beam size */
int beamSize;
/* batch size */
int batchSize;
/* we keep the final hypotheses in a heap for each sentence in the batch. */
XHeap<MIN_HEAP, float>* fullHypos;
/* array of the end symbols */
int* endSymbols;
/* number of the end symbols */
int endSymbolNum;
/* start symbol */
int startSymbol;
/* scalar of the input sequence (for max number of search steps) */
float scalarMaxLength;
/* indicate whether the early stop strategy is used */
bool isEarlyStop;
/* pids for alive states */
IntList aliveStatePids;
/* alive sentences */
IntList aliveSentList;
/* whether we need to reorder the states */
bool needReorder;
public:
/* constructor */
BeamSearch();
/* de-constructor */
~BeamSearch();
/* initialize the model */
void Init(Config& config);
/* search for the most promising states */
void Search(Model* model, XTensor& input, XTensor& padding, IntList* output, XTensor& score);
/* preparation */
void Prepare(int myBatchSize, int myBeamSize);
/* compute the model score for each hypotheses */
void Score(StateBundle* prev, StateBundle* beam);
/* generate token indices via beam pruning */
void Generate(StateBundle* prev, StateBundle* beam);
/* expand the search graph */
void Expand(StateBundle* prev, StateBundle* beam, XTensor& reorderState);
/* collect hypotheses with ending symbol */
void Collect(StateBundle* beam);
/* fill the hypotheses heap with incomplete hypotheses */
void FillHeap(StateBundle* beam);
/* save the output sequences and score */
void Dump(IntList* output, XTensor* score);
/* check if the token is an end symbol */
bool IsEnd(int token);
/* check whether all hypotheses are completed */
bool IsAllCompleted(StateBundle* beam);
/* update the beam by pruning finished states */
void RemoveFinishedStates(StateBundle* beam, XTensor& aliveEncoding,
XTensor& aliveInput, XTensor& alivePadding, XTensor& aliveIdx);
/* set end symbols for search */
void SetEnd(const int* tokens, const int tokenNum);
/* make a mask to prevent duplicated entries in beam expansion for the first position */
XTensor MakeFirstMask(StateBundle* beam);
};
class GreedySearch
{
private:
/* predictor */
Predictor predictor;
/* max length of the generated sequence */
int maxLength;
/* batch size */
int batchSize;
/* array of the end symbols */
int* endSymbols;
/* number of the end symbols */
int endSymbolNum;
/* start symbol */
int startSymbol;
/* scalar of the input sequence (for max number of search steps) */
float scalarMaxLength;
public:
/* constructor */
GreedySearch();
/* de-constructor */
~GreedySearch();
/* initialize the model */
void Init(Config& config);
/* search for the most promising states */
void Search(Model* model, XTensor& input, XTensor& padding, IntList* output);
/* preparation */
void Prepare(int myBatchSize);
/* check if the token is an end symbol */
bool IsEnd(int token);
/* set end symbols for search */
void SetEnd(const int* tokens, const int tokenNum);
};
}
#endif
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2019-03-27
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-04, 2020-06
*/
#include "Search.h"
#include "Translator.h"
#include "../Utility.h"
#include "../../niutensor/tensor/XTensor.h"
#include "../../niutensor/tensor/XUtility.h"
#include "../../niutensor/tensor/core/CHeader.h"
using namespace nts;
namespace nmt
{
/* constructor */
Translator::Translator()
{
}
/* de-constructor */
Translator::~Translator()
{
if (beamSize > 1)
delete (BeamSearch*)seacher;
else
delete (GreedySearch*)seacher;
}
/* initialize the model */
void Translator::Init(Config& config)
{
beamSize = config.beamSize;
vSize = config.srcVocabSize;
vSizeTgt = config.tgtVocabSize;
sentBatch = config.sBatchSize;
wordBatch = config.wBatchSize;
if (beamSize > 1) {
LOG("translating with beam search (%d)", beamSize);
seacher = new BeamSearch();
((BeamSearch*)seacher)->Init(config);
}
else if (beamSize == 1) {
LOG("translating with greedy search");
seacher = new GreedySearch();
((GreedySearch*)seacher)->Init(config);
}
else {
CheckNTErrors(false, "Invalid beam size\n");
}
}
/*
test the model
>> ifn - input data file
>> sfn - source vocab file
>> tfn - target vocab file
>> ofn - output data file
>> model - pretrained model
*/
void Translator::Translate(const char* ifn, const char* sfn,
const char* tfn, const char* ofn, Model* model)
{
int wc = 0;
int wordCountTotal = 0;
int sentCount = 0;
int batchCount = 0;
int devID = model->devID;
double startT = GetClockSec();
/* batch of input sequences */
XTensor batchEnc;
/* padding */
XTensor paddingEnc;
batchLoader.Init(ifn, sfn, tfn);
LOG("loaded the input file, elapsed=%.1fs ", GetClockSec() - startT);
int count = 0;
double batchStart = GetClockSec();
while (!batchLoader.IsEmpty())
{
count++;
for (int i = 0; i < model->decoder->nlayer; ++i) {
model->decoder->selfAttCache[i].miss = true;
model->decoder->enDeAttCache[i].miss = true;
}
auto indices = batchLoader.LoadBatch(&batchEnc, &paddingEnc,
sentBatch, wordBatch, devID);
IntList* output = new IntList[indices.Size() - 1];
/* greedy search */
if (beamSize == 1) {
((GreedySearch*)seacher)->Search(model, batchEnc, paddingEnc, output);
}
/* beam search */
else {
XTensor score;
((BeamSearch*)seacher)->Search(model, batchEnc, paddingEnc, output, score);
}
for (int i = 0; i < indices.Size() - 1; ++i) {
Result* res = new Result;
res->id = int(indices[i]);
res->res = output[i];
batchLoader.outputBuffer.Add(res);
}
delete[] output;
wc += int(indices[-1]);
wordCountTotal += int(indices[-1]);
sentCount += int(indices.Size() - 1);
batchCount += 1;
if (count % 1 == 0) {
double elapsed = GetClockSec() - batchStart;
batchStart = GetClockSec();
LOG("elapsed=%.1fs, sentence=%f, sword=%.1fw/s",
elapsed, float(sentCount) / float(batchLoader.inputBuffer.Size()),
double(wc) / elapsed);
wc = 0;
}
}
/* append empty lines to the result */
for (int i = 0; i < batchLoader.emptyLines.Size(); i++) {
Result* emptyRes = new Result;
emptyRes->id = batchLoader.emptyLines[i];
batchLoader.outputBuffer.Add(emptyRes);
}
double startDump = GetClockSec();
/* reorder the result */
batchLoader.SortOutput();
/* print the result to a file */
batchLoader.DumpRes(ofn);
double elapsed = GetClockSec() - startDump;
LOG("translation completed (word=%d, sent=%zu)",
wordCountTotal, batchLoader.inputBuffer.Size() + batchLoader.emptyLines.Size());
}
/*
dump the result into the file
>> file - data file
>> output - output tensor
*/
void Translator::Dump(FILE* file, XTensor* output)
{
if (output != NULL && output->unitNum != 0) {
int seqLength = output->dimSize[output->order - 1];
for (int i = 0; i < output->unitNum; i += seqLength) {
for (int j = 0; j < seqLength; j++) {
int w = output->GetInt(i + j);
if (w < 0 || w == 1 || w == 2)
break;
fprintf(file, "%d ", w);
}
fprintf(file, "\n");
}
}
else
{
fprintf(file, "\n");
}
}
}
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2019-03-27
* A week with no trips :)
* $Modified by: HU Chi (huchinlp@gmail.com) 2020-06
*/
#ifndef __TESTER_H__
#define __TESTER_H__
#include "Search.h"
#include "DataSet.h"
namespace nmt
{
/* This class translates test sentences with a trained model. */
class Translator
{
public:
/* vocabulary size of the source side */
int vSize;
/* vocabulary size of the target side */
int vSizeTgt;
/* batch size for sentences */
int sentBatch;
/* batch size for words */
int wordBatch;
/* beam size */
int beamSize;
/* for batching */
DataSet batchLoader;
/* decoder for inference */
void* seacher;
public:
/* constructor */
Translator();
/* de-constructor */
~Translator();
/* initialize the model */
void Init(Config& config);
/* test the model */
void Translate(const char* ifn, const char* vfn, const char* ofn,
const char* tfn, Model* model);
/* dump the result into the file */
void Dump(FILE* file, XTensor* output);
};
}
#endif
\ No newline at end of file
/* NiuTrans.NMT - an open-source neural machine translation system.
* Copyright (C) 2020 NiuTrans Research. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: HU Chi (huchinlp@foxmail.com) 2020-01-03
*/
#include <fstream>
#include "Vocab.h"
#include "../Utility.h"
namespace nts {
/* load a vocabulary from a file */
void Vocab::Load(const string& src)
{
string vsz, sid;
ifstream f(src, ios::in);
CheckNTErrors(f.is_open(), "unable to open the vocabulary file");
/* get the vocab size and the start id */
f >> vsz >> sid;
startID = stol(sid);
vocabSize = stol(vsz);
string word, id;
for (int i = 0; i < vocabSize - startID; i++) {
f >> word >> id;
word2id[word] = stol(id);
id2word[stol(id)] = word;
}
f.close();
}
/* save a vocabulary to a file */
void Vocab::Save(const string& src)
{
ofstream f(src, ios::out);
/* the first line: size of the vocab and the start id */
f << vocabSize << "\t" << startID;
/* other lines: words and indices */
for (const auto& p : word2id)
f << p.first << "\t" << p.second;
f.close();
}
/*
copy data from another vocabulary
>> v - the target vocabulary
*/
void Vocab::CopyFrom(const Vocab& v)
{
for (const auto& w2i : v.word2id)
word2id.insert(w2i);
for (const auto& i2w : v.id2word)
id2word.insert(i2w);
}
}
\ No newline at end of file
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论