fix the bug of suming the word numbers of both source and target-sides for the loss computation in transformer MT models