Commit a3d277c8 by libei

fix bugs

parent 4b5dcaed
......@@ -858,6 +858,7 @@ def transformer_before_shared25():
hparams.learning_rate_warmup_steps = 8000
hparams.optimizer = "MultistepAdam"
hparams.optimizer_multistep_accumulate_steps = 4
hparams.encoder_layers = 25
# it's likely to oom when you train deep transformer-pre-norm within 4096 batch_size
hparams.batch_size = 2048
return hparams
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论