I optimized the implementation of S2T.
It must be said that some problems still confuse me: 1. Whether to scale in the input layer (I try to replace it with layer specification); 2. The detailed setting of weight sharing between output projection matrix and embedding matrix in the adapter (I notice that inconsistent variance will lead to bad results); 3. The biggest confusion is that the variance increases with the calculation layer by layer (I am not sure if this phenomenon is reasonable, I will compare the behavior on the latest code). Finally, the detailed implementation is so important to the final performance, even if it is a subtle difference.
正在显示
egs/aishell/asr/conf/purectc.yaml
0 → 100644
egs/tibetan/asr/binary.sh
0 → 100644
egs/tibetan/asr/conf/base.yaml
0 → 100644
egs/tibetan/asr/conf/basis.yaml
0 → 100644
egs/tibetan/asr/conf/big.yaml
0 → 100644
egs/tibetan/asr/conf/big_wenet.yaml
0 → 100644
egs/tibetan/asr/conf/conformer.yaml
0 → 100644
egs/tibetan/asr/conf/ctc.yaml
0 → 100644
egs/tibetan/asr/conf/dlcl.yaml
0 → 100644
egs/tibetan/asr/conf/inter.yaml
0 → 100644
egs/tibetan/asr/conf/local_attn.yaml
0 → 100644
egs/tibetan/asr/conf/pds_base.yaml
0 → 100644
egs/tibetan/asr/conf/pds_base_16.yaml
0 → 100644
egs/tibetan/asr/conf/pds_base_32.yaml
0 → 100644
egs/tibetan/asr/conf/pds_base_4.yaml
0 → 100644
egs/tibetan/asr/conf/pds_base_8.yaml
0 → 100644
egs/tibetan/asr/conf/pds_base_8_grow.yaml
0 → 100644
egs/tibetan/asr/conf/pds_big_8.yaml
0 → 100644
egs/tibetan/asr/conf/purectc.yaml
0 → 100644
egs/tibetan/asr/conf/purectc_pds_base.yaml
0 → 100644
egs/tibetan/asr/conf/purectc_pds_base_8.yaml
0 → 100644
egs/tibetan/asr/conf/rpr.yaml
0 → 100644
egs/tibetan/asr/decode.sh
0 → 100644
egs/tibetan/asr/local/monitor.sh
0 → 100644
egs/tibetan/asr/local/parse_options.sh
0 → 100644
egs/tibetan/asr/local/utils.sh
0 → 100644
egs/tibetan/asr/run.sh
0 → 100644
egs/tibetan/asr/train.sh
0 → 100644
egs/wmt16/mt/conf/deep_ctc.yaml
0 → 100644
egs/wmt16/mt/conf/inter.yaml
0 → 100644
请
注册
或者
登录
后发表评论