egs/tibetan/asr/conf/big.yaml · 9fe8cd1e7777df48a3218387868d4868595a2f3e · xuchen / Fairseq-S2T

I optimized the implementation of S2T. · 380d7794

It must be said that some problems still confuse me:
1. Whether to scale in the input layer (I try to replace it with layer specification);
2. The detailed setting of weight sharing between output projection matrix and embedding matrix in the adapter (I notice that inconsistent variance will lead to bad results);
3. The biggest confusion is that the variance increases with the calculation layer by layer (I am not sure if this phenomenon is reasonable, I will compare the behavior on the latest code).
Finally, the detailed implementation is so important to the final performance, even if it is a subtle difference.

committed May 24, 2022

380d7794

big.yaml 728 Bytes