It must be said that some problems still confuse me: 1. Whether to scale in the input layer (I try to replace it with layer specification); 2. The detailed setting of weight sharing between output projection matrix and embedding matrix in the adapter (I notice that inconsistent variance will lead to bad results); 3. The biggest confusion is that the variance increases with the calculation layer by layer (I am not sure if this phenomenon is reasonable, I will compare the behavior on the latest code). Finally, the detailed implementation is so important to the final performance, even if it is a subtle difference.
Name |
Last commit
|
Last Update |
---|---|---|
.. | ||
__init__.py | 正在载入提交数据... | |
adaptive_loss.py | 正在载入提交数据... | |
composite_loss.py | 正在载入提交数据... | |
cross_entropy.py | 正在载入提交数据... | |
ctc.py | 正在载入提交数据... | |
fairseq_criterion.py | 正在载入提交数据... | |
join_speech_and_text_loss.py | 正在载入提交数据... | |
label_smoothed_cross_entropy.py | 正在载入提交数据... | |
label_smoothed_cross_entropy_latency_augmented.py | 正在载入提交数据... | |
label_smoothed_cross_entropy_with_alignment.py | 正在载入提交数据... | |
label_smoothed_cross_entropy_with_ctc.py | 正在载入提交数据... | |
legacy_masked_lm.py | 正在载入提交数据... | |
masked_lm.py | 正在载入提交数据... | |
model_criterion.py | 正在载入提交数据... | |
nat_loss.py | 正在载入提交数据... | |
sentence_prediction.py | 正在载入提交数据... | |
sentence_ranking.py | 正在载入提交数据... | |
wav2vec_criterion.py | 正在载入提交数据... |