It must be said that some problems still confuse me: 1. Whether to scale in the input layer (I try to replace it with layer specification); 2. The detailed setting of weight sharing between output projection matrix and embedding matrix in the adapter (I notice that inconsistent variance will lead to bad results); 3. The biggest confusion is that the variance increases with the calculation layer by layer (I am not sure if this phenomenon is reasonable, I will compare the behavior on the latest code). Finally, the detailed implementation is so important to the final performance, even if it is a subtle difference.
| Name |
Last commit
|
Last Update |
|---|---|---|
| .. | ||
| bart | 正在载入提交数据... | |
| huggingface | 正在载入提交数据... | |
| nat | 正在载入提交数据... | |
| roberta | 正在载入提交数据... | |
| speech_to_text | 正在载入提交数据... | |
| wav2vec | 正在载入提交数据... | |
| __init__.py | 正在载入提交数据... | |
| composite_encoder.py | 正在载入提交数据... | |
| distributed_fairseq_model.py | 正在载入提交数据... | |
| fairseq_decoder.py | 正在载入提交数据... | |
| fairseq_encoder.py | 正在载入提交数据... | |
| fairseq_incremental_decoder.py | 正在载入提交数据... | |
| fairseq_model.py | 正在载入提交数据... | |
| fconv.py | 正在载入提交数据... | |
| fconv_lm.py | 正在载入提交数据... | |
| fconv_self_att.py | 正在载入提交数据... | |
| lightconv.py | 正在载入提交数据... | |
| lightconv_lm.py | 正在载入提交数据... | |
| lstm.py | 正在载入提交数据... | |
| lstm_lm.py | 正在载入提交数据... | |
| masked_lm.py | 正在载入提交数据... | |
| model_utils.py | 正在载入提交数据... | |
| multilingual_transformer.py | 正在载入提交数据... | |
| transformer.py | 正在载入提交数据... | |
| transformer_align.py | 正在载入提交数据... | |
| transformer_ctc.py | 正在载入提交数据... | |
| transformer_from_pretrained_xlm.py | 正在载入提交数据... | |
| transformer_lm.py | 正在载入提交数据... | |
| transformer_s2.py | 正在载入提交数据... |