It must be said that some problems still confuse me: 1. Whether to scale in the input layer (I try to replace it with layer specification); 2. The detailed setting of weight sharing between output projection matrix and embedding matrix in the adapter (I notice that inconsistent variance will lead to bad results); 3. The biggest confusion is that the variance increases with the calculation layer by layer (I am not sure if this phenomenon is reasonable, I will compare the behavior on the latest code). Finally, the detailed implementation is so important to the final performance, even if it is a subtle difference.
| Name |
Last commit
|
Last Update |
|---|---|---|
| .. | ||
| benchmark | 正在载入提交数据... | |
| clib | 正在载入提交数据... | |
| config | 正在载入提交数据... | |
| criterions | 正在载入提交数据... | |
| data | 正在载入提交数据... | |
| dataclass | 正在载入提交数据... | |
| distributed | 正在载入提交数据... | |
| logging | 正在载入提交数据... | |
| model_parallel | 正在载入提交数据... | |
| models | 正在载入提交数据... | |
| modules | 正在载入提交数据... | |
| optim | 正在载入提交数据... | |
| scoring | 正在载入提交数据... | |
| tasks | 正在载入提交数据... | |
| __init__.py | 正在载入提交数据... | |
| binarizer.py | 正在载入提交数据... | |
| checkpoint_utils.py | 正在载入提交数据... | |
| file_io.py | 正在载入提交数据... | |
| file_utils.py | 正在载入提交数据... | |
| hub_utils.py | 正在载入提交数据... | |
| incremental_decoding_utils.py | 正在载入提交数据... | |
| iterative_refinement_generator.py | 正在载入提交数据... | |
| nan_detector.py | 正在载入提交数据... | |
| ngram_repeat_block.py | 正在载入提交数据... | |
| options.py | 正在载入提交数据... | |
| pdb.py | 正在载入提交数据... | |
| quantization_utils.py | 正在载入提交数据... | |
| registry.py | 正在载入提交数据... | |
| search.py | 正在载入提交数据... | |
| sequence_generator.py | 正在载入提交数据... | |
| sequence_scorer.py | 正在载入提交数据... | |
| token_generation_constraints.py | 正在载入提交数据... | |
| tokenizer.py | 正在载入提交数据... | |
| trainer.py | 正在载入提交数据... | |
| utils.py | 正在载入提交数据... | |
| version.txt | 正在载入提交数据... |