| Name |
Last commit
|
Last Update |
|---|---|---|
| docs | ||
| egs | ||
| examples | ||
| fairseq | ||
| fairseq_cli | ||
| scripts | ||
| tests | ||
| .gitignore | ||
| .gitmodules | ||
| CODE_OF_CONDUCT.md | ||
| CONTRIBUTING.md | ||
| Fairseq-README.md | ||
| LICENSE | ||
| README.md | ||
| hubconf.py | ||
| pyproject.toml | ||
| setup.py | ||
| train.py |
It must be said that some problems still confuse me: 1. Whether to scale in the input layer (I try to replace it with layer specification); 2. The detailed setting of weight sharing between output projection matrix and embedding matrix in the adapter (I notice that inconsistent variance will lead to bad results); 3. The biggest confusion is that the variance increases with the calculation layer by layer (I am not sure if this phenomenon is reasonable, I will compare the behavior on the latest code). Finally, the detailed implementation is so important to the final performance, even if it is a subtle difference.
| Name |
Last commit
|
Last Update |
|---|---|---|
| docs | 正在载入提交数据... | |
| egs | 正在载入提交数据... | |
| examples | 正在载入提交数据... | |
| fairseq | 正在载入提交数据... | |
| fairseq_cli | 正在载入提交数据... | |
| scripts | 正在载入提交数据... | |
| tests | 正在载入提交数据... | |
| .gitignore | 正在载入提交数据... | |
| .gitmodules | 正在载入提交数据... | |
| CODE_OF_CONDUCT.md | 正在载入提交数据... | |
| CONTRIBUTING.md | 正在载入提交数据... | |
| Fairseq-README.md | 正在载入提交数据... | |
| LICENSE | 正在载入提交数据... | |
| README.md | 正在载入提交数据... | |
| hubconf.py | 正在载入提交数据... | |
| pyproject.toml | 正在载入提交数据... | |
| setup.py | 正在载入提交数据... | |
| train.py | 正在载入提交数据... |