1. 07 Sep, 2022 1 commit
  2. 06 Sep, 2022 2 commits
  3. 04 Sep, 2022 3 commits
  4. 02 Sep, 2022 1 commit
  5. 31 Aug, 2022 1 commit
  6. 30 Aug, 2022 1 commit
  7. 26 Aug, 2022 1 commit
  8. 22 Aug, 2022 1 commit
  9. 27 Jul, 2022 1 commit
  10. 26 Jul, 2022 1 commit
  11. 25 Jul, 2022 3 commits
  12. 19 Jul, 2022 3 commits
  13. 12 Jul, 2022 2 commits
  14. 01 Jun, 2022 1 commit
  15. 27 May, 2022 1 commit
  16. 25 May, 2022 1 commit
  17. 24 May, 2022 1 commit
    • I optimized the implementation of S2T. · 380d7794
      It must be said that some problems still confuse me:
      1. Whether to scale in the input layer (I try to replace it with layer specification);
      2. The detailed setting of weight sharing between output projection matrix and embedding matrix in the adapter (I notice that inconsistent variance will lead to bad results);
      3. The biggest confusion is that the variance increases with the calculation layer by layer (I am not sure if this phenomenon is reasonable, I will compare the behavior on the latest code).
      Finally, the detailed implementation is so important to the final performance, even if it is a subtle difference.
      xuchen committed
  18. 13 May, 2022 1 commit
  19. 12 May, 2022 1 commit
  20. 06 May, 2022 1 commit
  21. 06 Apr, 2022 1 commit
  22. 30 Mar, 2022 1 commit
  23. 18 Mar, 2022 1 commit
  24. 13 Mar, 2022 1 commit
  25. 10 Mar, 2022 3 commits
  26. 09 Mar, 2022 1 commit
  27. 04 Mar, 2022 1 commit
  28. 01 Mar, 2022 1 commit
  29. 28 Feb, 2022 1 commit
  30. 24 Feb, 2022 1 commit