figure-Structure of the network during Transformer training.tex 6.15 KB