I would like you to explain in detail what happens inside the SA layer during error backpropagation in GPT model backpropagation adjustment. Cl4o: The backpropagation ...
In the current GPT architecture, is what SA maintains as short-term memory the K layer and V layer? Or is it the RC layer immediately before? Backpropagation adjustment requires the RC layer, right?