I would like you to explain in detail what happens inside the SA layer during error backpropagation in GPT model backpropagation adjustment. Cl4o: The backpropagation ...
In the current GPT architecture, is what SA maintains as short-term memory the K layer and V layer? Or is it the RC layer immediately before? Backpropagation adjustment requires the RC layer, right?
一部の結果でアクセス不可の可能性があるため、非表示になっています。
アクセス不可の結果を表示する