Ideally, tensor.triu_(1) should fill lower triangular part with 0. However, it fails to do so when the matrix is large. For example: q_len = 100000 causal_mask ...