Abstract: Computing fast Fourier transform (FFT) on parallel computers has the same communication requirement to transpose matrices one or more times. In this paper, we propose an efficient algorithm ...