A CUDA C/C++ implementation comparing different loop unrolling strategies for matrix multiplication on GPU. This project demonstrates the performance impact of various loop unrolling factors (2, 4, 8, ...