Abstract: Knowledge Distillation (KD) is a widely used model compression technique that primarily transfers knowledge by aligning the predictions of a student model with those of a teacher model.