When reading, it has been demonstrated that switching from a sentence primarily describing information in one modality to text describing information in another modality leads to an increase in ...
[Figure 2 and 3 in the paper] Generate multimodal data (data generation inspired from here), and apply cross-modal KD. [Table 2 in the paper] Modify γ in data, and observe the performance differences ...