LanguageBind is a language-centric multimodal pretraining approach, taking the language as the bind across different modalities because the language modality is well-explored and contains rich ...
Thanks for releasing the codes! I was reading your paper, but still have some questions about LanguageBind used in Video-LLaVA: Were the weights of the image/video encoder initialized from ...