vision-encoder/ ├── models/ # encoder loading & config │ ├── mae.py │ ├── clip.py │ ├── dino.py │ ├── ... │ └── registry.py # unified model registry ├── wrappers/ # normalize encoder outputs to common ...
I had another question about the details of the vision encoder. To my understanding, when training llava, you freeze the vision encoder and only train the projection layer that maps the latent feature ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results