Mistral.rs attempts to automatically load a chat template from the tokenizer_config.json file. This enables high flexibility across instruction-tuned models and ensures accurate chat templating.
In situ quantization works by quantizing non GGUF or GGML models in-place. This allows you to take advantage of flash attention, and reduces memory footprint when running the model. Currently, all ...