This repository contains all the necessary code and scripts to deploy a huggingface retrieval model such as multilingual-e5-large using NVIDIA's Triton Inference Server. The guide covers every step ...
The first step is to create the most appropriate module structure: In this case, all modules ending in _api are abstract, and their implementation is described in modules without the _api postfix.