Custom inference logic 2× faster than FastAPI Agents, RAG, pipelines, more Custom logic + control Any PyTorch model Self-host or managed Multi-GPU autoscaling Batching + streaming BYO model or vLLM No ...