FriendliAI — founded by the researcher behind continuous batching, the technique at the core of vLLM — is launching ...
Think of continuous batching as the LLM world’s turbocharger — keeping GPUs busy nonstop and cranking out results up to 20x faster. I discussed how PagedAttention cracked the code on LLM memory chaos ...