What is this? MT-Bench is a benchmark of 80 challenging multi-turn questions spanning 8 categories (writing, roleplay, reasoning, math, coding, extraction, STEM, humanities). This pipeline automates ...
⚠️ Run this in Docker or a Virtual Machine! You're welcome to disregard this message, but if you do and the AI decides that the best course of action for its task is to build a command to format your ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results