Modelbench Tutorial Texture

Run safety benchmarks against AI models and view detailed reports showing how well they performed.

The current public practice benchmark uses LlamaGuard to evaluate the safety of responses. For now you will need a Together AI account to use it. For 1.0, we test models on a variety of services; if ...

GitHub

add-a-sut.md

Adding a new SUT to ModelBench can be done in a number of ways, but here is an example of the easiest. In this example, the assumption is that you want to create your own SUT -- a process that is ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Run safety benchmarks against AI models and view detailed reports showing how well they performed.

add-a-sut.md

Trending now