Modelbench Tutorial - Search News

Run safety benchmarks against AI models and view detailed reports showing how well they performed.

The current public practice benchmark uses LlamaGuard to evaluate the safety of responses. For now you will need a Together AI account to use it. For 1.0, we test models on a variety of services; if ...

GitHub

Run safety benchmarks against AI models and view detailed reports showing how well they performed.

This is a MLCommons project, part of the AI Risk & Reliability Working Group. The project is at an early stage. You can see sample benchmarks here and our 0.5 white paper here. This project now ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Run safety benchmarks against AI models and view detailed reports showing how well they performed.

Run safety benchmarks against AI models and view detailed reports showing how well they performed.

Trending now