This codebase implements an intuitive pipeline for utilizing Large Language Models (LLMs) to evaluate the quality of Multiple Choice Questions (MCQs) by rating them against a set of quality criteria.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results