Large Language Models Benchmarks

Measuring What Matters in Large Language Model Performance

As large language models (LLMs) gain momentum worldwide, there’s a growing need for reliable ways to measure their performance. Benchmarks that evaluate LLM outputs allow developers to track ...

The Phnom Penh Post

How benchmarks shape AI battlefield—and where South Korea’s models stand

SEOUL – AI has swept across the tech industry, powering chatbots, search engines and productivity tools. OpenAI’s ChatGPT — which first ignited the global buzz in November 2022 — and other big tech ...

Earth.com

AI can feign moral reasoning by repeating online language patterns

Scientists warn that current AI tests reward polite responses rather than real moral reasoning in large language models.

moneycontrol.com

Sarvam AI launches 30B and 105B models, says 105B outperforms DeepSeek R1 and Gemini Flash ...

Did our AI summary help? Bengaluru-based AI startup Sarvam AI on February 18 announced the launch of two new large language models, a 30-billion-parameter model and a 105-billion-parameter model, both ...

15 日on MSN

Scientists found AI’s fatal flaw—the most advanced models are failing basic logic tests

Identifying vulnerabilities is good for public safety, industry, and the scientists making these models.

15 日on MSN

Sarvam unveils two new large language models focused on real-time use, advanced reasoning

The company said the model is optimised for “efficient thinking”, delivering stronger responses while using fewer tokens — a key factor in reducing inference costs in production environments.

一部の結果でアクセス不可の可能性があるため、非表示になっています。

アクセス不可の結果を表示する