Real-world LLM benchmarks testing Speed, Quality, and Responsiveness HARD MODE ACTIVE
| CMP | RK | PROVIDER | MODEL | SCORE | SPEED | TPS | TTFT | COST | JSON | LOGIC | CODE | OUTPUT (PREVIEW) | LINK |
|---|
The Braiain Speed Index has evolved. Standard benchmarks have become too easy for SOTA models, resulting in score saturation. Protocol v5.0 ("Hard Mode") introduces strict constraints, RAG-simulation, and data processing tasks designed to break "lazy" models.
Requests begin with a dense, multi-format payload containing Server Logs mixed with Narrative Text. This forces the model to perform "Needle in a Haystack" retrieval and context switching before generation begins.
Models must complete three distinct tasks in a single pass. Unlike previous versions, these tasks now include strict negative constraints ("Simon Says" rules) that prevent memorization.
This entire benchmark is open-source. Clone the repo, add your API keys, and run `python benchmark.py` locally. Compare your results to ours and see if geographic location affects performance.
📦 VIEW ON GITHUB