LLM Test - Pangram Substitution Puzzle
Test and benchmark large language models (LLMs) with our interactive pangram-substitution puzzle. Measure LLM reasoning, prompt handling, and problem-solving skills in real time with a browser-based AI evaluation tool that doubles as a fun logic game.
Jan 13, 2026 v0.1.1The quick fox pangram lookup test. Create a language that maps real words to words within the target word list. Then ask the LLM to figure out the sentence and then fill in the missing word(s).
| Make | Level | |||
|---|---|---|---|---|
| ChatGPT | 4 | +1 | ||
| 5 | +2 | |||
| 5.1 | +0 | |||
| Gemini | 2 | +1 | ||
| 2.5 | +2 | |||
| 3 | +2 | |||
| Grok | 4 | +3 | ||
| Kimi | k2 | +3 | ||
| MiniMax | M2.7 | +0 | ||
| Sonnet | 4 | Unable | -1 |
Levels measure the reasoning models. On auto setting (no super pro plans). And score represents total reasoning capacity. It does not measure cost. It also does not account for extra computations; while some models may have access to tools, others may not. Test results were taken via the chat app interface.
Email llmtest@snowdon.dev to sign up to the newsletter containing periodic results.