LLM Test - Pangram Substitution Puzzle
Benchmark and test large language models (LLMs) with this unique Anti-Cryptography puzzle. Measure LLM reasoning, prompt handling, and problem-solving skills in real time with a browser-based AI evaluation tool. Try it yourself to see how your favorite models stack up against a true logic challenge!
Jun 3, 2026 v0.1.2Make sure to pick a unique seed number to ensure the LLM has not
already solved the puzzle.
Waiting for a response...
The LLM has read the puzzle...
Generating new tests are disabled until the LLM has responded.Check The Answer
Can the artificial intelligence reason?
If the AI managed to answer correctly. Remember to give it a unique seed.
Thanks for the info!
Thanks for the info!
The quick fox pangram lookup test. Create a language that maps real words to words within the target word list. Then ask the LLM to figure out the sentence and then fill in the missing word(s).
| Make | Level | |||
|---|---|---|---|---|
| ChatGPT | 4 | +1 | ||
| 5 | +2 | |||
| 5.1 | +0 | |||
| 5.5 | +0 | |||
| Gemini | 2 | +1 | ||
| 2.5 | +2 | |||
| 3 | +2 | |||
| 3.5 | -2 | |||
| Grok | 4 | +3 | ||
| Kimi | k2 | +3 | ||
| MiniMax | M2.7 | +0 | ||
| Sonnet | 4 | Unable | -1 |
Levels measure the reasoning models. On auto setting (no super pro plans).
And score represents total reasoning capacity. It does not measure cost. It also does not account for extra computations; while some models may have access to tools, others may not. Test results were taken via the chat app interface.
Now with Web MCP intergration. Just ask your Web LLM to solve the current puzzle.
AI Answer Prevented
The LLM/AI tried to answer a question without reading the puzzle.