Large Language Model Testing Tool

The definitive proof AI can't reason. A browser-based tool for testing and evaluating large language models in real-time.

Jul 11, 2025

The quick brown fox panagram lookup test. Create a language that maps real words to words within the target word list. Then ask the LLM to figure out the sentence and then fill in the last word. Make sure to pick a unique seed number to ensure that the engine has not already read the solved puzzle.

Generate the output and pass on to a large language model, and then check the result.

While the input and seeding is deterministic, be careful as the word list may change if you reload the page and the server refreshes the list.

Over time, LLMs learn to answer correctly with common inputs. However, they struggle to produce the desired output with less common seeds. Utilizing Deep Research, which incorporates search and tool capabilities, can help the LLM find the answer. For instance, searches based on a phrase like "The quick brown fox..." will directly yield the sentence, indicating the LLM can retrieve a correct result without requiring significant intelligence. Conversely, prompts types of 'think longer,' where the LLM is run repeatedly, demand genuine thinking or complex reasoning and therefore often fail.

Test Config

Expected answer

Check The Answer

Generated LLM test
      The test will appear here...

The code can be found at the repository: github.com/snowdon-dev/node-llm-test