Simple questions too hard?

#22
by urtuuuu - opened

I wonder why this latest top model that has top chatbot arena score, fails at old simple questions like:

  1. Making one candle requires 125 grams of wax and 1 wick. How many candles can I make with 500 grams of wax and 3 wicks? Be concise.
  2. I have 50 books. I read 7 of them, how many books i have left?

That's just two examples from my test questions, but there are probably many more.

Gemini 2.0 Flash can't answer this question either
image.png

I mean, all this time i thought this question isn't hard at all. Almost all small models can answer it, probably even 3B or 4B models.

Q: Making one candle requires 125 grams of wax and 1 wick. How many candles can I make with 500 grams of wax and 3 wicks? Reason step by step. Then end the answer with a single digit.

A:
Here's how to solve this problem step-by-step:

  1. Calculate how many candles can be made with the wax: 500 grams of wax / 125 grams/candle = 4 candles
  2. Calculate how many candles can be made with the wicks: 3 wicks / 1 wick/candle = 3 candles
  3. Determine the limiting factor: You can make 4 candles with the wax, but only 3 with the wicks. Therefore, the wicks are the limiting factor.

You can make 3 candles.

3

I've just tried the first question on Gemini 2.0 Flash and DeepSeek, both of them answered the question correctly (DeepSeek wasn't concise though).
The second question sounds ambiguous, this is, as an example, an answer form Perplexity.

perplexity.jpg

"Be concise." throws it off, if you don't add that it answers just fine. Example of bad prompting. Probably way more examples of bad prompts to add to this.

"Be concise." throws it off, if you don't add that it answers just fine. Example of bad prompting. Probably way more examples of bad prompts to add to this.

It wasn't me who created this prompt, but all the models I tested in recent years usually answered correctly.
I just asked Phi-4, llama 3.1 8B, qwen 2.5 7B= all answered 3.
Second question about books, qwen turbo, 7b, 14b all answer 50 books left but 43 unread.
And of course all big models won't fail first question.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment