Simple questions too hard?

#22

by urtuuuu - opened 3 days ago

Discussion

urtuuuu

3 days ago

I wonder why this latest top model that has top chatbot arena score, fails at old simple questions like:

Making one candle requires 125 grams of wax and 1 wick. How many candles can I make with 500 grams of wax and 3 wicks? Be concise.
I have 50 books. I read 7 of them, how many books i have left?

That's just two examples from my test questions, but there are probably many more.

moekuro

2 days ago

Gemini 2.0 Flash can't answer this question either

urtuuuu

2 days ago

•

edited 2 days ago

I mean, all this time i thought this question isn't hard at all. Almost all small models can answer it, probably even 3B or 4B models.

hoogenm

1 day ago

Q: Making one candle requires 125 grams of wax and 1 wick. How many candles can I make with 500 grams of wax and 3 wicks? Reason step by step. Then end the answer with a single digit.

A:
Here's how to solve this problem step-by-step:

Calculate how many candles can be made with the wax: 500 grams of wax / 125 grams/candle = 4 candles
Calculate how many candles can be made with the wicks: 3 wicks / 1 wick/candle = 3 candles
Determine the limiting factor: You can make 4 candles with the wax, but only 3 with the wicks. Therefore, the wicks are the limiting factor.

You can make 3 candles.

segaa

1 day ago

I've just tried the first question on Gemini 2.0 Flash and DeepSeek, both of them answered the question correctly (DeepSeek wasn't concise though).
The second question sounds ambiguous, this is, as an example, an answer form Perplexity.

Satya93

1 day ago

"Be concise." throws it off, if you don't add that it answers just fine. Example of bad prompting. Probably way more examples of bad prompts to add to this.

urtuuuu

about 19 hours ago

•

edited about 19 hours ago

"Be concise." throws it off, if you don't add that it answers just fine. Example of bad prompting. Probably way more examples of bad prompts to add to this.

It wasn't me who created this prompt, but all the models I tested in recent years usually answered correctly.
I just asked Phi-4, llama 3.1 8B, qwen 2.5 7B= all answered 3.
Second question about books, qwen turbo, 7b, 14b all answer 50 books left but 43 unread.
And of course all big models won't fail first question.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment