No. It literally cannot count the number of R letters in strawberry. It says 2, there are 3. ChatGPT had this problem, but it seems it is fixed. However if you say “are you sure?” It says 2 again.
Ask ChatGPT to make an image of a cat without a tail. Impossible. Odd, I know, but one of those weird AI issues
so… with all the supposed reasoning stuff they can do, and supposed “extrapolation of knowledge” they cannot figure out that a tail is part of a cat, and which part it is.
Is this some meme?
Non thinking prediction models can’t count the r’s in strawberry due to the nature of tokenization.
However openai o1 and deep seek r1 can both reliably do it correctly
No. It literally cannot count the number of R letters in strawberry. It says 2, there are 3. ChatGPT had this problem, but it seems it is fixed. However if you say “are you sure?” It says 2 again.
Ask ChatGPT to make an image of a cat without a tail. Impossible. Odd, I know, but one of those weird AI issues
I mean I tested it out, even tbough I am sure your trolling me and DeepSeek correctly counts the R’s
Because there aren’t enough pictures of tail-less cats out there to train on.
It’s literally impossible for it to give you a cat with no tail because it can’t find enough to copy and ends up regurgitating cats with tails.
Same for a glass of water spilling over, it can’t show you an overfilled glass of water because there aren’t enough pictures available for it to copy.
This is why telling a chatbot to generate a picture for you will never be a real replacement for an artist who can draw what you ask them to.
so… with all the supposed reasoning stuff they can do, and supposed “extrapolation of knowledge” they cannot figure out that a tail is part of a cat, and which part it is.
The “reasoning” models and the image generation models are not the same technology and shouldn’t be compared against the same baseline.
The “reasoning” you are seeing is it finding human conversations online, and summerizing them
I’m not seeing any reasoning, that was the point of my comment. That’s why I said “supposed”
Oh, that’s another good test. It definitely failed.
There are lots of Manx photos though.
Manx images: https://duckduckgo.com/?q=manx&iax=images&ia=images