- cross-posted to:
- apple_enthusiast@lemmy.world
- cross-posted to:
- apple_enthusiast@lemmy.world
They are large LANGUAGE models. It’s no surprise that they can’t solve those mathematical problems in the study. They are trained for text production. We already knew that they were no good in counting things.
“You see this fish? Well, it SUCKS at climbing trees.”
That’s not how you sell fish though. You gotta emphasize how at one time we were all basically fish and if you buy my fish for long enough, those fish will eventually evolve hands to climb!
“Premium fish for sale: GUARANTEED to never climb your trees”
The fun part isn’t even what Apple said - that the emperor is naked - but why it’s doing it. It’s nice bullet against all four of its GAFAM competitors.
They’re a publicly traded company.
Their executives need something to point to to be able to push back against pressure to jump on the trend.
This right here, this isn’t conscientious analysis of tech and intellectual honesty or whatever, it’s a calculated shot at it’s competitors who are desperately trying to prevent the generative AI market house of cards from falling
The results of this new GSM-Symbolic paper aren’t completely new in the world of AI research. Other recent papers have similarly suggested that LLMs don’t actually perform formal reasoning and instead mimic it with probabilistic pattern-matching of the closest similar data seen in their vast training sets.
WTF kind of reporting is this, though? None of this is recent or new at all, like in the slightest. I am shit at math, but have a high level understanding of statistical modeling concepts mostly as of a decade ago, and even I knew this. I recall a stats PHD describing models as “stochastic parrots”; nothing more than probabilistic mimicry. It was obviously no different the instant LLM’s came on the scene. If only tech journalists bothered to do a superficial amount of research, instead of being spoon fed spin from tech bros with a profit motive…
If only tech journalists bothered to do a superficial amount of research, instead of being spoon fed spin from tech bros with a profit motive…
This is outrageous! I mean the pure gall of suggesting journalists should be something other than part of a human centipede!
describing models as “stochastic parrots”
That is SUCH a good description.
Are the uncensored models more capable tho?
Given the use cases they were benchmarking I would be very surprised if they were any better.
statistical engine suggesting words that sound like they’d probably be correct is bad at reasoning
How can this be??
Totally unexpectable!!!
antianticipatable!
astonisurprising!
I feel like a draft landed on Tim’s desk a few weeks ago, explains why they suddenly pulled back on OpenAI funding.
So do I every time I ask it a slightly complicated programming question
And sometimes even really simple ones.
Did anyone believe they had the ability to reason?
People are stupid OK? I’ve had people who think that it can in fact do math, “better than a calculator”
Yes
The tested LLMs fared much worse, though, when the Apple researchers modified the GSM-Symbolic benchmark by adding “seemingly relevant but ultimately inconsequential statements” to the questions
Good thing they’re being trained on random posts and comments on the internet, which are known for being succinct and accurate.
Here’s the cycle we’ve gone through multiple times and are currently in:
AI winter (low research funding) -> incremental scientific advancement -> breakthrough for new capabilities from multiple incremental advancements to the scientific models over time building on each other (expert systems, LLMs, neutral networks, etc) -> engineering creates new tech products/frameworks/services based on new science -> hype for new tech creates sales and economic activity, research funding, subsidies etc -> (for LLMs we’re here) people become familiar with new tech capabilities and limitations through use -> hype spending bubble bursts when overspend doesn’t keep up with infinite money line goes up or new research breakthroughs -> AI winter -> etc…
Someone needs to pull the plug on all of that stuff.