It’s not just a predictive text program. That’s been around for decades. That’s a common misconception.
As I understand it, it uses statistics from the whole text to create new text. It would be very rare to output “cats have feathers” because that phrase doesn’t ever appear in the training data. Both words “have feathers” never follow “cats”.
and that is exactly how a predictive text algorithm works.
some tokens go in
they are processed by a deterministic, static statistical model, and a set of probabilities (always the same, deterministic, remember?) comes out.
pick the word with the highest probability, add it to your initial string and start over.
if you want variety, add some randomness and don’t just always pick the most probable next token.
Coincidentally, this is exactly how llms work. It’s a big markov chain, but with a novel lossy compression algorithm on its state transition table. The last point is also the reason why, if anyone says they can fix llm hallucinations, they’re lying.
because that phrase doesn’t ever appear in the training data.
Eh but LLMs abstract. It has seen “<animal> have feathers” and “<animal> have fur” quite a lot of times. The problem isn’t that LLMs can’t reason at all, the problem is that they do employ techniques used in proper reasoning, in particular tracking context throughout the text (cross-attention) but lack techniques necessary for the whole thing, instead relying on confabulation to sound convincing regardless of the BS they spout. Suffices to emulate an Etonian but that’s not a high standard.
It’s not just a predictive text program. That’s been around for decades. That’s a common misconception.
As I understand it, it uses statistics from the whole text to create new text. It would be very rare to output “cats have feathers” because that phrase doesn’t ever appear in the training data. Both words “have feathers” never follow “cats”.
and that is exactly how a predictive text algorithm works.
some tokens go in
they are processed by a deterministic, static statistical model, and a set of probabilities (always the same, deterministic, remember?) comes out.
pick the word with the highest probability, add it to your initial string and start over.
if you want variety, add some randomness and don’t just always pick the most probable next token.
Coincidentally, this is exactly how llms work. It’s a big markov chain, but with a novel lossy compression algorithm on its state transition table. The last point is also the reason why, if anyone says they can fix llm hallucinations, they’re lying.
Everyone who says this doesn’t actually understand how LLMs work.
Multivector word embeddings create emergent relationships that’s new knowledge that doesn’t exist in the training dataset.
Computerphile did a good video on this well before the LLM craze.
Your “ackshyually” is missing the point.
Eh but LLMs abstract. It has seen “<animal> have feathers” and “<animal> have fur” quite a lot of times. The problem isn’t that LLMs can’t reason at all, the problem is that they do employ techniques used in proper reasoning, in particular tracking context throughout the text (cross-attention) but lack techniques necessary for the whole thing, instead relying on confabulation to sound convincing regardless of the BS they spout. Suffices to emulate an Etonian but that’s not a high standard.