There is some research being done with fine tuning 1-bit quants, and they seem pretty responsive to it. Of course you’ll never get a full generalist model out of it, but there’s some hope for tiny specialized models that can run on CPU for a fraction of the energy bill.
The big models are great marketing because their verbal output is believable, but they’re grossly overkill for most tasks.
Try using a 1-bit LLM to test the article’s claim.
The perplexity loss is staggering. It’s like 75% accuracy lost or more. It turns a 30 billion parameter model into a 7 billion parameter model.
Highly recommended that you try to replicate their results.
But since it takes 10% of the space (vram, etc.) sounds like they could just start with a larger model and still come out ahead
There is some research being done with fine tuning 1-bit quants, and they seem pretty responsive to it. Of course you’ll never get a full generalist model out of it, but there’s some hope for tiny specialized models that can run on CPU for a fraction of the energy bill.
The big models are great marketing because their verbal output is believable, but they’re grossly overkill for most tasks.