Obviously there’s not a lot of love for OpenAI and other corporate API generative AI here, but how does the community feel about self hosted models? Especially stuff like the Linux Foundation’s Open Model Initiative?
I feel like a lot of people just don’t know there are Apache/CC-BY-NC licensed “AI” they can run on sane desktops, right now, that are incredible. I’m thinking of the most recent Command-R, specifically. I can run it on one GPU, and it blows expensive API models away, and it’s mine to use.
And there are efforts to kill the power cost of inference and training with stuff like matrix-multiplication free models, open source and legally licensed datasets, cheap training… and OpenAI and such want to shut down all of this because it breaks their monopoly, where they can just outspend everyone scaling , stealiing data and destroying the planet. And it’s actually a threat to them.
Again, I feel like corporate social media vs fediverse is a good anology, where one is kinda destroying the planet and the other, while still niche, problematic and a WIP, kills a lot of the downsides.
I love the idea, I much prefer it to the mainstream. The problem is, the typical process of documenting FOSS and self-host projects (websites, wiki, mailing lists, etc) move too slow and are too cumbersome for how quick things are developing right now. So people are kind of having to invent the new tech a d new ways to communicate about it, and they’re not always making choices that either scale or are easy to find and reference.
Okay, since you seem to be so helpful here, I’ll lay out where I’m at. I’ve been using LLMs like ChatGPT, Copilot, and Bard more professionally. I find them equal parts useful, confusing, annoying, and skeevey. I’ve got a lil VPS I run for services, I could put a front end on there easy. I’ve also got an old 8core Xeon machine with like 48GB ram and a leftover AMD R9 270 sitting there with Unraid barely installed. I can chamge the OS of course, but what am I realistically looking at being able to run locally that won’t go above like 60-75% usage so I can still eventually get a couple game servers, network storage, and Jellyfin working? I’ll be honest I don’t care about image generation much, but if I do I can always look into upgrading
Honestly, not much. Llama 8B, but very slowly, or maybe deepseek v2 chat, preprocessed on the 270 with vulkan but mostly running on CPU. And I guess just limit it to 6 threads? I’d host it with kobold.cpp vulkan, or maybe the llama.cpp server if there will be multiple users.
You can try them to see if they feel OK, but llms are just not something that like old hardware. An RTX 3060 (or a Mac, or a 12GB+ AMD GPU) is considered bare minimum in the community, a 3090 or 7900 XTX standard.