Self-hosting LLMs

GreenSofaBed@lemmy.zip · 11 days ago

Self-hosting LLMs

The Hobbyist@lemmy.zip · 11 days ago

I didn’t say it can’t. But I’m not sure how well it is optimized for it. From my initial testing it queues queries and submits them one after another to the model, I have not seen it batch compute the queries, but maybe it’s a setup thing on my side. vLLM on the other hand is designed specifically for the multi co current user use case and has multiple optimizations for it.

Avid Amoeba@lemmy.ca · 10 days ago

I see. Makes sense.