Point of pedantry- the Nano uses a Tegra X1 as its SoC. It has a Maxwell generation GPU, not Kepler.
The new Jetson Orin Nano uses an Ampere GPU.
Point of pedantry- the Nano uses a Tegra X1 as its SoC. It has a Maxwell generation GPU, not Kepler.
The new Jetson Orin Nano uses an Ampere GPU.
It depends.
What is your budget? And what hardware/hypervisor do you have?
And what specifically are you looking to do with “generative AI?” Ugh…I hate that term.
There are two key things to keep in mind about rack-mount GPUs. First, you need servers that are specifically built to host most GPUs in the factory. Almost all of NVIDIA’s server-grade GPUs are passively cooled, so the servers need to have a fan configuration to cool the GPUs. And except for the lowest end server GPUs (P4/T4/A2/L4 - all Inference cards and over $1000 per card) which draw less than the 75 watts provided by the PCI slot, all of the GPUs require at least 150 watts, molex power connectors and higher wattage power supplies.
And most of the drivers and docker/kubernetes plugins for these GPUs are locked behind NVIDIA licensing.
You’d want something that is at least Pascal-generation, but the Turing or newer cards are better.
Your better bet is to get a rack-mount workstation (which is basically a server anyway) and stick a higher-end Quadro or GeForce 30x0 card in there.
Edit: I never answered what I have - an R730 factory built for GPUs with a pair of Tesla P4 cards. I originally built it to play with GPUs for VDI.
It’s all of those. I’m not voting because I can’t decide what the biggest reason is, and there is no “two or more of the above” options to pick from.
The only one I don’t think applies to me is “Better Services” as many of the open-source/self-hosted solutions aren’t necessarily better than commercial options. Or they’re missing SSO support in the free/freemium/open-source tier.