Selfhosted LLM (ChatGPT)

autopilot@lemmy.world · edit-2 1 year ago

Selfhosted LLM (ChatGPT)

CeeBee@lemmy.world · edit-2 1 year ago

The best/easiest way to get started with a self-hosted LLM is to check out this repo:

https://github.com/oobabooga/text-generation-webui

Its goal is to be the Automatic1111 of text generators, and it does a fair job at it.

A good model that’s said to rival gpt-3.5 is the new Falcon model. The full sized version is too big to run on a single GPU, but the 7b version “only” needs about 16GB.

https://huggingface.co/tiiuae/falcon-7b

There’s also the Wizard-uncensored model that is popular.

https://huggingface.co/ehartford/Wizard-Vicuna-13B-Uncensored

There are a ton of models out there with new ones popping up every day. You just need to search around. The oobabooga repo has a few models linked in the readme also.

Edit: there’s also h20gpt, which seems really promising. I’m going to try it out in the next couple days.

https://github.com/h2oai/h2ogpt

pe1uca@lemmy.pe1uca.dev · 1 year ago

How do you know how much ram the model needs?

redcalcium@c.calciumlabs.com · edit-2 1 year ago

The model creator usually mentioned it in the readme:

You will need at least 16GB of memory to swiftly run inference with Falcon-7B.

Usually the models support CPU inference. Tremendously slow but works in a pinch.

laenurd@lemmy.lemist.de · 1 year ago

Note that when using llama-derived models, such as vicuna, you are bound by their license to only use them for “research” purposes.

If you want an unrestricted version, go for open-llama or RedPajama.

Falcon is less restrictive and only wants a cut of profits if they exceed 1 million dollars, but I’d wager that fully unrestricted is the way to go.

beesthetrees@feddit.uk · 1 year ago

Falcon has switched to Apache 2.0 and removed the commercial limit.

laenurd@lemmy.lemist.de · 1 year ago

Sorry, I must’ve missed that somehow, then my comment only applies to llama and its direct derivates.