getting started in hard mode with very wimpy hardware

ThorrJo@lemmy.sdf.org · 13 days ago

getting started in hard mode with very wimpy hardware

SmokeyDope@lemmy.world · edit-2 13 days ago

Hey there ThorrJo welcome to our community.

I recommend you use kobold.cpp as your first inference engine of choice as its very easy to get running especially on Linux. Since you have no GPU you don’t need to worry about CUDA or Vulcan for offloading.

https://github.com/LostRuins/koboldcpp/

Read the kobold wiki section for vision model projection. For the image recognition model itself I recommend you use Nvidia Cosmos finetune of Qwen2.5-VL. Make sure to load the qwen2.5vl mmproj lens that kobold links along with the model.

https://github.com/LostRuins/koboldcpp/wiki#what-is-llava-and-mmproj

https://huggingface.co/koboldcpp/mmproj/tree/main

https://huggingface.co/mradermacher/Cosmos-Reason1-7B-i1-GGUF

.GGUF I linked are already pre-quantized, you should be able to load the biggest quant available and the f16 mmproj on your 48gb ram easy with lots of context allocation room left.

Allocate as much context size as you can. larger high resolution images take more input context to process.

For troubleshooting if its replies are wonky try changing chat template first I forget if its ChatML or something else. You can try adjusting sampler size too.

Kobold.CPP runs a web interface you can connect to through the browser on multiple devices. It also exposes its backend through openai-compatable api so you can write your own custom apps for send and receive or use kobold with other frontend software thats compatable with corporate APIs like tinychat if you want to go further.

If you have any specific questions or need help feel free to reach out :)

afk_strats@lemmy.world · 13 days ago

Chiming in to say this is a very reasonable starting place and wanted to highlight to op that this solution is 100% self-hosted

Tobberone@slrpnk.net · 12 days ago

I’m a beginner myself, and while I do have a GPU (unsure how much that speeds up things) I have found the qwen3-coder has been almost a cheatcode when problem solving the various issues that otherwise would have me search different forums for hours.

ThorrJo@lemmy.sdf.org · 13 days ago

Thank you so much for this detailed starting point!! ❤️