I have an GTX 1660 Super (6 GB)
Right now I have ollama with:
- deepseek-r1:8b
- qwen2.5-coder:7b
Do you recommend any other local models to play with my GPU?
Deepseek is good at reasoning, qwen is good at programming, but I find llama3.1 8b to be well suited for creativity, writing, translations and other tasks which fall out of the scope of your two models. It’s a decent all arounder. It’s about 4.9GB in q4_K_M.
It’s not out of my scope, I’m just learning what can I do locally with my current machine.
Today I read about RAG, maybe I’m gonna try an easy local setup to chat with a PDF.
Mistral
I personally run models on my laptop. I have 48 GB of ram and a i5-12500U. It runs a little slow but usable
My gear is an old:
I7-4790 16GB RAM
How many tokens by second?
The biggest bottleneck is going to be memory. I would just stick with GPU only since your GPU memory has the most bandwidth.