Trying something new, going to pin this thread as a place for beginners to ask what may or may not be stupid questions, to encourage both the asking and answering.

Depending on activity level I’ll either make a new one once in awhile or I’ll just leave this one up forever to be a place to learn and ask.

When asking a question, try to make it clear what your current knowledge level is and where you may have gaps, should help people provide more useful concise answers!

  • hendrik@palaver.p3x.de
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    2 days ago

    From what I know, I assume yes, the relation between model size and speed/performance should be linear. Maybe there is some additional small overhead making it a bit faster or slower than expected. But I’m really not an expert on the maths, so don’t trust me.

    And maybe have a look at this bugreport: https://github.com/ggml-org/llama.cpp/issues/11332
    I think it matches your situation. They resolve this by messing with the batch size and someone recommends not to use Vulkan on an iGPU.