• Aria@lemmygrad.ml
    link
    fedilink
    arrow-up
    3
    ·
    1 day ago

    I’m really underwhelmed by 32B Qwen DeepSeek R1. Both in its reasoning and knowledge. Still haven’t needed to or tested it at maths, so maybe that’s where it really shines.

  • seang96A
    link
    fedilink
    arrow-up
    2
    ·
    1 day ago

    With these new models is it worth trying it without a Nvidia GPU / running on CPU?

    • chiisana@lemmy.chiisana.net
      link
      fedilink
      arrow-up
      2
      ·
      17 hours ago

      Depending on what you want to do with it, and what your expectations are; the smaller distilled versions could work on CPU, but most likely will need extra help on top, just like other similar sized models.

      This being a reasoning model, you might get a more well thought out results out of it, but at the end of the day, smaller parameter space (easiest to think as ‘less vocabulary’), smaller capabilities.

      If you just want something to very quickly chat back and forth with on a CPU, try IBM’s granite3.1-moe:3b, which is very fast even on a modern CPU, but doesn’t really excel in complex problems without additional support (ie: RAG or tool use).

      • seang96A
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        17 hours ago

        My primary use would probably be a model for home assistant to reason what was said, and based reason what is desired like “Turn off all outlets” being said in the living room would turn them off. Or saying “its cold” to adjust the thermostat. Also hoping it would be able to figure out speech to text inaccuracies and reason what is wanted still. I think a smaller model would be an appropriate use for this also since response times should be on the quicker side.

        • chiisana@lemmy.chiisana.net
          link
          fedilink
          arrow-up
          3
          ·
          17 hours ago

          Yep! Give granite a try. I think that would be perfect for this use case both in terms of able to answer your queries and doing them quickly, without a GPU by just using modern CPU. I was getting above 30 tokens per second on my 10th gen i5, which kind of blew my mind.

          Thinking models like r1 will be better at things like troubleshooting a faulty furnace, or user problems, so there’s benefits in pushing those envelopes. However, if all you need is to give basic instructions, have it infer your intent, and finally perform the desired tasks, then smaller mixture of experts models should be passable even without a GPU.

  • Tundra@lemmy.ml
    link
    fedilink
    arrow-up
    4
    arrow-down
    2
    ·
    2 days ago

    I literally downloaded it today and gave it a whirl - it constantly writes what its thinking before giving you a response. get annoying after a while

    • mugdad1@lemm.ee
      link
      fedilink
      English
      arrow-up
      1
      ·
      21 hours ago

      me too i downloaded 8b r1 and its thinking proccess is annoying and takes time also resources

    • shootwhatsmyname@lemm.ee
      link
      fedilink
      English
      arrow-up
      10
      ·
      1 day ago

      It’s a reasoning model so that’s what it’s supposed to do. It make it more accurate with math and a slew of other things. It’s usually meant to be processed somehow so only the final output is shown

      You can install n8n and and use it with Ollama to create some pretty cool chat workflows