My primary use would probably be a model for home assistant to reason what was said, and based reason what is desired like “Turn off all outlets” being said in the living room would turn them off. Or saying “its cold” to adjust the thermostat. Also hoping it would be able to figure out speech to text inaccuracies and reason what is wanted still. I think a smaller model would be an appropriate use for this also since response times should be on the quicker side.
Yep! Give granite a try. I think that would be perfect for this use case both in terms of able to answer your queries and doing them quickly, without a GPU by just using modern CPU. I was getting above 30 tokens per second on my 10th gen i5, which kind of blew my mind.
Thinking models like r1 will be better at things like troubleshooting a faulty furnace, or user problems, so there’s benefits in pushing those envelopes. However, if all you need is to give basic instructions, have it infer your intent, and finally perform the desired tasks, then smaller mixture of experts models should be passable even without a GPU.
My primary use would probably be a model for home assistant to reason what was said, and based reason what is desired like “Turn off all outlets” being said in the living room would turn them off. Or saying “its cold” to adjust the thermostat. Also hoping it would be able to figure out speech to text inaccuracies and reason what is wanted still. I think a smaller model would be an appropriate use for this also since response times should be on the quicker side.
Yep! Give granite a try. I think that would be perfect for this use case both in terms of able to answer your queries and doing them quickly, without a GPU by just using modern CPU. I was getting above 30 tokens per second on my 10th gen i5, which kind of blew my mind.
Thinking models like r1 will be better at things like troubleshooting a faulty furnace, or user problems, so there’s benefits in pushing those envelopes. However, if all you need is to give basic instructions, have it infer your intent, and finally perform the desired tasks, then smaller mixture of experts models should be passable even without a GPU.