☆ Yσɠƚԋσʂ ☆@lemmygrad.ml to

Technology@lemmygrad.mlEnglish · 4 days ago

QwQ-32B is a 32 billion parameter language model achieves comparable performance to DeepSeek-R1 with 671 billion parameters, using reinforcement learning for scaling

qwenlm.github.io

3

9

QwQ-32B is a 32 billion parameter language model achieves comparable performance to DeepSeek-R1 with 671 billion parameters, using reinforcement learning for scaling

qwenlm.github.io

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml to

Technology@lemmygrad.mlEnglish · 4 days ago

3

QwQ-32B: Embracing the Power of Reinforcement Learning

qwenlm.github.io

QWEN CHAT Hugging Face ModelScope DEMO DISCORD Scaling Reinforcement Learning (RL) has the potential to enhance model performance beyond conventional pretraining and post-training methods. Recent studies have demonstrated that RL can significantly improve the reasoning capabilities of models. For instance, DeepSeek R1 has achieved state-of-the-art performance by integrating cold-start data and multi-stage training, enabling deep thinking and complex reasoning. Our research explores the scalability of Reinforcement Learning (RL) and its impact on enhancing the intelligence of large language models.

Chat

marl_karx@lemmygrad.ml
link
fedilink
English
arrow-up
3·
2 days ago
Isnt deepseek based on qwen? at least the distilled models?
- ☆ Yσɠƚԋσʂ ☆@lemmygrad.mlOP
  link
  fedilink
  arrow-up
  3·
  2 days ago
  I think so, but this looks like an update of qwen with some new tricks.