QwQ-32B

WebsiteContact for Pricing

QwQ-32B is a medium-sized reasoning model within the Qwen series, developed by the Qwen team as part of their Qwen2.5 model family. This model is a causal language model with 32.5 billion parameters that undergoes both pretraining and post-training processes, including supervised fine-tuning and reinforcement learning. It employs a transformer architecture featuring Rotary Position Embedding (RoPE), Scaled Sine-Gaussian Unit (SwiGLU), Real-Time Normalization (RMSNorm), and Attention Query Key Value biases, consisting of 64 layers with 40 attention heads for queries and keys, and 8 attention heads for values. QwQ-32B can handle a full context length of 131,072 tokens and is designed to perform competitively against advanced reasoning models such as DeepSeek-R1 and o1-mini.

Visit Website

Advanced Reasoning SystemIncludes specialized elements such as RoPE, SwiGLU, RMSNorm, and Attention QKV bias with 64 layers and 40/8 attention heads for both query (Q) and key-value (KV).
Advanced Text AnalysisSupports up to 131,072 tokens with YaRN scaling for enhanced long-sequence information processing.
Thoughtful Content CreationFeatures an innovative thought process indicated by <think> tags to guarantee high-quality, well-considered responses.
Modular Deployment CapabilitiesSupports multiple deployment frameworks including vLLM and various quantization formats (e.g., GGUF, 4-bit BNB, 16-bit).