In a significant move that highlights China's growing leadership in the open-source AI landscape, Z.ai (formerly Zhipu AI) has released GLM-4.5, a new family of large language models (LLMs) poised to challenge Western competitors with a focus on superior performance, efficiency, and agentic capabilities. The model's launch, under the permissive MIT license, marks a major step in making cutting-edge AI technology accessible and affordable to a global community of developers and businesses.
A Hybrid Approach to Intelligence
The GLM-4.5 series introduces a novel "hybrid reasoning" architecture, designed for the age of AI agents. The models feature a dual-mode operation:
Thinking Mode: For complex, multi-step tasks requiring deep reasoning, tool usage, and autonomous project planning.
Non-Thinking Mode: For instant, low-latency responses, making it highly versatile for a wide range of applications from chatbots to rapid code generation.
This dual-mode design, built on a Mixture-of-Experts (MoE) architecture, allows GLM-4.5 to achieve exceptional performance with remarkable hardware efficiency. The flagship model, GLM-4.5, boasts 355 billion total parameters with only 32 billion active for any given query.
Its lighter sibling, GLM-4.5-Air, is even more efficient with 106 billion total parameters and 12 billion active, making it capable of running on consumer-grade GPUs with quantization. This focus on parameter efficiency and lean design has a direct impact on operational costs, with Z.ai's API pricing significantly undercutting many industry rivals.
Outperforming on Key Benchmarks
Z.ai's internal evaluations, conducted across 12 industry-standard benchmarks, position GLM-4.5 as a global frontrunner.
Overall Performance: The flagship GLM-4.5 ranks third among all models tested, behind only Grok-4 and o3, and ahead of Claude 4 Opus. The GLM-4.5-Air model ranks sixth overall, outperforming models of comparable scale, including Claude 4 Sonnet.
Agentic Capabilities: GLM-4.5 is specifically engineered for autonomous tasks. It achieved a 26.4% success rate on the challenging BrowseComp web Browse benchmark, surpassing Claude-4-Opus (18.8%). It also demonstrated a tool-calling success rate of 90.6%, a figure higher than Claude-4-Sonnet and other leading models.
Coding: The model showcases impressive full-stack development capabilities. On the SWE-bench Verified benchmark, GLM-4.5 scored 64.2%, outperforming GPT-4.1. In head-to-head coding evaluations, it achieved a 53.9% win rate against Kimi K2 and an 80.8% win rate over Qwen3-Coder.
Open-Source and Accessible for All
A core tenet of the GLM-4.5 release is its commitment to open-source principles. The models are available under a permissive MIT license, allowing for unrestricted commercial use and secondary development. Z.ai has made multiple variants available on the Hugging Face platform under its zai-org profile, including:
GLM-4.5 and GLM-4.5-Air: The primary, high-performance models.
Base Models: Foundation models for developers to fine-tune for specific use cases.
Quantized Versions: To further enhance accessibility, Z.ai has released quantized versions, including FP8 models, which drastically reduce memory requirements and speed up inference, enabling deployment on more modest hardware.
GLM-4.5-Q8 size is ~385Gb while Q4's ~207Gb allow running on special hardware. GLM-4.5-Air Q4 size on disk is 67Gb allowing to run on consumer grade hardware with 96Gb of VRAM or Unified Memory. By lowering the barrier to entry and fostering an open ecosystem, the GLM-4.5 series is set to accelerate innovation.