GLM-5: from Vibe Coding to Agentic Engineering

Original authors: GLM-5-Team, :, Aohan Zeng, Xin Lv, Zhenyu Hou, Zhengxiao Du, Qinkai Zheng, Bin Chen, Da Yin, Chendi Ge, Chenghua Huang, Chengxing Xie, Chenzheng Zhu, Congfeng Yin, Cunxiang Wang, Gengzheng Pan, Hao ZeGLM-5-Team, :, Aohan Zeng, Xin Lv, Zhenyu Hou, Zhengxiao Du, Qinkai Zheng, Bin Chen, Da Yin, Chendi Ge, Chenghua Huang, Chengxing Xie, Chenzheng Zhu, Congfeng Yin, Cunxiang Wang, Gengzheng Pan, Hao Zeng, Haoke Zhang, Haoran Wang, Huilong Chen, Jiajie Zhang, Jian Jiao, Jiaqi Guo, Jingsen Wang, Jingzhao Du, Jinzhu Wu, Kedong Wang, Lei Li, Lin Fan, Lucen Zhong, Mingdao Liu, Mingming Zhao, Pengfan Du, Qian Dong, Rui Lu, Shuang-Li, Shulin Cao, Song Liu, Ting Jiang, Xiaodong Chen, Xiaohan Zhang, Xuancheng Huang, Xuezhen Dong, Yabo Xu, Yao Wei, Yifan An, Yilin Niu, Yitong Zhu, Yuanhao Wen, Yukuo Cen, Yushi Bai, Zhongpei Qiao, Zihan Wang, Zikang Wang, Zilin Zhu, Ziqiang Liu, Zixuan Li, Bojie Wang, Bosi Wen, Can Huang, Changpeng Cai, Chao Yu, Chen Li, Chengwei Hu, Chenhui Zhang, Dan Zhang, Daoyan Lin, Dayong Yang, Di Wang, Ding Ai, Erle Zhu, Fangzhou Yi, Feiyu Chen, Guohong Wen, Hailong Sun, Haisha Zhao, Haiyi Hu, Hanchen Zhang, Hanrui Liu, Hanyu Zhang, Hao Peng, Hao Tai, Haobo Zhang, He Liu, Hongwei Wang, Hongxi Yan, Hongyu Ge, Huan Liu, Huanpeng Chu, Jia'ni Zhao, Jiachen Wang, Jiajing Zhao, Jiamin Ren, Jiapeng Wang, Jiaxin Zhang, Jiayi Gui, Jiayue Zhao, Jijie Li, Jing An, Jing Li, Jingwei Yuan, Jinhua Du, Jinxin Liu, Junkai Zhi, Junwen Duan, Kaiyue Zhou, Kangjian Wei, Ke Wang, Keyun Luo, Laiqiang Zhang, Leigang Sha, Liang Xu, Lindong Wu, Lintao Ding, Lu Chen, Minghao Li, Nianyi Lin, Pan Ta, Qiang Zou, Rongjun Song, Ruiqi Yang, Shangqing Tu, Shangtong Yang, Shaoxiang Wu, Shengyan Zhang, Shijie Li, Shuang Li, Shuyi Fan, Wei Qin, Wei Tian, Weining Zhang, Wenbo Yu, Wenjie Liang, Xiang Kuang, Xiangmeng Cheng, Xiangyang Li, Xiaoquan Yan, Xiaowei Hu, Xiaoying Ling, Xing Fan, Xingye Xia, Xinyuan Zhang, Xinze Zhang, Xirui Pan, Xu Zou, Xunkai Zhang, Yadi Liu, Yandong Wu, Yanfu Li, Yidong Wang, Yifan Zhu, Yijun Tan, Yilin Zhou, Yiming Pan, Ying Zhang, Yinpei Su, Yipeng Geng, Yong Yan, Yonglin Tan, Yuean Bi, Yuhan Shen, Yuhao Yang, Yujiang Li, Yunan Liu, Yunqing Wang, Yuntao Li, Yurong Wu, Yutao Zhang, Yuxi Duan, Yuxuan Zhang, Zezhen Liu, Zhengtao Jiang, Zhenhe Yan, Zheyu Zhang, Zhixiang Wei, Zhuo Chen, Zhuoer Feng, Zijun Yao, Ziwei Chai, Ziyuan Wang, Zuzhou Zhang, Bin Xu, Minlie Huang, Hongning Wang, Juanzi Li, Yuxiao Dong, Jie Tang

Published 2026-02-25

📖 4 min read☕ Coffee break read

View on arXiv ↗PDF ↗

CC BY 4.0

Original authors: GLM-5-Team, :, Aohan Zeng, Xin Lv, Zhenyu Hou, Zhengxiao Du, Qinkai Zheng, Bin Chen, Da Yin, Chendi Ge, Chenghua Huang, Chengxing Xie, Chenzheng Zhu, Congfeng Yin, Cunxiang Wang, Gengzheng Pan, Hao Zeng, Haoke Zhang, Haoran Wang, Huilong Chen, Jiajie Zhang, Jian Jiao, Jiaqi Guo, Jingsen Wang, Jingzhao Du, Jinzhu Wu, Kedong Wang, Lei Li, Lin Fan, Lucen Zhong, Mingdao Liu, Mingming Zhao, Pengfan Du, Qian Dong, Rui Lu, Shuang-Li, Shulin Cao, Song Liu, Ting Jiang, Xiaodong Chen, Xiaohan Zhang, Xuancheng Huang, Xuezhen Dong, Yabo Xu, Yao Wei, Yifan An, Yilin Niu, Yitong Zhu, Yuanhao Wen, Yukuo Cen, Yushi Bai, Zhongpei Qiao, Zihan Wang, Zikang Wang, Zilin Zhu, Ziqiang Liu, Zixuan Li, Bojie Wang, Bosi Wen, Can Huang, Changpeng Cai, Chao Yu, Chen Li, Chengwei Hu, Chenhui Zhang, Dan Zhang, Daoyan Lin, Dayong Yang, Di Wang, Ding Ai, Erle Zhu, Fangzhou Yi, Feiyu Chen, Guohong Wen, Hailong Sun, Haisha Zhao, Haiyi Hu, Hanchen Zhang, Hanrui Liu, Hanyu Zhang, Hao Peng, Hao Tai, Haobo Zhang, He Liu, Hongwei Wang, Hongxi Yan, Hongyu Ge, Huan Liu, Huanpeng Chu, Jia'ni Zhao, Jiachen Wang, Jiajing Zhao, Jiamin Ren, Jiapeng Wang, Jiaxin Zhang, Jiayi Gui, Jiayue Zhao, Jijie Li, Jing An, Jing Li, Jingwei Yuan, Jinhua Du, Jinxin Liu, Junkai Zhi, Junwen Duan, Kaiyue Zhou, Kangjian Wei, Ke Wang, Keyun Luo, Laiqiang Zhang, Leigang Sha, Liang Xu, Lindong Wu, Lintao Ding, Lu Chen, Minghao Li, Nianyi Lin, Pan Ta, Qiang Zou, Rongjun Song, Ruiqi Yang, Shangqing Tu, Shangtong Yang, Shaoxiang Wu, Shengyan Zhang, Shijie Li, Shuang Li, Shuyi Fan, Wei Qin, Wei Tian, Weining Zhang, Wenbo Yu, Wenjie Liang, Xiang Kuang, Xiangmeng Cheng, Xiangyang Li, Xiaoquan Yan, Xiaowei Hu, Xiaoying Ling, Xing Fan, Xingye Xia, Xinyuan Zhang, Xinze Zhang, Xirui Pan, Xu Zou, Xunkai Zhang, Yadi Liu, Yandong Wu, Yanfu Li, Yidong Wang, Yifan Zhu, Yijun Tan, Yilin Zhou, Yiming Pan, Ying Zhang, Yinpei Su, Yipeng Geng, Yong Yan, Yonglin Tan, Yuean Bi, Yuhan Shen, Yuhao Yang, Yujiang Li, Yunan Liu, Yunqing Wang, Yuntao Li, Yurong Wu, Yutao Zhang, Yuxi Duan, Yuxuan Zhang, Zezhen Liu, Zhengtao Jiang, Zhenhe Yan, Zheyu Zhang, Zhixiang Wei, Zhuo Chen, Zhuoer Feng, Zijun Yao, Ziwei Chai, Ziyuan Wang, Zuzhou Zhang, Bin Xu, Minlie Huang, Hongning Wang, Juanzi Li, Yuxiao Dong, Jie Tang

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you've been teaching a brilliant but slightly clumsy apprentice how to fix things. For years, you've had to stand over their shoulder, whispering instructions like, "Okay, now type this line of code," or "Wait, that button doesn't work, try clicking the other one." This is what the paper calls "Vibe Coding." It's helpful, but it's slow, and the apprentice can't really work alone.

GLM-5 is the moment that apprentice finally graduates. They don't just listen to your vibes anymore; they become a Master Engineer. They can look at a messy problem, plan the whole project, write the code, fix their own mistakes, and run a business simulation for a year without you needing to say a word.

Here is the simple breakdown of how they did it, using some everyday analogies:

1. The New Brain Architecture: "The Smart Librarian" (DSA)

Previously, to find a specific book in a library of 128,000 pages, the AI had to look at every single page to make sure it didn't miss anything. This was slow and expensive.

The Fix: GLM-5 uses something called DSA (DeepSeek Sparse Attention). Imagine a librarian who doesn't read every book. Instead, they have a super-smart index that instantly knows exactly which 5 pages matter for your question and ignores the other 127,995.
The Result: The AI is now twice as fast and costs half as much to run, but it still remembers everything important.

2. The Training Gym: "The Asynchronous Dojo"

In the past, training AI was like a gym where everyone had to wait for the slowest person to finish a set before the next one could start. If one person took a long time to think, the whole gym stood idle.

The Fix: GLM-5 built a new Asynchronous Infrastructure. Imagine a dojo where the "thinking" (inference) and the "learning" (training) happen in separate rooms. The thinkers generate thousands of scenarios, and the teachers learn from them instantly, without waiting for anyone to finish.
The Result: The AI learns from complex, long-term tasks (like running a business for a year) much faster and more efficiently.

3. The "Thinking" Habits: "The Architect's Blueprint"

Older AIs would often jump straight to the answer, like a student guessing on a test. GLM-5 has learned three new ways to think:

Interleaved Thinking: It pauses to think before every single action, like an architect checking the blueprint before laying a brick.
Preserved Thinking: If you ask it to fix a bug in a huge codebase, it remembers its previous thoughts so it doesn't have to re-derive the whole logic from scratch every time. It keeps a running notebook.
Turn-Level Thinking: You can tell it, "Think hard for this complex math problem, but just give me a quick answer for this simple greeting." It knows when to switch gears.

4. The Real-World Test: "The Internship"

The paper doesn't just show test scores; it shows the AI doing real jobs.

The Vending Machine Test: Imagine giving an AI $1,000 and asking it to run a vending machine business for a year. GLM-5 didn't just survive; it made $4,432. It learned to restock items, fix broken machines, and manage cash flow better than most humans.
The Software Engineer: When asked to fix bugs in real-world software (like the kind used by millions of people), GLM-5 solved more problems than any other open-source model, rivaling the most expensive, secret models from big tech companies.

5. The "Pony Alpha" Surprise

The authors did something bold: they released the model anonymously (calling it "Pony Alpha") on a public platform. They wanted to see if people would like it just for its skills, without knowing it was made by a Chinese team.

The Result: People loved it. They guessed it was from top US labs like Anthropic or Google. When the authors revealed it was GLM-5, it proved that the model's quality spoke for itself, transcending borders and biases.

The Big Picture

GLM-5 isn't just a "smarter chatbot." It represents a shift from asking for help to delegating work.

Before: You are the driver; the AI is the passenger giving directions.
Now: You are the boss; the AI is the project manager who handles the team, the schedule, and the execution.

The paper concludes that we are moving from an era of "Vibe Coding" (guessing and hoping) to "Agentic Engineering" (planning, building, and iterating with precision). GLM-5 is the first open-source model to truly master this new era.

1. Problem Statement

The paper addresses the critical bottlenecks hindering the transition of Large Language Models (LLMs) from passive knowledge repositories to active, autonomous problem solvers, specifically in the realm of complex software engineering ("Agentic Engineering").

Computational Cost vs. Capability: Traditional scaling laws require massive computational resources to improve reasoning and coding capabilities, making long-context, multi-step agent workflows prohibitively expensive.
Real-World Adaptability: Existing models often excel at static benchmarks (e.g., single-turn coding tasks) but fail in dynamic, long-horizon real-world scenarios (e.g., end-to-end software development, business simulation) due to poor planning, context loss, and instability in asynchronous agent loops.
Efficiency of Training/Inference: Standard dense attention mechanisms ( $O(L^2)$ ) and synchronous reinforcement learning (RL) pipelines create severe latency and GPU idle time during long agent rollouts.

2. Methodology

GLM-5 introduces a comprehensive overhaul of the model architecture, training infrastructure, and alignment strategies to achieve state-of-the-art (SOTA) performance with extreme efficiency.

A. Architecture & Pre-Training

Model Scale: GLM-5 scales to 744B total parameters with 40B active parameters (256 experts), doubling the size of its predecessor (GLM-4.5).
DeepSeek Sparse Attention (DSA): To handle long contexts (up to 200K) efficiently, GLM-5 adopts DSA. Unlike fixed sliding windows, DSA dynamically selects relevant tokens based on content importance.
- Training Strategy: A "dense warm-up and sparse adaptation" approach allows the model to transition from a dense base model to a sparse one without retraining from scratch, reducing training costs by ~1.5–2x for long sequences.
Multi-Latent Attention (MLA) Optimization: The authors refined MLA using Muon Split (splitting projection matrices for orthogonalization) to match the performance of Grouped-Query Attention (GQA) while retaining memory efficiency. They also optimized head dimensions (MLA-256) to reduce decoding computational costs.
Data: Trained on 28.5 trillion tokens, with a heavy focus on code, reasoning, and long-context agentic data. The data pipeline includes rigorous filtering for synthetic data and long-tail knowledge.

B. Post-Training & Alignment

Asynchronous Reinforcement Learning (RL) Infrastructure:
- Decoupled Engines: The training and inference engines are fully decoupled. A central Multi-Task Rollout Orchestrator manages diverse agent tasks (coding, search, terminal) asynchronously, eliminating GPU idle time caused by long-horizon rollouts.
- Token-in-Token-out (TITO): To prevent re-tokenization mismatches in asynchronous settings, the system passes raw token IDs and metadata directly from the inference engine to the trainer, ensuring exact action-level correspondence.
- Stability Mechanisms: Implements Direct Double-sided Importance Sampling (clipping log-probabilities) and filters out stale or noisy samples (e.g., environment crashes) to maintain training stability.
RL Algorithm:
- Reasoning RL: Uses a mixed-domain approach (Math, Science, Code, Tool-Integrated Reasoning) with GRPO and IcePop techniques to mitigate training-inference mismatches.
- Agentic RL: Optimized for long-horizon tasks (e.g., SWE, search) with specialized reward systems and environment scaling (10k+ verifiable SWE environments).
- General RL: Decomposes objectives into foundational correctness, emotional intelligence, and task-specific quality, using a hybrid reward system (Rule-based, Outcome Reward Models, Generative Reward Models).
On-Policy Cross-Stage Distillation: A final stage where the model distills knowledge from previous SFT and RL stages to prevent catastrophic forgetting and recover capabilities.

C. Hardware Adaptation

Full-Stack Chinese Chip Support: GLM-5 is fully optimized for seven domestic Chinese chip platforms (Huawei Ascend, Moore Threads, Hygon, etc.).
Optimizations: Includes W4A8 mixed-precision quantization (INT4 for MoE experts, INT8 for others), custom fusion kernels (Lightning Indexer, Sparse Flash Attention), and advanced inference engine scheduling (vLLM/SGLang adaptations) to achieve performance comparable to dual-GPU international clusters on single Chinese nodes.

3. Key Contributions

Paradigm Shift to Agentic Engineering: Moves beyond "vibe coding" (prompt-based generation) to autonomous agents that plan, implement, and iterate on complex software tasks.
DSA Architecture: Successfully integrates DeepSeek Sparse Attention into a massive MoE model, achieving long-context fidelity (200K) with significantly reduced training/inference costs.
Asynchronous RL Infrastructure: A novel system that decouples generation from training, enabling massive-scale exploration of agent trajectories without synchronization bottlenecks.
Advanced Context Management: Introduces Preserved Thinking (retaining reasoning blocks across turns) and Hierarchical Context Management (Keep-recent-k + Discard-all) for search agents, drastically improving performance on long-horizon tasks like BrowseComp.
Comprehensive Evaluation Suite: Proposes CC-Bench-V2, an internal, fully automated benchmark for frontend, backend, and long-horizon tasks, moving beyond static leaderboards to real-world engineering validation.

4. Results

GLM-5 demonstrates SOTA performance across open-weight models and rivals top proprietary systems.

Benchmarks:
- Artificial Analysis Intelligence Index v4.0: Scores 50.4, becoming the first open-weight model to reach this score (up from 42 for GLM-4.7).
- SWE-bench Verified: 77.8% (beating Gemini 3 Pro and approaching Claude Opus 4.5).
- Humanity's Last Exam (HLE): 50.4 (with tools), outperforming Claude Opus 4.5 and Gemini 3 Pro.
- BrowseComp: 75.9 with context management, the highest among open models.
- Vending-Bench 2: Achieves a final balance of $4,432, demonstrating strong long-term planning and resource management.
Real-World Engineering (CC-Bench-V2):
- Frontend: 98% Build Success Rate; competitive with Claude Opus 4.5 in Check-item Success Rate (CSR).
- Backend: Pass@1 of 25.8% on complex engineering tasks, comparable to Claude Opus 4.5 and significantly ahead of GLM-4.7.
- Long-Horizon: Outperforms Claude Opus 4.5 in Repo Exploration (65.6% vs 64.5%), showing superior strategic search capabilities.
Efficiency: Achieves comparable performance to proprietary models while being fully open-weight and optimized for cost-effective deployment on domestic hardware.

5. Significance

Democratization of Agentic AI: GLM-5 proves that open-weight models can now compete with the most advanced proprietary systems in complex, real-world software engineering tasks, lowering the barrier to entry for high-performance AI agents.
Efficiency Breakthrough: The combination of DSA, asynchronous RL, and hardware-specific optimizations (Chinese chips) offers a blueprint for scaling intelligence without linearly scaling costs.
New Evaluation Standards: By introducing CC-Bench-V2 and emphasizing "Agentic Engineering" over static coding benchmarks, the paper shifts the community focus toward evaluating models on their ability to execute end-to-end, multi-step workflows in dynamic environments.
Geopolitical & Ecosystem Impact: The successful adaptation to Chinese hardware ecosystems demonstrates the viability of a self-reliant AI infrastructure, reducing dependency on Western hardware for cutting-edge model deployment.

In conclusion, GLM-5 represents a milestone in the evolution of foundation models, bridging the gap between theoretical reasoning capabilities and practical, efficient, autonomous engineering execution.