C

Chinese AI company Zhipu.AI has announced the comprehensive open-sourcing of its next-generation General Language Models (GLM), including the advanced GLM-Z1 inference models.

The release features the GLM-Z1 inference model, which achieves inference speeds up to eight times faster than DeepSeek-R1.

Zhipu claims that by optimizing GQA parameters, employing quantization, and implementing speculative sampling, the GLM-Z1-32B-0414 delivers a remarkable 200 tokens per second on consumer-grade GPUs – a staggering 50 times faster than human reading speed.

The open-sourced portfolio also features the foundational GLM-4-32B-0414 model, specifically enhanced for agent capabilities with superior performance in tool usage, web search, and code generation.

Zhipu has also open-sourced smaller 9B parameter versions of both GLM-4 and GLM-Z1 models, providing an efficient solution for resource-constrained environments.

The models are released under the permissive MIT license and can be explored for free at https://chat.z.ai/ or downloaded from Hugging Face's model hub.