Zhipu AI supports text chat, image understanding, speech-to-text (ASR), and embedding. A single zhipu_ai_api_key enables all capabilities.
All capabilities below can be configured in one place via the “Model Management” page in the Web Console, with no need to manually edit the configuration file.
Text Chat
{
"model": "glm-5.1",
"zhipu_ai_api_key": "YOUR_API_KEY"
}
| Parameter | Description |
|---|
model | Can be glm-5.1, glm-5-turbo, glm-5, glm-4.7, glm-4-plus, glm-4-flash, glm-4-air, etc. See model codes |
zhipu_ai_api_key | Create one in the Zhipu AI Console |
zhipu_ai_api_base | Optional, defaults to https://open.bigmodel.cn/api/paas/v4 |
Image Understanding
Zhipu’s chat models (glm-5.1, glm-5-turbo, etc.) do not support vision; vision calls are uniformly routed to glm-5v-turbo. Once zhipu_ai_api_key is configured, the Agent’s Vision tool automatically uses this model, with no need to specify it explicitly in the configuration file.
Speech-to-Text (ASR)
{
"voice_to_text": "zhipu",
"voice_to_text_model": "glm-asr-2512"
}
| Parameter | Description |
|---|
voice_to_text | Set to zhipu to enable Zhipu ASR |
voice_to_text_model | Optional, defaults to glm-asr-2512 |
Credentials are automatically reused from zhipu_ai_api_key. Audio files should be smaller than 25MB; oversized files may be rejected by the server.
Embedding
{
"embedding_provider": "zhipu",
"embedding_model": "embedding-3"
}
Available models: embedding-3, embedding-2. After changing the embedding, run /memory rebuild-index to rebuild the index.