GLM - CowAgent

Zhipu AI supports text chat, image understanding, speech-to-text (ASR), and embedding. A single zhipu_ai_api_key enables all capabilities.

All capabilities below can be configured in one place via the “Model Management” page in the Web Console, with no need to manually edit the configuration file.

Text Chat

{
  "model": "glm-5.1",
  "zhipu_ai_api_key": "YOUR_API_KEY"
}

Parameter	Description
`model`	Can be `glm-5.1`, `glm-5-turbo`, `glm-5`, `glm-4.7`, `glm-4-plus`, `glm-4-flash`, `glm-4-air`, etc. See model codes
`zhipu_ai_api_key`	Create one in the Zhipu AI Console
`zhipu_ai_api_base`	Optional, defaults to `https://open.bigmodel.cn/api/paas/v4`

Image Understanding

Zhipu’s chat models (glm-5.1, glm-5-turbo, etc.) do not support vision; vision calls are uniformly routed to glm-5v-turbo. Once zhipu_ai_api_key is configured, the Agent’s Vision tool automatically uses this model, with no need to specify it explicitly in the configuration file.

Speech-to-Text (ASR)

{
  "voice_to_text": "zhipu",
  "voice_to_text_model": "glm-asr-2512"
}

Parameter	Description
`voice_to_text`	Set to `zhipu` to enable Zhipu ASR
`voice_to_text_model`	Optional, defaults to `glm-asr-2512`

Credentials are automatically reused from zhipu_ai_api_key. Audio files should be smaller than 25MB; oversized files may be rejected by the server.

Embedding

{
  "embedding_provider": "zhipu",
  "embedding_model": "embedding-3"
}

Available models: embedding-3, embedding-2. After changing the embedding, run /memory rebuild-index to rebuild the index.

​Text Chat

​Image Understanding

​Speech-to-Text (ASR)

​Embedding

Text Chat

Image Understanding

Speech-to-Text (ASR)

Embedding