Google Gemini supports text chat, image understanding, and image generation (Nano Banana series). A single gemini_api_key enables all capabilities.
All capabilities below can be configured in one place via the “Model Management” page in the Web Console, with no need to manually edit the configuration file.
Text Chat
{
"model": "gemini-3.5-flash",
"gemini_api_key": "YOUR_API_KEY"
}
| Parameter | Description |
|---|
model | Recommended: gemini-3.5-flash; also supports gemini-3.1-pro-preview, gemini-3.1-flash-lite-preview, gemini-3-flash-preview, gemini-3-pro-preview, etc. See official docs |
gemini_api_key | Create one in Google AI Studio |
gemini_api_base | Optional, defaults to https://generativelanguage.googleapis.com. Can be changed to a third-party proxy |
Image Understanding
All Gemini models natively support vision. Once gemini_api_key is configured, the Agent’s Vision tool automatically uses the main model to recognize images, with no extra setup required.
To manually specify a Vision model:
{
"tools": {
"vision": {
"model": "gemini-3.1-flash-lite-preview"
}
}
}
Image Generation
{
"skills": {
"image-generation": {
"model": "gemini-3.1-flash-image-preview"
}
}
}
| Model ID | Alias |
|---|
gemini-3.1-flash-image-preview | Nano Banana 2 |
gemini-3-pro-image-preview | Nano Banana Pro |
gemini-2.5-flash-image | Nano Banana |