Supported Models
Every creative model your Agent needs, accessible through one unified API. We handle routing, format conversion, and failover automatically.
Sora
OpenAI
World-model video generation up to 60 seconds with realistic physics.
Veo 3
Google DeepMind
Google's latest video model with native audio generation and 8K output.
Veo 3.1
Google DeepMind
Improved consistency and longer duration outputs up to 2 minutes.
Veo 3.1 Fast
Google DeepMind
Speed-optimized variant with 4× faster generation at slightly reduced quality.
Kling 1.6
Kuaishou
High-fidelity video generation with strong motion dynamics and coherence.
Kling O1
Kuaishou
Next-gen Kling with chain-of-thought video planning for complex scenes.
Gen-3 Alpha
Runway
Text and image-to-video generation with cinematic quality and motion control.
Video-01
MiniMax
Hailuo AI video generation with natural motion and character consistency.
Pika 2.0
Pika
Creative video effects and scene transformations from text or image inputs.
Dream Machine Ray2
Luma AI
Fast video generation with strong spatial understanding and 3D consistency.
CogVideoX-5B
Zhipu AI
Open-source 5B parameter video model for research and self-hosted deployment.
HunyuanVideo
Tencent
Tencent's video foundation model with multi-shot narrative generation.
Wan 2.1
Alibaba
Alibaba's open video model with excellent text and object rendering.
FLUX.1 Pro
Black Forest Labs
State-of-the-art image generation with exceptional detail and composition.
FLUX.1 Dev
Black Forest Labs
Open-weight variant for development and fine-tuning workflows.
FLUX.1 Schnell
Black Forest Labs
Ultra-fast generation in ~1 second. Perfect for real-time applications.
DALL·E 3
OpenAI
Advanced text-to-image with strong prompt understanding and safety features.
Stable Diffusion XL
Stability AI
High-quality 1024×1024 image generation with excellent prompt following and detail.
Stable Diffusion 3.5
Stability AI
Latest SD architecture with improved text rendering and photorealism.
Midjourney v6.1
Midjourney
Industry-leading aesthetic quality for artistic and commercial imagery.
Ideogram 2.0
Ideogram
Best-in-class text rendering inside images. Logos, posters, signage.
Recraft V3
Recraft
SVG and vector-style illustration generation for design workflows.
Kolors
Kuaishou
Bilingual (Chinese/English) image generation with strong cultural understanding.
Playground v3
Playground
Optimized for graphic design, color palettes, and typography layouts.
HunyuanDiT
Tencent
Tencent's diffusion transformer for high-resolution image synthesis.
Eleven Multilingual v2
ElevenLabs
Industry-leading voice cloning and multilingual TTS in 29 languages.
Eleven Turbo v2.5
ElevenLabs
Ultra-low latency voice synthesis for conversational AI and live use cases.
TTS-1 HD
OpenAI
High-definition text-to-speech with 6 natural voices. Great for narration.
TTS-1
OpenAI
Low-latency TTS optimized for real-time applications and streaming.
CosyVoice 2
Alibaba
Zero-shot voice cloning with emotion and style control. Streaming support.
Fish Speech 1.5
Fish Audio
Low-latency streaming TTS with VQGAN codec. Under 150ms first byte.
XTTS v2
Coqui
Open-source multilingual TTS with voice cloning from 6-second samples.
Bark
Suno
Generates speech, music, sound effects, and nonverbal communication.
Whisper Large v3
OpenAI
State-of-the-art speech-to-text in 100+ languages. Timestamp support.
MusicGen Large
Meta
Text-to-music generation for background tracks, jingles, and audio branding.
Stable Audio 2.0
Stability AI
Generate up to 3 minutes of high-quality audio from text prompts.
Parler TTS
Hugging Face
Describe the voice you want in natural language. Fully open-source.
