37 models available

Supported Models

Every creative model your Agent needs, accessible through one unified API. We handle routing, format conversion, and failover automatically.

Sora

OpenAI

Popular

World-model video generation up to 60 seconds with realistic physics.

openai/sora

Veo 3

Google DeepMind

New

Google's latest video model with native audio generation and 8K output.

google/veo-3

Veo 3.1

Google DeepMind

Improved consistency and longer duration outputs up to 2 minutes.

google/veo-3.1

Veo 3.1 Fast

Google DeepMind

Fast

Speed-optimized variant with 4× faster generation at slightly reduced quality.

google/veo-3.1-fast

Kling 1.6

Kuaishou

Popular

High-fidelity video generation with strong motion dynamics and coherence.

kuaishou/kling-1.6

Kling O1

Kuaishou

New

Next-gen Kling with chain-of-thought video planning for complex scenes.

kuaishou/kling-o1

Gen-3 Alpha

Runway

Popular

Text and image-to-video generation with cinematic quality and motion control.

runway/gen3-alpha

Video-01

MiniMax

Hailuo AI video generation with natural motion and character consistency.

minimax/video-01

Pika 2.0

Pika

Creative video effects and scene transformations from text or image inputs.

pika/v2

Dream Machine Ray2

Luma AI

Fast video generation with strong spatial understanding and 3D consistency.

luma/ray2

CogVideoX-5B

Zhipu AI

Open-source 5B parameter video model for research and self-hosted deployment.

zhipu/cogvideox-5b

HunyuanVideo

Tencent

Tencent's video foundation model with multi-shot narrative generation.

tencent/hunyuan-video

Wan 2.1

Alibaba

Alibaba's open video model with excellent text and object rendering.

alibaba/wan-2.1

FLUX.1 Pro

Black Forest Labs

Popular

State-of-the-art image generation with exceptional detail and composition.

bfl/flux-1-pro

FLUX.1 Dev

Black Forest Labs

Open-weight variant for development and fine-tuning workflows.

bfl/flux-1-dev

FLUX.1 Schnell

Black Forest Labs

Fast

Ultra-fast generation in ~1 second. Perfect for real-time applications.

bfl/flux-1-schnell

DALL·E 3

OpenAI

Popular

Advanced text-to-image with strong prompt understanding and safety features.

openai/dall-e-3

Stable Diffusion XL

Stability AI

Popular

High-quality 1024×1024 image generation with excellent prompt following and detail.

stability/sdxl-1.0

Stable Diffusion 3.5

Stability AI

New

Latest SD architecture with improved text rendering and photorealism.

stability/sd3.5-large

Midjourney v6.1

Midjourney

Industry-leading aesthetic quality for artistic and commercial imagery.

midjourney/v6.1

Ideogram 2.0

Ideogram

New

Best-in-class text rendering inside images. Logos, posters, signage.

ideogram/v2

Recraft V3

Recraft

SVG and vector-style illustration generation for design workflows.

recraft/v3

Kolors

Kuaishou

Bilingual (Chinese/English) image generation with strong cultural understanding.

kuaishou/kolors

Playground v3

Playground

Optimized for graphic design, color palettes, and typography layouts.

playground/v3

HunyuanDiT

Tencent

Tencent's diffusion transformer for high-resolution image synthesis.

tencent/hunyuan-dit

Eleven Multilingual v2

ElevenLabs

Popular

Industry-leading voice cloning and multilingual TTS in 29 languages.

elevenlabs/multilingual-v2

Eleven Turbo v2.5

ElevenLabs

Fast

Ultra-low latency voice synthesis for conversational AI and live use cases.

elevenlabs/turbo-v2.5

TTS-1 HD

OpenAI

Popular

High-definition text-to-speech with 6 natural voices. Great for narration.

openai/tts-1-hd

TTS-1

OpenAI

Fast

Low-latency TTS optimized for real-time applications and streaming.

openai/tts-1

CosyVoice 2

Alibaba

New

Zero-shot voice cloning with emotion and style control. Streaming support.

alibaba/cosyvoice-2

Fish Speech 1.5

Fish Audio

Low-latency streaming TTS with VQGAN codec. Under 150ms first byte.

fish/speech-1.5

XTTS v2

Coqui

Open-source multilingual TTS with voice cloning from 6-second samples.

coqui/xtts-v2

Bark

Suno

Generates speech, music, sound effects, and nonverbal communication.

suno/bark

Whisper Large v3

OpenAI

State-of-the-art speech-to-text in 100+ languages. Timestamp support.

openai/whisper-large-v3

MusicGen Large

Meta

Text-to-music generation for background tracks, jingles, and audio branding.

meta/musicgen-large

Stable Audio 2.0

Stability AI

Generate up to 3 minutes of high-quality audio from text prompts.

stability/stable-audio-2

Parler TTS

Hugging Face

Describe the voice you want in natural language. Fully open-source.

hf/parler-tts-large