Content Generation API (Video / Image / Audio / Music)
Generate video, image, audio (TTS) and music with Onysoft AI Gateway. Use 40+ provider models such as Veo, Sora 2, Runway, Kling, Wan, Hailuo, Seedream, Flux, GPT Image, Imagen, Nano Banana, Ideogram, Grok Imagine, ElevenLabs and Suno through a single asynchronous backend.
5 endpoints, 1 shared backend
There are 5 endpoint names for the generation flow, and they all connect to the same backend. Which one you use is purely semantic — it is meant to make your code more readable. The returned task_id is queried the same way for all of them.
Endpoint Names (Aliases)
The 5 endpoints below are fully equivalent. Whichever you send, the same backend runs and the same task_id format is returned. For semantic clarity, prefer the one that fits your model type.
| Endpoint | Recommended Use | Example Models |
|---|---|---|
POST /v1/video/generate |
Video generation | veo3, sora-2, runway, kling, wan, hailuo |
POST /v1/image/generate |
Image generation | seedream, flux, imagen, nano-banana, ideogram, grok-imagine, gpt-image |
POST /v1/audio/generate |
Audio generation (general) | elevenlabs/text-to-speech, sound-effect, audio-isolation, speech-to-text |
POST /v1/voice/generate |
Speech generation (TTS) | elevenlabs/text-to-speech-multilingual-v2, turbo-2-5, v3-text-to-dialogue |
POST /v1/music/generate |
Music generation | suno-v4, ai-music-api/generate, mashup, extend, sounds |
Status Lookup Endpoints
Whichever alias you used for generation, you can use the same alias for status lookup (or any of them — the task_id is global).
GET /v1/video/status/{task_id}GET /v1/image/status/{task_id}GET /v1/audio/status/{task_id}GET /v1/voice/status/{task_id}GET /v1/music/status/{task_id}
Asynchronous Design
All of these endpoints work asynchronously. When you send a request you receive a task_id (HTTP 202). You use this ID to query the generation status and check whether your content is ready. Typical duration: image 10-30s, video 30-180s, TTS 5-15s, music 30-90s.
Text/Chat Exception
For text/chat models, use /v1/chat/completions (synchronous, streaming-capable — see Chat Completions). Special case: Google Gemini image models (gemini-*-image) are called through the chat completions endpoint (multimodal response).
Generate Video
/v1/video/generate
Starts a new video, image or music generation task.
Parameters
The video model. Example: veo3_fast, kling-2.5, runway_gen4_turbo
Video description (English recommended). Detailed and descriptive prompts produce better results.
Aspect ratio: "16:9", "9:16", "1:1". Default: "16:9"
Video length (seconds). Default: 5. Depending on the model, 5, 6, 8 or 10 seconds are supported.
Source image. Required for image-to-video and image-to-image models. Can be a public HTTP URL or a base64 data URI.
Alternative to image_url. A base64-encoded image. The system automatically saves it to a temp file and produces a public URL.
Resolution. Supports different values per model: "720p", "1080p" (video), "1K", "2K", "4K" (image). The backend selects the correct value automatically.
Quality level. "high", "standard", "low". Used only with the gpt-image and seedream models.
Unwanted elements. Specify which object/style/color you do not want generated.
Example Requests
Video Generation (Text-to-Video)
curl -X POST https://api.onysoft.com/v1/video/generate \
-H "Authorization: Bearer sk-ony-YOUR-KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "veo3_fast",
"prompt": "A golden retriever running through a field of sunflowers at sunset",
"aspect_ratio": "16:9",
"duration": 5
}'
Image Generation (Text-to-Image)
curl -X POST https://api.onysoft.com/v1/video/generate \
-H "Authorization: Bearer sk-ony-YOUR-KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "google/imagen4",
"prompt": "A beautiful sunset over mountains, photorealistic, 4K",
"aspect_ratio": "16:9"
}'
Image Editing (Image-to-Image)
curl -X POST https://api.onysoft.com/v1/video/generate \
-H "Authorization: Bearer sk-ony-YOUR-KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedream/4.5-edit",
"prompt": "Change background to a futuristic city",
"image_url": "https://example.com/input.jpg",
"aspect_ratio": "1:1"
}'
Music Generation
curl -X POST https://api.onysoft.com/v1/video/generate \
-H "Authorization: Bearer sk-ony-YOUR-KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "suno-v4",
"prompt": "Upbeat electronic dance music with catchy female vocals about summer"
}'
Example Response
{
"success": true,
"data": {
"task_id": "vid_abc123...",
"status": "pending",
"model": "veo3_fast",
"estimated_cost": {
"amount": 0.60,
"currency": "USD"
}
}
}
Status Lookup
/v1/video/status/{task_id}
Queries the video generation status.
Status Values
| Status | Description |
|---|---|
pending | Request received, queued |
processing | Video generation in progress |
completed | Video ready, URL available |
failed | Generation failed |
Example Request
curl https://api.onysoft.com/v1/video/status/vid_abc123 \
-H "Authorization: Bearer sk-ony-YOUR-KEY"
Example Response (Completed)
{
"success": true,
"data": {
"task_id": "vid_abc123...",
"status": "completed",
"video_url": "https://...",
"thumbnail_url": "https://...",
"duration": 5,
"model": "veo3_fast"
}
}
Video List
/v1/video/list
Returns a list of generated videos.
Query Parameters
Page number. Default: 1
Records per page. Default: 20, Maximum: 100
Supported Models
The system provides access to 200+ KieAI models. The model list is dynamic and synced from the admin panel. For the current model list and prices, use the Models page or the GET /v1/models endpoint.
Video Models (Featured)
| Model | Provider | Type | Description |
|---|---|---|---|
veo3 | Video | Veo 3.1 - Highest quality (t2v/i2v) | |
veo3_fast | Video | Veo 3.1 Fast - Fast generation | |
sora-2-text-to-video | OpenAI | Video | Sora 2 text-to-video |
sora-2-pro-text-to-video | OpenAI | Video | Sora 2 Pro - high quality |
runway_gen4_turbo | Runway | Video | Gen-4 Turbo |
runway_aleph | Runway | Video | Aleph (video-to-video) |
kling/v2-5-turbo-text-to-video-pro | Kling | Video | Kling 2.5 Turbo Pro |
kling-2.6/text-to-video | Kling | Video | Kling 2.6 |
wan/2-6-text-to-video | Alibaba | Video | Wan 2.6 |
hailuo/2-3-image-to-video-pro | Hailuo | Video | Hailuo 2.3 Pro (i2v) |
grok-imagine/text-to-video | xAI | Video | Grok Imagine Video |
Image Models (Featured)
| Model | Provider | Type | Description |
|---|---|---|---|
grok-imagine/text-to-image | xAI | t2i | Grok Imagine (most compatible) |
google/imagen4 | t2i | Imagen 4 Standard | |
google/imagen4-fast | t2i | Imagen 4 Fast | |
google/imagen4-ultra | t2i | Imagen 4 Ultra - Highest quality | |
google/nano-banana-2 | t2i | Nano Banana 2 (1K/2K/4K) | |
ideogram/v3-text-to-image | Ideogram | t2i | Ideogram V3 (text-to-image) |
ideogram/v3-edit | Ideogram | i2i | Ideogram V3 Edit |
flux-2/pro-text-to-image | Flux | t2i | Flux 2 Pro |
flux-2/flex-text-to-image | Flux | t2i | Flux 2 Flex |
seedream/4.5-text-to-image | ByteDance | t2i | Seedream 4.5 (text-to-image) |
seedream/4.5-edit | ByteDance | i2i | Seedream 4.5 Edit (image-to-image) |
gpt-image/1.5-text-to-image | OpenAI | t2i | GPT Image 1.5 |
wan/2-7-image-pro | Alibaba | t2i | Wan 2.7 Image Pro |
Audio Models (TTS / Sound Effect / Speech)
Audio generation via the ElevenLabs provider. Recommended endpoint: POST /v1/audio/generate or POST /v1/voice/generate.
| Model | Type | Description | Price (USD) |
|---|---|---|---|
elevenlabs/text-to-speech-multilingual-v2 | TTS | Multilingual TTS (29 languages including Turkish). Recommended. | $0.060 |
elevenlabs/text-to-speech-turbo-2-5 | TTS | Low-latency TTS (English-weighted only). Currently experiencing an "internal error" on the provider side. | $0.030 |
elevenlabs/v3-text-to-dialogue | TTS Dialogue | Multi-character dialogue generation (V3 model) | $0.070 |
elevenlabs/sound-effect-v2 | SFX | Sound effect generation (prompt → audio) | $0.001 |
elevenlabs/audio-isolation | Audio Filter | Vocal/instrument isolation | $0.001 |
elevenlabs/speech-to-text | STT | Speech to text (transcription) | $0.0175 |
TTS Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt | string | — | The text to voice (required) |
voice | string | "Rachel" | Voice identifier. A value from the list below (case-sensitive). |
stability | float | 0.5 | Voice consistency (0.0 - 1.0). High = monotone, low = expressive. |
similarity_boost | float | 0.75 | Voice similarity (0.0 - 1.0). |
style | float | 0.0 | Style intensity (0.0 - 1.0). Higher = more characterful. |
Supported Voice List (ElevenLabs)
In the ElevenLabs integration via KieAI, only the following 20 voices are available. The values are case-sensitive — the first letter must be capitalized. If you send a voice outside the list, KieAI returns the error "This voice is not within the range of allowed options".
| Female | Male |
|---|---|
Rachel (default), Alice, Aria, Charlotte, Jessica, Laura, Lily, Matilda, Sarah |
Brian, Bill, Callum, Charlie, Chris, Daniel, Eric, George, Liam, Roger, Will |
For Turkish TTS
Use the elevenlabs/text-to-speech-multilingual-v2 model — it provides natural narration in 29 languages including Turkish. The turbo-2-5 model is English-weighted only and is currently having a temporary issue on the provider side.
TTS Example (Turkish, multilingual-v2)
curl -X POST https://api.onysoft.com/v1/voice/generate \
-H "Authorization: Bearer sk-ony-YOUR-KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "elevenlabs/text-to-speech-multilingual-v2",
"prompt": "Merhaba, ben Rachel. Onysoft AI üzerinden konuşuyorum.",
"voice": "Rachel",
"stability": 0.6,
"similarity_boost": 0.8,
"style": 0.1
}'
Sound Effect Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt | string | — | Sound effect description (e.g. "rain on metal roof") |
duration_seconds | float | 5.0 | Generation length (seconds). Between 1.0 and 22.0. |
prompt_influence | float | 0.3 | Fidelity to the prompt (0.0 - 1.0). |
Sound Effect Example
curl -X POST https://api.onysoft.com/v1/audio/generate \
-H "Authorization: Bearer sk-ony-YOUR-KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "elevenlabs/sound-effect-v2",
"prompt": "thunder rolling in distant mountains, heavy rain",
"duration_seconds": 8.0,
"prompt_influence": 0.5
}'
Music Models
Suno and KieAI ai-music-api models for song/music generation. Recommended endpoint: POST /v1/music/generate.
| Model | Provider | Description |
|---|---|---|
suno-v4 | Suno | Suno V4 — Song generation (instrumental/vocal, multiple genres) |
ai-music-api/generate | KieAI | Create a new song (Suno behind the scenes) |
ai-music-api/extend | KieAI | Extend an existing song |
ai-music-api/mashup | KieAI | Combine 2 songs |
ai-music-api/sounds | KieAI | Sound effects |
ai-music-api/upload-and-cover-audio | KieAI | Generate a cover from existing audio |
ai-music-api/separate-vocals | KieAI | Vocal/instrument separation |
ai-music-api/create-music-video | KieAI | Generate a visual video for music |
ai-music-api/add-instrumental | KieAI | Add instrumentals to existing vocals |
ai-music-api/convert-to-wav-format | KieAI | MP3 → WAV conversion |
suno-add-vocals | KieAI | Add vocals to an instrumental |
suno-extend-music | KieAI | Extend a Suno track |
suno-generate-lyrics | KieAI | Lyrics generation |
Current Prices
Prices are pulled dynamically from the KieAI API. They are shown with the markup rate applied to your project/customer. For current pricing: call GET /v1/models or see the models page.
Per-Model Required Parameters
Each provider requires different parameters. The table below shows the parameters required for each model family. These parameters are filled in automatically on the backend — you only need to send prompt and, optionally, aspect_ratio.
| Provider | Model Family | Auto-Filled Parameters |
|---|---|---|
| imagen4 / imagen4-fast / imagen4-ultra | prompt (only) | |
| xAI | grok-imagine/* | aspect_ratio |
| Ideogram | ideogram/v3-text-to-image | prompt (only) |
| Ideogram | ideogram/v3-edit, v3-remix | prompt + image_url (i2i) |
| Ideogram | ideogram/character* | reference_image_urls (i2i) |
| Flux | flux-2/* | aspect_ratio, resolution (1K/2K) |
| ByteDance | seedream/* | quality, n, response_format, size, resolution (2K), aspect_ratio, background |
| OpenAI | gpt-image/* | quality, n, response_format, size, resolution, aspect_ratio, background |
| Alibaba | wan/2-7-image-* | aspect_ratio, resolution (2K/4K) |
| Alibaba | wan video models | duration, resolution (720p/1080p) |
| nano-banana-edit | image_urls (i2i - image required) |
i2i / i2v (image-to-image / image-to-video) Models
When sending a source image for image or video generation, follow this rule:
- Single image: send
image_url(string) - Multiple images (reference): send
image_urls(array) - The backend automatically converts to the actual parameter name each provider expects (
imageUrl,input_urls,first_frame_url,image_input, etc.) - As a value, use a public HTTP URL (
https://...) or a base64 data URI (data:image/jpeg;base64,...) — if base64, the system automatically converts it to a public URL - Maximum total request size: 10 MB
Provider Parameter Mapping Matrix
The table below shows, for advanced use, the internal parameter names each provider expects. You do not need to know this detail in normal use — sending image_url or image_urls is enough; the backend converts automatically.
| Model Family | Expected Parameter | Type | Note |
|---|---|---|---|
seedream/*-edit | image_urls | array | Supports multiple references |
flux-2/*-image-to-image | input_urls | array | Supports resolution 1K/2K |
gpt-image-2-image-to-image | input_urls | array | aspect_ratio + resolution required |
google/nano-banana-edit | image_urls | array | output_format (png/jpg) + image_size |
google/nano-banana-2 / pro | image_input | array | An empty array can also be sent (t2i mode) |
ideogram/v3-edit / remix | image_url | string | + mask_url (optional) |
ideogram/character* | reference_image_urls | array | For character reference |
grok-imagine/image-to-image | image_urls | array | nsfw_checker support |
grok-imagine/image-to-video | image_urls | array | + duration, resolution, mode |
kling-*/image-to-video | image_url | string | + aspect_ratio, duration |
hailuo/*-image-to-video | image_url | string | + duration (6/10) |
sora-2-image-to-video | image_urls | array | + n_frames, aspect_ratio (landscape/portrait) |
wan/2-7-image-to-video | first_frame_url | string | + last_frame_url (optional closing frame) |
wan/2-6-flash-image-to-video | image_urls | array | + audio (bool, required) |
runway_* | imageUrl | string (camelCase) | Runway internal format |
veo3* reference-to-video | imageUrls | array (camelCase) | + generationType="REFERENCE_2_VIDEO" |
topaz/* | image_url | string | + scale (2/4) — upscaler |
elevenlabs/audio-isolation | audio_url | string | Not an image, audio |
elevenlabs/speech-to-text | audio_url | string | Not an image, audio |
Sora 2 (OpenAI) Models
OpenAI's new Sora 2 video generation models. High quality, natural motion, with audio.
| Model | Type | Description |
|---|---|---|
sora-2-text-to-video | t2v | Text-to-video (standard quality) |
sora-2-image-to-video | i2v | Video from a starting image |
sora-2-pro-text-to-video | t2v | Sora 2 Pro — high-quality t2v |
sora-2-pro-image-to-video | i2v | Sora 2 Pro — high-quality i2v |
Sora 2 Parameters
| Parameter | Default | Description |
|---|---|---|
aspect_ratio | "landscape" | "landscape" or "portrait" |
n_frames | "10" | Seconds/frame count (string) |
size | "high" | Pro variant only: "standard" or "high" |
remove_watermark | false | Watermark removal |
character_id_list | — | Character reference IDs (array) |
image_urls | — | Required for i2v (array) |
Sora 2 Example (Pro, landscape)
curl -X POST https://api.onysoft.com/v1/video/generate \
-H "Authorization: Bearer sk-ony-YOUR-KEY" \
-d '{
"model": "sora-2-pro-text-to-video",
"prompt": "A drone shot of a serene mountain lake at sunrise, mist rising from water",
"aspect_ratio": "landscape",
"n_frames": "10",
"size": "high"
}'
Note: the price you send to the API is the customer price with your project's/customer's markup rate applied. Average markup: 50% (projects) and 30% (partner customers).
Full Python Example (Polling)
import time
import requests
API_KEY = "sk-ony-YOUR-KEY"
BASE = "https://api.onysoft.com/v1"
# 1. Video oluştur
resp = requests.post(f"{BASE}/video/generate",
headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"},
json={"model": "veo3_fast", "prompt": "A cat playing piano", "duration": 5}
)
task = resp.json()["data"]
task_id = task["task_id"]
print(f"Task: {task_id}, Status: {task['status']}")
# 2. Durum sorgula (polling)
while True:
resp = requests.get(f"{BASE}/video/status/{task_id}",
headers={"Authorization": f"Bearer {API_KEY}"})
data = resp.json()["data"]
print(f"Status: {data['status']}")
if data["status"] == "completed":
print(f"Video URL: {data['video_url']}")
break
elif data["status"] == "failed":
print(f"Hata: {data.get('error')}")
break
time.sleep(10) # 10 saniye bekle