Streaming

Real-time response streaming

The streaming feature lets you receive responses from the AI model in real time. This way, users can see partial results before the full response is generated.

bolt

Better UX: With streaming you can give your users faster feedback. It significantly reduces wait time, especially for long responses.

Enabling Streaming

Add the stream: true parameter to your request:

cURL
curl https://api.onysoft.com/v1/chat/completions \
  -H "Authorization: Bearer sk-ony-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Merhaba!"}],
    "stream": true
  }'

Stream Response Format

Streaming responses arrive in Server-Sent Events (SSE) format:

SSE Response
data: {"id":"req_a1b2c3d4e5f6a1b2c3d4e5f6","object":"chat.completion.chunk","choices":[{"delta":{"content":"Merhaba"}}]}

data: {"id":"req_a1b2c3d4e5f6a1b2c3d4e5f6","object":"chat.completion.chunk","choices":[{"delta":{"content":"!"}}]}

data: {"id":"req_a1b2c3d4e5f6a1b2c3d4e5f6","object":"chat.completion.chunk","choices":[{"delta":{"content":" Nasıl"}}]}

data: {"id":"req_a1b2c3d4e5f6a1b2c3d4e5f6","object":"chat.completion.chunk","choices":[],"usage":{"prompt_tokens":12,"completion_tokens":8,"total_tokens":20},"cost":{"amount":0.000008,"currency":"USD"}}

data: [DONE]

Streaming with Python

Python
from openai import OpenAI

client = OpenAI(
    api_key="sk-ony-your-api-key",
    base_url="https://api.onysoft.com/v1"
)

stream = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Merhaba!"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Streaming with Node.js

Node.js
import OpenAI from 'openai';

const client = new OpenAI({
    apiKey: 'sk-ony-your-api-key',
    baseURL: 'https://api.onysoft.com/v1'
});

const stream = await client.chat.completions.create({
    model: 'openai/gpt-4o-mini',
    messages: [{ role: 'user', content: 'Merhaba!' }],
    stream: true
});

for await (const chunk of stream) {
    process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
Want help finding the right model?