Streaming
Real-time response streaming
The streaming feature lets you receive responses from the AI model in real time. This way, users can see partial results before the full response is generated.
Better UX: With streaming you can give your users faster feedback. It significantly reduces wait time, especially for long responses.
Enabling Streaming
Add the stream: true parameter to your request:
cURL
curl https://api.onysoft.com/v1/chat/completions \
-H "Authorization: Bearer sk-ony-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Merhaba!"}],
"stream": true
}'
Stream Response Format
Streaming responses arrive in Server-Sent Events (SSE) format:
SSE Response
data: {"id":"req_a1b2c3d4e5f6a1b2c3d4e5f6","object":"chat.completion.chunk","choices":[{"delta":{"content":"Merhaba"}}]}
data: {"id":"req_a1b2c3d4e5f6a1b2c3d4e5f6","object":"chat.completion.chunk","choices":[{"delta":{"content":"!"}}]}
data: {"id":"req_a1b2c3d4e5f6a1b2c3d4e5f6","object":"chat.completion.chunk","choices":[{"delta":{"content":" Nasıl"}}]}
data: {"id":"req_a1b2c3d4e5f6a1b2c3d4e5f6","object":"chat.completion.chunk","choices":[],"usage":{"prompt_tokens":12,"completion_tokens":8,"total_tokens":20},"cost":{"amount":0.000008,"currency":"USD"}}
data: [DONE]
Streaming with Python
Python
from openai import OpenAI
client = OpenAI(
api_key="sk-ony-your-api-key",
base_url="https://api.onysoft.com/v1"
)
stream = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Merhaba!"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Streaming with Node.js
Node.js
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'sk-ony-your-api-key',
baseURL: 'https://api.onysoft.com/v1'
});
const stream = await client.chat.completions.create({
model: 'openai/gpt-4o-mini',
messages: [{ role: 'user', content: 'Merhaba!' }],
stream: true
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}