Chat Completions
The Corenet API uses an OpenAI-compatible request and response format. Existing integrations built against the OpenAI SDK can be redirected to the Corenet endpoint with minimal configuration changes. All requests require a valid enterprise bearer token.
Authentication
All API requests must include an Authorization header with a bearer token.
Tokens are scoped to your organization and issued during onboarding.
There are no per-user tokens; credentials are managed at the org level.
Authorization: Bearer cnai-org_<your_token>
Tokens beginning with cnai-org_ have full API access within your
contractual quota. Tokens are rotated on a 90-day cycle. Rotation notices are
sent to the registered org contact 14 days in advance.
Base URL
All endpoints are served from a single base URL assigned to your organization at provisioning time:
https://<org-handle>.api.corenet.ai/v1
The <org-handle> prefix is unique to your organization and listed
in your onboarding document. Do not use the root domain directly — requests
without an org handle will return 403 Forbidden.
Versioning
The current API version is v1. Version is specified in the URL path.
Breaking changes will be introduced under a new version path. The previous version
remains available for a deprecation window of no less than 90 days.
Chat Completions
Generates a model response for the given conversation history.
Minimal request example
curlcurl https://<org-handle>.api.corenet.ai/v1/chat/completions \ -H "Authorization: Bearer cnai-org_<token>" \ -H "Content-Type: application/json" \ -d '{ "model": "corenet-1", "messages": [ { "role": "user", "content": "Summarize the attached report." } ] }'
Request Body
| Parameter | Type | Description |
|---|---|---|
| model required | string | Model identifier. See Models for available values. |
| messages required | array | Array of message objects forming the conversation. Each object must have role and content fields. |
| temperature optional | number | Sampling temperature between 0 and 2. Higher values produce more varied output. Default: 1. |
| top_p optional | number | Nucleus sampling cutoff. Recommended to adjust either temperature or top_p, not both. Default: 1. |
| max_tokens optional | integer | Upper bound on tokens generated. Does not guarantee this length. Default is model-dependent. Must not exceed the context window limit. |
| stream optional | boolean | If true, partial message deltas are sent as server-sent events. See Streaming. Default: false. |
| stop optional | string | array | Up to 4 sequences where generation stops. The stop sequence itself is not included in the output. |
| n optional | integer | Number of completion choices to generate. Values above 1 are counted against your quota proportionally. Default: 1. |
| presence_penalty optional | number | Penalty between -2.0 and 2.0 applied to tokens based on whether they have appeared. Positive values reduce repetition. Default: 0. |
| frequency_penalty optional | number | Penalty between -2.0 and 2.0 applied proportional to token frequency in the output so far. Default: 0. |
| user optional | string | Caller-supplied identifier for the end user. Used in audit logs. Has no effect on inference. Max 64 characters. |
Message object
Each entry in the messages array must conform to the following structure:
| Field | Type | Description |
|---|---|---|
| role required | string | One of system, user, or assistant. The system role may only appear as the first message. |
| content required | string | Text content of the message. May be an empty string for assistant messages if tool_calls are present (not yet supported in v1). |
| name optional | string | An optional label for the participant. Included in the context as-is. No semantic effect on inference. |
Full request example
json{
"model": "corenet-1",
"messages": [
{
"role": "system",
"content": "You are a concise technical assistant."
},
{
"role": "user",
"content": "What is the capital of France?"
}
],
"temperature": 0.4,
"max_tokens": 256,
"stream": false,
"user": "session-a3f91"
}
Response Object
A successful non-streaming response returns an object with the following structure:
json{
"id": "chatcmpl-8fZ2kLmNpQr1tXwV",
"object": "chat.completion",
"created": 1710000000,
"model": "corenet-1",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 28,
"completion_tokens": 2,
"total_tokens": 30
}
}
finish_reason values
stop— model reached a natural stopping point or a stop sequencelength— output was truncated atmax_tokenscontent_filter— output was blocked by the content policy layer
Streaming
When stream: true, the API returns a stream of
text/event-stream events. Each event contains a partial
response delta. The stream terminates with a final data: [DONE] message.
ssedata: {
"id": "chatcmpl-8fZ2kLmNpQr1tXwV",
"object": "chat.completion.chunk",
"created": 1710000000,
"model": "corenet-1",
"choices": [{
"index": 0,
"delta": { "content": "Paris" },
"finish_reason": null
}]
}
data: {
"choices": [{
"delta": {},
"finish_reason": "stop"
}]
}
data: [DONE]
The first chunk includes role: "assistant" in the delta.
Subsequent chunks carry only content. The final chunk has an empty
delta and a non-null finish_reason.
Token usage is not returned in streaming mode.
Models
The following model identifiers are available to enterprise clients:
| Model ID | Context | Notes |
|---|---|---|
| corenet-1 | 128k tokens | Primary production model. Recommended for most workloads. |
| corenet-1-fast | 32k tokens | Reduced latency variant. Lower throughput cost. Suitable for latency-sensitive pipelines. |
| corenet-1-preview | 128k tokens | Pre-release checkpoint. Behavior may differ from stable. Opt-in required per org. |
Model identifiers are pinned per contract period. Access to new model versions requires explicit acknowledgment of any behavioral change notes provided by the account team.
Error Codes
Errors follow the standard HTTP status code convention. The response body is a JSON object with an error field:
json{
"error": {
"message": "Invalid bearer token.",
"type": "authentication_error",
"code": "invalid_api_key"
}
}
| Status | Type | Description |
|---|---|---|
| 400 | invalid_request_error | Malformed request body or invalid parameter values. |
| 401 | authentication_error | Missing or invalid bearer token. |
| 403 | permission_error | Token valid, but lacks access to the requested model or feature. |
| 429 | rate_limit_error | Request or token quota exceeded. See Retry-After header. |
| 500 | api_error | Internal error. Retry with exponential backoff. Persistent errors should be reported to your account manager. |
| 503 | overloaded_error | Service temporarily at capacity. Queue or retry with backoff. |
Rate Limits
Rate limits are defined per organization in the enterprise agreement and enforced at the API gateway. Limits are expressed as:
- Requests per minute (RPM)
- Tokens per minute (TPM)
- Tokens per day (TPD)
All limit headers are returned on every response:
x-ratelimit-limit-requests: 100
x-ratelimit-remaining-requests: 87
x-ratelimit-limit-tokens: 100000
x-ratelimit-remaining-tokens: 94210
x-ratelimit-reset-requests: 8s
x-ratelimit-reset-tokens: 3s
When a limit is exceeded, the response is 429 Too Many Requests.
The Retry-After header indicates the number of seconds until the
limit resets. Sustained over-limit usage may trigger a temporary org suspension
pending review by the account team.