corenet/ai / api-reference / chat-completions

Chat Completions

The Corenet API uses an OpenAI-compatible request and response format. Existing integrations built against the OpenAI SDK can be redirected to the Corenet endpoint with minimal configuration changes. All requests require a valid enterprise bearer token.

Authentication

All API requests must include an Authorization header with a bearer token. Tokens are scoped to your organization and issued during onboarding. There are no per-user tokens; credentials are managed at the org level.

Authorization: Bearer cnai-org_<your_token>

Tokens beginning with cnai-org_ have full API access within your contractual quota. Tokens are rotated on a 90-day cycle. Rotation notices are sent to the registered org contact 14 days in advance.

Base URL

All endpoints are served from a single base URL assigned to your organization at provisioning time:

https://<org-handle>.api.corenet.ai/v1

The <org-handle> prefix is unique to your organization and listed in your onboarding document. Do not use the root domain directly — requests without an org handle will return 403 Forbidden.

Versioning

The current API version is v1. Version is specified in the URL path. Breaking changes will be introduced under a new version path. The previous version remains available for a deprecation window of no less than 90 days.

Chat Completions

Generates a model response for the given conversation history.

POST /v1/chat/completions

Minimal request example

curl
curl https://<org-handle>.api.corenet.ai/v1/chat/completions \
  -H "Authorization: Bearer cnai-org_<token>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "corenet-1",
    "messages": [
      { "role": "user", "content": "Summarize the attached report." }
    ]
  }'

Request Body

Parameter Type Description
model required string Model identifier. See Models for available values.
messages required array Array of message objects forming the conversation. Each object must have role and content fields.
temperature optional number Sampling temperature between 0 and 2. Higher values produce more varied output. Default: 1.
top_p optional number Nucleus sampling cutoff. Recommended to adjust either temperature or top_p, not both. Default: 1.
max_tokens optional integer Upper bound on tokens generated. Does not guarantee this length. Default is model-dependent. Must not exceed the context window limit.
stream optional boolean If true, partial message deltas are sent as server-sent events. See Streaming. Default: false.
stop optional string | array Up to 4 sequences where generation stops. The stop sequence itself is not included in the output.
n optional integer Number of completion choices to generate. Values above 1 are counted against your quota proportionally. Default: 1.
presence_penalty optional number Penalty between -2.0 and 2.0 applied to tokens based on whether they have appeared. Positive values reduce repetition. Default: 0.
frequency_penalty optional number Penalty between -2.0 and 2.0 applied proportional to token frequency in the output so far. Default: 0.
user optional string Caller-supplied identifier for the end user. Used in audit logs. Has no effect on inference. Max 64 characters.

Message object

Each entry in the messages array must conform to the following structure:

Field Type Description
role required string One of system, user, or assistant. The system role may only appear as the first message.
content required string Text content of the message. May be an empty string for assistant messages if tool_calls are present (not yet supported in v1).
name optional string An optional label for the participant. Included in the context as-is. No semantic effect on inference.

Full request example

json{
  "model": "corenet-1",
  "messages": [
    {
      "role": "system",
      "content": "You are a concise technical assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ],
  "temperature": 0.4,
  "max_tokens": 256,
  "stream": false,
  "user": "session-a3f91"
}

Response Object

A successful non-streaming response returns an object with the following structure:

json{
  "id": "chatcmpl-8fZ2kLmNpQr1tXwV",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "corenet-1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 2,
    "total_tokens": 30
  }
}

finish_reason values

  • stop — model reached a natural stopping point or a stop sequence
  • length — output was truncated at max_tokens
  • content_filter — output was blocked by the content policy layer

Streaming

When stream: true, the API returns a stream of text/event-stream events. Each event contains a partial response delta. The stream terminates with a final data: [DONE] message.

ssedata: {
  "id": "chatcmpl-8fZ2kLmNpQr1tXwV",
  "object": "chat.completion.chunk",
  "created": 1710000000,
  "model": "corenet-1",
  "choices": [{
    "index": 0,
    "delta": { "content": "Paris" },
    "finish_reason": null
  }]
}

data: {
  "choices": [{
    "delta": {},
    "finish_reason": "stop"
  }]
}

data: [DONE]

The first chunk includes role: "assistant" in the delta. Subsequent chunks carry only content. The final chunk has an empty delta and a non-null finish_reason. Token usage is not returned in streaming mode.

Models

The following model identifiers are available to enterprise clients:

Model ID Context Notes
corenet-1 128k tokens Primary production model. Recommended for most workloads.
corenet-1-fast 32k tokens Reduced latency variant. Lower throughput cost. Suitable for latency-sensitive pipelines.
corenet-1-preview 128k tokens Pre-release checkpoint. Behavior may differ from stable. Opt-in required per org.

Model identifiers are pinned per contract period. Access to new model versions requires explicit acknowledgment of any behavioral change notes provided by the account team.

Error Codes

Errors follow the standard HTTP status code convention. The response body is a JSON object with an error field:

json{
  "error": {
    "message": "Invalid bearer token.",
    "type": "authentication_error",
    "code": "invalid_api_key"
  }
}
Status Type Description
400 invalid_request_error Malformed request body or invalid parameter values.
401 authentication_error Missing or invalid bearer token.
403 permission_error Token valid, but lacks access to the requested model or feature.
429 rate_limit_error Request or token quota exceeded. See Retry-After header.
500 api_error Internal error. Retry with exponential backoff. Persistent errors should be reported to your account manager.
503 overloaded_error Service temporarily at capacity. Queue or retry with backoff.

Rate Limits

Rate limits are defined per organization in the enterprise agreement and enforced at the API gateway. Limits are expressed as:

  • Requests per minute (RPM)
  • Tokens per minute (TPM)
  • Tokens per day (TPD)

All limit headers are returned on every response:

x-ratelimit-limit-requests: 100
x-ratelimit-remaining-requests: 87
x-ratelimit-limit-tokens: 100000
x-ratelimit-remaining-tokens: 94210
x-ratelimit-reset-requests: 8s
x-ratelimit-reset-tokens: 3s

When a limit is exceeded, the response is 429 Too Many Requests. The Retry-After header indicates the number of seconds until the limit resets. Sustained over-limit usage may trigger a temporary org suspension pending review by the account team.