Skip to main content

Quick reference

ModeMinimumRecommendedMaximum
Standard409632768
Thinking160001638432768

Standard mode

response = client.chat.completions.create(
    model="orchid01",
    messages=[...],
    max_tokens=4096,  # default
)
For longer documents or detailed analysis responses, increase to 8192.

Thinking mode

max_tokens below 16,000 in thinking mode risks truncated responses. Reasoning tokens count toward the limit — the model reasons before it answers, consuming tokens before the final response begins.
response = client.chat.completions.create(
    model="orchid01",
    messages=[...],
    max_tokens=16384,  # minimum for thinking mode
    extra_body={"orchid": {"thinking": True}},
)
If you pass max_tokens below 16,000 with thinking enabled, Orchid raises it automatically to 16,000 and adds max_tokens_adjusted: true to the response orchid field.

Recommendations by task

TaskModemax_tokens
Extract a specific figureStandard1024
Summarise a filing sectionStandard2048
Full document analysisStandard4096–8192
Covenant extraction from long agreementStandard8192
Complex multi-document analysisThinking16384
Multi-step scenario modellingThinking32768