Last Updated: June 04, 2026
Introduction
Crusoe Managed Inference serves open models through an OpenAI-compatible proxy endpoint. Some of these models — including deepseek-ai/DeepSeek-V4-Pro — support a thinking (reasoning) mode, where the model returns its intermediate reasoning in a separate reasoning_content field alongside the final answer.
Because Managed Inference runs on Crusoe's own inference engine rather than DeepSeek's hosted API, thinking mode is enabled differently than DeepSeek's public API documentation describes. DeepSeek's hosted API uses a top-level thinking parameter; the Managed Inference proxy does not accept that parameter and will reject the request. Instead, you enable thinking mode through chat_template_kwargs.
This article shows the supported way to turn thinking mode on, how to confirm it is working, and the common parameter mistakes to avoid.
Prerequisites
- A Crusoe Cloud Account With Access to the Intelligence Foundry / Managed Inference
- A Managed Inference API Key (Generated From the Intelligence Foundry via the Get API Key Button)
- A Local Terminal With
curlInstalled (The OpenAI SDK Works Equally Well — See Additional Resources) - The Managed Inference Base URL:
https://api.inference.crusoecloud.com/v1
Instructions
Step 1: Export Your API Key
Export your Managed Inference API key as an environment variable so it isn't hard-coded into your requests.
export CRUSOE_API_KEY="<your-api-key>"
Step 2: Send a Request With Thinking Mode Enabled
Send a chat completion request to deepseek-ai/DeepSeek-V4-Pro. Thinking mode is toggled by setting "thinking": true inside the chat_template_kwargs object.
curl -X POST "https://api.inference.crusoecloud.com/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $CRUSOE_API_KEY" \
-d '{
"model": "deepseek-ai/DeepSeek-V4-Pro",
"messages": [{"role": "user", "content": "Your prompt here"}],
"chat_template_kwargs": {"thinking": true},
"max_tokens": 500
}'Step 3: Confirm Thinking Mode Is Active
In the response, the assistant message includes a populated reasoning_content field in addition to the usual content field. A non-empty reasoning_content means thinking mode is on.
{
"choices": [
{
"message": {
"role": "assistant",
"content": "<final answer>",
"reasoning_content": "<intermediate reasoning>"
}
}
]
}Step 4: Turn Thinking Mode Off (Optional)
To turn thinking mode off, set "thinking": false in chat_template_kwargs, or omit chat_template_kwargs entirely. With thinking disabled, reasoning_content is returned as null.
ℹ️ Note: The model may still show step-by-step working in
content— that's just its normal answer style. The reliable signal that thinking mode is off isreasoning_contentbeingnull, not the appearance of the visible answer.
Example
A complete request with thinking mode enabled, piped through jq to show just the assistant message:
curl -s -X POST "https://api.inference.crusoecloud.com/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $CRUSOE_API_KEY" \
-d '{
"model": "deepseek-ai/DeepSeek-V4-Pro",
"messages": [{"role": "user", "content": "What is 17 x 23? Think it through."}],
"chat_template_kwargs": {"thinking": true},
"max_tokens": 500
}' | jq '.choices[0].message'Response:
{
"role": "assistant",
"content": "17 × 23 = 17 × (20 + 3) = (17 × 20) + (17 × 3) = 340 + 51 = 391. So the answer is 391.",
"reasoning_content": "We are asked: \"What is 17 x 23? Think it through.\" This is a simple multiplication. 17 × 23 = 17 × (20 + 3) = 340 + 51 = 391. So the answer is 391.",
"tool_calls": null
}In this response, the model's working appears in reasoning_content, while content holds the final answer.
Common Pitfalls
These are the two parameter mistakes most likely to trip you up, especially when porting code written against DeepSeek's hosted API:
-
Using
"thinking": {"type": "enabled"}returns HTTP 403. This is DeepSeek's hosted-API parameter, and the Managed Inference proxy's allowlist only accepts OpenAI-compatible parameters. The request is rejected with{"errors":["Request blocked: parameter 'thinking' is not allowed"]}. Usechat_template_kwargs(Step 2) instead. -
Using
"reasoning_effort": "high"on its own does not enable thinking. The request returns HTTP 200, butreasoning_contentstaysnulland no reasoning is produced. On Managed Inference,reasoning_effortis not the thinking toggle — thinking must be enabled viachat_template_kwargs. Per DeepSeek's own documentation,reasoning_effortonly controls effort once thinking is already turned on.
Additional Resources
- Getting Started with Managed Inference — Crusoe Cloud
- Managed Inference Overview — Crusoe Cloud
-
DeepSeek Thinking Mode Documentation — Note: the
thinkingandreasoning_effortparameters described in DeepSeek's documentation apply to their hosted API, not to Crusoe Managed Inference.