How-To Enable Thinking (Reasoning) Mode for DeepSeek V4 Pro on Crusoe Managed Inference

Last Updated: June 04, 2026

Introduction

Crusoe Managed Inference serves open models through an OpenAI-compatible proxy endpoint. Some of these models — including deepseek-ai/DeepSeek-V4-Pro — support a thinking (reasoning) mode, where the model returns its intermediate reasoning in a separate reasoning_content field alongside the final answer.

Because Managed Inference runs on Crusoe's own inference engine rather than DeepSeek's hosted API, thinking mode is enabled differently than DeepSeek's public API documentation describes. DeepSeek's hosted API uses a top-level thinking parameter; the Managed Inference proxy does not accept that parameter and will reject the request. Instead, you enable thinking mode through chat_template_kwargs.

This article shows the supported way to turn thinking mode on, how to confirm it is working, and the common parameter mistakes to avoid.

Prerequisites

A Crusoe Cloud Account With Access to the Intelligence Foundry / Managed Inference
A Managed Inference API Key (Generated From the Intelligence Foundry via the Get API Key Button)
A Local Terminal With curl Installed (The OpenAI SDK Works Equally Well — See Additional Resources)
The Managed Inference Base URL: https://api.inference.crusoecloud.com/v1

Instructions

Step 1: Export Your API Key

Export your Managed Inference API key as an environment variable so it isn't hard-coded into your requests.

export CRUSOE_API_KEY="<your-api-key>"

Step 2: Send a Request With Thinking Mode Enabled

Send a chat completion request to deepseek-ai/DeepSeek-V4-Pro. Thinking mode is toggled by setting "thinking": true inside the chat_template_kwargs object.

curl -X POST "https://api.inference.crusoecloud.com/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $CRUSOE_API_KEY" \
  -d '{
    "model": "deepseek-ai/DeepSeek-V4-Pro",
    "messages": [{"role": "user", "content": "Your prompt here"}],
    "chat_template_kwargs": {"thinking": true},
    "max_tokens": 500
  }'

Step 3: Confirm Thinking Mode Is Active

In the response, the assistant message includes a populated reasoning_content field in addition to the usual content field. A non-empty reasoning_content means thinking mode is on.

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "<final answer>",
        "reasoning_content": "<intermediate reasoning>"
      }
    }
  ]
}

Step 4: Turn Thinking Mode Off (Optional)

To turn thinking mode off, set "thinking": false in chat_template_kwargs, or omit chat_template_kwargs entirely. With thinking disabled, reasoning_content is returned as null.

ℹ️ Note: The model may still show step-by-step working in content — that's just its normal answer style. The reliable signal that thinking mode is off is reasoning_content being null, not the appearance of the visible answer.

Example

A complete request with thinking mode enabled, piped through jq to show just the assistant message:

curl -s -X POST "https://api.inference.crusoecloud.com/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $CRUSOE_API_KEY" \
  -d '{
    "model": "deepseek-ai/DeepSeek-V4-Pro",
    "messages": [{"role": "user", "content": "What is 17 x 23? Think it through."}],
    "chat_template_kwargs": {"thinking": true},
    "max_tokens": 500
  }' | jq '.choices[0].message'

Response:

{
  "role": "assistant",
  "content": "17 × 23 = 17 × (20 + 3) = (17 × 20) + (17 × 3) = 340 + 51 = 391. So the answer is 391.",
  "reasoning_content": "We are asked: \"What is 17 x 23? Think it through.\" This is a simple multiplication. 17 × 23 = 17 × (20 + 3) = 340 + 51 = 391. So the answer is 391.",
  "tool_calls": null
}

In this response, the model's working appears in reasoning_content, while content holds the final answer.

Common Pitfalls

These are the two parameter mistakes most likely to trip you up, especially when porting code written against DeepSeek's hosted API:

Using "thinking": {"type": "enabled"} returns HTTP 403. This is DeepSeek's hosted-API parameter, and the Managed Inference proxy's allowlist only accepts OpenAI-compatible parameters. The request is rejected with {"errors":["Request blocked: parameter 'thinking' is not allowed"]}. Use chat_template_kwargs (Step 2) instead.
Using "reasoning_effort": "high" on its own does not enable thinking. The request returns HTTP 200, but reasoning_content stays null and no reasoning is produced. On Managed Inference, reasoning_effort is not the thinking toggle — thinking must be enabled via chat_template_kwargs. Per DeepSeek's own documentation, reasoning_effort only controls effort once thinking is already turned on.

Additional Resources

Getting Started with Managed Inference — Crusoe Cloud
Managed Inference Overview — Crusoe Cloud
DeepSeek Thinking Mode Documentation — Note: the thinking and reasoning_effort parameters described in DeepSeek's documentation apply to their hosted API, not to Crusoe Managed Inference.

Related to

how-to api deepseek managed inference

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Article is closed for comments.

Introduction

Prerequisites

Instructions

Step 1: Export Your API Key

Step 2: Send a Request With Thinking Mode Enabled

Step 3: Confirm Thinking Mode Is Active

Step 4: Turn Thinking Mode Off (Optional)

Example

Common Pitfalls

Additional Resources

Related to

Was this article helpful?

Still need help?

Related Articles

Recently Viewed

Comments

How-To Enable Thinking (Reasoning) Mode for DeepSeek V4 Pro on Crusoe Managed Inference

Introduction

Prerequisites

Instructions

Step 1: Export Your API Key

Step 2: Send a Request With Thinking Mode Enabled

Step 3: Confirm Thinking Mode Is Active

Step 4: Turn Thinking Mode Off (Optional)

Example

Common Pitfalls

Additional Resources

Related to

Was this article helpful?

Still need help?

Related Articles

Related articles

Recently Viewed

Comments