Skip to main content
Crusoe Support Help Center home page
Crusoe

How-To Enable Thinking (Reasoning) Mode for DeepSeek V4 Pro on Crusoe Managed Inference

Sandesh Muralidhar
Sandesh Muralidhar
Updated

Last Updated: June 04, 2026

Introduction

Crusoe Managed Inference serves open models through an OpenAI-compatible proxy endpoint. Some of these models — including deepseek-ai/DeepSeek-V4-Pro — support a thinking (reasoning) mode, where the model returns its intermediate reasoning in a separate reasoning_content field alongside the final answer.

Because Managed Inference runs on Crusoe's own inference engine rather than DeepSeek's hosted API, thinking mode is enabled differently than DeepSeek's public API documentation describes. DeepSeek's hosted API uses a top-level thinking parameter; the Managed Inference proxy does not accept that parameter and will reject the request. Instead, you enable thinking mode through chat_template_kwargs.

This article shows the supported way to turn thinking mode on, how to confirm it is working, and the common parameter mistakes to avoid.

Prerequisites

  • A Crusoe Cloud Account With Access to the Intelligence Foundry / Managed Inference
  • A Managed Inference API Key (Generated From the Intelligence Foundry via the Get API Key Button)
  • A Local Terminal With curl Installed (The OpenAI SDK Works Equally Well — See Additional Resources)
  • The Managed Inference Base URL: https://api.inference.crusoecloud.com/v1

Instructions

Step 1: Export Your API Key

Export your Managed Inference API key as an environment variable so it isn't hard-coded into your requests.

export CRUSOE_API_KEY="<your-api-key>"

Step 2: Send a Request With Thinking Mode Enabled

Send a chat completion request to deepseek-ai/DeepSeek-V4-Pro. Thinking mode is toggled by setting "thinking": true inside the chat_template_kwargs object.

curl -X POST "https://api.inference.crusoecloud.com/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $CRUSOE_API_KEY" \
  -d '{
    "model": "deepseek-ai/DeepSeek-V4-Pro",
    "messages": [{"role": "user", "content": "Your prompt here"}],
    "chat_template_kwargs": {"thinking": true},
    "max_tokens": 500
  }'

Step 3: Confirm Thinking Mode Is Active

In the response, the assistant message includes a populated reasoning_content field in addition to the usual content field. A non-empty reasoning_content means thinking mode is on.

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "<final answer>",
        "reasoning_content": "<intermediate reasoning>"
      }
    }
  ]
}

Step 4: Turn Thinking Mode Off (Optional)

To turn thinking mode off, set "thinking": false in chat_template_kwargs, or omit chat_template_kwargs entirely. With thinking disabled, reasoning_content is returned as null.

ℹ️ Note: The model may still show step-by-step working in content — that's just its normal answer style. The reliable signal that thinking mode is off is reasoning_content being null, not the appearance of the visible answer.

Example

A complete request with thinking mode enabled, piped through jq to show just the assistant message:

curl -s -X POST "https://api.inference.crusoecloud.com/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $CRUSOE_API_KEY" \
  -d '{
    "model": "deepseek-ai/DeepSeek-V4-Pro",
    "messages": [{"role": "user", "content": "What is 17 x 23? Think it through."}],
    "chat_template_kwargs": {"thinking": true},
    "max_tokens": 500
  }' | jq '.choices[0].message'

Response:

{
  "role": "assistant",
  "content": "17 × 23 = 17 × (20 + 3) = (17 × 20) + (17 × 3) = 340 + 51 = 391. So the answer is 391.",
  "reasoning_content": "We are asked: \"What is 17 x 23? Think it through.\" This is a simple multiplication. 17 × 23 = 17 × (20 + 3) = 340 + 51 = 391. So the answer is 391.",
  "tool_calls": null
}

In this response, the model's working appears in reasoning_content, while content holds the final answer.

Common Pitfalls

These are the two parameter mistakes most likely to trip you up, especially when porting code written against DeepSeek's hosted API:

  • Using "thinking": {"type": "enabled"} returns HTTP 403. This is DeepSeek's hosted-API parameter, and the Managed Inference proxy's allowlist only accepts OpenAI-compatible parameters. The request is rejected with {"errors":["Request blocked: parameter 'thinking' is not allowed"]}. Use chat_template_kwargs (Step 2) instead.
  • Using "reasoning_effort": "high" on its own does not enable thinking. The request returns HTTP 200, but reasoning_content stays null and no reasoning is produced. On Managed Inference, reasoning_effort is not the thinking toggle — thinking must be enabled via chat_template_kwargs. Per DeepSeek's own documentation, reasoning_effort only controls effort once thinking is already turned on.

Additional Resources

Related to

Was this article helpful?

0 out of 0 found this helpful

Still need help?

Our support team is ready to assist you with any questions.

Have more questions? Submit a request

Recently Viewed

Comments

0 comments

Article is closed for comments.