How-To: Fix Hanging Terraform Operations

Introduction

When executing Infrastructure-as-Code deployments on Crusoe Cloud, you may occasionally experience situations where terraform plan, terraform apply, or terraform destroy commands hang indefinitely.

This behavior typically points to one of four causes: localized environment issues, orphaned state locks, API network timeouts, or resource dependency bottlenecks during concurrent modifications.

This guide outlines how to isolate the root cause using native Terraform diagnostic logging, manually resolve state locking contentions, and safely break hanging operations from your client side without requiring an escalation to Crusoe engineering.

Prerequisites

Local or remote terminal access where Terraform configurations are initialized
Administrative control over the Terraform backend state storage
Standard terraform CLI binary installed
Crusoe CLI installed and authenticated

Instructions

If your execution window stalls, walk through these targeted isolation steps to clear the blockage and restore linear command execution.

Step 1: Enable Diagnostic Logging

To pinpoint exactly where Terraform is stalling (such as an unreturned Crusoe Cloud API callback or an active local wait loop), turn on verbose debugging logs by configuring the TF_LOG environment variable before executing your commands.

Set the log level to DEBUG or TRACE to uncover the explicit HTTP handshake or resource graph node causing the lockup:
```
~ export TF_LOG=DEBUG
~ export TF_LOG_PATH=./terraform-debug.log
~ terraform plan
```
Expected Outcome: The log streams active operations, letting you see whether the process is stuck on an authentication request, a specific resource instantiation, or a lock-acquisition loop. The last lines in terraform-debug.log identify the stall point.

Step 2: Isolate a Refresh or API Hang

By default, plan and destroy refresh state by polling every resource, so a slow or unreachable API stalls here before any change is shown.

Skip the refresh to test whether the hang is in the provider/API path:
```
~ terraform plan -refresh=false
```
If that returns quickly, run serially to find the single resource that blocks:
```
~ terraform plan -parallelism=1
```
Expected Outcome: A fast result with -refresh=false confirms the hang is API/connectivity-related rather than a problem in your configuration. Running with -parallelism=1 surfaces the exact resource the operation stops on.

Step 3: Verify and Clear a Stale State Lock

If a previous session closed abruptly, your remote backend file remains pinned. If your diagnostic logs point to a locking conflict, you can manually break the queue.

First, make new runs fail fast instead of hanging forever by setting a lock timeout:
```
~ terraform plan -lock-timeout=60s
```
If the run reports a lock conflict, locate the unique Lock Info ID in the console error output, then clear the blockage by passing that string to the unlock utility:
```
~ terraform force-unlock <Lock_Info_ID>
```
Expected Outcome: The stale remote concurrency token is cleared, freeing your workspace for subsequent changes. Only run this once you are certain no other Terraform run is active.

Step 4: Target a Blocked Resource During Destroy

During a terraform destroy lockup, a single sub-resource dependency (such as a persistent disk detachment or network-interface binding) may block the wider operation graph.

Target that one resource by its address (the form is resource_type.resource_name, not a hostname) so the rest of the graph isn't held up:
```
~ terraform destroy -target=crusoe_compute_instance.<resource_name>
```
If the hang is because a resource was already deleted outside of Terraform (for example, through the console or CLI), Terraform may poll indefinitely for something that no longer exists. Remove it from state so the operation can complete:
```
~ terraform state rm <Resource Address>
```
Expected Outcome: The blocking resource is handled in isolation, after which a normal terraform destroy can complete. Use -target sparingly — it is a troubleshooting tool, not a routine workflow, and overuse can leave your configuration and state out of sync.

Step 5: Reinitialize the Provider Plugin

An outdated or corrupted provider plugin can cause hangs that survive every step above.

Reinitialize the working directory and pull the current provider version:
```
~ terraform init -upgrade
```
Expected Outcome: A clean provider install rules out a corrupted plugin. To prevent unexpected upgrades from reintroducing the hang, constrain the provider version in your required_providers block (e.g., version = "~> 5.0").

Example

You run terraform apply to provision a new CMK cluster and the operation hangs at module.vpc.crusoe_network_security_group.allow_ssh for ten minutes with no output. You enable TF_LOG=DEBUG and see the last log entry is an HTTP POST to the Crusoe API /v1/networks/{id}/security-groups endpoint with no response. Running terraform plan -refresh=false returns immediately, confirming the hang is in the provider/API path. You then run terraform plan -parallelism=1 and the operation stalls at the same security group resource. Checking the Crusoe console shows the API is healthy, so you run terraform plan -lock-timeout=60s which fails fast with a state lock error. You run terraform force-unlock <Lock_Info_ID> to clear the orphaned lock from a previous crashed session, then re-run terraform apply which completes successfully.

Frequently Asked Questions (FAQ)

Q: Why does my terraform plan or terraform destroy command hang indefinitely without printing any errors?

A: This usually happens for two reasons: either Terraform is waiting to acquire a state lock that was left orphaned by a previous crashed or disconnected session, or it is waiting on a cloud resource dependency graph node that has not responded to an upstream API call.

Q: How do I know if the hang is caused by a state lock or a network/API issue?

A: You can expose the real-time operations by enabling verbose diagnostic logging. Turning on TF_LOG=DEBUG forces Terraform to output exactly what event it is pausing on, whether it is an unreturned HTTP handshake or a concurrent lock queue.

Q: Can I stop a hanging Terraform command safely using Ctrl+C?

A: Pressing Ctrl+C once sends a graceful interrupt signal (SIGINT) to Terraform, which attempts to cleanly exit and release any remote locks. However, interrupting a process mid-execution can occasionally leave state locks orphaned. Never force-quit with multiple rapid interrupts unless absolutely necessary, as this risks corrupting your remote state file.

Q: Can I avoid the indefinite wait on a locked state entirely?

A: Yes. Passing -lock-timeout=60s (or your preferred duration) tells Terraform to wait only that long for a lock before failing with a clear error, instead of hanging.

Additional Resources

Related to

terraform how-to

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Article is closed for comments.

Introduction

Prerequisites

Instructions

Step 1: Enable Diagnostic Logging

Step 2: Isolate a Refresh or API Hang

Step 3: Verify and Clear a Stale State Lock

Step 4: Target a Blocked Resource During Destroy

Step 5: Reinitialize the Provider Plugin

Example

Frequently Asked Questions (FAQ)

Additional Resources

Related to

Was this article helpful?

Still need help?

Related Articles

Recently Viewed

Comments

How-To: Fix Hanging Terraform Operations

Introduction

Prerequisites

Instructions

Step 1: Enable Diagnostic Logging

Step 2: Isolate a Refresh or API Hang

Step 3: Verify and Clear a Stale State Lock

Step 4: Target a Blocked Resource During Destroy

Step 5: Reinitialize the Provider Plugin

Example

Frequently Asked Questions (FAQ)

Additional Resources

Related to

Was this article helpful?

Still need help?

Related Articles

Related articles

Recently Viewed

Comments