Slurm Nodes in PLND State: Why Jobs Aren’t Starting and How to Fix It

Last Updated: Mar 31, 2026

Overview:

This article covers a scenario where Slurm compute nodes remain stuck in the PLND (planned) state, preventing jobs from starting even though the nodes appear idle and healthy.

In this state, attempts to manually reset nodes (for example, using scontrol update State=IDLE) do not take effect.

Important:
The PLND state is not an error condition. It indicates that Slurm has reserved the node(s) for a future job. While the node may appear idle, it is intentionally withheld from scheduling lower-priority jobs.

This situation is commonly misunderstood as:

Node failure or unresponsiveness
Slurm daemon (slurmd) issues
Infrastructure or VM-level problems

However, in many cases, the root cause is scheduler behavior, specifically future job reservations created by Slurm’s backfill scheduler.

Prerequisites:

Access to a Slurm cluster
Permissions to run Slurm commands (sinfo, squeue, scontrol)
Access to login or head node
Basic familiarity with Slurm scheduling

Step-by-Step Instructions:

1. Confirm Node State

Check the current state of nodes:

$ sinfo

Example output:

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
batch*       up   infinite      2   plnd slurm-compute-node-[9,15]

If nodes show plnd, proceed to the next steps.

2. Attempt Manual Reset (Expected to Fail)

Try setting the node to IDLE:

$ sudo scontrol update NodeName="slurm-compute-node-[9,15]" State=IDLE

If the state does not change, this confirms that PLND is being enforced by the scheduler, not manual state.

3. Check for Reservations

$ scontrol show reservations

If reservations exist → nodes may be reserved for maintenance or admin purposes
If no reservations are present, continue - this is expected in backfill scenarios

4. Inspect Node Details

$ scontrol show node <node-name>

Example (for the above scenario):
$ scontrol show node slurm-compute-node-9
$ scontrol show node slurm-compute-node-15

Look for:

State=IDLE+PLANNED
Any Reason= field (e.g., Not responding)

If nodes appear healthy but still PLANNED, continue.

5. Check for Pending Jobs

$ squeue -t PD

Look for jobs with:

Large node requirements
Future start times
High priority

6. Identify Future Reservation (Root Cause)

Inspect specific pending jobs:

$ scontrol show job <job_id>

Look for fields like:

StartTime=... (future timestamp)
SchedNodeList=... (includes affected nodes)

Key Insight:
If a job has a future start time and requires specific nodes, Slurm reserves those nodes in advance, placing them in PLND.

7. Validate Scheduler Behavior

At this point:

Nodes are healthy ✅
No explicit reservations exist ✅
Nodes are still PLND ✅

This confirms the cause:

👉 Slurm backfill scheduler is reserving nodes for a future job

8. Release Nodes (If Needed)

To make nodes available immediately, modify the blocking job:

Option A: Hold the job

scontrol hold <job_id>

Option B: Cancel the job

scancel <job_id>

Option C: Adjust job requirements

Reduce node count
Reduce time limit

Once the job is no longer schedulable, the PLND state will clear.

9. Verify Node State

sinfo -N -o "%N %T"

Expected result:

slurm-compute-node-9  IDLE
slurm-compute-node-15 IDLE

Resolution:

Nodes stuck in PLND state are typically reserved by Slurm for a future scheduled job, not due to node failure or infrastructure issues.

In this case:

A pending job with a future StartTime requires multiple nodes
Slurm reserved those nodes in advance
This prevented other jobs from running, even though nodes appeared idle

Fix:
Modify or remove the blocking job (hold, cancel, or adjust constraints) to release the nodes back to the scheduler.

Additional Notes:

PLND overrides IDLE - manual state changes will not persist
This behavior is expected in clusters using backfill scheduling
Misinterpreting PLND as a failure can lead to unnecessary debugging (e.g., restarting slurmd)
Always check pending jobs before investigating node health

Additional Resources

Slurm documentation: Node states and scheduling behavior
man scontrol (for PLANNED state details)
squeue, sinfo, scontrol command references

Related to

slurm plnd solution

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Article is closed for comments.

Overview:

Prerequisites:

Step-by-Step Instructions:

Resolution:

Additional Notes:

Additional Resources

Related to

Was this article helpful?

Still need help?

Related Articles

Recently Viewed

Comments

Slurm Nodes in PLND State: Why Jobs Aren’t Starting and How to Fix It

Overview:

Prerequisites:

Step-by-Step Instructions:

Resolution:

Additional Notes:

Additional Resources

Related to

Was this article helpful?

Still need help?

Related Articles

Related articles

Recently Viewed

Comments