Introduction
This guide outlines how to remediate issues related to SSH inaccessibility on virtual machines (VMs) even though successful nc
shows port 22 opened.
Step-by-Step Instructions
1. Verify SSH on the NFS node
- Verify you can SSH into the NFS node.
- NOTE: The nfs-node is centrally connected to all worker nodes to provide shared scratch storage. As a result, if the nfs-node goes down or the NFS service is unavailable, SSH access to the worker nodes may fail or hang due to dependencies on the mounted shared filesystem. To ensure stability, make sure the NFS server is running and accessible before interacting with the cluster.
2. Ensure the nfs-server
service is running on the NFS node.
- Run the command
sudo systemctl status nfs-server
● nfs-server.service - NFS server and services
Loaded: loaded (/lib/systemd/system/nfs-server.service; enabled; vendor preset: enabled)
Active: inactive
- If the
nfs-server
is inactive, you'll need to start and enable it with the commands below
sudo systemctl start nfs-server
sudo systemctl enable nfs-server --now
3. Verify you can SSH into your worker nodes once the nfs-server
server is running.
4. Run the ansible-playbook
command to configure your SLURM cluster to install the packages, configuring services and shared mounts.
ansible-playbook -i ansible/inventory/inventory.yml ansible/slurm.yml -f 128
Comments
0 comments
Please sign in to leave a comment.