Introduction
Kubernetes does not provide persistent storage by default. To add persistent storage to the cluster, you can make use of an external storage provider like Ceph or any other storage backend. This article outlines the steps that need to be performed for setting up an external Ceph cluster using storage optimize s1a nodes.
The local NVMe storage within each s1A instance is ephemeral, meaning it is not persistent once the instance is terminated. To ensure high availability and redundancy, we recommend setting up a Ceph cluster.
Prerequisites
- Access to s1A nodes (minimum 4 count of s1a.20x)
- In production environments, Ceph’s default replication factor of 3 provides a balance between redundancy and performance. However, in a cluster with only 3 nodes, the failure of a single node could lead to potential data loss. By using at least 4 nodes, the cluster can tolerate failures more effectively and maintain data integrity, ensuring smoother operation and higher fault tolerance.
- To calculate usable storage capacity, you need to account for the replication factor. For example, if you need 100TB of usable storage, you need to provision 100TB x 3 (replication factor) = 300TB of raw capacity. In Crusoe cloud, this can be achieved by provisioning 4 count of s1a.120x nodes or 8 count of s1a.60x nodes.
- Access to a Kubernetes cluster
Step-by-Step Instructions
Note: All the commands performed on the bootstrap node need to be run as root user.
-
Bootstrap the Ceph cluster
a. Install cephadm and ceph CLI on the bootstrap node.# apt install -y cephadm
# cephadm add-repo --release squid
# cephadm install ceph-common
b. Install docker/podman on the remaining Ceph nodes# sudo apt update && sudo apt install podman -y
c. Initialize the bootstrap node. For <mon-ip> Use the public IP of the bootstrap node if the Kubernetes and Ceph clusters are in different subnets. If the 2 clusters are in the same subnet, use the private IP to bootstrap the cluster.# cephadm bootstrap --mon-ip <mon-ip> --allow-fqdn-hostname
d. The bootstrap command will generate credentials for the Ceph dashboard. Save the credentials in a safe place. We will need it in the later steps. If you did not save the password, it can be found in the cephadm.log file that gets generated on the bootstrap node. Use the following command to find it.# cat /var/log/ceph/cephadm.log | grep -A5 'Ceph Dashboard is now available'
e. Setup passwordless SSH between the bootstrap and remaining Ceph nodes.# cp /etc/ceph/ceph.pub /root/.ssh/id_rsa.pub
# ceph config-key get mgr/cephadm/ssh_identity_key > /root/.ssh/id_rsa
# chmod 600 /root/.ssh/id_rsa
f. Copy Ceph's public SSH key to the remaining Ceph nodes. This step needs to be performed from your local machine from where you can SSH to all the Ceph nodes.# scp <bootstrap_node_public_ip>:/etc/ceph/ceph.pub .
# ssh-copy-id -f -i ceph.pub root@<public_ip_of_other_ceph_nodes>
g. Add the remaining hosts to the Ceph cluster. SSH back into to the bootstrap node to perform this step.# ceph orch host add <host_fqdn> <host_ip> --labels _admin
h. Run the following command to display inventory of unformatted storage devices on all the cluster nodes.# ceph orch device ls
i. Once verified, add all the unformatted storage disks to the Ceph cluster.
# ceph orch apply osd --all-available-devices
j. Validate the Ceph cluster status and OSDs before proceeding to next steps. Please allow a few minutes for all the disks to initialize within the cluster.# ceph osd tree
# ceph status
-
Create RBD Pool
a. Now that we have a Ceph cluster, we need to create a pool to use it with Kubernetes. Create a RBD pool using the following command
# ceph osd pool create <pool-name>
b. Verify that a pool was created using the following command# ceph osd pool ls
c. Associate the RBD application with the newly created pool.# ceph osd pool application enable <pool-name> rbd
-
Connect the external Ceph cluster to your Kubernetes cluster
a. Create users and keys on the external Ceph cluster. To perform this, rook-ceph provides a python script. Pull the script from github and run it on the bootstrap node using the following steps. This will generate a list of export variables and save these# curl https://raw.githubusercontent.com/rook/rook/refs/heads/release-1.15/deploy/examples/create-external-cluster-resources.py -o /root/create-external-cluster-resources.py
# chmod +x /root/create-external-cluster-resources.py
# python3 /root/create-external-cluster-resources.py --rbd-data-pool-name <pool-name> --format bash
b. The next step is to install Rook. This step will need to be performed from your local machine since we will need access to the Kubernetes cluster and Helm.# git clone https://github.com/rook/rook.git
# cd rook/
# git checkout v1.15.1
# cd deploy/charts/
# clusterNamespace=rook-ceph
# operatorNamespace=rook-ceph
# helm repo add rook-release https://charts.rook.io/release
# helm install --create-namespace --namespace $clusterNamespace rook-ceph rook-release/rook-ceph -f ./rook-ceph/values.yaml
# helm install --create-namespace --namespace $clusterNamespace rook-ceph-cluster \
--set operatorNamespace=$operatorNamespace rook-release/rook-ceph-cluster -f ./rook-ceph-cluster/values-external.yaml
c. Once Rook has been deployed successfully, we will export the variables generated in step 3b to import the Ceph cluster into Kubernetes. These steps will also be performed from your local machine.# Copy the variables generated in step 3b and paste them on your terminal
# curl https://raw.githubusercontent.com/rook/rook/refs/heads/release-1.15/deploy/examples/import-external-cluster.sh -o import-external-cluster.sh
# chmod +x import-external-cluster.sh
# ./import-external-cluster.sh -
Validate the Ceph cluster and set default storage class
a. Verify that the external Ceph cluster is now connected to your Kubernetes cluster.# kubectl get cephclusters -n rook-ceph
b. You can access the UI as well to validate the status of the cluster.
c. A storage class will also be created for dynamic volume provisioning. You can check the storage class using the following command.# kubectl -n rook-ceph get sc
Additional Resources
- Ceph
- Rook
Comments
0 comments
Article is closed for comments.