Introduction
Flyte is an open-source Kubernetes-native workflow orchestrator maintained by Crusoe partner Union.ai. With its MLOps-specific features, immutable and reproducible workflows, and strong typing, it is often considered as an alternative to Kubeflow Pipelines, Airflow and Dagster. You're most likely reading this article because you've already decided on Flyte as your pipeline orchestrator and want to deploy it for production use on your Crusoe Managed Kubernetes (CMK) cluster. This article shows you how to:
- Install the flyte-core helm chart as the Flyte execution backend on CMK.
- Leverage Crusoe Cloud features such as Crusoe Container Registry (CCR) for use with Flyte.
- Configure end-user access to Flyte's console and admin services using Kubernetes Ingress.
- Install and configure the flytekit SDK on your local machine in order to develop workflows and submit them for remote execution on your Flyte backend.
Flyte-core is the production implementation of Flyte which installs the core Flyte components as individually scalable deployments. Flyte-binary, the alternative chart, is for testing and small-scale production use and deploys all Flyte services inside a single Kubernetes pod. The steps in this article were tested using flyte-core 1.16.1.
Prerequisites
- Kubectl, Helm and Docker installed on your local machine
- Access to a working CMK cluster (via kubectl) from your local machine
- An SQL database service accessible to your CMK cluster - typically Postgres.
- If you don't already have a database service, you can install Postgres on your CMK cluster using this standard Helm chart.
- In your database service, create two new empty databases along with a password-authenticated user that has admin access to both. In this article, the databases are called flyteadmin and datacatalog.
- An S3-compatible object store with a pre-existing bucket and credentials with access to that bucket. The store should be accessible by both the CMK cluster and your local machine and its endpoint should have TLS enabled. You can create an object store on your CMK cluster itself using the Minio helm chart (ensure that the Crusoe SSD CSI driver storage class is installed on your cluster to provide the persistent volumes). For example:
helm repo add minio https://charts.min.io/ && helm repo update
kubectl create ns minio
# provide a TLS certificate and key
# (if self signed cert, add the cert to CA bundles as needed)
kubectl -n minio create secret tls minio-tls \
--cert=/path/to/tls.crt --key=/path/to/tls.key
helm install minio-release minio/minio --n minio \
--set mode=standalone --set tls.enabled=true --set tls.certSecret=minio-tls \
--set tls.publicCrt=tls.crt --set tls.privateKey=tls.key \
--set rootUser="minioadmin" --set rootPassword="minioadmin"Detailed configuration of the object store is beyond the scope of this article. You should create object store users with authentication and bucket permissions according to your organizational requirements.
-
A container registry with credentials that allow pushing and pulling of images. We will use Crusoe Container Registry (CCR), but you could use a third-party registry such as Dockerhub or run a Harbor instance in your CMK cluster. Create a CCR registry and obtain its URL and an access token. Refer to the CCR documentation for full details about all available options.
# set location to match that of your CMK cluster; output shows registry URL crusoe registry repositories create --location us-southcentral1-a \ --name myflyteregistry --mode standard # Create access token cmk crusoe registry tokens create
Limitations
CCR is a recently-introduced feature and might not be enabled on your Crusoe Cloud account. If you don't see 'Registry' as a top-level menu item in your Crusoe Cloud Console, please reach out to the Crusoe support team.
Step-by-step instructions
1 - From a Linux host running standard Docker daemon (not Docker Desktop), using your CCR registry URL and access token, perform a Docker login:
docker login <complete CCR registry URL> -u <your Crusoe Cloud login email>
# (enter the access token when prompted for a password)
cat ~/.docker/config.json |base64 -w0Copy the base64-encoded Docker config returned by the previous command, taking care to exclude any characters which follow that output in your terminal (because the output has no terminating newline or whitespace).
2 - Back on your local machine, create a YAML file registry-secret.yaml containing your Docker config, and apply it to your Flyte namespace as an imagePullSecret:
Contents of registry-secret.yaml
apiVersion: v1
data:
.dockerconfigjson: <base-64 encoded Docker config from step 1>
kind: Secret
metadata:
name: registry-secret
type: kubernetes.io/dockerconfigjsonCommands to create namespace and the imagePullSecret within it:
kubectl create ns flyte
kubectl -n flyte apply -f registry-secret.yaml3 - Create a secret containing the password of the database user that Flyte will use to access the 2 databases set up in the prerequisites section:
kubectl -n flyte create secret generic flyte-db-secret \
--from-literal=pass.txt=<clear-text password of the db user>4 - If needed, create a secret containing the CA bundle that will allow Flyte components to verify their TLS connections to the object store:
kubectl -n flyte create secret generic minio-ca \
--from-file=ca.crt=/path/to/ca.pem5 - Obtain the TLS certificate and key that you will use to secure your Flyte admin and console endpoints, then create a secret with them in the namespace of your ingress controller
kubectl -n istio-system create secret tls flyte-tls \
--cert=/path/to/flyte-endpoints-public.crt --key=/path/to/flyte-endpoints-private.key 6 - Obtain the flyte-core Helm chart, update your repo, and generate the default values.yaml file (useful for reference about other config options not covered in this article):
helm repo add flyteorg https://flyteorg.github.io/flyte
helm repo update
#Output the default helm chart config values for reference
helm show values flyte/flyte-core > flyte-core-default-values.yaml7 - Copy the following YAML and save as flyte-core-cmk-values.yaml, and make any required edits to database, object storage and ingress values.
# flyte-core-cmk-values.yaml
flyteadmin:
serviceAccount:
imagePullSecrets: [{"name": "registry-secret"}]
additionalVolumes:
- name: ca-cert
secret:
secretName: minio-ca
additionalVolumeMounts:
- name: ca-cert
mountPath: /etc/ssl/certs/object-store-ca.pem
subPath: ca.crt
readOnly: true
flytescheduler:
serviceAccount:
imagePullSecrets: [{"name": "registry-secret"}]
additionalVolumes:
- name: ca-cert
secret:
secretName: minio-ca
additionalVolumeMounts:
- name: ca-cert
mountPath: /etc/ssl/certs/object-store-ca.pem
subPath: ca.crt
readOnly: true
datacatalog:
serviceAccount:
imagePullSecrets: [{"name": "registry-secret"}]
additionalVolumes:
- name: ca-cert
secret:
secretName: minio-ca
additionalVolumeMounts:
- name: ca-cert
mountPath: /etc/ssl/certs/object-store-ca.pem
subPath: ca.crt
readOnly: true
flytepropeller:
serviceAccount:
imagePullSecrets: [{"name": "registry-secret"}]
additionalVolumes:
- name: ca-cert
secret:
secretName: minio-ca
additionalVolumeMounts:
- name: ca-cert
mountPath: /etc/ssl/certs/object-store-ca.pem
subPath: ca.crt
readOnly: true
flyteconsole:
imagePullSecrets: [{"name": "registry-secret"}]
webhook:
serviceAccount:
imagePullSecrets: [{"name": "registry-secret"}]
additionalVolumes:
- name: ca-cert
secret:
secretName: minio-ca
additionalVolumeMounts:
- name: ca-cert
mountPath: /etc/ssl/certs/object-store-ca.pem
subPath: ca.crt
readOnly: true
common:
databaseSecret:
name: "flyte-db-secret"
ingress:
ingressClassName: istio # or nginx depending on your ingressClass
enabled: true
separateGrpcIngress: false # false for Istio, true for Nginx
separateGrpcIngressAnnotations:
nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
host: <DNS hostname that you will point at your CMK loadbalancer>
tls:
enabled: true
secretName: flyte-tls
storage:
type: s3
bucketName: flyte
s3:
endpoint: "https://<url of object storage endpoint>"
region: us-east-1
authType: accesskey
accessKey: "<object storage access key>"
secretKey: "<object storage secret key>"
db:
datacatalog:
database:
port: 5432
username: postgres
host: postgres-postgresql.postgres.svc.cluster.local
dbname: "datacatalog"
passwordPath: /etc/db/pass.txt
admin:
database:
port: 5432
username: postgres
host: postgres-postgresql.postgres.svc.cluster.local
dbname: "flyteadmin"
passwordPath: /etc/db/pass.txt
configmap:
adminServer:
server:
security:
secure: true
k8s:
plugins:
k8s:
default-env-vars:
- FLYTE_IMAGE_REGISTRY: "registry.us-southcentral1-a.ccr.crusoecloudcompute.com/myflyteregistry-xyz"
- FLYTE_AWS_ENDPOINT: "https://<url of object storage endpoint>"
- FLYTE_AWS_ACCESS_KEY_ID: "<object storage access key>"
- FLYTE_AWS_SECRET_ACCESS_KEY: "<object storage secret key>"The values in the example above are only a small subset of all the possible configurable values in the flyte-core helm chart. Review the 'flyte-core-default-values.yaml' file generated in step 6 in order to understand the example values in their full context.
Install the helm chart with your values file:
helm install -n flyte -f flyte-core-cmk-example-values.yaml flyte flyte/flyte-coreCheck that all the pods in the flyte namespace come to Running state. A common cause of pods crashing on creation is when the database cannot be accessed, which is often because the password secret was not correctly configured or the 'passwordPath' property was not added to the helm values file.
When the pods are running, attempt to access the Flyte console using the DNS hostname provided in your ingress configuration (suffix the path '/console'). You should see 3 default projects listed. If you see the UI but no projects, that suggests a problem with CORS settings or with the underlying flyte-admin api itself.
Creating and submitting a workflow
Flyte workflows are developed and submitted for remote execution using the Flytekit SDK. Flyte recommends the use of the uv package manager for Python, so install it on your local machine, create a virtual environment in a working directory and then install Flytekit:
#install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
#create a working directory
mkdir flyte-test && cd flyte-test
#set up a virtual environment using Python 3.12 (or any version supported by Flyte)
uv venv --python 3.12
source .venv/bin/activate
#install Flytekit
uv pip install flytekit
uv sync
#initialize a test project
pyflyte init --template flyte-simple my-project
Create a config file and export your CCR registry URL:
export FLYTE_IMAGE_REGISTRY=registry.us-southcentral1-a.ccr.crusoecloudcompute.com/myflyteregistry-xyz
cat <<EOF > flyteconfig.yaml
admin:
endpoint: dns:///<your Flyte backend hostname as shown in Ingress>
#the CA of the Flyte backend TLS certificate
caCertFilePath: /path/to/flyte/ca.pem
image:
builder: local
storage:
type: s3
container: <name of bucket you created for Flyte in object storage>
connection:
endpoint: https://<url of object storage endpoint>
access-key: <object storage access key>
secret-key: <object storage secret key>
EOFIf you need to supply a CA bundle for TLS verification of your object store endpoint (or any other endpoints required by your code) you can provide them as part of the imageSpec. Place the CA bundle in the Flyte project's working directory and edit the Python code like so:
image_spec = fl.ImageSpec(
name="say-hello-image",
requirements="uv.lock",
#set the registry using the env or explicitly
registry=os.environ['FLYTE_IMAGE_REGISTRY'],
#CA bundle is copied into the container image and referenced using env
copy=["./minio.pem"],
env={"REQUESTS_CA_BUNDLE": "/root/minio.pem"}
)Run the workflow with 'pyflyte' (installed as part of Flytekit)
# -vvv for verbose mode - useful for seeing all the local stages of the process
pyflyte -vvv -c ./flyteconfig.yaml run --remote --project flytesnacks \
--domain development hello_world.py hello_world_wPyflyte builds a container image locally and pushes it to the registry. It pushes the Python code the S3 bucket, then instructs the Flyte admin backend to execute the workflow. In turn, the Flyte backend creates a pod using the new image in the project's Kubernetes namespace (it could be several different pods, each using different images, depending on the complexity of the workflow). The pod collects the Python code from the object store and runs it. The execution progress and results are shown in the Flyte console at the URL printed out by the Pyflyte command. You can use the console to see the progress and results of the execution attempt, as shown in this screen shot.
Troubleshooting
As you can tell from all the preceding steps, numerous independent services need to be correctly configured in order for Flyte workflow execution to succeed. Common reasons for the execution to fail include:
- Local docker is not running or current user does not have permission to use it
- Docker registry details have not been correctly provided or current user does not have authorization to push to it
- Flyte admin service is not reachable from local machine or its TLS certificate is not trusted
- Local machine and/or one or more back-end Flyte services cannot reach the object store or do not trust its TLS certificate
- imagePullSecrets have not been set up correctly in Flyte namespace or provided to Flyte services and as a result the execution pods fail with imagePullBackoff.
- Execution pods are unable to contact or trust the object store endpoint and therefore cannot pull the tarred Python code for their tasks.