Skip to main content
Crusoe Support Help Center home page
Crusoe

How-To get started with Flyte v1 on Crusoe Managed Kubernetes

Abe Sharp
Abe Sharp
Updated

Introduction

Flyte is an open-source Kubernetes-native workflow orchestrator maintained by Crusoe partner Union.ai. With its MLOps-specific features, immutable and reproducible workflows, and strong typing, it is often considered as an alternative to Kubeflow Pipelines, Airflow and Dagster. You're most likely reading this article because you've already decided on Flyte as your pipeline orchestrator and want to deploy it for production use on your Crusoe Managed Kubernetes (CMK) cluster. This article shows you how to:

  • Install the flyte-core helm chart as the Flyte execution backend on CMK. 
  • Leverage Crusoe Cloud features such as Crusoe Container Registry (CCR) for use with Flyte.
  • Configure end-user access to Flyte's console and admin services using Kubernetes Ingress.
  • Install and configure the flytekit SDK on your local machine in order to develop workflows and submit them for remote execution on your Flyte backend.

Flyte-core is the production implementation of Flyte which installs the core Flyte components as individually scalable deployments. Flyte-binary, the alternative chart, is for testing and small-scale production use and deploys all Flyte services inside a single Kubernetes pod. The steps in this article were tested using flyte-core 1.16.1.

 

Prerequisites

  • Kubectl, Helm and Docker installed on your local machine
  • Access to a working CMK cluster (via kubectl) from your local machine
  • An SQL database service accessible to your CMK cluster - typically Postgres. 
    • If you don't already have a database service, you can install Postgres on your CMK cluster using this standard Helm chart.
    • In your database service, create two new empty databases along with a password-authenticated user that has admin access to both. In this article, the databases are called flyteadmin and datacatalog.
  • An S3-compatible object store with a pre-existing bucket and credentials with access to that bucket. The store should be accessible by both the CMK cluster and your local machine and its endpoint should have TLS enabled. You can create an object store on your CMK cluster itself using the Minio helm chart (ensure that the Crusoe SSD CSI driver storage class is installed on your cluster to provide the persistent volumes). For example:
helm repo add minio https://charts.min.io/ && helm repo update
kubectl create ns minio

# provide a TLS certificate and key
# (if self signed cert, add the cert to CA bundles as needed)
kubectl -n minio create secret tls minio-tls \
  --cert=/path/to/tls.crt --key=/path/to/tls.key

helm install minio-release minio/minio --n minio \
  --set mode=standalone --set tls.enabled=true --set tls.certSecret=minio-tls \
  --set tls.publicCrt=tls.crt --set tls.privateKey=tls.key \
  --set rootUser="minioadmin" --set rootPassword="minioadmin"

Detailed configuration of the object store is beyond the scope of this article. You should create object store users with authentication and bucket permissions according to your organizational requirements.

  • A container registry with credentials that allow pushing and pulling of images. We will use Crusoe Container Registry (CCR), but you could use a third-party registry such as Dockerhub or run a Harbor instance in your CMK cluster. Create a CCR registry and obtain its URL and an access token. Refer to the CCR documentation for full details about all available options.

    # set location to match that of your CMK cluster; output shows registry URL
    crusoe registry repositories create --location us-southcentral1-a \
      --name myflyteregistry --mode standard
    
    # Create access token
    cmk crusoe registry tokens create

 

Limitations

CCR is a recently-introduced feature and might not be enabled on your Crusoe Cloud account. If you don't see 'Registry' as a top-level menu item in your Crusoe Cloud Console, please reach out to the Crusoe support team.

 

Step-by-step instructions

1 - From a Linux host running standard Docker daemon (not Docker Desktop), using your CCR registry URL and access token, perform a Docker login:

docker login <complete CCR registry URL> -u <your Crusoe Cloud login email>
# (enter the access token when prompted for a password)
  
cat ~/.docker/config.json |base64 -w0

Copy the base64-encoded Docker config returned by the previous command, taking care to exclude any characters which follow that output in your terminal (because the output has no terminating newline or whitespace).

2 - Back on your local machine, create a YAML file registry-secret.yaml containing your Docker config, and apply it to your Flyte namespace as an imagePullSecret:

Contents of registry-secret.yaml

apiVersion: v1
data:
  .dockerconfigjson: <base-64 encoded Docker config from step 1>
kind: Secret
metadata:
  name: registry-secret
type: kubernetes.io/dockerconfigjson

Commands to create namespace and the imagePullSecret within it:

kubectl create ns flyte
kubectl -n flyte apply -f registry-secret.yaml

3 - Create a secret containing the password of the database user that Flyte will use to access the 2 databases set up in the prerequisites section:

kubectl -n flyte create secret generic flyte-db-secret \
  --from-literal=pass.txt=<clear-text password of the db user>

4 - If needed, create a secret containing the CA bundle that will allow Flyte components to verify their TLS connections to the object store:

kubectl -n flyte create secret generic minio-ca \
  --from-file=ca.crt=/path/to/ca.pem

5 - Obtain the TLS certificate and key that you will use to secure your Flyte admin and console endpoints, then create a secret with them in the namespace of your ingress controller

kubectl -n istio-system create secret tls flyte-tls \
  --cert=/path/to/flyte-endpoints-public.crt --key=/path/to/flyte-endpoints-private.key 

6 - Obtain the flyte-core Helm chart, update your repo, and generate the default values.yaml file (useful for reference about other config options not covered in this article):

helm repo add flyteorg https://flyteorg.github.io/flyte
helm repo update

#Output the default helm chart config values for reference
helm show values flyte/flyte-core > flyte-core-default-values.yaml

7 - Copy the following YAML and save as flyte-core-cmk-values.yaml, and make any required edits to database, object storage and ingress values. 

# flyte-core-cmk-values.yaml

flyteadmin:
  serviceAccount:
    imagePullSecrets: [{"name": "registry-secret"}]
  additionalVolumes:
    - name: ca-cert
      secret:
        secretName: minio-ca
  additionalVolumeMounts:
    - name: ca-cert
      mountPath: /etc/ssl/certs/object-store-ca.pem
      subPath: ca.crt
      readOnly: true

flytescheduler:
  serviceAccount:
    imagePullSecrets: [{"name": "registry-secret"}]
  additionalVolumes:
    - name: ca-cert
      secret:
        secretName: minio-ca
  additionalVolumeMounts:
    - name: ca-cert
      mountPath: /etc/ssl/certs/object-store-ca.pem
      subPath: ca.crt
      readOnly: true

datacatalog:
  serviceAccount:
    imagePullSecrets: [{"name": "registry-secret"}]
  additionalVolumes:
    - name: ca-cert
      secret:
        secretName: minio-ca
  additionalVolumeMounts:
    - name: ca-cert
      mountPath: /etc/ssl/certs/object-store-ca.pem
      subPath: ca.crt
      readOnly: true


flytepropeller:
  serviceAccount:
    imagePullSecrets: [{"name": "registry-secret"}]
  additionalVolumes:
    - name: ca-cert
      secret:
        secretName: minio-ca
  additionalVolumeMounts:
    - name: ca-cert
      mountPath: /etc/ssl/certs/object-store-ca.pem
      subPath: ca.crt
      readOnly: true

flyteconsole:
  imagePullSecrets: [{"name": "registry-secret"}]
  
webhook:
  serviceAccount:
    imagePullSecrets: [{"name": "registry-secret"}]
  additionalVolumes:
    - name: ca-cert
      secret:
        secretName: minio-ca
  additionalVolumeMounts:
    - name: ca-cert
      mountPath: /etc/ssl/certs/object-store-ca.pem
      subPath: ca.crt
      readOnly: true

common:
  databaseSecret:
    name: "flyte-db-secret"
  ingress:
    ingressClassName: istio # or nginx depending on your ingressClass
    enabled: true
    separateGrpcIngress: false # false for Istio, true for Nginx
    separateGrpcIngressAnnotations:
      nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
    host: <DNS hostname that you will point at your CMK loadbalancer>
    tls:
      enabled: true
      secretName: flyte-tls

storage:
  type: s3
  bucketName: flyte
  s3:
    endpoint: "https://<url of object storage endpoint>"
    region: us-east-1
    authType: accesskey
    accessKey: "<object storage access key>"
    secretKey: "<object storage secret key>"

db:
  datacatalog:
    database:
      port: 5432
      username: postgres
      host: postgres-postgresql.postgres.svc.cluster.local
      dbname: "datacatalog"
      passwordPath: /etc/db/pass.txt
  admin:
    database:
      port: 5432
      username: postgres
      host: postgres-postgresql.postgres.svc.cluster.local
      dbname: "flyteadmin"
      passwordPath: /etc/db/pass.txt

configmap:
  adminServer:
    server:
      security:
        secure: true

  k8s:
    plugins:
      k8s:
        default-env-vars:
        - FLYTE_IMAGE_REGISTRY: "registry.us-southcentral1-a.ccr.crusoecloudcompute.com/myflyteregistry-xyz"
        - FLYTE_AWS_ENDPOINT: "https://<url of object storage endpoint>"
        - FLYTE_AWS_ACCESS_KEY_ID: "<object storage access key>"
        - FLYTE_AWS_SECRET_ACCESS_KEY: "<object storage secret key>"

The values in the example above are only a small subset of all the possible configurable values in the flyte-core helm chart. Review the 'flyte-core-default-values.yaml' file generated in step 6 in order to understand the example values in their full context.

Install the helm chart with your values file:

helm install -n flyte -f flyte-core-cmk-example-values.yaml flyte flyte/flyte-core

Check that all the pods in the flyte namespace come to Running state. A common cause of pods crashing on creation is when the database cannot be accessed, which is often because the password secret was not correctly configured or the 'passwordPath' property was not added to the helm values file.

When the pods are running, attempt to access the Flyte console using the DNS hostname provided in your ingress configuration (suffix the path '/console'). You should see 3 default projects listed. If you see the UI but no projects, that suggests a problem with CORS settings or with the underlying flyte-admin api itself.


 

 

Creating and submitting a workflow

Flyte workflows are developed and submitted for remote execution using the Flytekit SDK. Flyte recommends the use of the uv package manager for Python, so install it on your local machine, create a virtual environment in a working directory and then install Flytekit:

#install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

#create a working directory
mkdir flyte-test && cd flyte-test

#set up a virtual environment using Python 3.12 (or any version supported by Flyte)
uv venv --python 3.12
source .venv/bin/activate

#install Flytekit
uv pip install flytekit
uv sync

#initialize a test project
pyflyte init --template flyte-simple my-project

Create a config file and export your CCR registry URL:

export FLYTE_IMAGE_REGISTRY=registry.us-southcentral1-a.ccr.crusoecloudcompute.com/myflyteregistry-xyz

cat <<EOF > flyteconfig.yaml
admin:
 endpoint: dns:///<your Flyte backend hostname as shown in Ingress>
 #the CA of the Flyte backend TLS certificate
 caCertFilePath: /path/to/flyte/ca.pem
image:
 builder: local
storage:
 type: s3
 container: <name of bucket you created for Flyte in object storage>
 connection:
   endpoint: https://<url of object storage endpoint>
   access-key: <object storage access key>
   secret-key: <object storage secret key>
EOF

If you need to supply a CA bundle for TLS verification of your object store endpoint (or any other endpoints required by your code) you can provide them as part of the imageSpec. Place the CA bundle in the Flyte project's working directory and edit the Python code like so:

image_spec = fl.ImageSpec(
    name="say-hello-image",
    requirements="uv.lock",
    
    #set the registry using the env or explicitly
    registry=os.environ['FLYTE_IMAGE_REGISTRY'],
    
    #CA bundle is copied into the container image and referenced using env
    copy=["./minio.pem"],
    env={"REQUESTS_CA_BUNDLE": "/root/minio.pem"}
)

Run the workflow with 'pyflyte' (installed as part of Flytekit)

# -vvv for verbose mode - useful for seeing all the local stages of the process
pyflyte -vvv -c ./flyteconfig.yaml run --remote --project flytesnacks \
  --domain development hello_world.py hello_world_w

Pyflyte builds a container image locally and pushes it to the registry. It pushes the Python code the S3 bucket, then instructs the Flyte admin backend to execute the workflow. In turn, the Flyte backend creates a pod using the new image in the project's Kubernetes namespace (it could be several different pods, each using different images, depending on the complexity of the workflow). The pod collects the Python code from the object store and runs it. The execution progress and results are shown in the Flyte console at the URL printed out by the Pyflyte command. You can use the console to see the progress and results of the execution attempt, as shown in this screen shot.

Troubleshooting

As you can tell from all the preceding steps, numerous independent services need to be correctly configured in order for Flyte workflow execution to succeed. Common reasons for the execution to fail include:

  • Local docker is not running or current user does not have permission to use it
  • Docker registry details have not been correctly provided or current user does not have authorization to push to it
  • Flyte admin service is not reachable from local machine or its TLS certificate is not trusted
  • Local machine and/or one or more back-end Flyte services cannot reach the object store or do not trust its TLS certificate
  • imagePullSecrets have not been set up correctly in Flyte namespace or provided to Flyte services and as a result the execution pods fail with imagePullBackoff.
  • Execution pods are unable to contact or trust the object store endpoint and therefore cannot pull the tarred Python code for their tasks.

Related to

Was this article helpful?

0 out of 0 found this helpful

Still need help?

Our support team is ready to assist you with any questions.

Have more questions? Submit a request

Recently Viewed

Comments

0 comments

Article is closed for comments.