Get started with NVIDIA Riva Server on Crusoe Managed Kubernetes

Last Updated: March 23rd, 2026

Introduction

NVIDIA Riva is a GPU-accelerated SDK used to build and deploy real-time, customizable AI speech applications, including automatic speech recognition (ASR) and text-to-speech (TTS). It enables developers to create high-performance voice assistants, transcription services, and translation tools. This article explains how to deploy the Riva API server Helm chart on Crusoe Managed Kubernetes and run its example notebooks. The Helm chart creates a working Riva API service backed by Triton inference serving a selection of pre-trained voice models. Because the Riva examples are provided as Jupyter notebooks, we will also explain how to deploy Jupyterhub on your CMK cluster, but you can just as well configure the Riva API endpoint for external access via a LoadBalancer and run your client code outside the CMK cluster..

Prerequisites

An Nvidia NGC API key (obtained from your Nvidia NGC account)
Admin-level Kubectl access to a Crusoe Managed Kubernetes cluster with at least 1 available GPU of any type.

Step-by-Step Instructions

Deploy Jupyterhub

helm repo add jupyterhub https://hub.jupyter.org/helm-chart/ && helm repo update
helm upgrade --cleanup-on-fail \
  --install jupyter jupyterhub/jupyterhub --namespace riva --create-namespace
  
# remove the Jupyterhub network policies that can block intra-cluster services
kubectl -n riva delete networkpolicies --all

# Start a port-forward to the Jupyterhub UI
kubectl --namespace=riva2 port-forward service/proxy-public 8080:http

Access http://localhost:8080 in your browser and log into Jupyterhub with any username/password combination. For production use, you would of course configure Jupyterhub authentication to meet your organizational needs, but that's beyond the scope of this article.

Note: If you have Crusoe Load Balancer available, you can edit the proxy-public service to change the service type from ClusterIP to LoadBalancer and access the service without using port-forward

Deploy NVIDIA Riva Server

# Provide your NGC API key
export NGC_API_KEY=<your NGC api key>

#Download the helm chart
helm fetch https://helm.ngc.nvidia.com/nvidia/riva/charts/riva-api-2.19.1.tgz \
  --username=\$oauthtoken --password=$NGC_API_KEY
  
#extract the helm chart so that values.yaml can be edited
tar -xvzf riva-api-2.19.1.tgz

Edit riva-api/values.yaml to uncomment required offline models from the modelRepoGenerator.ngcModelConfigs.models section:
- nvidia/riva/rmir_asr_conformer_en_us_ofl:2.19.0
- nvidia/riva/rmir_diarizer_offline:2.19.0

.. and set cacheConfig.gpuProduct to match your GPU type, for example, NVIDIA-H100-80GB-HBM3 (run kubectl describe nodes|grep NVIDIA to check).

Apply the chart:

helm -n riva install riva-api ./riva-api-2.19.1.tgz  --set ngcCredentials.password=$(echo -n $NGC_API_KEY | base64 -w0) \
  --set ngcCredentials.email=<your ngc account email> --set modelRepoGenerator.modelDeployKey=$(echo -n tlt_encode | base64 -w0) \
  --values riva-api/values.yaml

Allow several minutes for the triton0 and riva-api pods to come to ready state (mainly required for the Triton server to pull and deploy the models - about 30 minutes the first time round, depending on how many models you uncommented in values.yaml above). You can tail the logs of the triton0 pod's init container if you want to monitor its progress.

kubectl -n riva logs -l app=triton0 -c riva-model-init -f

Run the quickstart notebooks

Using JupyterHub in your browser, create a new terminal to clone the Riva examples repo and install the Python requirements:

git clone https://github.com/nvidia-riva/tutorials.git
pip install grpcio grpcio-tools nvidia-riva-client

Open the resulting 'tutorials' directory in the Jupyter file browser pane, and then open the 'asr-basics.ipynb' notebook.

In the second runnable cell, edit auth = riva.client.Auth(uri='localhost:50051') to auth = riva.client.Auth(uri='riva-api:50051')

Run each cell in the example and verify that the audio sample is translated to "ASR Transcript: What is natural language processing?". Try 'asr-speaker-diarization.ipynb' as well for a cool example of an interview where the model identifies the different speakers in a conversation.

You have now got the Riva API and Triton inferencing up and running on your CMK cluster. Use the example notebooks to experiment with Riva and develop your own applications. Happy speech inferencing!

Additional Resources

Related to

nvidia cmk solution

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Article is closed for comments.

Introduction

Prerequisites

Step-by-Step Instructions

Additional Resources

Related to

Was this article helpful?

Still need help?

Related Articles

Recently Viewed

Comments

Get started with NVIDIA Riva Server on Crusoe Managed Kubernetes

Introduction

Prerequisites

Step-by-Step Instructions

Additional Resources

Related to

Was this article helpful?

Still need help?

Related Articles

Related articles

Recently Viewed

Comments