Real-Time Audio Translation with OpenAI APIs on DigitalOcean GPU Droplets Using Open WebUI

Nov 08, 2024 08:30 AM - 1 month ago 43827

Introduction

With the expanding request for multilingual communication, real-time audio translator is quickly gaining attention. In this tutorial, you will study to deploy a real-time audio translator exertion utilizing OpenAI APIs connected Open WebUI, each hosted connected a powerful GPU Droplet from DigitalOcean.

DigitalOcean’s GPU Droplets, powered by NVIDIA H100 GPUs, connection important capacity for AI workloads, making them perfect for accelerated and businesslike real-time audio translation. Let’s get started.

Prerequisites

  • A DigitalOcean Cloud account.
  • A GPU Droplet deployed and running.
  • An OpenAI API key group up for accessing the OpenAI models.
  • Familiarity pinch SSH and basal Docker commands.
  • An SSH cardinal for logging into your GPU Droplet.

Step 1 - Setting Up the DigitalOcean GPU Droplet

1.Create a New Project - You will request to create a caller project from the unreality power sheet and necktie it to a GPU Droplet.

2.Create a GPU Droplet - Log into your DigitalOcean account, create a caller GPU Droplet, and take AI/ML Ready arsenic the OS. This OS image installs each the basal NVIDIA GPU Drivers. You tin mention to our charismatic archiving connected how to create a GPU Droplet.

Create-a-gpu-droplet which is AI/ML Ready

3.Add an SSH Key for authentication - An SSH cardinal is required to authenticate pinch the GPU Droplet and by adding the SSH key, you tin login to the GPU Droplet from your terminal.

Add an SSH cardinal for authentication

4.Finalize and Create the GPU Droplet - Once each of the supra steps are completed, finalize and create a caller GPU Droplet.

Create a GPU Droplet

Step 2 - Installing and Configuring Open WebUI

Open WebUI is simply a web interface that allows users to interact pinch connection models (LLMs). It’s designed to beryllium user-friendly, extensible, and self-hosted, and tin tally offline. Open WebUI is akin to ChatGPT successful its interface, and it tin beryllium utilized pinch a assortment of LLM runners, including Ollama and OpenAI-compatible APIs.

There are 3 ways you tin deploy Open WebUI:

  • Docker: Officially supported and recommended for astir users.
  • Python: Suitable for low-resource environments aliases those wanting a manual setup.
  • Kubernetes: Ideal for endeavor deployments that require scaling and orchestration.

In this tutorial you will deploy Open WebUI utilizing Docker arsenic a docker instrumentality connected the GPU Droplet pinch Nvidia GPU support. You tin cheque retired and study astir really to deploy Open WebUI utilizing different techniques successful this Open WebUI quick commencement guide.

Docker Setup

Once the GPU Droplet is fresh and deployed. SSH to the GPU Droplet from your terminal.

ssh root@<your-droplet-ip>

This Ubuntu AI/ML Ready H100x1GPU Droplet comes pre-installed pinch docker.

You tin verify the docker type utilizing the beneath command:

docker --version

Output

Docker type 24.0.7, build 24.0.7-0ubuntu2~22.04.1

Next, tally the beneath bid to verify and guarantee Docker has entree to your GPU:

docker tally --rm --gpus each nvidia/cuda:12.2.0-runtime-ubuntu22.04 nvidia-smi

This bid pulls the nvidia/cuda:12.2.0-runtime-ubuntu22.04 image (if it has not already been downloaded aliases updates an existing image) and starts a container.

Inside the container, it runs nvidia-smi to corroborate that the instrumentality has GPU entree and tin interact pinch the underlying GPU hardware. Once nvidia-smi has executed, the --rm emblem ensures the instrumentality is automatically removed, arsenic it’s nary longer needed.

You should observe the pursuing output:

Output

Unable to find image 'nvidia/cuda:12.2.0-runtime-ubuntu22.04' locally 12.2.0-runtime-ubuntu22.04: Pulling from nvidia/cuda aece8493d397: Pull complete 9fe5ccccae45: Pull complete 8054e9d6e8d6: Pull complete bdddd5cb92f6: Pull complete 5324914b4472: Pull complete 9a9dd462fc4c: Pull complete 95eef45e00fa: Pull complete e2554c2d377e: Pull complete 4640d022dbb8: Pull complete Digest: sha256:739e0bde7bafdb2ed9057865f53085539f51cbf8bd6bf719f2e114bab321e70e Status: Downloaded newer image for nvidia/cuda:12.2.0-runtime-ubuntu22.04 ========== == CUDA == ========== CUDA Version 12.2.0 Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All authorities reserved. This instrumentality image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and utilizing the container, you judge the position and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license A transcript of this licence is made disposable in this instrumentality astatine /NGC-DL-CONTAINER-LICENSE for your convenience. Thu Nov 7 19:32:18 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.183.06 Driver Version: 535.183.06 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA H100 80GB HBM3 On | 00000000:00:09.0 Off | 0 | | N/A 28C P0 70W / 700W | 0MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process sanction GPU Memory | | ID ID Usage | |=======================================================================================| | No moving processes recovered | +---------------------------------------------------------------------------------------+

Deploy Open WebUI utilizing Docker pinch GPU Support

Please usage the beneath docker bid to tally the Open WebUI docker container.

docker tally -d -p 3000:8080 -v open-webui:/app/backend/data --name open-webui --gpus each ghcr.io/open-webui/open-webui:main

The supra bid runs a Docker instrumentality utilizing the open-webui image and sets up circumstantial configurations for web ports, volumes, and GPU access.

  1. docker tally -d:

    • docker tally starts a caller Docker container.
    • -d runs the instrumentality successful detached mode, meaning it runs successful the background.
  2. -p 3000:8080:

    • This maps larboard 8080 wrong the instrumentality to larboard 3000 connected the big machine.
    • It allows you to entree the exertion successful the instrumentality by navigating to http://localhost:3000 connected the host.
  3. -v open-webui:/app/backend/data:

    • This mounts a Docker measurement named open-webui to the /app/backend/data directory wrong the container.
    • Volumes are utilized to persist information generated aliases utilized by the container, ensuring it remains disposable moreover if the instrumentality is stopped aliases deleted.
  4. –name open-webui:

    • Assigns the instrumentality a circumstantial name, open-webui, which makes it easier to reference (e.g., docker extremity open-webui to extremity the container).
  5. ghcr.io/open-webui/open-webui:main:

    • Specifies the Docker image to usage for the container.
    • ghcr.io/open-webui/open-webui is the sanction of the image, hosted connected GitHub’s instrumentality registry (ghcr.io).
    • main is the image tag, often representing the latest unchangeable type aliases main branch.
  6. –gpus all:

    • This action enables GPU support for the container, allowing it to usage each disposable GPUs connected the big machine.
    • It’s basal for applications that leverage GPU acceleration, specified arsenic instrumentality learning models.

Verify if the Open WebUI docker instrumentality is up and running:

docker ps

Output

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 4fbe72466797 ghcr.io/open-webui/open-webui:main "bash start.sh" 5 seconds agone Up 4 seconds (health: starting) 0.0.0.0:3000->8080/tcp, :::3000->8080/tcp open-webui

Once Open WebUI instrumentality is up and running, entree it astatine http://<your_gpu_droplet_ip>:3000 connected your browser.

Open WebUI dashboard

Step 3 - Add OpenAI API Key to usage GPT-4o pinch Open WebUI

In this step, you will adhd your OpenAI API key to Open WebUI.

Once logged successful to the Open WebUI dashboard, you should announcement nary models moving arsenic seen successful the beneath image:

Open WebUI Dashboard

To link Open WebUI pinch OpenAI and usage each the disposable OpenAI models, travel the beneath steps:

  1. Open Settings:

    • In Open WebUI, click your user icon astatine the bottommost left, past click Settings.
  2. Go to Admin:

    • Navigate to the Admin tab, past prime Connections.
  3. Add the OpenAI API Key:

    • Add your OpenAI API key successful the correct textbox nether the OpenAI API tab.
  4. Verify Connection:

    • Click Verify Connection. A greenish ray confirms a successful connection.

Adding OpenAI API Key

Now, Open WebUI will past auto-detect each disposable OpenAI models. Select GPT-4o from the list.

GPT-4o models

Next, group the text-to-speech and speech-to-text models and audio settings to usage OpenAI susurration model:

Setup audio settings

Again, navigate and click Settings -> Audio to configure and prevention the audio STT and TTS settings, arsenic seen successful the supra screenshot.

You tin publication much astir the OpenAI text-to-speech and speech-to-text here.

Step 4 - Set up Audio Tunneling

If you’re streaming audio from your section instrumentality to the Droplet, way the audio input done an SSH tunnel.

Since the GPU Droplet has the Open WebUI instrumentality moving connected http://localhost:3000, you tin entree it connected your section instrumentality by navigating to http://localhost:3000 aft mounting up this SSH tunnel.

This is required to fto Open WebUI entree the microphone connected your section instrumentality for realtime audio translator and realtime lamguage processing. As without this it will propulsion the beneath correction erstwhile clicking the headphone aliases microphone icon to usage GPT-4o for earthy connection processing tasks.

Error erstwhile signaling audio

Use the beneath bid to group a section SSH passageway from your section instrumentality to the GPU Droplet by opening a caller terminal connected your section machine:

ssh -o ServerAliveInterval=60 -o ServerAliveCountMax=5 root@<gpu_droplet_ip> -L 3000:localhost:3000

This bid establishes an SSH relationship to your GPU Droplet arsenic the guidelines personification and establishes a section larboard forwarding tunnel. It besides includes options to support the SSH convention alive. Here’s a elaborate breakdown:

  1. -o ServerAliveInterval=60:

    • This action sets the ServerAliveInterval to 60 seconds, meaning that each 60 seconds, an SSH keep-alive connection is sent to the distant server.
    • This helps forestall the SSH relationship from timing retired owed to inactivity.
  2. -o ServerAliveCountMax=5:

    • This action sets the ServerAliveCountMax to 5, which allows up to 5 missed keep-alive messages earlier the SSH relationship is terminated.
    • Together pinch ServerAliveInterval=60, this mounting intends the SSH convention will enactment unfastened for 5 minutes (5 × 60 seconds) of nary consequence from the server earlier closing.
  3. -L 3000:localhost:3000:

    • This portion sets up section larboard forwarding.
    • 3000 (before the colon) is the section larboard connected your machine, wherever you will entree the forwarded connection.
    • localhost:3000 (after the colon) refers to the destination connected the GPU Droplet.
    • In this case, it forwards postulation from larboard 3000 connected your section instrumentality to larboard 3000 connected the GPU Droplet.

Now, this bid will let you to entree the Open WebUI by visiting http://localhost:3000 connected your section instrumentality and besides usage the microphone for real-time audio translation.

Step 5 - Implementing Real-time Translation pinch GPT-4o

Click the headphone aliases microphone icon to usage susurration and GPT-4o models for earthy connection processing tasks.

Use the microphone to chat

Clicking connected the Headphone/Call fastener will unfastened a sound adjunct utilizing OpenAI GPT-4o and susurration models for real-time audio processing and translation.

You tin usage it to construe and transcribe the audio successful existent clip by talking pinch the GPT-4o sound assistant.

Voice chat and transcription successful existent time

Conclusion

Deploying real-time audio translator utilizing OpenAI APIs connected Open WebUI pinch DigitalOcean’s GPU Droplets allows developers to create high-performance translator systems. With easy setup and monitoring, DigitalOcean’s level provides the resources for scalable, businesslike AI applications.

More