← Back to blog

Month 4 Milestone 3: GPU Optimization and Scaling Challenges

October 19, 2025

Month 4 Milestone 3: GPU Optimization and Scaling Challenges

GPU Cluster Optimization

This week work on the GPU Kubernetes cluster and getting the right balance of GPU for Whisper and the LLM was challenging. I finally was able to get everything going after I swapped the power supply, as I was getting sudden power offs while fine-tuning GPU usage.

After the power supply swap, I was able to sustain 30 concurrent calls by running a script to simulate a user and holding the call up to 50 seconds. The GPU performed excellently with more headroom available on the GPU cluster for more simultaneous users.

CPU Cluster Scaling

The CPU cluster, since it runs more services, had its challenges as well. I will need to add more nodes and scale the cluster further.

Overall Progress

Overall happy with the progress despite the fine-tuning challenges. The remaining issue is to get audio forks to work well with WebSockets. Although my Python load testing scripts show everything is working, I have yet to get to an E2E scenario. This will be my focus point this week.