Topics
-
Inference at Scale: Transcribing Millions of Insurance Calls with Whisper and Azure ML
By: Jimmy Scray
Experience Level: Intermediate
Length: 25 Minutes
Description:Transcribing a few audio files with Whisper is easy. Transcribing millions of recordings efficiently, reliably, and cost-effectively is a very different problem.
In this talk, I'll dive into the Python code and infrastructure behind a large-scale speech transcription platform built for the insurance industry. Starting from a notebook prototype, we'll explore how the system evolved into a distributed inference pipeline running across thousands of GPU workers.
Rather than focusing on machine learning theory, we'll focus on inference engineering: benchmarking CPU and GPU workloads, maximizing throughput, orchestrating jobs with Azure Machine Learning, handling spot-instance interruptions, and writing resilient Python code that can recover from failures and resume processing automatically.
Along the way, I'll share benchmark results, architecture decisions, code examples, and the lessons learned while processing millions of real-world recordings.
If you're interested in Python, distributed systems, performance optimization, or production machine learning infrastructure, this talk will show what happens after the model is trained.