Question 1

How accurate is Whisper Large V3 on VoltageGPU?

Accepted Answer

Whisper Large V3 achieves state-of-the-art accuracy: 99%+ on clean English audio, 95%+ on conversational English, and 90%+ on most of the 99 supported languages. Accuracy depends on audio quality, background noise, and language.

Question 2

How does pricing compare to OpenAI Whisper API?

Accepted Answer

VoltageGPU Whisper API costs approximately $0.003 per minute of audio compared to OpenAI's $0.006 per minute. For 10,000 minutes of audio, you save $30 per month. Bulk pricing is even lower.

Question 3

What audio formats are supported?

Accepted Answer

We support all major audio and video formats: MP3, MP4, WAV, FLAC, OGG, WebM, M4A, and more. Maximum file size is 500MB per request. For larger files, we recommend splitting them into chunks.

Question 4

Can I get subtitles in SRT or VTT format?

Accepted Answer

Yes. Set response_format to "srt" or "vtt" in your API request to get ready-to-use subtitle files with timestamps. You can also request "verbose_json" for word-level timestamps and build custom subtitle formats.

Question 5

Is real-time transcription supported?

Accepted Answer

Currently, VoltageGPU Whisper API processes pre-recorded audio files. For real-time streaming transcription, you can deploy a custom Whisper streaming pipeline on a VoltageGPU pod with WebSocket support.

Speech to Text API - Whisper & More

Key Benefits

Whisper Large V3

99+ Languages

OpenAI-Compatible

Word-Level Timestamps

10x Cheaper

Batch Processing

Recommended GPUs

Code Example

FAQ

Explore More

Start Building