faster-whisper performs worse on a 5070 TI than a 4070 TI Super #1287

vicimikec · 2025-04-14T15:18:37Z

So I know that the 5000 series GPUs from NVidia were a bit of a let down for gamers, but I have heard the Blackwell architecture is supposed to be an AI power house. I just added a new system to my dev cluster with a 5070 TI in it and it is performing 10% worse than my other systems with 4070 TI Supers. To make sure it was the card and not my setup I switched the cards in two of the systems and saw equally bad performance on the one with the 5070 TI. These systems are running the same code and models. Everything I have ready it should be either on par with the 4070 TI super or as much as 20% faster. I am not sure if this is a bug or if this is just a defective card or if Nvidia actually removed something from the cards that faster-whisper or one of its dependencies needs. Any help in figuring this out would be appreciated.

Purfview · 2025-04-14T23:40:39Z

Don't know about it but I've seen posts that 50xx series is waste of money.

Could you test if compute_type=int8_float16 works with your 5070?

Purfview · 2025-04-14T23:55:30Z

Any help in figuring this out would be appreciated.

I just remembered one report that 2080 is ~30% faster than 5070Ti in sequential mode, but 5070Ti is faster in batched mode.

BTW, check if hyperoption like Hardware Accelerated GPU Scheduling is same with both cards.

vicimikec · 2025-04-15T20:40:21Z

I put the 5070 TI in a windows system to check if it was one of those bad cards with missing ROPS. It is not. When I get it back into the server Ill test the "int_float16". That value throws errors on my 4070 TI Supers. I tried batch mode on the 5070 TI but it threw a CUBLAS_STATUS_NOT_SUPPORTED error. From what I can tell its a bug in the cublas library that should hopefully be fixed soon.

Purfview · 2025-04-15T21:04:51Z

Ill test the "int_float16". That value throws errors on my 4070 TI Supers.

Oh I made a typo. It should be int8_float16.

I tried batch mode on the 5070 TI but it threw a CUBLAS_STATUS_NOT_SUPPORTED error.

float16 should work.

Relevant link: OpenNMT/CTranslate2#1865

From what I can tell its a bug in the cublas library that should hopefully be fixed soon.

Is it reported to nvidia, do you have any link?

vicimikec · 2025-04-16T12:56:18Z

Oh that issue is what I was referring to. Just realized you are the reporter.

vicimikec · 2025-04-16T13:32:49Z

Just put the 5070 TI back in a dev system and tried running in batch more with "int8_float16". Got the same error:

Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib64/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
File "/usr/local/bin/whisperserv-batched", line 332, in run
for segment in segments:
File "/usr/local/lib/python3.11/site-packages/faster_whisper/transcribe.py", line 553, in _batched_segments_generator
results = self.forward(
^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/faster_whisper/transcribe.py", line 121, in forward
encoder_output, outputs = self.generate_segment_batched(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/faster_whisper/transcribe.py", line 223, in generate_segment_batched
results = self.model.model.generate(
^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED
`

With "float16" in batch mode it ran through and was very slightly faster (1 second faster on a 2 hour long recording) than the 4070 TI Super with the same settings. I guess I am happy with that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faster-whisper performs worse on a 5070 TI than a 4070 TI Super #1287

faster-whisper performs worse on a 5070 TI than a 4070 TI Super #1287

vicimikec commented Apr 14, 2025

Purfview commented Apr 14, 2025 •

edited

Loading

Purfview commented Apr 14, 2025

vicimikec commented Apr 15, 2025

Purfview commented Apr 15, 2025 •

edited

Loading

vicimikec commented Apr 16, 2025

vicimikec commented Apr 16, 2025

faster-whisper performs worse on a 5070 TI than a 4070 TI Super #1287

faster-whisper performs worse on a 5070 TI than a 4070 TI Super #1287

Comments

vicimikec commented Apr 14, 2025

Purfview commented Apr 14, 2025 • edited Loading

Purfview commented Apr 14, 2025

vicimikec commented Apr 15, 2025

Purfview commented Apr 15, 2025 • edited Loading

vicimikec commented Apr 16, 2025

vicimikec commented Apr 16, 2025

Purfview commented Apr 14, 2025 •

edited

Loading

Purfview commented Apr 15, 2025 •

edited

Loading