Skip to content

faster-whisper performs worse on a 5070 TI than a 4070 TI Super #1287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
vicimikec opened this issue Apr 14, 2025 · 6 comments
Open

faster-whisper performs worse on a 5070 TI than a 4070 TI Super #1287

vicimikec opened this issue Apr 14, 2025 · 6 comments

Comments

@vicimikec
Copy link

So I know that the 5000 series GPUs from NVidia were a bit of a let down for gamers, but I have heard the Blackwell architecture is supposed to be an AI power house. I just added a new system to my dev cluster with a 5070 TI in it and it is performing 10% worse than my other systems with 4070 TI Supers. To make sure it was the card and not my setup I switched the cards in two of the systems and saw equally bad performance on the one with the 5070 TI. These systems are running the same code and models. Everything I have ready it should be either on par with the 4070 TI super or as much as 20% faster. I am not sure if this is a bug or if this is just a defective card or if Nvidia actually removed something from the cards that faster-whisper or one of its dependencies needs. Any help in figuring this out would be appreciated.

@Purfview
Copy link
Contributor

Purfview commented Apr 14, 2025

Don't know about it but I've seen posts that 50xx series is waste of money.

Could you test if compute_type=int8_float16 works with your 5070?

@Purfview
Copy link
Contributor

Any help in figuring this out would be appreciated.

I just remembered one report that 2080 is ~30% faster than 5070Ti in sequential mode, but 5070Ti is faster in batched mode.

BTW, check if hyperoption like Hardware Accelerated GPU Scheduling is same with both cards.

@vicimikec
Copy link
Author

I put the 5070 TI in a windows system to check if it was one of those bad cards with missing ROPS. It is not. When I get it back into the server Ill test the "int_float16". That value throws errors on my 4070 TI Supers. I tried batch mode on the 5070 TI but it threw a CUBLAS_STATUS_NOT_SUPPORTED error. From what I can tell its a bug in the cublas library that should hopefully be fixed soon.

@Purfview
Copy link
Contributor

Purfview commented Apr 15, 2025

Ill test the "int_float16". That value throws errors on my 4070 TI Supers.

Oh I made a typo. It should be int8_float16.

I tried batch mode on the 5070 TI but it threw a CUBLAS_STATUS_NOT_SUPPORTED error.

float16 should work.

Relevant link: OpenNMT/CTranslate2#1865

From what I can tell its a bug in the cublas library that should hopefully be fixed soon.

Is it reported to nvidia, do you have any link?

@vicimikec
Copy link
Author

Oh that issue is what I was referring to. Just realized you are the reporter.

@vicimikec
Copy link
Author

Just put the 5070 TI back in a dev system and tried running in batch more with "int8_float16". Got the same error:

Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib64/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
File "/usr/local/bin/whisperserv-batched", line 332, in run
for segment in segments:
File "/usr/local/lib/python3.11/site-packages/faster_whisper/transcribe.py", line 553, in _batched_segments_generator
results = self.forward(
^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/faster_whisper/transcribe.py", line 121, in forward
encoder_output, outputs = self.generate_segment_batched(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/faster_whisper/transcribe.py", line 223, in generate_segment_batched
results = self.model.model.generate(
^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED
`

With "float16" in batch mode it ran through and was very slightly faster (1 second faster on a 2 hour long recording) than the 4070 TI Super with the same settings. I guess I am happy with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants