Skip to content

[Bug] Hive Catalog enable File Cache Orc row reader nextBatch failed #51092

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 3 tasks
pumbaaaaa opened this issue May 20, 2025 · 0 comments
Open
2 of 3 tasks

[Bug] Hive Catalog enable File Cache Orc row reader nextBatch failed #51092

pumbaaaaa opened this issue May 20, 2025 · 0 comments

Comments

@pumbaaaaa
Copy link

pumbaaaaa commented May 20, 2025

Search before asking

  • I had searched in the issues and found no similar issues.

Version

2.1.7

What's Wrong?

mysql> set enable_file_cache=true;
Query OK, 0 rows affected (0.00 sec)

mysql> select  date_format(cast(left(logtime, 19) as datetime(3)), '%Y-%m-%d') as dateValue from hive_36.streaming.mpaas_data_20230717 where pt_bd = '2025-04-28' group by dateValue limit 100;
ERROR 1105 (HY000): errCode = 2, detailMessage = (xx)[CANCELLED]cur path: hdfs://xx/user/hive/warehouse/streaming.db/mpaas_data_20230717/pt_td=2025-04-28/pt_bd=2025-04-28/compacted-part-c1607f8b-0b89-4aac-8555-9a0cb25dfd12-0-11348. Orc row reader nextBatch failed. reason = Buffer error in ZlibDecompressionStream::NextDecompress

mysql> set enable_file_cache=false;
Query OK, 0 rows affected (0.01 sec)

mysql> select  date_format(cast(left(logtime, 19) as datetime(3)), '%Y-%m-%d') as dateValue from hive_36.streaming.mpaas_data_20230717 where pt_bd = '2025-04-28' group by dateValue limit 100;
+------------+
| dateValue  |
+------------+
| 2025-04-26 |
| 2025-04-18 |
| 2023-06-20 |
| 2024-09-01 |
+------------+

Query Hive external table through catalog, if file_cache is enabled, the query fails; but if file_cache is disabled, the query succeeds. Aside from setting clear_file_cache = true to clear the file cache, how can this issue be resolved?

The following are all the error messages:

detailMessage = (xx)[CANCELLED]cur path: hdfs://xx/user/hive/warehouse/streaming.db/mpaas_data_20230717/pt_td=2025-04-28/pt_bd=2025-04-28/compacted-part-c1607f8b-0b89-4aac-8555-9a0cb25dfd12-0-11348. Orc row reader nextBatch failed. reason = Read past EOF in DecompressionStream::readBuffer

detailMessage = (xx)[CANCELLED]cur path: hdfs://xx/user/hive/warehouse/streaming.db/mpaas_data_20230717/pt_td=2025-04-16/pt_bd=2025-04-16/compacted-part-c1607f8b-0b89-4aac-8555-9a0cb25dfd12-0-7712. Orc row reader nextBatch failed. reason = Data error in ZlibDecompressionStream::NextDecompress

detailMessage = (xx)[CANCELLED]cur path: hdfs://xx/user/hive/warehouse/streaming.db/mpaas_data_20230717/pt_td=2025-04-28/pt_bd=2025-04-28/compacted-part-c1607f8b-0b89-4aac-8555-9a0cb25dfd12-0-11348. Orc row reader nextBatch failed. reason = Illegal run length for delta encoding: 1

detailMessage = (xx)[CANCELLED]cur path: hdfs://xx/user/hive/warehouse/streaming.db/mpaas_data_20230717/pt_td=2025-04-16/pt_bd=2025-04-16/compacted-part-c1607f8b-0b89-4aac-8555-9a0cb25dfd12-0-7773. failed to init reader, err: [INTERNAL_ERROR]Init OrcReader failed. reason = Invalid ORC postscript length

detailMessage = (xx)[CANCELLED]cur path: hdfs://xx/user/hive/warehouse/streaming.db/mpaas_data_20230717/pt_td=2025-04-28/pt_bd=2025-04-28/compacted-part-c1607f8b-0b89-4aac-8555-9a0cb25dfd12-0-11348. Orc row reader nextBatch failed. reason = Corrupt PATCHED_BASE encoded data (patchBitSize + pgw > 64)!

detailMessage = (xx)[CANCELLED]cur path: hdfs://xx/user/hive/warehouse/streaming.db/mpaas_data_20230717/pt_td=2025-04-28/pt_bd=2025-04-28/compacted-part-c1607f8b-0b89-4aac-8555-9a0cb25dfd12-0-11348. Orc row reader nextBatch failed. reason = Corrupt PATCHED_BASE encoded data (pl==0)!

detailMessage = (xx)[CANCELLED]cur path: hdfs://xx/user/hive/warehouse/streaming.db/mpaas_data_20230717/pt_td=2025-03-14/pt_bd=2025-03-14/part-41910-415c2648-d2db-4443-baa6-f346415998eb.c000.snappy.orc. Orc row reader nextBatch failed. reason = SnappyDecompressionStream choked on corrupt input

What You Expected?

Resolve this issue

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant