Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[high cardinality queries] track OOM on trace item table and time series endpoints #6814

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

xurui-c
Copy link
Member

@xurui-c xurui-c commented Jan 24, 2025

Copy link

sentry-io bot commented Jan 24, 2025

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: snuba/clickhouse/native.py

Function Unhandled Issue
execute ClickhouseError: Connection refused (127.0.0.1:9058) clickhouse_n...
Event Count: 5

Did you find this useful? React with a 👍 or 👎

@xurui-c xurui-c force-pushed the rachel/trackOOM branch 2 times, most recently from 25f5fe8 to eb45416 Compare January 31, 2025 08:28
@xurui-c xurui-c changed the title track OOM [high cardinality queries] track OOM on trace item table and time series endpoints Jan 31, 2025
Copy link

codecov bot commented Jan 31, 2025

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
2795 1 2794 11
View the top 1 failed tests by shortest run time
tests.web.rpc.v1.test_endpoint_time_series.test_endpoint_time_series.TestTimeSeriesApi::test_OOM
Stack Traces | 0.27s run time
Traceback (most recent call last):
  File ".../v1/test_endpoint_time_series/test_endpoint_time_series.py", line 907, in test_OOM
    sentry_sdk_mock.assert_called_once()
  File ".../local/lib/python3.11/unittest/mock.py", line 918, in assert_called_once
    raise AssertionError(msg)
AssertionError: Expected 'capture_exception' to have been called once. Called 0 times.

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

@xurui-c xurui-c force-pushed the rachel/trackOOM branch 4 times, most recently from 7266cc6 to 9f7e8a8 Compare January 31, 2025 21:59
@xurui-c xurui-c marked this pull request as ready for review January 31, 2025 22:27
@xurui-c xurui-c requested review from a team as code owners January 31, 2025 22:27
except QueryException as e:
if e.extra[
"code"
] == 241 or "DB::Exception: Memory limit (for query) exceeded" in str(e):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to match on the code and the string? Or do we need to look for the full message as opposed to just Memory limit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do either. I'm kind of making the assumption here that ClickHouse's OOM error will always be 241. As an extra measure I'm also doing string matching with "DB::EXception: Memory..."

I'm not string matching for the full message since the full message can be extremely long and differs for every error, but it seems like "DB::Exception: Memory limit (for query) exceeded" is the same in every OOM error

snuba/web/rpc/v1/endpoint_trace_item_table.py Outdated Show resolved Hide resolved
@xurui-c xurui-c force-pushed the rachel/trackOOM branch 2 times, most recently from 291b242 to d4b777c Compare February 3, 2025 17:48
snuba/web/db_query.py Outdated Show resolved Hide resolved
@xurui-c xurui-c requested a review from phacops February 3, 2025 19:57
snuba/web/rpc/__init__.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants