Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(hardware): Add some logging around what I think is causing some can errors #17117

Merged
merged 3 commits into from
Jan 10, 2025

Conversation

ryanthecoder
Copy link
Contributor

Overview

We occasionally get some ABR issues where the can bus says it's not receiving ACKs or that a move group says it didn't get all expected nodes, but reports [] as the missing nodes.

I think the second bug is because we're checking with if not self._moves[group_id] instead of checking if len(self._moves[group_id]) == 0: Which should be equivalent but I think there may be some python cleanup bug that reports the list as true due to a remnant that hasn't been garbage collected yet.

Test Plan and Hands on Testing

Changelog

Review requests

Risk assessment

@ryanthecoder ryanthecoder requested a review from a team as a code owner December 16, 2024 18:36
Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the logic in the new check true, and thus logging, only if those values are consistent?

@ryanthecoder ryanthecoder changed the base branch from edge to chore_release-8.3.0 January 10, 2025 16:29
@@ -284,6 +285,7 @@ async def _ensure_send(
expected_nodes = list(self._known_nodes)
else:
expected_nodes = [node_id]
log.warning(f"Setting expected nodes to {expected_nodes}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to flood the logs no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If everything is being done right this should never be in the logs, if we start seeing it, that mean's we are not setting up the can messages correctly and we need to hook up the "expected nodes" argument to this method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we delete the one on line 282 then

Copy link
Contributor

@SyntaxColoring SyntaxColoring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Messages and logic LGTM, modulo @vegano1's question above.

I don't see any way that bool(self._moves) could disagree with len(self._moves), though.

@ryanthecoder
Copy link
Contributor Author

Messages and logic LGTM, modulo @vegano1's question above.

I don't see any way that bool(self._moves) could disagree with len(self._moves), though.

I agree but we have seen a few cases in ABR where this actually seems to be the case. I'm hoping this log either reveals an issue with python garbage collecting or shows us that we need to look elsewhere for the error.

Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good but can we condense some of the messages that are now duplicated

@@ -284,6 +285,7 @@ async def _ensure_send(
expected_nodes = list(self._known_nodes)
else:
expected_nodes = [node_id]
log.warning(f"Setting expected nodes to {expected_nodes}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we delete the one on line 282 then

@@ -154,6 +154,7 @@ async def send_and_verify_recieved(self) -> ErrorCode:
log.error(
f"Message did not receive ack for message index {self._message.payload.message_index}"
)
log.error(f"Missing node {self._expected_nodes}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we condense this and the previous message

@ryanthecoder ryanthecoder merged commit 02b6a42 into chore_release-8.3.0 Jan 10, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants