Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMCL] 9x initialisation (and reset) speed increase #4941

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

doisyg
Copy link
Contributor

@doisyg doisyg commented Feb 21, 2025


Basic Info

Info Please fill out this column
Ticket(s) this addresses
Primary OS tested on Ubuntu
Robotic platform tested on Dexory robot
Does this PR contain AI generated software? No

Description of contribution in a few bullet points

On VERY big maps, the method map_update_cspace of AMCL takes ages. Which makes AMCL long to start, or reset (when for instance changing a dynamic parameter needing reinit_laser = true)
3 tricks by order of impact:

  • using std::queue instead of std::priority_queue. But this is why this is a draft, I cannot understand why std::priority_queue was needed in the first place (and which criteria it used for ordering), so I am not sure.
  • avoiding one CellData copy with emplace instead of push.
  • computing MAP_INDEX(map, i, j) once instead of twice.

On a big map, AMCL activation times drops from 22s to 2.4s on my machine.

Description of documentation updates required from your changes

Description of how this change was tested


Future work that may be required in bullet points

For Maintainers:

  • Check that any new parameters added are updated in docs.nav2.org
  • Check that any significant change is added to the migration guide
  • Check that any new features OR changes to existing behaviors are reflected in the tuning guide
  • Check that any new functions have Doxygen added
  • Check that any new features have test coverage
  • Check that any new plugins is added to the plugins page
  • If BT Node, Additionally: add to BT's XML index of nodes for groot, BT package's readme table, and BT library lists

Guillaume Doisy added 2 commits February 21, 2025 19:17
Signed-off-by: Guillaume Doisy <[email protected]>
Signed-off-by: Guillaume Doisy <[email protected]>
Copy link

codecov bot commented Feb 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Files with missing lines Coverage Δ
nav2_amcl/src/map/map_cspace.cpp 85.24% <100.00%> (-5.94%) ⬇️

... and 2 files with indirect coverage changes

Copy link
Member

@SteveMacenski SteveMacenski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I’m also not entirely clear on reading this how the priority queue ever worked considering no comparator for the class/struct was provided. I don’t see any reason except maybe someone thinking it would improve caching efficiency to do patches near each other consecutively.

I assume the queue change is what resulted in the major runtime benefits, but can you confirm that?

@mikeferguson do you see any reason to not merge this?

@@ -130,18 +129,14 @@ void enqueue(
return;
}

map->cells[MAP_INDEX(map, i, j)].occ_dist = distance * map->scale;
const int map_index = MAP_INDEX(map, i, j);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be done before L120 to remove another MAPINDEX look up on L120.

@mikeferguson
Copy link
Contributor

This code was added 15 years ago - by Gerkey (who generally knows what he is doing)

On line 82-87, there is a less-than comparator defined for CellData - it compares the occ_dist of each of the cells. The order in which cells are expanded is important - since we're basically doing wavefront expansion from occupied cells and on L120 we only ever set the occ_dist once. In theory, you will get nearly the same map - but possibly not identically the same map if you don't enforce expanding the closest cells first.

I see at least three possible options here:

  1. Verify that the standard queue ends up the same (we know for sure that we enqueue all the occ_dist=0 cells first, outside of the queue processing loop. Then the occ_dist=0 cells would be expanded, enqueuing everything that is occ_dist=~1 from an obstacle. As those are expanded, we get the occ_dist=~2 cells. I think this tracks as being close to the same.... but I think it's not 100% the same since there is also the complication of the CachedDistanceMap - and so we're actually setting occ_dist using a bunch of floating point numbers).

  2. Use the standard queue, but if we can't prove things never get unordered, you could change L120 so that we don't early exit, and instead verify that our new candidate source cell isn't closer. This is probably still faster than the priority queue, but I can't say for sure without testing.

  3. Stick with the priority queue - but we could also probably speed it up a bit (at the cost of a bit higher memory usage) by inserting the occ_dist into the CellData structure and avoiding 2x MAP_INDEX lookup in every call to the less-than operator.

@SteveMacenski
Copy link
Member

SteveMacenski commented Mar 10, 2025

Ah, I didn't see the comparator in the priority queue constructor nor after the class so I missed it.

These are good points, thanks Fergs! I think (3) is the most sensible since it is both the least impactful and easiest to validate. Though 2 should be similarly simple to test.

@doisyg what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants