Replace `(usize, Option<usize>)` with custom type `SizeHint` and eliminate `try_size_hint()`. #218

kpreid · 2025-04-10T15:17:41Z

The SizeHint type provides these advantages:

Easier to construct by composition, using + and | or methods.
More self-explanatory.
Distinguishes between “unknown size” and “known to be unbounded” to aid debugging.

Since this is a breaking change to the trait, this PR also makes the closely related change of combining try_size_hint() and size_hint() into a single fallible function. This will make it easier to correctly implement the trait.

Also, while I was touching the macro code, I made all the crate paths absolute (::arbitrary), as they should be for the maximum compatibility.

Fixes #201.
Part of #217.

fitzgen

Thanks for whipping up this PR!

I think we still have some design questions to answer before we commit to a particular approach. See inline comments below.

fitzgen · 2025-04-15T17:03:25Z

src/size_hint.rs

+    /// When the depth is not too deep, calls `f` with `depth + 1` to calculate the
+    /// size hint.
+    ///
+    /// Otherwise, returns an error.
+    ///
+    /// This should be used when implementing [`Arbitrary::size_hint()`]
+    /// for generic types.
+    pub fn recursion_guard(
+        depth: usize,
+        f: impl FnOnce(usize) -> Result<Self, crate::MaxRecursionReached>,
+    ) -> Result<Self, crate::MaxRecursionReached> {
+        if depth > MAX_DEPTH {
+            Err(crate::MaxRecursionReached {})
+        } else {
+            f(depth + 1)
+        }


Let's switch from depth to fuel, from incrementing to decrementing, and from depth > MAX_DEPTH to fuel == 0.

I think we will may also want to preserve the fuel count across size_hint and recursion_guard invocations such that if we are given fuel = 100 that doesn't mean "process type trees of up to depth 100" but instead "call up to 100 size_hint/recursion_guard methods". The latter is a better overall limit on compute spent on size hints. The former isn't very useful because a depth of 10 already gets to binary trees of size 1024 but also isn't very deep at all for non-pathological/non-exponential cases.

This probably involves defining and threading a new context type through size_hint calls and removing public constructors for SizeHint and replacing them with methods on the context, something like

pub struct SizeHintContext { fuel: usize, } impl SizeHintContext { pub fn new(fuel: usize) -> Self { Self { fuel } } fn consume_fuel(&mut self) -> Result<()> { if *self.fuel == 0 { return Err(()); } *self.fuel -= 1; Ok(()) } pub fn exactly(&mut self, size: usize) -> Result<SizeHint> { self.consume_fuel()?; Ok(SizeHint { lower: size, upper: Upper::Bounded(size), }) } pub fn at_least(&mut self, size: usize) -> Result<SizeHint> { self.consume_fuel()?; Ok(SizeHint { lower: size, upper: Upper::Unbounded, }) } // Other `SizeHint` constructors and combinators... } // ... pub trait Arbitrary { fn size_hint(context: &mut SizeHintContext) -> Result<SizeHint> { SizeHint { lower: 0, upper: Upper::Unbounded, } } }

It might even make sense to actually put the SizeHint inside the SizeHintContext and just have methods that mutate the size hint being built, rather than returning new size hints that we then need to combine. This would further disincentivize constructing a new SizeHintBuilder in the middle of a size_hint method implementation, which would defeat the purpose of having fuel in the first place. Doing it this way would also allow extracting the half-built SizeHint and getting at least an idea of the lower bound, even when we run out of fuel. (Maybe the consume_fuel method should set upper to unbounded if it returns an error?)

This approach effectively turns the context into a builder:

pub struct SizeHintBuilder { fuel: usize, lower: usize, upper: Upper, } impl SizeHintBuilder { pub fn new(fuel: usize) -> Self { Self { fuel, lower: 0, upper: Upper::Bounded(0), } } fn consume_fuel(&mut self) -> Result<()> { if *self.fuel == 0 { return Err(crate::Error::SizeHintOutOfFuel); } *self.fuel -= 1; Ok(()) } pub fn finish(self) -> SizeHint { SizeHint { lower: self.lower, upper: self.upper, } } pub fn exactly(&mut self, size: usize) -> Result<()> { self.consume_fuel()?; self.lower = self.lower.checked_add(size).ok_or(())? self.upper += Upper::Bounded(size); Ok(()) } pub fn at_least(&mut self, size: usize) -> Result<()> { self.consume_fuel()?; self.lower = self.lower.checked_add(size).ok_or(())? self.upper += Upper::Unbounded; Ok(()) } pub fn unknown(&mut self) -> Result<()> { self.consume_fuel()?; self.upper += Upper::Unknown; Ok(()) } // Other `SizeHint` constructors and combinators... } pub trait Arbitrary { fn size_hint(builder: &mut SizeHintBuilder) -> Result<()> { builder.unknown() } }

What do other arbitrary maintainers think of the above approaches? I'm kind of partial to the builder-style API myself.

cc @Manishearth @nagisa

I guess the and_all and or_all combinators get a little trickier with the builder style API rather than the context-style API. We would either need to create sub-builders, which makes it easy to defeat the purpose of a single shared fuel value, or we would have to do some nasty generics magic which makes the API impenetrable. Or some other funky API to thread the fuel through.

This is making me lean back towards the context-style approach.

The builder seems fine, I'm worried about too much additional computation in what is essentially pre-fuzz infrastructure

I guess the and_all and or_all combinators get a little trickier with the builder style API

Even or() is difficult to do that way because you need to take the minimum of multiple independent sums.

That said, I think it would make sense to track fuel in, not a “builder” (makes one output value) but a “factory” (makes many); methods of the factory are constructors of SizeHint values (or error). I will see about trying that out.

@Manishearth

I'm worried about too much additional computation in what is essentially pre-fuzz infrastructure

Nothing here should be any more computationally complex than what we have today, it is all just hiding implementation details in non-public struct fields for the most part.

you couldn't construct a SizeHint without checking/consuming fuel,

Something I’d like to be able to offer someday — though there is no provision in the current trait/macro for doing so in practice — is the possibility of defining size hints for finite structures as constants. This would mean that infinite recursions would automatically be caught at compile time by the const evaluation limit, and run time doesn't need to compute an essentially constant value every time. That world can be mixed with the get() world, but I don't think it can be mixed with the "all construction consumes fuel" world (unless there is also some kind of ContextWithInfiniteFuel for static use).

That said, I do see your point and I'll switch it over if you still object. However, I don't think it is necessary to go back to using Result to do so.

(unless there is also some kind of ContextWithInfiniteFuel for static use).

Actually, no reason that would need to be separate. And a macro could help support this hypothetical const use case. All right. I still think that interposing on recursions only is cleaner, but ease of getting it right is the most important thing.

Something I’d like to be able to offer someday — though there is no provision in the current trait/macro for doing so in practice — is the possibility of defining size hints for finite structures as constants. This would mean that infinite recursions would automatically be caught at compile time by the const evaluation limit, and run time doesn't need to compute an essentially constant value every time. That world can be mixed with the get() world, but I don't think it can be mixed with the "all construction consumes fuel" world (unless there is also some kind of ContextWithInfiniteFuel for static use).

This would be nice, but I wouldn't bend over backwards for it. In practice, LLVM should be able to inline and const-fold away 99% of size_hint implementations, so the benefit is really just for people writing size_hints by hand (since we can presumably always do the right thing in the derive macro). All that is to say, there is also value in having a simple, easy-to-understand-at-a-glance API and sometimes that can outweigh fitting things into the static type system. Basically the same reason why we don't always use branded-lifetimes tricks (similar to the ghost-cell crate) every time we index into slices, and accept that bounds checks will happen at runtime (modulo llvm optimizing them away), even though we could use a convoluted API to prove safety at compile time and which would allow us to omit the runtime bounds checks.

However, I don't think it is necessary to go back to using Result to do so.

Ah, this is a good point: the helper function I wrote is basically context.get::<T>() in the new PR iteration. I guess the one thing that Results allow for is easily determining whether the result of a size_hint call is partial or not, but given that you've put an out_of_fuel: bool field inside SizeHint that can still be easily exposed to users (most of whom likely will not care, so making it less in-your-face API-wise as a method on SizeHint that userc can choose to query, rather than a variant they are forced to match on before doing what they really want to do, is probably a good trade off).

I still think that interposing on recursions only is cleaner, but ease of getting it right is the most important thing.

Yeah, to be clear avoiding fuel on leaf types would be nice, but I think preventing footguns is definitely the more important property here.

fitzgen · 2025-04-15T17:47:11Z

src/size_hint.rs

+    /// The size is known to be unbounded (or overflow [`usize`]).
+    Unbounded,
+    /// The size is unknown because a more specific size hint was not given.
+    /// This is functionally identical to `Unbounded` but prints differently.
+    Unknown,


I'm not convinced that there is a meaningful difference between these variants. Just having a different Debug print doesn't really seem worth it to me.

I included this feature precisely because, if it had existed, it would have helped me troubleshoot a missing size hint method.

…size>)>`. Introducing the `SizeHint` type provides these advantages: * Easier to construct by composition, using `+` and `|` or methods. * More self-explanatory. * Distinguishes between “unknown size” and “known to be unbounded” to aid debugging. The `Context` instead of `depth` provides these advantages: * Fully bounded computation with no dependence on the recursion breadth. * Elimination of `Result`, making `size_hint()` implementors simpler and allowing reporting of partially-known lower bounds rather than discarding all information on error. Since there are no longer any `Result`s, `try_size_hint()` is of course gone, making it easier to correctly implement the trait. But that could also have been done by adding the `Result` return to `size_hint()` if we were keeping it. Also, while I was touching the macro code, I made all the crate paths absolute (`::arbitrary`), as they should be for the maximum compatibility.

fitzgen

Thanks for your patience and sticking with this while we work out the design kinks!

I think we are now at a place where we agree what the design should look like (no Results; SizeHint construction consumes fuel, even for leaf types) so if you update this PR accordingly, I can take a closer look and provide the hopefully final review.

Thanks again!

fitzgen · 2025-05-01T17:39:51Z

src/size_hint.rs

+    /// Consumes 1 unit of fuel if available, and returns whether that fuel was available.
+    #[inline(never)]
+    fn consume_fuel(&self) -> bool {


Is there a reason you made this inline(never)? I would expect that we would want to make it inline instead, so that LLVM can fully inline and const-fold away size_hint implementations for small, non-recursive types.

fitzgen · 2025-05-01T17:47:15Z

src/size_hint.rs

+    /// Returns [`T::size_hint()`][Arbitrary::size_hint], and consumes 1 unit of fuel too.
+    #[inline]
+    pub fn get<'a, T: Arbitrary<'a>>(&self) -> SizeHint {
+        if self.consume_fuel() {
+            T::size_hint(self)
+        } else {
+            SizeHint::OUT_OF_FUEL
+        }
+    }


Two nitpicks:

I don't think we should make guarantees about how much fuel any particular method consumes. When describing what fuel is (say on the Context::new constructor), we can give rough estimates, but I think we want to keep it somewhat fuzzy so that we can adjust if necessary. The important part is that it provides a finite bound to size-hint computation, not exactly what that bound is.

As discussed in thread, we shouldn't actually need this method anymore, once SizeHint construction (even for leaf types) consumes fuel. Users can just call the trait methods directly instead.

fitzgen · 2025-05-01T17:50:03Z

src/size_hint.rs

+    /// Take the sum of the `self` and `rhs` size hints.
+    ///
+    /// Use this when an [`Arbitrary::arbitrary()`] implementation is going to consume both
+    /// `self` bytes and `rhs` bytes, such as for the fields of a `struct`.
+    ///
+    /// This operation can also be written as the `+` operator.
+    #[inline]
+    pub const fn and(self, rhs: Self) -> Self {


What do you think about moving all the and/and_all/or/or_all/repeat combinators to methods on Context? That way we retain the flexibility to make them consume fuel in the future, if that ever becomes necessary.

fitzgen · 2025-05-01T17:52:24Z

src/size_hint.rs

+    /// Also contains some meta-information about the size hint as a whole.
+    upper: Upper,
+
+    out_of_fuel: bool,


I think we will want a method on SizeHint, and probably Context too, for checking whether we are out of fuel. I didn't see these below; apologies if these exist and I just missed it in my skim.

kpreid force-pushed the sizehint branch from 6d4d9a0 to 92d146f Compare April 10, 2025 15:30

kpreid mentioned this pull request Apr 10, 2025

List of breaking changes for next breaking release #217

Open

8 tasks

fitzgen reviewed Apr 15, 2025

View reviewed changes

kpreid force-pushed the sizehint branch from 92d146f to a3855f2 Compare April 26, 2025 20:52

fitzgen reviewed May 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace `(usize, Option<usize>)` with custom type `SizeHint` and eliminate `try_size_hint()`. #218

Replace `(usize, Option<usize>)` with custom type `SizeHint` and eliminate `try_size_hint()`. #218

kpreid commented Apr 10, 2025 •

edited

Loading

fitzgen left a comment

fitzgen Apr 15, 2025

fitzgen Apr 15, 2025

Manishearth Apr 15, 2025

kpreid Apr 15, 2025 •

edited

Loading

fitzgen Apr 15, 2025

kpreid Apr 30, 2025

kpreid Apr 30, 2025

fitzgen May 1, 2025

fitzgen May 1, 2025

fitzgen May 1, 2025

fitzgen Apr 15, 2025

kpreid Apr 15, 2025 •

edited

Loading

fitzgen left a comment

fitzgen May 1, 2025

fitzgen May 1, 2025

fitzgen May 1, 2025

fitzgen May 1, 2025

Replace (usize, Option<usize>) with custom type SizeHint and eliminate try_size_hint(). #218

Are you sure you want to change the base?

Replace (usize, Option<usize>) with custom type SizeHint and eliminate try_size_hint(). #218

Conversation

kpreid commented Apr 10, 2025 • edited Loading

fitzgen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kpreid Apr 15, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kpreid Apr 15, 2025 • edited Loading

Choose a reason for hiding this comment

fitzgen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Replace `(usize, Option<usize>)` with custom type `SizeHint` and eliminate `try_size_hint()`. #218

Replace `(usize, Option<usize>)` with custom type `SizeHint` and eliminate `try_size_hint()`. #218

kpreid commented Apr 10, 2025 •

edited

Loading

kpreid Apr 15, 2025 •

edited

Loading

kpreid Apr 15, 2025 •

edited

Loading