-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update timestamps.pyx #60624
base: main
Are you sure you want to change the base?
Update timestamps.pyx #60624
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Can you give this PR a more descriptive title. Please follow the guide here:
https://pandas.pydata.org/pandas-docs/dev/development/contributing.html#making-a-pull-request
Also, when changing behavior, always add tests.
_Timestamp ts | ||
|
||
# Check for potential overflow before normalization | ||
if local_val < INT64_MIN or local_val > INT64_MAX: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local_val is an int64, so don't these conditions always evaluate to false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right.
Since pd.Timestamp.min is set to the smallest possible value in two's complement, then when we try to subtract even a small positive number, it can't get even "more negative". So would it be fair to say that what was described in the initial issue isn't necessarily a bug of pandas but rather just a constraint of the two's complement arithmetic that pandas uses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of adding a check that always evaluates to False
regardless of the value of local_val
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you declare local_val
as int64_t here you can just add the Cython @cython.overflowcheck(True)
decorator to this function. That should greatly simplify what you are trying to do here while being much more performant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, @rhshadrach, I meant to say that you were correct in pointing that out, and I agree that this line
if local_val < INT64_MIN or local_val > INT64_MAX:
is not a good way to check for the overflow.
@WillAyd, are you proposing a change to the normalize function? Or to int64_t normalize_i8_stamp? Because in the return for int64_t normalize_i8_stamp, I think the subtraction of any positive value from local_val, when local_val is the Timestamp.min, is what is causing the wrap around.
Also, what should be the expected behavior if overflow occurs during normalization? For example, should the code raise an exception, return the original timestamp, return NaT, or do something else entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the subtraction of any positive value from local_val, when local_val is the Timestamp.min, is what is causing the wrap around.
Wherever this is happening you can use the decorator
Also, what should be the expected behavior if overflow occurs during normalization? For example, should the code raise an exception, return the original timestamp, return NaT, or do something else entirely?
It should raise an error. Signed overflow is undefined behavior - we can't do anything about it but raise in advance of that happening
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Proposing a change to fix this issue: #60583. My understanding is that the bug is due to the code logic described in timestamps.pyx.
The first option that I considered was modifying normalize in timestamps.pyx. The second option that I considered was modifying normalize_i8_stamp() in timestamps.pyx, and also updating normalize_i8_stamp() in timestamps.pxd to update the normalize_i8_stamp() declaration to match the new signature.
datetimes.py would not be modified because my understanding is that this overflow issue is not present in datetimes.