Skip to content

Revert "Improve coalesce and concat performance for views (#7614)" #7623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 7, 2025

Conversation

Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Jun 7, 2025

This reverts commit 7739a83.

Which issue does this PR close?

Rationale for this change

I found this errors in DataFusion (see apache/datafusion#16249 (comment)), so let's revert it and find the error.

What changes are included in this PR?

Are there any user-facing changes?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dandandan

if keep_views {
self.views_buffer.extend_from_slice(array.views());
} else {
let starting_buffer = self.completed.len() as u32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one bug is that this starting buffer value should be calculated before extending self.completed a few lines above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah

if keep_views {
self.views_buffer.extend_from_slice(array.views());
} else {
let starting_buffer = self.completed.len() as u32;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm - this seems off as we already pushed to self.completed @alamb (I only have time later today to address)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make some new tests / etc to try and cover this case

@alamb alamb merged commit da461c8 into apache:main Jun 7, 2025
29 checks passed
@alamb
Copy link
Contributor

alamb commented Jun 7, 2025

Let's revert it to be safe, and we can re-create a new PR with some more testing

Dandandan added a commit to Dandandan/arrow-rs that referenced this pull request Jun 7, 2025
alamb pushed a commit that referenced this pull request Jun 8, 2025
#7625)

… (#7614)" (#7623)"

This reverts commit da461c8.

This adds a test and fix for the wrong index issue.
I also verified the change for DataFusion (and benchmarks show notable
improvements).

# Which issue does this PR close?


Closes #NNN.

# Rationale for this change


# What changes are included in this PR?



# Are there any user-facing changes?
alamb added a commit that referenced this pull request Jun 9, 2025
# Which issue does this PR close?

- Follow on to #7625 from
@Dandandan

# Rationale for this change

I want to eventually remove `gc_string_view` but currently the unit
tests are in terms of that function

# What changes are included in this PR?

Rewrite tests to be in terms of `coalesce` instead

Also, 
1. Add additional coverage for the issue we saw in
#7623
2. Add add coverage for the case where there are data buffers in the
view, but they are not referenced by any view
#7625 (comment)

Codecov of this module is now 100%

# Are there any user-facing changes?

If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants