-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scale function(_get_mean_var) updated for dense array, speedup upto ~4.65x #3280
base: main
Are you sure you want to change the base?
scale function(_get_mean_var) updated for dense array, speedup upto ~4.65x #3280
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3280 +/- ##
==========================================
- Coverage 76.67% 76.59% -0.08%
==========================================
Files 112 112
Lines 12946 12949 +3
==========================================
- Hits 9926 9918 -8
- Misses 3020 3031 +11
|
src/scanpy/preprocessing/_utils.py
Outdated
if isinstance(X, np.ndarray): | ||
n_threads = numba.get_num_threads() | ||
mean, var = _compute_mean_var(X, axis=axis, n_threads=n_threads) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened to sparse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still try to integrate all the stuff it's a mess though.
src/scanpy/preprocessing/_utils.py
Outdated
else: | ||
mean = axis_mean(X, axis=axis, dtype=np.float64) | ||
mean_sq = axis_mean(elem_mul(X, X), axis=axis, dtype=np.float64) | ||
var = mean_sq - mean**2 | ||
# enforce R convention (unbiased estimator) for variance | ||
if X.shape[axis] != 1: | ||
var *= X.shape[axis] / (X.shape[axis] - 1) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove erroneous diffs
Co-authored-by: Ilan Gold <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! This seems a bit unfinished. If you need our help with a local test environment or so, we’re happy to help.
@@ -33,7 +33,10 @@ def _(X: np.ndarray, *, axis: Literal[0, 1], dtype: DTypeLike) -> np.ndarray: | |||
def _get_mean_var( | |||
X: _SupportedArray, *, axis: Literal[0, 1] = 0 | |||
) -> tuple[NDArray[np.float64], NDArray[np.float64]]: | |||
if isinstance(X, sparse.spmatrix): | |||
if isinstance(X, np.ndarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of adding a second code path that handles np.ndarray, you should replace the existing one above:
@axis_mean.register(np.ndarray)
def _(X: np.ndarray, ...): ...
axis_i = 0 | ||
mean = np.zeros(X.shape[axis_i], dtype=np.float64) | ||
var = np.zeros(X.shape[axis_i], dtype=np.float64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pull this out of the if branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean these assignments. When you have two branches, and both start with the same 3 lines, just do those before the if statement instead.
var[r] += value * value | ||
for c in numba.prange(X.shape[0]): | ||
mean[c] = mean[c] / X.shape[1] | ||
var[c] = (var[c] - mean[c] ** 2) / (X.shape[1] - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no return statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have updated the code please check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the tests aren’t passing, so it still doesn’t seem to be working
…cale-mean-variance
…a/scanpy into scale-mean-variance
This also doesn't work for F-Arrays |
…cale-mean-variance
We had created this PR before #3099. This one is the same PR with editing enabled for maintainers.
Hi,
We are submitting PR for speed up of the _get_mean_var function.
experiment setup : AWS r7i.24xlarge
add timer around _get_mean_var call
scanpy/scanpy/preprocessing/_scale.py
Line 167 in 706d4ef
we can also create _get_mean_var_std function that return std as well so we don't require to compute it in scale function(L168-L169).