-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closure and closure context field loads are not folded in AOT #60068
Comments
/cc @alexmarkov |
This comment has been minimized.
This comment has been minimized.
@Manishtechbee please no useless LLM generated garbage on this issue tracker or you will be banned for spam. |
We tried this making these changes with @mkustermann but it didn't quite give us the same performance as manually inlining diff --git a/runtime/vm/compiler/backend/il.cc b/runtime/vm/compiler/backend/il.cc
index 42d89ce95f1..35258314ff3 100644
--- a/runtime/vm/compiler/backend/il.cc
+++ b/runtime/vm/compiler/backend/il.cc
@@ -2969,6 +2969,11 @@ Definition* LoadFieldInstr::Canonicalize(FlowGraph* flow_graph) {
}
}
break;
+ case Slot::Kind::kClosure_context:
+ if (auto closure_allocation = orig_instance->AsAllocateClosure()) {
+ return closure_allocation->context()->definition();
+ }
+ break;
default:
break;
}
@@ -3008,6 +3013,22 @@ Definition* LoadFieldInstr::Canonicalize(FlowGraph* flow_graph) {
}
}
+ if (auto instr = orig_instance->AsAllocateContext()) {
+ if (IsImmutableLoad()) {
+ for (auto use : instr->input_uses()) {
+ if (auto store = use->instruction()->AsStoreField()) {
+ if (IsDominatedBy(store)) {
+ if ((use->use_index() == StoreFieldInstr::kInstancePos) &&
+ store->slot().IsIdentical(slot())) {
+ // RELEASE_ASSERT(IsDominatedBy(store));
+ return store->value()->definition();
+ }
+ }
+ }
+ }
+ }
+ }
+
return this;
}
diff --git a/runtime/vm/compiler/backend/inliner.cc b/runtime/vm/compiler/backend/inliner.cc
index 2909c6915fc..e24e17e583f 100644
--- a/runtime/vm/compiler/backend/inliner.cc
+++ b/runtime/vm/compiler/backend/inliner.cc
@@ -464,7 +464,14 @@ class CallSites : public ValueObject {
worklist.Add(call_info.call);
}
}
- RELEASE_ASSERT(worklist.length() == calls_->length());
+ for (auto& call_info : closure_calls_) {
+ if (call_info.call->HasSSATemp()) {
+ add_to_worklist(call_info.call);
+ } else {
+ worklist.Add(call_info.call);
+ }
+ }
+ // RELEASE_ASSERT(worklist.length() == calls_->length());
add_transitive_dependencies_to_worklist(0);
// Step 2: canonicalize each definition from the worklist. We process
@@ -588,13 +602,13 @@ class CallSites : public ValueObject {
HandleStaticCall(static_call, inline_only_profitable_methods, graph,
depth, nesting_depth, inlined_info);
} else if (auto closure_call = current->AsClosureCall()) {
- if (!inline_only_profitable_methods) {
- // Consider closure for further inlining. Note that it will
- // still be subject to all the inlining heuristics.
- closure_calls_.Add({graph, closure_call, depth, nesting_depth});
- } else {
- // No longer consider the closure because inlining is too deep.
- }
+ // if (!inline_only_profitable_methods) {
+ // Consider closure for further inlining. Note that it will
+ // still be subject to all the inlining heuristics.
+ closure_calls_.Add({graph, closure_call, depth, nesting_depth});
+ // } else {
+ // // No longer consider the closure because inlining is too deep.
+ // }
}
}
} We improved the benchmark from 400us to 395us, but manually inlining There's also something strange in this change which is the It's also strange that eliminating a bunch of closure allocations in a hot loop gave us only a tiny improvement. |
This is likely a pass ordering problem: full-blown store-load forwarding happens after inlining is already finished. Simple store-loading forwarding during canonicalization (which happens during inlining) is one way to solve it. However, such store-load forwarding does not respect control flow and usually applicable only to final fields. With captured final local variables you need to be a bit more careful as there could be multiple assignments (stores) to a final local variable. For example: final int foo;
if (...) {
foo = 2;
} else {
foo = 3;
}
() { print(foo); } (); That's why you probably need a |
In https://github.com/osa1/protobuf.dart/tree/inlining_issue I add a bunch of inline pragmas to two higher-order functions and a few small functions.
Inlining these higher-order functions should eliminate indirect calls when calling the function arguments as in each call site the closure being passed is a function expression.
However currently AOT does not seem to fold field loads from closures and contexts.
If I manually inline
_readPacked
(by copy-pasting it into each of the call sites), that improves runtime offrom_binary
benchmark 7%. The diff for this change is at the end.To repro:
inlining_issue
branch../tool/compile_protos.sh
(requires protoc with Dart plugin)dart pub get
dart compile exe bin/from_binary.dart
I'm not fluent in VM's flow graph IR, but with
--extra-gen-snapshot-options=--print-flow-graph-filter=_mergeFromCodedBufferReader,--print-flow-graph-optimized,--compiler-passes=\*Inlining
and inline pragmas, I see code like(this is in an "after inlining" block)
Here
LoadField(v524, ...)
could be folded, which would then allow foldingLoadField(v4632, ...)
, which I suspect in the end would give us the optimizations.Inlining
_readPacked
manually is not ideal as that would potentially increase binary sizes on the browsers without improving runtime perf. Ideally I want to do this with backend-specific pragmas.Diff for manually inlining `_readPacked`
cc @mraleph
The text was updated successfully, but these errors were encountered: