Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialization and Linearization #3352

Open
AayushSabharwal opened this issue Jan 24, 2025 · 2 comments
Open

Initialization and Linearization #3352

AayushSabharwal opened this issue Jan 24, 2025 · 2 comments

Comments

@AayushSabharwal
Copy link
Member

We have a bit of a fundamental issue with running initialization in linearization_function and the like. Note that all of this discussion is in the context of #3347, on the assumption it is merged. With that PR, any initial conditions of the form u0 = [x =>1, y => 2] will be turned into the equations x ~ x0_tmp, y ~ y0_tmp in the initialization system. The new parameters x0_tmp and y0_tmp are only present in the initialization system, and take whatever value they have in the ODEProblem. This allows mutating prob[x] = 2.0 and automatically getting the equivalent of x ~ 2.0 in initialization.

As with all good things, there's a catch. This transformation only happens for variables that are unknowns in the system. If z is an observed variable (present as an equation of the form z ~ ... in observed(sys)) then u0 = [z => 1.0] will be hardcoded in the initialization system as z ~ 1.0. This is because even if z0_tmp existed, where would it get its value from in the ODEProblem? z isn't in the state vector, it is calculated from the state vector. We could add z0_tmp as a parameter to the ODESystem, but the user doesn't know about this parameter. It's going to have a name with some unicode to prevent collisions, which automatically makes it difficult to type. Since systems are immutable, this would also mean that the new parameter is only present in the system stored in prob.f.sys, not the system the user has, the one they passed to ODEProblem. We'd also need to be very careful about how this parameter is managed, because it would be very easy to make it so that the parameters in prob.f.sys have different indexes than sys and thus getter functions made using sys won't work on the actual value providers, which is unacceptable.

Where does linearization come into this? Well, for ODEProblem if the user wants to change the initial condition for z, they can simply call remake and pass z => 2.0. This will rebuild the initialization system accordingly. This approach isn't an option for linearization, because linearization_function is meant to return a function that can be called with multiple operating points and run very quickly on account of being called very often. Effectively, linearization_function returns a LinearizationProblem for which the intended use case is the equivalent of calling symbolic remake many times in a loop. "Many" can be upwards of 6000 times on relatively small problems. A challenge with UX here is that the system the user passes to linearization_function is not actually the system that is used for code generation. It goes through io_preprocessing to mark some variables as inputs/outputs and hence structurally change the system. This is even worse in e.g. sensitivity_function, which adds equations and variables before calling linearization_function. As a result, the user has no control and almost no intuition or feedback for which variables are observed and which are unknowns in the system used to build initialization. This is of course bad: the user should be able to tune what they want without being concerned about whether it's an unknown or not, and currently they can't.

One possible way to handle this is the same z0_tmp approach, but since MTK has control over the object returned from linearization_function (instead of lower-level SciML infrastructure as is the case with ODEProblem) it can update the "LinearizationProblem" appropriately. The downside here is that

  1. The user has to pass an operating point containing all the variables they want to tune to linearization_function so the appropriate var ~ var0_tmp variables are created.
  2. The object returned from linearization_function is stateful in a much more indirect way. If the user passes z => 2.0 as part of op to linearize(sys, lin_fun; op), it will mutate lin_fun. The next time linearize is called, z0_tmp will have the value of 2 if the user doesn't pass a new value for z. If they called linearize in the reverse order, this wouldn't be the case and they'd get a different result out. Moreover, the user doesn't know whether a variable passed to op will do this mutation so they can't schedule their linearize calls appropriately either.

@baggepinnen had phrased this problem nicely, which I'll paraphrase here: SciML follows a prob[var] = value; solve(prob) user workflow. Linearization wants a solve(prob, values) workflow which is antithetical to how everything else is designed.

One way to look at this is if we were to solve this problem for ODEProblem (allow changing initial conditions of observed variables without remake) how would we do it? What would the user API look like?

@AayushSabharwal
Copy link
Member Author

One (breaking) solution is to lean in to the LinearizationProblem workflow, actually setting up a LinearizationProblem and changing user workflows to be explicit about their mutation.

@hersle
Copy link
Contributor

hersle commented Jan 24, 2025

One way to look at this is if we were to solve this problem for ODEProblem (allow changing initial conditions of observed variables without remake) how would we do it? What would the user API look like?

TL; DR: I think defaults = [y => 1.0] and u0 = [y => 1.0] should be equivalent, and that both should produce an equation y ~ y_out in the initialization problem with default y => 1.0, instead of the hardcoded y ~ 1.0.


In my opinion, an ODEProblem should not be "just an ODE", but fundamentally regarded as a two-stage initialization+ODE problem with fixed input/output structure between each stage:

  1. solves the initialization problem for a fixed set of input variables from the user, and outputs its solution to stage 2,
  2. solves the ODE problem with the fixed input variables from stage 1, and outputs its solution to the user.

The code for every sub-problem in the chain and the input/output piping between them should be completely locked in after calling prob = ODEProblem(sys, u0, tspan, p). For example, suppose an ODE has equations [D(x) ~ 0, y ~ 2*x], and the user wants observed initialization with u0 = [y => 2.0]. Then the composite problem prob should:

  • have an independent input parameter y that the user can mutate/remake/...,
  • stage 1 should have an initialization equation y ~ 2*x or x ~ y/2 (no hardcoded y ~ 2.0!),
  • stage 2 should initialize x in the ODE from the solution of stage 2.

It should be illegal for the user to modify/remake x in prob, because it is not an independent input parameter of stage 1, but a derived "output parameter" of the whole composite problem. Remake should only handle updating the values (and types) of the independent parameters of the composite problem. And Base.show(prob) should show the user only the independent parameters they are allowed to modify (not u0 in the ODE, which may depend on initialization).

If the user wants x as an independent input parameter instead, they should recreate a different ODEProblem(u0 = [x => 1.0]) which should:

  • have an independent parameter x (which is contained in the user-defined system),
  • stage 1 should have a equation x_out ~ x (where x_out is an internal dummy variable that the user should not access, in order to make it possible not to hardcode x ~ 1.0!),
  • stage 2 should initialize x in the ODE to x_out from stage 1.

All *Problems that come to my mind have such a "stage-separated structure". For example, a strongly connected SCCODESystem would have the same structure, only with multiple ODE stages. Any composite problem is simply a chain of "primary" sub-problems, and the input of the whole composite problem is the input of the first sub-problem.

This works nicely with a solve(prob, values) workflow, because prob contains generated code for solving the entire chain of sub-problems with easily tunable values of the independent parameters.

Does such a "stage-separation" or a generalization thereof work for linearization? I am not familiar enough with it to understand all the challenges you present, but I thought this sounded like a very clear and composable structure, so I hope it could be of some help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants