-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Capture vars not env #66
Conversation
…expression is seen in patsy term; add tests.
…es_needed and stored in state variable instead. Requires adding an 'eval_env' parameter to design_matrix_builders function.
…matrix_builders().
# more obscure backtrace later on. | ||
if not isinstance(eval_env, six.integer_types + (EvalEnvironment,)): | ||
raise TypeError("Parameter 'eval_env' must be either an integer or an instance" + | ||
"of patsy.EvalEnvironment.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicks:
- The "+" is unnecessary here -- the Python lexer (like that in many languages, e.g. C) will concatenate adjacent string tokens at compile time, e.g.
"foo" "bar"
is the same as"foobar"
(and so is"foo" 'bar'
for that matter). It's even slightly more efficient, b/c+
is evaluated at runtime, while leaving it out gives a single string directly in the .pyc. Not that this efficiency matters at all :-). But as a style/consistency thing I leave the+
out. - You're missing a space, this string says "instanceof" :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I'm aware of this feature, but I vaguely remember being bitten by it (although maybe it was in C instead of Python), so I personally tend to prefer being explicit unless necessary. But given that this is the style of the rest of the code base, I'll go with it. Done.
Hi Christian! As you can see I've mostly recovered from PyCon, and did a quick pass for general style on what you have so far :-). Looking forward to seeing the rest! |
@@ -614,7 +614,7 @@ def _make_term_column_builders(terms, | |||
term_to_column_builders[term] = column_builders | |||
return new_term_order, term_to_column_builders | |||
|
|||
def design_matrix_builders(termlists, data_iter_maker, NA_action="drop"): | |||
def design_matrix_builders(termlists, data_iter_maker, eval_env=0, NA_action="drop"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about this a little more... I think maybe we should actually make eval_env
a required argument here. This is the lower-level machinery for those who are doing clever programmatic things, so it's safer I think to insist the user think about how they want to handle the evaluation environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me too. Done.
Hi Nathaniel! Thanks for the review. I will reply to them / apply them once I'm done with fixing the tests for using a subset of the environment. I am currently in the process of fixing said tests. Instead of comparing |
Hi Christian, Sounds like a good plan. Before I answer your actual question, can you back
|
I need it for the tests. Here is one of the tests that I am in the process of fixing up (in eval.py): def test_EvalFactor_memorize_passes_needed():
from patsy.state import stateful_transform
foo = stateful_transform(lambda: "FOO-OBJ")
bar = stateful_transform(lambda: "BAR-OBJ")
quux = stateful_transform(lambda: "QUUX-OBJ")
e = EvalFactor("foo(x) + bar(foo(y)) + quux(z, w)")
state = {}
eval_env = EvalEnvironment.capture(0)
passes = e.memorize_passes_needed(state, eval_env)
expected_eval_env = EvalEnvironment([{'foo': foo,
'bar': bar,
'quux': quux}],
eval_env.flags)
assert passes == 2
assert state["eval_env"] == expected_eval_env
# More stuff cut. The last line is why I need equality on EvalEnvironment. (Before, the code just did |
Ah, I see. My preference would still be to not define equality on
(This isn't as thorough as a full equality check, but we already have thorough tests for |
Sounds reasonable to me. I'll move in that direction. I'll update the pull request when all the tests pass again. |
…environment in its state.
Here's the updated pull request, using the subset of the environment in the state. Let me know what you think. |
Also, what do you think is the right thing to do regarding shadowing of variables between the EvalEnvironment and the DataFrame or other passed to the formula? |
…style of the rest of the code base. Also add missing space.
Looking pretty good! Things I can think of besides the shadowing issue:
|
Okay, so having had a chance to look over the code that would be affected by shadowing, I think my thought is: some sort of systematic handling of shadowing would probably be a good idea, but it feels like something we can just as well handle as an independent change. In particular, shadowing checks should probably also notice when builtins are shadowed (this has been a persistent issue -- see #13). So I don't think it's a blocker for this PR. |
Working on the doc now. I assume you will register this release on zenodo before release time? (I'm also assuming the next release will be v0.4.0.) |
Actually, an annoying thing about Zenodo is that you actually cannot get the DOI until after the release is finalized, basically b/c it the DOI-generation operation takes the release hash as an input. So I've just been putting the DOIs into the dev docs after the release, and the dev docs are what show up by default on readthedocs anyway, and I guess it's good enough. And yeah, we'll call it v0.4.0. |
Just updated the pull request with the results of my pass over the docs. Could you go over the diff for the doc update and see if things still make sense? I've deleted a lot more eval_env, etc. stuff than I added to the docs. I'm a bit worried that this means we missed an additional place where an eval_env argument should have been added. I've also mentioned the incompatible change to I'll add the high-level tests another day, but other than that, this is starting to look pretty complete. |
The doc changes look good to me -- I think the docs just spend more time talking about the factor interface than the actual builder interface, so it makes sense that you deleted more than you added. Mentioning the |
Added a high-level test. Let me know if you have other ideas for these kinds of tests. Otherwise, please let me know if you have any final comments, if there are still missing bits, etc. (Really looking forward to getting this merged.) |
Looks good to me! One last thing: it looks like you have some merge conflicts which are greying out the merge button and preventing the tests from running. Could you merge or rebase from current master to resolve these? |
Merge done. |
Huzzah! We are good to go. |
Thanks for the great work, and for sticking with it despite my nitpickiness :-). |
Woot! Great! |
Haha. No problem! Happy to be starting to make contributions on the scientific Python software stack. More of your comments were very relevant, I thought. And until I come close to you in terms of code contributed to patsy, it's much more your code base than mine, so some nitpickiness is fully allowed. (I tried already to stick to your coding style as best as I could.) I also much prefer code bases with a unified style anyway, even when I don't agree 100% with it. So no problem again. |
This is the current state of my work to capture only the required variables for patsy formulas instead of the whole environment.
Current status:
Figure out what the right thing to do is when variables are shadowed between the EvalFactor's environment and the data.(Can be done together with Trouble making categorical variable --- TypeError: 'ClassRegistry' object is not callable #13 instead. Not a blocker for this.)When this is complete, it will fix bug #25.