-
Notifications
You must be signed in to change notification settings - Fork 838
packaging #2438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
packaging #2438
Conversation
Now allows: @Environment(vars=config_expr("...")) - This was broken since vars was a ConfigValue and not an updatable dictionary @project(name=config_expr("get_my_name(cfg)")) - This was broken because when evaluating the config_expr, we did not consider the globals at the point of call but instead inside the file defining config_expr. Tests are deployed internally.
- fixes a case where configs and Runner/Deployer were not compatible due to configs not being processed in the same cwd - fixes an issue with constructing the config values (introduced in previous commit) - fixes the error message when both default and specified config files are not found
Previously something like: RETRIES = config_expr("settings").retry class MyFlow(FlowSpec): @retries(times=RETRIES) @step def start(self): ... @retries(times=RETRIES) @step def end(self): ... would fail. Also, something like: FOO = config_expr("settings").foo.bar BAR = FOO.BAZ BLAH = FOO.BAZ2 would also fail. This fixes these issues by copying DelayEvaluator everytime and also making the `__call__` function reentrant.
# The JSON module in Python3 deals with Unicode. Tar gives bytes. | ||
info_str = ( | ||
self._tar.extractfile(os.path.basename(INFO_FILE)).read().decode("utf-8") | ||
info_str = MFEnv.get_archive_content(self._tar, MetaFile.INFO_FILE).decode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be cleaned up better - MFEnv.from_archive(self._tar).get_info()
@@ -14,18 +14,14 @@ | |||
from .parameters import current_flow | |||
from .user_configs.config_decorators import CustomStepDecorator | |||
from .user_configs.config_parameters import ( | |||
ConfigValue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not used
unicode | ||
except NameError: | ||
unicode = str | ||
basestring = str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can pull out Python2 changes in a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THere are definitely other changes. We can revert this but I think we can leave this here for now and yes, do a separate PR to pull out all the other ones.
tarball. This hook can return one of two things: | ||
- a generator yielding a tuple of `(file_path, arcname)` to add files to | ||
the code package. `file_path` is the path to the file on the local filesystem | ||
and `arcname` is the path relative to the packaged code. | ||
- a generator yielding a tuple of `(content, arcname, type)` where: | ||
- type is a AddToPackageType | ||
- for CODE_FILE: | ||
- content: path to the file to include | ||
- arcname: path relative to the code directory in the package | ||
- for CODE_MODULE: | ||
- content: name of the module | ||
- arcame: None (ignored) | ||
- for CONFIG_FILE: | ||
- content: path to the file to include | ||
- arcname: path relative to the config directory in the package | ||
- for CONFIG_CONTENT: | ||
- content: bytes to include | ||
- arcname: path relative to the config directory in the package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is quite funky - returning one of two things with different types. need to fix this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you suggest? As I mentioned:
- we can make it a breaking change (change the signature of add_package)
- we can add a different method (feels wonky too)
- ??
# moving here. The reason is that this is needed to read extension information which needs | ||
# to happen before mfenv gets packaged. | ||
|
||
MFENV_DIR = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change MFENV to MFCODE. we already have --environment and @Environment - MFENV will only lead to more confusion here and everywhere.
if version not in get_filname_map: | ||
raise ValueError( | ||
"Unsupported packaging version '%s'. Please update Metaflow" % version | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will we actually ever hit this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes -- if a newer version of Metaflow has a newer packaging method and an old client is used to access stuff.
except: | ||
# python 2 | ||
f.name = f.__func__.func_name | ||
f.wrappers = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually not related to this PR. It slipped through.
# here that is then used in MFEnv (it logically belongs in MFEnv but that file can't | ||
# be loaded just yet). | ||
global _info_file_content | ||
global _info_file_present |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this extra global
_UNINITIALISED = object() # sentinel
_info_file_content = _UNINITIALISED
def read_info_file():
"""Return the parsed info.json once, or None if the file is absent."""
global _info_file_content
if _info_file_content is _UNINITIALISED:
info_filename = get_filename(MetaFile.INFO_FILE)
if info_filename is not None:
with open(info_filename, "r", encoding="utf-8") as f:
_info_file_content = json.load(f)
else:
_info_file_content = None
return _info_file_content
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make that change. Technically there is an extra global too (the _UNINITIALIZED
) but it can be shared.
class MetaFile(Enum): | ||
INFO_FILE = "INFO" | ||
CONFIG_FILE = "CONFIG_PARAMETERS" | ||
INCLUDED_DIST_INFO = "INCLUDED_DIST_INFO" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is INCLUDED_DIST_INFO
- hard to understand what this implies - is there an EXCLUDED_DIST_INFO
too somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can clarify: it means the information about any distribution that is included. So no, there is no excluded one. I can just hame it DIST_INFO.
) | ||
|
||
|
||
class MetaFile(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for future generations looking at the code, it will be immensely useful to call out in detail what is included in these files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I can add this. We didn't add any file (except dist info) but can make it more clear as part of making the code better.
) | ||
|
||
|
||
class MetaFile(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, do we need to have three separate files? any value in merging all/some?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can definitely have them all as one. I vaguely remember you being in favor of separate files in the past but I have no strong preference.
|
||
|
||
def generic_get_filename( | ||
name: Union[MetaFile, str], is_meta: Optional[bool] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_meta is not actually used. consider dropping
def v1_get_filename( | ||
name: Union[MetaFile, str], | ||
meta_info: Dict[str, Any], | ||
is_meta: Optional[bool] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this work if i am interested in a file that unfortunately clashes with an internal reserved name? also, is_meta actually useful? what if we simply had another method that provided the contents of INFO
file and it included everything you need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps by setting is_meta to false actively somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me check on that. The idea of putting stuff in mfconf is that they won't clash with code.
|
||
def read_included_dist_info(): | ||
global _included_dist_info | ||
global _included_dist_present |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
Called to add custom files needed for this environment. This hook will be | ||
called in the `MetaflowPackage` class where metaflow compiles the code package | ||
tarball. This hook can return one of two things: | ||
- a generator yielding a tuple of `(file_path, arcname)` to add files to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit funky, we shouldn't return two types of tuples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already addressed above with a question.
return None | ||
|
||
|
||
class MFEnv: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can MetaflowPackage instead be the way that rest of the code accesses functionality here? will make it easier for all the decorator implementations which already have access to the package object to get access to info files etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reading through the code, MFEnv seems like an internal implementation detail for MetaflowPackage. If we can keep it as such throughout the code, that will be great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't say it's a implementation detail but more a part of it but yes, I can move some of those methods back out to MetaflowPackage. That is not a problem.
@@ -159,11 +159,11 @@ def runtime_init(self, flow, graph, package, run_id): | |||
os.path.join(self.metaflow_dir.name, "metaflow"), | |||
) | |||
|
|||
info = os.path.join(get_metaflow_root(), os.path.basename(INFO_FILE)) | |||
info = MFEnv.get_filename(MetaFile.INFO_FILE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be good to do package.get_info_file_path()
so that we don't have to bother decorator authors to worry about what is MFEnv or MetaFile or InfoFile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to specialize too much I guess but I don't mind having separate utility functions get_info_file_path
, get_dist_file_path
etc if you prfer that. I was trying to make it easier to add files later.
@@ -159,11 +159,11 @@ def runtime_init(self, flow, graph, package, run_id): | |||
os.path.join(self.metaflow_dir.name, "metaflow"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a future TODO: the path mangling here is something that the package module can handle by default. the same gymnastics are needed every single time for similar decorators.
@@ -93,6 +93,7 @@ def skip_metaflow_dependencies(): | |||
return skip_pkgs | |||
|
|||
def sync_uv_project(datastore_type): | |||
# TODO: NEED TO POINT UV TO FILES IN THE CONFIG DIR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe this needs to be addressed before we merge this PR? cc @saikonen
@@ -9,7 +9,6 @@ | |||
from itertools import takewhile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be a different PR, cc @madhur-ob
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that is a change I did. What was this supposed to do?
MFENV_DIR = ( | ||
".mfenv" # Directory containing "system" code (metaflow and user dependencies) | ||
) | ||
MFCONF_DIR = ".mfconf" # Directory containing Metaflow's configuration files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider changing the name to something else since it is neither metaflow config not the new config feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we can have MFCODE_DIR
and MFINFO_DIR
?
".mfenv" # Directory containing "system" code (metaflow and user dependencies) | ||
) | ||
MFCONF_DIR = ".mfconf" # Directory containing Metaflow's configuration files | ||
MFENV_MARKER = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't this be just in the info file? will simplify things quite a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would make it less flexible. The marker tells you how to access the info file. The info file is not in the same place already.
CODE_MODULE = 0x502 | ||
CODE_METAFLOW = 0x504 | ||
CONFIG_FILE = 0x601 | ||
CONFIG_CONTENT = 0x202 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to distinguish between all of them? wouldn't just user code
, system code
, info
be sufficient minimally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to check. We need to distinguish between files and content I think in some of the things. I can check if I can get rid of that. I think we also nee to know something is a module to extract the dist info.
|
||
@classmethod | ||
def _extract_archive_mfenv_info( | ||
cls, archive: Any, packaging_backend=TarPackagingBackend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any methods that need access to archive can be safely moved under MetaflowPackage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can check
) | ||
|
||
|
||
class PackagedDistribution(metadata.Distribution): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider moving this to a different file so that we can avoid an import from .mfnv in cli.py
_mappings = {} | ||
|
||
@classmethod | ||
def get_filename( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair to assume that some methods work against local filesystem and some against the code package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. I've tried to name them differently.
return None | ||
|
||
|
||
def read_included_dist_info(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider renaming to read_dist_info()
No description provided.