Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor to use the e2e test script #580

Merged
merged 3 commits into from
Feb 10, 2025
Merged

Conversation

michelle-yooh
Copy link
Collaborator

@michelle-yooh michelle-yooh commented Jan 25, 2025

Description

This PR refactors the maxtext_moe_gpu_e2e tests to use the MaxText e2e test script instead of the direct command. It requires AI-Hypercomputer/maxtext#1191 to be merged first.

Tests

Please describe the tests that you ran on Cloud VM to verify changes.

Instruction and/or command lines to reproduce your tests: ...

List links for your tests (use go/shortn-gen for any internal link): ...

Screenshot of the test result
http://shortn/_dNMK6pqBn9

bugs for the failed tests
pinned test on 2 A3 nodes: http://shortn/_WN4xgkG3Hj
pinned test on 2 A3+ nodes: http://shortn/_zKqVme3CFO

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run one-shot tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

@michelle-yooh
Copy link
Collaborator Author

@RissyRan @parambole I've moved the functions to xlml/utils/gpu.py and verified that the resizing works as expected. http://shortn/_ttKViB2Hv7 (Please disregard the test failures as it used the latest MaxText image which doesn't have the test script yet)

@shralex shralex requested a review from yangyuwei February 3, 2025 21:23
xlml/utils/gpu.py Outdated Show resolved Hide resolved
@michelle-yooh michelle-yooh force-pushed the yooh/gpu-moe-dag-refactor branch from a751d02 to 0d63f0d Compare February 10, 2025 07:21
@michelle-yooh michelle-yooh force-pushed the yooh/gpu-moe-dag-refactor branch from 0d63f0d to 84f2bd2 Compare February 10, 2025 07:58
@michelle-yooh michelle-yooh merged commit b637454 into master Feb 10, 2025
6 checks passed
@michelle-yooh michelle-yooh deleted the yooh/gpu-moe-dag-refactor branch February 10, 2025 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants