-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the quantized data shape compatible with original tensor shape #5483
Open
sfc-gh-reyazda
wants to merge
47
commits into
microsoft:master
Choose a base branch
from
Snowflake-Labs:fix-quantized-shape
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
47 commits
Select commit
Hold shift + click to select a range
085fcd8
Make the quantized data shape compatible with original tensor shape
sfc-gh-reyazda a83b384
change the scale and quantized data format
sfc-gh-reyazda 048648d
minor fixes
sfc-gh-reyazda bf12893
fix
sfc-gh-reyazda b18f71f
minor fix
sfc-gh-reyazda 4d6e04b
Merge branch 'master' into fix-quantized-shape
sfc-gh-reyazda f924455
more fixed
sfc-gh-reyazda e03c0f4
Merge branch 'fix-quantized-shape' of https://github.com/Snowflake-La…
sfc-gh-reyazda d9cfba6
Improve _configure_optimizer() final optimizer log (#5528)
nelyahu 2bbc680
Enhance testing: Skip fused_optimizer tests if not supported. (#5159)
vshekhawat-hlab b3ab626
Skip the UT cases that use unimplemented op builders. (#5372)
foin6 4494c86
rocblas -> hipblas changes for ROCm (#5401)
rraminen 2c0dcac
Rocm warp size fix (#5402)
rraminen f53895f
Optimize zero3 fetch params using all_reduce (#5420)
deepcharm bb146c3
CPUAdam fp16 and bf16 support (#5409)
BacharL 31f11c0
Fix the TypeError for XPU Accelerator (#5531)
shiyang-weng 35b4813
Fix RuntimeError for moe on XPU: tensors found at least two devices (…
shiyang-weng cf0ccb5
Remove synchronize calls from allgather params (#5516)
BacharL e388056
Avoid overwrite of compiled module wrapper attributes (#5549)
deepcharm 5ff0d44
Small typos in functions set_none_gradients_to_zero (#5557)
TravelLeraLone 29ab009
Adapt doc for #4405 (#5552)
oraluben 633da3d
Update to HF_HOME from TRANSFORMERS_CACHE (#4816)
loadams 9db010e
[INF] DSAttention allow input_mask to have false as value (#5546)
oelayan7 bd2b2ef
Add throughput timer configuration (#5363)
deepcharm 3c5aa00
Add Ulysses DistributedAttention compatibility (#5525)
Kwen-Chen d7f9be6
Add hybrid_engine.py as path to trigger the DS-Chat GH workflow (#5562)
lekurile c160d76
Update HPU docker version (#5566)
loadams c203830
[MiCS] Remove the handle print on DeepSpeed side (#5574)
ys950902 5e5c8a7
Rename files in fp_quantize op from quantize.* to fp_quantize.* (#5577)
loadams ff01ade
Update to fix sidebar over text (#5567)
loadams 83920f6
DeepSpeedCheckpoint: support custom final ln idx (#5506)
nelyahu a6076cf
Update minor CUDA version compatibility (#5591)
adk9 9db9970
Add slide deck for meetup in Japan (#5598)
tohtana c6f151c
Fixed the Windows build. (#5596)
costin-eseanu 0bf3511
estimate_zero2_model_states_mem_needs: fixing memory estiamtion (#5099)
nelyahu cca53b0
Fix cuda hardcode for inference woq (#5565)
Liangliang-Ma 31815d9
fix sequence parallel(Ulysses) grad scale for zero0 (#5555)
inkcherry 6ad125e
Add Compressedbackend for Onebit optimizers (#5473)
Liangliang-Ma 9c15b8f
Updated hpu-gaudi2 tests content. (#5622)
vshekhawat-hlab 2e4bc1d
Pin transformers version for MII tests (#5629)
loadams e5b4d41
WA for Torch-compile-Z3-act-apt accuracy issue from the Pytorch repo …
NirSonnenschein 8a4d03c
stage_1_and_2: optimize clip calculation to use clamp (#5632)
nelyahu 5e5b1f7
Fix overlap communication of ZeRO stage 1 and 2 (#5606)
penn513 c47ad5f
Merge branch 'master' of https://github.com/Snowflake-Labs/deepspeed …
sfc-gh-reyazda 277902a
remove float8 dtype
sfc-gh-reyazda 74311af
Merge branch 'master' into fix-quantized-shape
sfc-gh-reyazda 9eb12fb
Merge branch 'master' into fix-quantized-shape
sfc-gh-reyazda File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -62,7 +62,8 @@ def _ensure_quantized(self, tensor: torch.Tensor): | |
tensor.data = self.quantizer.quantize(tensor.data, | ||
q_bits=self.quantization_config.q_bits, | ||
q_mantisa_bits=self.quantization_config.mantissa_bits) | ||
assert tensor.dtype == torch.uint8 | ||
assert (tensor.dtype == torch.int8), \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. shouldn't it be |
||
f"Quantize conversion dtype ({tensor.dtype}) error!" | ||
|
||
def dequantized(self) -> torch.Tensor: | ||
""" | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is redefined at line 118.