Conversation
Signed-off-by: jenchen13 <jennifchen@nvidia.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #673 +/- ##
==========================================
+ Coverage 74.50% 74.71% +0.21%
==========================================
Files 183 192 +9
Lines 18400 18941 +541
==========================================
+ Hits 13709 14152 +443
- Misses 4691 4789 +98 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: jenchen13 <jennifchen@nvidia.com>
|
TODO fix unit test to work while commenting on quantization of test model https://github.com/NVIDIA/Model-Optimizer/blob/main/tests/gpu/torch/quantization/plugins/test_megatron.py#L870 |
|
TODO currently |
Signed-off-by: jenchen13 <jennifchen@nvidia.com>
Signed-off-by: jenchen13 <jennifchen@nvidia.com>
| weight_quantizer_enabled = self.weight_quantizer.is_enabled if hasattr(self, "weight_quantizer") else False | ||
| # TODO is checking just k enough? | ||
| k_bmm_quantizer_enabled = self.k_bmm_quantizer.is_enabled if hasattr(self, "k_bmm_quantizer") else False | ||
| v_bmm_quantizer_enabled = self.v_bmm_quantizer.is_enabled if hasattr(self, "v_bmm_quantizer") else False | ||
| is_enabled = weight_quantizer_enabled or k_bmm_quantizer_enabled or v_bmm_quantizer_enabled |
There was a problem hiding this comment.
why not do:
| weight_quantizer_enabled = self.weight_quantizer.is_enabled if hasattr(self, "weight_quantizer") else False | |
| # TODO is checking just k enough? | |
| k_bmm_quantizer_enabled = self.k_bmm_quantizer.is_enabled if hasattr(self, "k_bmm_quantizer") else False | |
| v_bmm_quantizer_enabled = self.v_bmm_quantizer.is_enabled if hasattr(self, "v_bmm_quantizer") else False | |
| is_enabled = weight_quantizer_enabled or k_bmm_quantizer_enabled or v_bmm_quantizer_enabled | |
| is_enabled = any(isinstance(child, TensorQuantizer) and child.is_enabled for child in self.children()) |
| query = materialize_if_needed(query) | ||
| key = materialize_if_needed(key) | ||
| value = materialize_if_needed(value) |
There was a problem hiding this comment.
Do we need this if we are calling inputs = inputs.contiguous() in TensorQuantize forward?
There was a problem hiding this comment.
TODO these lines may not be necessary
| model_ref = mtq.quantize(model_ref, config, forward_fn) | ||
|
|
||
| # CRITICAL: model_test must also be quantized with the same config | ||
| # Otherwise it won't have the KV cache quantizer keys when loading state dict | ||
| model_test = mtq.quantize(model_test, config, forward_fn) | ||
|
|
There was a problem hiding this comment.
@kaix-nv this is an incorrect unit test. This completely breaks the modelopt resume workflow (that is resume requires an ModelOpt un-modified model).
Signed-off-by: jenchen13 <jennifchen@nvidia.com>
|
closing due to #727 |
What does this PR do?
Type of change: ? Bug fix
Fix bug during resuming training from KV-cache-quantized checkpoint by writing extra state for
core_attentionto checkpointOverview: ?
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Additional Information