-
Notifications
You must be signed in to change notification settings - Fork 140
Hindi TN Support for Cardinal, Decimal, Fraction, Date, Time, Money and Measure #241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Namrata Gachchi <[email protected]>
for more information, see https://pre-commit.ci
nemo_text_processing/text_normalization/hi/data/whitelist/alternatives.tsv
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
Signed-off-by: Namrata Gachchi <[email protected]>
ngachchi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remaining files from the whitelist data class will be removed and single would be there
Signed-off-by: Namrata Gachchi <[email protected]>
for more information, see https://pre-commit.ci
…g the file opened Signed-off-by: Namrata Gachchi <[email protected]>
Signed-off-by: Namrata Gachchi <[email protected]>
Signed-off-by: Namrata Gachchi <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Namrata Gachchi <[email protected]>
…d word file Signed-off-by: Namrata Gachchi <[email protected]>
Signed-off-by: Namrata Gachchi <[email protected]>
Signed-off-by: Namrata Gachchi <[email protected]>
Signed-off-by: Namrata Gachchi <[email protected]>
Signed-off-by: Namrata Gachchi <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Namrata Gachchi <[email protected]>
Signed-off-by: Namrata Gachchi <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Namrata Gachchi <[email protected]>
Signed-off-by: Simon Zuberek <[email protected]>
Signed-off-by: Simon Zuberek <[email protected]>
Signed-off-by: Simon Zuberek <[email protected]>
Signed-off-by: Simon Zuberek <[email protected]>
Signed-off-by: Namrata Gachchi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any reason why we have English vocab as part of the Hindi TN grammar? I believe best approach for now would be monolingual
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any reason why we have English vocab as part of the Hindi TN grammar? I believe best approach for now would be monolingual
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any reason why we have English vocab as part of the Hindi TN grammar? I believe best approach for now would be monolingual
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any reason why we have English vocab as part of the Hindi TN grammar? I believe best approach for now would be monolingual
…nd Measure (NVIDIA#241) * Hindi TN changes Signed-off-by: Namrata Gachchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated date for Hindi TN cache Signed-off-by: Namrata Gachchi <[email protected]> * additional whitelist class .tsv files and unused imports removed Signed-off-by: Namrata Gachchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporated suggestions for unused statements and another for closing the file opened Signed-off-by: Namrata Gachchi <[email protected]> * Combined Hindi TN and ITN seperate blocks into single Signed-off-by: Namrata Gachchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added init.py files and removed unused commented lines Signed-off-by: Namrata Gachchi <[email protected]> * commented irrevelant references and unused snippets from whitelist and word file Signed-off-by: Namrata Gachchi <[email protected]> * Whitelist and Word class changes Signed-off-by: Namrata Gachchi <[email protected]> * post processor changes with minor fixes Signed-off-by: Namrata Gachchi <[email protected]> * remove space before punctuation for sparrowhawk file Signed-off-by: Namrata Gachchi <[email protected]> * minor fixes for measure class Signed-off-by: Namrata Gachchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated Jenkinsfile Signed-off-by: Namrata Gachchi <[email protected]> * removed unused imports and statements Signed-off-by: Namrata Gachchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated date stamp for HI cache and commented ITN grammars Signed-off-by: Namrata Gachchi <[email protected]> * Updates the cache Signed-off-by: Simon Zuberek <[email protected]> * Disables Hindi ITN L0 checks Signed-off-by: Simon Zuberek <[email protected]> * Reapplies ITN CI Checks Signed-off-by: Simon Zuberek <[email protected]> * Adds missing inits Signed-off-by: Simon Zuberek <[email protected]> * resolved the failing sparrowhawk test cases failed Signed-off-by: Namrata Gachchi <[email protected]> --------- Signed-off-by: Namrata Gachchi <[email protected]> Signed-off-by: Simon Zuberek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Simon Zuberek <[email protected]> Signed-off-by: Namrata Gachchi <[email protected]>
…nd Measure (NVIDIA#241) * Hindi TN changes Signed-off-by: Namrata Gachchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated date for Hindi TN cache Signed-off-by: Namrata Gachchi <[email protected]> * additional whitelist class .tsv files and unused imports removed Signed-off-by: Namrata Gachchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporated suggestions for unused statements and another for closing the file opened Signed-off-by: Namrata Gachchi <[email protected]> * Combined Hindi TN and ITN seperate blocks into single Signed-off-by: Namrata Gachchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added init.py files and removed unused commented lines Signed-off-by: Namrata Gachchi <[email protected]> * commented irrevelant references and unused snippets from whitelist and word file Signed-off-by: Namrata Gachchi <[email protected]> * Whitelist and Word class changes Signed-off-by: Namrata Gachchi <[email protected]> * post processor changes with minor fixes Signed-off-by: Namrata Gachchi <[email protected]> * remove space before punctuation for sparrowhawk file Signed-off-by: Namrata Gachchi <[email protected]> * minor fixes for measure class Signed-off-by: Namrata Gachchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated Jenkinsfile Signed-off-by: Namrata Gachchi <[email protected]> * removed unused imports and statements Signed-off-by: Namrata Gachchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated date stamp for HI cache and commented ITN grammars Signed-off-by: Namrata Gachchi <[email protected]> * Updates the cache Signed-off-by: Simon Zuberek <[email protected]> * Disables Hindi ITN L0 checks Signed-off-by: Simon Zuberek <[email protected]> * Reapplies ITN CI Checks Signed-off-by: Simon Zuberek <[email protected]> * Adds missing inits Signed-off-by: Simon Zuberek <[email protected]> * resolved the failing sparrowhawk test cases failed Signed-off-by: Namrata Gachchi <[email protected]> --------- Signed-off-by: Namrata Gachchi <[email protected]> Signed-off-by: Simon Zuberek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Simon Zuberek <[email protected]>
What does this PR do ?
This PR introduces Hindi support for a wide range of numerical and temporal formats, including:
Before your PR is "Ready for review"
Pre checks:
git commit -sto sign.pytestor (if your machine does not have GPU)pytest --cpufrom the root folder (given you marked your test cases accordingly@pytest.mark.run_only_on('CPU')).bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...pytestand Sparrowhawk here.__init__.pyfor every folder and subfolder, includingdatafolder which has .TSV files?Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.to all newly added Python files?Copyright 2015 and onwards Google, Inc.. See an example here.try import: ... except: ...) if not already done.PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.