Skip to content

Conversation

@zoobereq
Copy link
Contributor

@zoobereq zoobereq commented Nov 13, 2024

What does this PR do ?

The fix addresses an incorrectly applied closure over a single character token in the WORD grammar, resulting in all strings being classified as WORDS and subsequently skipped for denormalization.

Before your PR is "Ready for review"

Pre checks:

  • Have you signed your commits? Use git commit -s to sign.
  • Do all unittests finish successfully before sending PR?
    1. pytest or (if your machine does not have GPU) pytest --cpu from the root folder (given you marked your test cases accordingly @pytest.mark.run_only_on('CPU')).
    2. Sparrowhawk tests bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...
  • If you are adding a new feature: Have you added test cases for both pytest and Sparrowhawk here.
  • Have you added __init__.py for every folder and subfolder, including data folder which has .TSV files?
  • Have you followed codeQL results and removed unused variables and imports (report is at the bottom of the PR in github review box) ?
  • Have you added the correct license header Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. to all newly added Python files?
  • If you copied nemo_text_processing/text_normalization/en/graph_utils.py your header's second line should be Copyright 2015 and onwards Google, Inc.. See an example here.
  • Remove import guards (try import: ... except: ...) if not already done.
  • If you added a new language or a new feature please update the NeMo documentation (lives in different repo).
  • Have you added your language support to tools/text_processing_deployment/pynini_export.py.

PR Type:

  • New Feature
  • Bugfix
  • Documentation
  • Test

If you haven't finished some of the above items you can still open "Draft" PR.

anand-nv and others added 2 commits November 14, 2024 02:38
Signed-off-by: Anand Joseph <[email protected]>
Update FST paths

Signed-off-by: anand-nv <[email protected]>
@zoobereq zoobereq marked this pull request as ready for review November 14, 2024 14:12
@zoobereq zoobereq requested a review from tbartley94 November 14, 2024 14:13
@github-actions
Copy link

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Nov 29, 2024
tbartley94
tbartley94 previously approved these changes Dec 5, 2024
Copy link
Member

@tbartley94 tbartley94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarifying questions, but LGTM as long as tests pass

GraphFst,
delete_extra_space,
delete_space,
delete_zero_or_one_space,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a singular delete space that we can just use a kleene question with?


def __init__(self):
super().__init__(name="word", kind="classify")
word = pynutil.insert('name: "') + pynini.closure(NEMO_NOT_SPACE, 1) + pynutil.insert('"')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remind me on this syntax, is the closure(...,1) equivalent to 0 or 1 or match exactly one?

@github-actions github-actions bot removed the Stale label Dec 6, 2024
Copy link
Member

@tbartley94 tbartley94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tbartley94 tbartley94 merged commit e3862be into main Dec 10, 2024
5 checks passed
ngachchi pushed a commit to ngachchi/NeMo-text-processing that referenced this pull request Jun 23, 2025
* Fix space issue with ZH ITN

Signed-off-by: Anand Joseph <[email protected]>

* Update Jenkinsfile

Update FST paths

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Signed-off-by: Simon Zuberek <[email protected]>
Co-authored-by: Anand Joseph <[email protected]>
Co-authored-by: anand-nv <[email protected]>
Signed-off-by: Namrata Gachchi <[email protected]>
FredHaa pushed a commit to FredHaa/NeMo-text-processing that referenced this pull request Aug 15, 2025
* Fix space issue with ZH ITN

Signed-off-by: Anand Joseph <[email protected]>

* Update Jenkinsfile

Update FST paths

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Signed-off-by: Simon Zuberek <[email protected]>
Co-authored-by: Anand Joseph <[email protected]>
Co-authored-by: anand-nv <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants