Skip to content

Conversation

@mgrafu
Copy link
Collaborator

@mgrafu mgrafu commented Oct 13, 2025

What does this PR do ?

Vietnamese TN v1 merged to main

Before your PR is "Ready for review"

Pre checks:

  • Have you signed your commits? Use git commit -s to sign.
  • Do all unittests finish successfully before sending PR?
    1. pytest or (if your machine does not have GPU) pytest --cpu from the root folder (given you marked your test cases accordingly @pytest.mark.run_only_on('CPU')).
    2. Sparrowhawk tests bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...
  • If you are adding a new feature: Have you added test cases for both pytest and Sparrowhawk here.
  • Have you added __init__.py for every folder and subfolder, including data folder which has .TSV files?
  • Have you followed codeQL results and removed unused variables and imports (report is at the bottom of the PR in github review box) ?
  • Have you added the correct license header Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. to all newly added Python files?
  • If you copied nemo_text_processing/text_normalization/en/graph_utils.py your header's second line should be Copyright 2015 and onwards Google, Inc.. See an example here.
  • Remove import guards (try import: ... except: ...) if not already done.
  • If you added a new language or a new feature please update the NeMo documentation (lives in different repo).
  • Have you added your language support to tools/text_processing_deployment/pynini_export.py.

PR Type:

  • New Feature
  • Bugfix
  • Documentation
  • Test

If you haven't finished some of the above items you can still open "Draft" PR.

folivoramanh and others added 19 commits October 29, 2025 11:44
* Add Vietnamese text normalization for cardinal semiotic class

Signed-off-by: folivoramanh <[email protected]>

* Add missing init file

Signed-off-by: folivoramanh <[email protected]>

* Fix Cardinal and optimize logic

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* Add Vietnamese text normalization for ordinal and decimal semiotic classes

Signed-off-by: folivoramanh <[email protected]>

* update sparrowhawk

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor decimal code and docstring

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* Fraction class for Vietnamese TN

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove irrelavant test case

Signed-off-by: folivoramanh <[email protected]>

* Remove irrelavant test case

Signed-off-by: folivoramanh <[email protected]>

---------

Signed-off-by: folivoramanh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* Date for vietnamese TN

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add roman support and correct copyright header

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change header to current year

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change header time

Signed-off-by: folivoramanh <[email protected]>

---------

Signed-off-by: folivoramanh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* Time - semiotic class for Vietnamese TN

Signed-off-by: folivoramanh <[email protected]>

* remove irrelevant import and comment

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add comment and refractor pattern

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Change the spaces to NEMO_SPACE for maintenance.

Signed-off-by: folivoramanh <[email protected]>

* Change the spaces to NEMO_SPACE for maintenance.

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Change the spaces to NEMO_SPACE for maintenance. - remove quote

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* Add Vietnamese TN support for Money and Range semiotic classes

- Add money.py tagger and verbalizer for Vietnamese currency handling
- Add range.py tagger for numerical range processing
- Add supporting data files for money (currency, currency_minor, per_unit)
- Add quantity abbreviations and time units data
- Update existing taggers and verbalizers for integration
- Add comprehensive test cases for money and range functionality
- Update tokenize_and_classify to include new semiotic classes

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* modify illogical test cases

Signed-off-by: folivoramanh <[email protected]>

* refractor and simplify word and punctuation to avoid hardcoding

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor code money range

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* Add Vietnamese measure text normalization support

- Added measure tagger and verbalizer for Vietnamese TN
- Updated money tagger and verbalizer to handle per-unit measurements
- Added test cases for measure normalization
- Updated fraction handling for better integration
- Added data files for measurements, prefixes, and per-unit bases

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: folivoramanh <[email protected]>

* add test case for range measure

Signed-off-by: folivoramanh <[email protected]>

* additional support for cardinal and remove duplicate test case

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor cardinal and add test cases

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate lines in run_eval file

Signed-off-by: folivoramanh <[email protected]>

* refractor minor code

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add measure support for unit per unit cases

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* fix and add cases

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* Fix Jenkinsfile for CI

* Fix requirements for test

* Update paths and docker

* Fix docker name

* Fix click version

* Change path of grammars for sparrowhawk tests

* Update paths in sh_test.sh

* Update paths

* Revert paths

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: folivoramanh <[email protected]>
Co-authored-by: anand-nv <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* fix range and quote

Signed-off-by: folivoramanh <[email protected]>

* fix quote in post process

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix quote and range

Signed-off-by: folivoramanh <[email protected]>

---------

Signed-off-by: folivoramanh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* improve numeric semiotic classes

Signed-off-by: folivoramanh <[email protected]>

* Fix Jenkinsfile for CI (#325)

* Fix Jenkinsfile for CI

Signed-off-by: Anand Joseph <[email protected]>

* Fix requirements for test

Signed-off-by: Anand Joseph <[email protected]>

* Update paths and docker

Signed-off-by: Anand Joseph <[email protected]>

* Fix docker name

Signed-off-by: Anand Joseph <[email protected]>

* Fix click version

Signed-off-by: Anand Joseph <[email protected]>

* Change path of grammars for sparrowhawk tests

Signed-off-by: Anand Joseph <[email protected]>

* Update paths in sh_test.sh

Signed-off-by: Anand Joseph <[email protected]>

* Update paths

Signed-off-by: Anand Joseph <[email protected]>

* Revert paths

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: folivoramanh <[email protected]>

* revert old codes

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert not inherit

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve date time

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix pynini union instead of union operator

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve measure, telephone, electronic

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change union operator to pynini union

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <[email protected]>
Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* Fix Jenkinsfile for CI (#325)

* Fix Jenkinsfile for CI

Signed-off-by: Anand Joseph <[email protected]>

* Fix requirements for test

Signed-off-by: Anand Joseph <[email protected]>

* Update paths and docker

Signed-off-by: Anand Joseph <[email protected]>

* Fix docker name

Signed-off-by: Anand Joseph <[email protected]>

* Fix click version

Signed-off-by: Anand Joseph <[email protected]>

* Change path of grammars for sparrowhawk tests

Signed-off-by: Anand Joseph <[email protected]>

* Update paths in sh_test.sh

Signed-off-by: Anand Joseph <[email protected]>

* Update paths

Signed-off-by: Anand Joseph <[email protected]>

* Revert paths

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: folivoramanh <[email protected]>

* PR: Add Vietnamese text normalization for cardinal semiotic class (#289)

* Add Vietnamese text normalization for cardinal semiotic class

Signed-off-by: folivoramanh <[email protected]>

* Add missing init file

Signed-off-by: folivoramanh <[email protected]>

* Fix Cardinal and optimize logic

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: folivoramanh <[email protected]>

* Ordinal and Decimal for Vietnamese TN (#290)

* Add Vietnamese text normalization for ordinal and decimal semiotic classes

Signed-off-by: folivoramanh <[email protected]>

* update sparrowhawk

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor decimal code and docstring

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: folivoramanh <[email protected]>

* Vietnamese TN - Fraction (#296)

* Fraction class for Vietnamese TN

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove irrelavant test case

Signed-off-by: folivoramanh <[email protected]>

* Remove irrelavant test case

Signed-off-by: folivoramanh <[email protected]>

---------

Signed-off-by: folivoramanh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: folivoramanh <[email protected]>

* Date Semiotic Class for Vietnamese TN (#298)

* Date for vietnamese TN

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add roman support and correct copyright header

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change header to current year

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change header time

Signed-off-by: folivoramanh <[email protected]>

---------

Signed-off-by: folivoramanh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: folivoramanh <[email protected]>

* Time - semiotic class for Vietnamese TN  (#302)

* Time - semiotic class for Vietnamese TN

Signed-off-by: folivoramanh <[email protected]>

* remove irrelevant import and comment

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add comment and refractor pattern

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Change the spaces to NEMO_SPACE for maintenance.

Signed-off-by: folivoramanh <[email protected]>

* Change the spaces to NEMO_SPACE for maintenance.

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Change the spaces to NEMO_SPACE for maintenance. - remove quote

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: folivoramanh <[email protected]>

* Add Vietnamese TN support for Money and Range semiotic classes (#304)

* Add Vietnamese TN support for Money and Range semiotic classes

- Add money.py tagger and verbalizer for Vietnamese currency handling
- Add range.py tagger for numerical range processing
- Add supporting data files for money (currency, currency_minor, per_unit)
- Add quantity abbreviations and time units data
- Update existing taggers and verbalizers for integration
- Add comprehensive test cases for money and range functionality
- Update tokenize_and_classify to include new semiotic classes

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* modify illogical test cases

Signed-off-by: folivoramanh <[email protected]>

* refractor and simplify word and punctuation to avoid hardcoding

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor code money range

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: folivoramanh <[email protected]>

* Add Vietnamese measure text normalization support (#307)

* Add Vietnamese measure text normalization support

- Added measure tagger and verbalizer for Vietnamese TN
- Updated money tagger and verbalizer to handle per-unit measurements
- Added test cases for measure normalization
- Updated fraction handling for better integration
- Added data files for measurements, prefixes, and per-unit bases

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: folivoramanh <[email protected]>

* add test case for range measure

Signed-off-by: folivoramanh <[email protected]>

* additional support for cardinal and remove duplicate test case

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor cardinal and add test cases

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate lines in run_eval file

Signed-off-by: folivoramanh <[email protected]>

* refractor minor code

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add measure support for unit per unit cases

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: folivoramanh <[email protected]>

* Vietnamese MRC 1.0 fix case (#312)

* fix and add cases

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: folivoramanh <[email protected]>

* Fix word range (#334)

* fix range and quote

Signed-off-by: folivoramanh <[email protected]>

* fix quote in post process

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix quote and range

Signed-off-by: folivoramanh <[email protected]>

---------

Signed-off-by: folivoramanh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: folivoramanh <[email protected]>

* Date time itn (#333)

* improve numeric semiotic classes

Signed-off-by: folivoramanh <[email protected]>

* Fix Jenkinsfile for CI (#325)

* Fix Jenkinsfile for CI

Signed-off-by: Anand Joseph <[email protected]>

* Fix requirements for test

Signed-off-by: Anand Joseph <[email protected]>

* Update paths and docker

Signed-off-by: Anand Joseph <[email protected]>

* Fix docker name

Signed-off-by: Anand Joseph <[email protected]>

* Fix click version

Signed-off-by: Anand Joseph <[email protected]>

* Change path of grammars for sparrowhawk tests

Signed-off-by: Anand Joseph <[email protected]>

* Update paths in sh_test.sh

Signed-off-by: Anand Joseph <[email protected]>

* Update paths

Signed-off-by: Anand Joseph <[email protected]>

* Revert paths

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: folivoramanh <[email protected]>

* revert old codes

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert not inherit

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve date time

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix pynini union instead of union operator

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve measure, telephone, electronic

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change union operator to pynini union

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <[email protected]>
Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: folivoramanh <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: folivoramanh <[email protected]>
Co-authored-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* fix bug with commas and electronics

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update jenkins

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: folivoramanh <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Only mount TestData from path

Signed-off-by: anand-nv <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
updates:
- [github.com/pre-commit/pre-commit-hooks: v5.0.0 → v6.0.0](pre-commit/pre-commit-hooks@v5.0.0...v6.0.0)
- [github.com/PyCQA/flake8: 7.2.0 → 7.3.0](PyCQA/flake8@7.2.0...7.3.0)
- [github.com/PyCQA/isort: 6.0.1 → 6.1.0](PyCQA/isort@6.0.1...6.1.0)
- https://github.com/psf/blackhttps://github.com/psf/black-pre-commit-mirror
- [github.com/psf/black-pre-commit-mirror: 25.1.0 → 25.9.0](psf/black-pre-commit-mirror@25.1.0...25.9.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: folivoramanh <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* PR: Add Vietnamese text normalization for cardinal semiotic class (#289)

* Add Vietnamese text normalization for cardinal semiotic class

Signed-off-by: folivoramanh <[email protected]>

* Add missing init file

Signed-off-by: folivoramanh <[email protected]>

* Fix Cardinal and optimize logic

Signed-off-by: folivoramanh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mai Anh <[email protected]>

* Ordinal and Decimal for Vietnamese TN (#290)

* Add Vietnamese text normalization for ordinal and decimal semiotic classes

Signed-off-by: Mai Anh <[email protected]>

* update sparrowhawk

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor decimal code and docstring

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mai Anh <[email protected]>

* Vietnamese TN - Fraction (#296)

* Fraction class for Vietnamese TN

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove irrelavant test case

Signed-off-by: Mai Anh <[email protected]>

* Remove irrelavant test case

Signed-off-by: Mai Anh <[email protected]>

---------

Signed-off-by: Mai Anh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mai Anh <[email protected]>

* Date Semiotic Class for Vietnamese TN (#298)

* Date for vietnamese TN

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add roman support and correct copyright header

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change header to current year

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change header time

Signed-off-by: Mai Anh <[email protected]>

---------

Signed-off-by: Mai Anh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mai Anh <[email protected]>

* Time - semiotic class for Vietnamese TN  (#302)

* Time - semiotic class for Vietnamese TN

Signed-off-by: Mai Anh <[email protected]>

* remove irrelevant import and comment

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add comment and refractor pattern

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Change the spaces to NEMO_SPACE for maintenance.

Signed-off-by: Mai Anh <[email protected]>

* Change the spaces to NEMO_SPACE for maintenance.

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Change the spaces to NEMO_SPACE for maintenance. - remove quote

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mai Anh <[email protected]>

* Add Vietnamese TN support for Money and Range semiotic classes (#304)

* Add Vietnamese TN support for Money and Range semiotic classes

- Add money.py tagger and verbalizer for Vietnamese currency handling
- Add range.py tagger for numerical range processing
- Add supporting data files for money (currency, currency_minor, per_unit)
- Add quantity abbreviations and time units data
- Update existing taggers and verbalizers for integration
- Add comprehensive test cases for money and range functionality
- Update tokenize_and_classify to include new semiotic classes

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* modify illogical test cases

Signed-off-by: Mai Anh <[email protected]>

* refractor and simplify word and punctuation to avoid hardcoding

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor code money range

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mai Anh <[email protected]>

* Add Vietnamese measure text normalization support (#307)

* Add Vietnamese measure text normalization support

- Added measure tagger and verbalizer for Vietnamese TN
- Updated money tagger and verbalizer to handle per-unit measurements
- Added test cases for measure normalization
- Updated fraction handling for better integration
- Added data files for measurements, prefixes, and per-unit bases

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Mai Anh <[email protected]>

* add test case for range measure

Signed-off-by: Mai Anh <[email protected]>

* additional support for cardinal and remove duplicate test case

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor cardinal and add test cases

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate lines in run_eval file

Signed-off-by: Mai Anh <[email protected]>

* refractor minor code

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add measure support for unit per unit cases

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mai Anh <[email protected]>

* Vietnamese MRC 1.0 fix case (#312)

* fix and add cases

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mai Anh <[email protected]>

* Fix Jenkinsfile for CI (#325) (#327)

* Fix Jenkinsfile for CI

* Fix requirements for test

* Update paths and docker

* Fix docker name

* Fix click version

* Change path of grammars for sparrowhawk tests

* Update paths in sh_test.sh

* Update paths

* Revert paths

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: Mai Anh <[email protected]>
Co-authored-by: anand-nv <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mai Anh <[email protected]>

* Fix word range (#334)

* fix range and quote

Signed-off-by: Mai Anh <[email protected]>

* fix quote in post process

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix quote and range

Signed-off-by: Mai Anh <[email protected]>

---------

Signed-off-by: Mai Anh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mai Anh <[email protected]>

* Date time itn (#333)

* improve numeric semiotic classes

Signed-off-by: Mai Anh <[email protected]>

* Fix Jenkinsfile for CI (#325)

* Fix Jenkinsfile for CI

Signed-off-by: Anand Joseph <[email protected]>

* Fix requirements for test

Signed-off-by: Anand Joseph <[email protected]>

* Update paths and docker

Signed-off-by: Anand Joseph <[email protected]>

* Fix docker name

Signed-off-by: Anand Joseph <[email protected]>

* Fix click version

Signed-off-by: Anand Joseph <[email protected]>

* Change path of grammars for sparrowhawk tests

Signed-off-by: Anand Joseph <[email protected]>

* Update paths in sh_test.sh

Signed-off-by: Anand Joseph <[email protected]>

* Update paths

Signed-off-by: Anand Joseph <[email protected]>

* Revert paths

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Mai Anh <[email protected]>

* revert old codes

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert not inherit

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve date time

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix pynini union instead of union operator

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve measure, telephone, electronic

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change union operator to pynini union

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <[email protected]>
Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mai Anh <[email protected]>

* Staging vi tn signed off (#339)

* Fix Jenkinsfile for CI (#325)

* Fix Jenkinsfile for CI

Signed-off-by: Anand Joseph <[email protected]>

* Fix requirements for test

Signed-off-by: Anand Joseph <[email protected]>

* Update paths and docker

Signed-off-by: Anand Joseph <[email protected]>

* Fix docker name

Signed-off-by: Anand Joseph <[email protected]>

* Fix click version

Signed-off-by: Anand Joseph <[email protected]>

* Change path of grammars for sparrowhawk tests

Signed-off-by: Anand Joseph <[email protected]>

* Update paths in sh_test.sh

Signed-off-by: Anand Joseph <[email protected]>

* Update paths

Signed-off-by: Anand Joseph <[email protected]>

* Revert paths

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: Mai Anh <[email protected]>

* PR: Add Vietnamese text normalization for cardinal semiotic class (#289)

* Add Vietnamese text normalization for cardinal semiotic class

Signed-off-by: Mai Anh <[email protected]>

* Add missing init file

Signed-off-by: Mai Anh <[email protected]>

* Fix Cardinal and optimize logic

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <[email protected]>

* Ordinal and Decimal for Vietnamese TN (#290)

* Add Vietnamese text normalization for ordinal and decimal semiotic classes

Signed-off-by: Mai Anh <[email protected]>

* update sparrowhawk

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor decimal code and docstring

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <[email protected]>

* Vietnamese TN - Fraction (#296)

* Fraction class for Vietnamese TN

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove irrelavant test case

Signed-off-by: Mai Anh <[email protected]>

* Remove irrelavant test case

Signed-off-by: Mai Anh <[email protected]>

---------

Signed-off-by: Mai Anh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <[email protected]>

* Date Semiotic Class for Vietnamese TN (#298)

* Date for vietnamese TN

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add roman support and correct copyright header

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change header to current year

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change header time

Signed-off-by: Mai Anh <[email protected]>

---------

Signed-off-by: Mai Anh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <[email protected]>

* Time - semiotic class for Vietnamese TN  (#302)

* Time - semiotic class for Vietnamese TN

Signed-off-by: Mai Anh <[email protected]>

* remove irrelevant import and comment

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add comment and refractor pattern

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Change the spaces to NEMO_SPACE for maintenance.

Signed-off-by: Mai Anh <[email protected]>

* Change the spaces to NEMO_SPACE for maintenance.

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Change the spaces to NEMO_SPACE for maintenance. - remove quote

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <[email protected]>

* Add Vietnamese TN support for Money and Range semiotic classes (#304)

* Add Vietnamese TN support for Money and Range semiotic classes

- Add money.py tagger and verbalizer for Vietnamese currency handling
- Add range.py tagger for numerical range processing
- Add supporting data files for money (currency, currency_minor, per_unit)
- Add quantity abbreviations and time units data
- Update existing taggers and verbalizers for integration
- Add comprehensive test cases for money and range functionality
- Update tokenize_and_classify to include new semiotic classes

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* modify illogical test cases

Signed-off-by: Mai Anh <[email protected]>

* refractor and simplify word and punctuation to avoid hardcoding

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor code money range

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <[email protected]>

* Add Vietnamese measure text normalization support (#307)

* Add Vietnamese measure text normalization support

- Added measure tagger and verbalizer for Vietnamese TN
- Updated money tagger and verbalizer to handle per-unit measurements
- Added test cases for measure normalization
- Updated fraction handling for better integration
- Added data files for measurements, prefixes, and per-unit bases

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Mai Anh <[email protected]>

* add test case for range measure

Signed-off-by: Mai Anh <[email protected]>

* additional support for cardinal and remove duplicate test case

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor cardinal and add test cases

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate lines in run_eval file

Signed-off-by: Mai Anh <[email protected]>

* refractor minor code

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add measure support for unit per unit cases

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <[email protected]>

* Vietnamese MRC 1.0 fix case (#312)

* fix and add cases

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <[email protected]>

* Fix word range (#334)

* fix range and quote

Signed-off-by: Mai Anh <[email protected]>

* fix quote in post process

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix quote and range

Signed-off-by: Mai Anh <[email protected]>

---------

Signed-off-by: Mai Anh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <[email protected]>

* Date time itn (#333)

* improve numeric semiotic classes

Signed-off-by: Mai Anh <[email protected]>

* Fix Jenkinsfile for CI (#325)

* Fix Jenkinsfile for CI

Signed-off-by: Anand Joseph <[email protected]>

* Fix requirements for test

Signed-off-by: Anand Joseph <[email protected]>

* Update paths and docker

Signed-off-by: Anand Joseph <[email protected]>

* Fix docker name

Signed-off-by: Anand Joseph <[email protected]>

* Fix click version

Signed-off-by: Anand Joseph <[email protected]>

* Change path of grammars for sparrowhawk tests

Signed-off-by: Anand Joseph <[email protected]>

* Update paths in sh_test.sh

Signed-off-by: Anand Joseph <[email protected]>

* Update paths

Signed-off-by: Anand Joseph <[email protected]>

* Revert paths

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Mai Anh <[email protected]>

* revert old codes

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert not inherit

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve date time

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix pynini union instead of union operator

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve measure, telephone, electronic

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change union operator to pynini union

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <[email protected]>
Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: Mai Anh <[email protected]>
Co-authored-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mai Anh <[email protected]>

* Comma bugfix for En electronics (#332)

* fix bug with commas and electronics

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update jenkins

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mai Anh <[email protected]>

* remove unuse import (#340)

Signed-off-by: Mai Anh <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mai Anh <[email protected]>

* Update Jenkinsfile (#341)

Only mount TestData from path

Signed-off-by: anand-nv <[email protected]>
Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] pre-commit suggestions (#335)

updates:
- [github.com/pre-commit/pre-commit-hooks: v5.0.0 → v6.0.0](pre-commit/pre-commit-hooks@v5.0.0...v6.0.0)
- [github.com/PyCQA/flake8: 7.2.0 → 7.3.0](PyCQA/flake8@7.2.0...7.3.0)
- [github.com/PyCQA/isort: 6.0.1 → 6.1.0](PyCQA/isort@6.0.1...6.1.0)
- https://github.com/psf/blackhttps://github.com/psf/black-pre-commit-mirror
- [github.com/psf/black-pre-commit-mirror: 25.1.0 → 25.9.0](psf/black-pre-commit-mirror@25.1.0...25.9.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <[email protected]>

* update jenkins cache

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mai Anh <[email protected]>

* fill missing lang in arg run (#347)

Signed-off-by: Mai Anh <[email protected]>
Signed-off-by: Mai Anh <[email protected]>

---------

Signed-off-by: folivoramanh <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mai Anh <[email protected]>
Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
mgrafu and others added 2 commits October 29, 2025 14:46
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Copy link
Member

@tbartley94 tbartley94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs loggers removed. Please refactor LOCS so there's less nesting. Please reuse redundant code to keep file size down.

graph = (
# Thousands pattern (e.g., "hai nghìn không ba" -> "2003")
graph_hundred_component = pynini.union(
pynini.union(graph_digit, graph_zero) + delete_space + pynutil.delete("trăm"), pynutil.insert("0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without weighting you're going to get non-determinate behavior where a 0 is just inserted here.

month_graph = _get_month_graph()

month_graph = pynutil.insert('month: "') + month_graph + pynutil.insert('"')
# Complete year graph with all supported patterns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you modularize this. it's hard to track all the different graphs with the nesting

),
).optimize()

year_graph = pynutil.add_weight(year_graph_raw, YEAR_WEIGHT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need a dedicated weight or can you just reuse a general one ("min weight) for instance)

def __init__(self):
super().__init__(name="electronic", kind="classify")

delete_extra_space = pynutil.delete(" ")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should exist in the graph_utils class.

protocol = pynutil.insert('protocol: "') + protocol + pynutil.insert('"')
graph |= protocol
graph = pynini.union(graph, protocol)
########
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray comment

start_time = time.time()
cardinal = CardinalFst(deterministic=deterministic)
cardinal_graph = cardinal.fst
logger.debug(f"cardinal: {time.time() - start_time: .2f}s -- {cardinal_graph.num_states()} nodes")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the loggers

key_cardinal = pynutil.delete("key_cardinal: \"") + pynini.closure(NEMO_NOT_QUOTE) + pynutil.delete("\"")
integer = pynutil.delete("integer: \"") + pynini.closure(NEMO_NOT_QUOTE) + pynutil.delete("\"")

graph_with_key = key_cardinal + delete_space + pynutil.insert(" ") + integer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NEMO_SPACE

+ pynini.closure(NEMO_NOT_QUOTE, 1)
+ pynutil.delete("\"")
)
graph = graph @ pynini.cdrewrite(pynini.cross(u"\u00a0", " "), "", "", NEMO_SIGMA)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NEMO_SPACE

* Refactor Vietnamese ITN taggers: modularize date, add data files, improve naming

- Modularize date.py year components for better readability
- Add weights to prevent non-deterministic behavior in insert operations
- Remove redundant YEAR_WEIGHT constant (use inline weights)
- Create zero_prefix.tsv and digit_special.tsv data files
- Rename delete_extra_space to delete_single_space in electronic.py for clarity
- Add delete_single_space to graph_utils for reuse

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor Vietnamese: PSA follow

Signed-off-by: Mai Anh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Copy link
Member

@tbartley94 tbartley94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tbartley94 tbartley94 merged commit edd2288 into main Nov 5, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants