-
Notifications
You must be signed in to change notification settings - Fork 140
Staging vi tn to main #338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Add Vietnamese text normalization for cardinal semiotic class Signed-off-by: folivoramanh <[email protected]> * Add missing init file Signed-off-by: folivoramanh <[email protected]> * Fix Cardinal and optimize logic Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* Add Vietnamese text normalization for ordinal and decimal semiotic classes Signed-off-by: folivoramanh <[email protected]> * update sparrowhawk Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor decimal code and docstring Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* Fraction class for Vietnamese TN Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove irrelavant test case Signed-off-by: folivoramanh <[email protected]> * Remove irrelavant test case Signed-off-by: folivoramanh <[email protected]> --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* Date for vietnamese TN Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add roman support and correct copyright header Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header to current year Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header time Signed-off-by: folivoramanh <[email protected]> --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* Time - semiotic class for Vietnamese TN Signed-off-by: folivoramanh <[email protected]> * remove irrelevant import and comment Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add comment and refractor pattern Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: folivoramanh <[email protected]> * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. - remove quote Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* Add Vietnamese TN support for Money and Range semiotic classes - Add money.py tagger and verbalizer for Vietnamese currency handling - Add range.py tagger for numerical range processing - Add supporting data files for money (currency, currency_minor, per_unit) - Add quantity abbreviations and time units data - Update existing taggers and verbalizers for integration - Add comprehensive test cases for money and range functionality - Update tokenize_and_classify to include new semiotic classes Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * modify illogical test cases Signed-off-by: folivoramanh <[email protected]> * refractor and simplify word and punctuation to avoid hardcoding Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor code money range Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* Add Vietnamese measure text normalization support - Added measure tagger and verbalizer for Vietnamese TN - Updated money tagger and verbalizer to handle per-unit measurements - Added test cases for measure normalization - Updated fraction handling for better integration - Added data files for measurements, prefixes, and per-unit bases Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: folivoramanh <[email protected]> * add test case for range measure Signed-off-by: folivoramanh <[email protected]> * additional support for cardinal and remove duplicate test case Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor cardinal and add test cases Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove duplicate lines in run_eval file Signed-off-by: folivoramanh <[email protected]> * refractor minor code Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add measure support for unit per unit cases Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* fix and add cases Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* Fix Jenkinsfile for CI * Fix requirements for test * Update paths and docker * Fix docker name * Fix click version * Change path of grammars for sparrowhawk tests * Update paths in sh_test.sh * Update paths * Revert paths --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: folivoramanh <[email protected]> Co-authored-by: anand-nv <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* fix range and quote Signed-off-by: folivoramanh <[email protected]> * fix quote in post process Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix quote and range Signed-off-by: folivoramanh <[email protected]> --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* improve numeric semiotic classes Signed-off-by: folivoramanh <[email protected]> * Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <[email protected]> * Fix requirements for test Signed-off-by: Anand Joseph <[email protected]> * Update paths and docker Signed-off-by: Anand Joseph <[email protected]> * Fix docker name Signed-off-by: Anand Joseph <[email protected]> * Fix click version Signed-off-by: Anand Joseph <[email protected]> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <[email protected]> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <[email protected]> * Update paths Signed-off-by: Anand Joseph <[email protected]> * Revert paths Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: folivoramanh <[email protected]> * revert old codes Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert not inherit Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve date time Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix pynini union instead of union operator Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve measure, telephone, electronic Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change union operator to pynini union Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <[email protected]> * Fix requirements for test Signed-off-by: Anand Joseph <[email protected]> * Update paths and docker Signed-off-by: Anand Joseph <[email protected]> * Fix docker name Signed-off-by: Anand Joseph <[email protected]> * Fix click version Signed-off-by: Anand Joseph <[email protected]> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <[email protected]> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <[email protected]> * Update paths Signed-off-by: Anand Joseph <[email protected]> * Revert paths Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: folivoramanh <[email protected]> * PR: Add Vietnamese text normalization for cardinal semiotic class (#289) * Add Vietnamese text normalization for cardinal semiotic class Signed-off-by: folivoramanh <[email protected]> * Add missing init file Signed-off-by: folivoramanh <[email protected]> * Fix Cardinal and optimize logic Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <[email protected]> * Ordinal and Decimal for Vietnamese TN (#290) * Add Vietnamese text normalization for ordinal and decimal semiotic classes Signed-off-by: folivoramanh <[email protected]> * update sparrowhawk Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor decimal code and docstring Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <[email protected]> * Vietnamese TN - Fraction (#296) * Fraction class for Vietnamese TN Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove irrelavant test case Signed-off-by: folivoramanh <[email protected]> * Remove irrelavant test case Signed-off-by: folivoramanh <[email protected]> --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <[email protected]> * Date Semiotic Class for Vietnamese TN (#298) * Date for vietnamese TN Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add roman support and correct copyright header Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header to current year Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header time Signed-off-by: folivoramanh <[email protected]> --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <[email protected]> * Time - semiotic class for Vietnamese TN (#302) * Time - semiotic class for Vietnamese TN Signed-off-by: folivoramanh <[email protected]> * remove irrelevant import and comment Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add comment and refractor pattern Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: folivoramanh <[email protected]> * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. - remove quote Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <[email protected]> * Add Vietnamese TN support for Money and Range semiotic classes (#304) * Add Vietnamese TN support for Money and Range semiotic classes - Add money.py tagger and verbalizer for Vietnamese currency handling - Add range.py tagger for numerical range processing - Add supporting data files for money (currency, currency_minor, per_unit) - Add quantity abbreviations and time units data - Update existing taggers and verbalizers for integration - Add comprehensive test cases for money and range functionality - Update tokenize_and_classify to include new semiotic classes Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * modify illogical test cases Signed-off-by: folivoramanh <[email protected]> * refractor and simplify word and punctuation to avoid hardcoding Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor code money range Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <[email protected]> * Add Vietnamese measure text normalization support (#307) * Add Vietnamese measure text normalization support - Added measure tagger and verbalizer for Vietnamese TN - Updated money tagger and verbalizer to handle per-unit measurements - Added test cases for measure normalization - Updated fraction handling for better integration - Added data files for measurements, prefixes, and per-unit bases Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: folivoramanh <[email protected]> * add test case for range measure Signed-off-by: folivoramanh <[email protected]> * additional support for cardinal and remove duplicate test case Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor cardinal and add test cases Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove duplicate lines in run_eval file Signed-off-by: folivoramanh <[email protected]> * refractor minor code Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add measure support for unit per unit cases Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <[email protected]> * Vietnamese MRC 1.0 fix case (#312) * fix and add cases Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <[email protected]> * Fix word range (#334) * fix range and quote Signed-off-by: folivoramanh <[email protected]> * fix quote in post process Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix quote and range Signed-off-by: folivoramanh <[email protected]> --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <[email protected]> * Date time itn (#333) * improve numeric semiotic classes Signed-off-by: folivoramanh <[email protected]> * Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <[email protected]> * Fix requirements for test Signed-off-by: Anand Joseph <[email protected]> * Update paths and docker Signed-off-by: Anand Joseph <[email protected]> * Fix docker name Signed-off-by: Anand Joseph <[email protected]> * Fix click version Signed-off-by: Anand Joseph <[email protected]> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <[email protected]> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <[email protected]> * Update paths Signed-off-by: Anand Joseph <[email protected]> * Revert paths Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: folivoramanh <[email protected]> * revert old codes Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert not inherit Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve date time Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix pynini union instead of union operator Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve measure, telephone, electronic Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change union operator to pynini union Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: folivoramanh <[email protected]> Co-authored-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* fix bug with commas and electronics Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * update jenkins Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: folivoramanh <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Only mount TestData from path Signed-off-by: anand-nv <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
updates: - [github.com/pre-commit/pre-commit-hooks: v5.0.0 → v6.0.0](pre-commit/pre-commit-hooks@v5.0.0...v6.0.0) - [github.com/PyCQA/flake8: 7.2.0 → 7.3.0](PyCQA/flake8@7.2.0...7.3.0) - [github.com/PyCQA/isort: 6.0.1 → 6.1.0](PyCQA/isort@6.0.1...6.1.0) - https://github.com/psf/black → https://github.com/psf/black-pre-commit-mirror - [github.com/psf/black-pre-commit-mirror: 25.1.0 → 25.9.0](psf/black-pre-commit-mirror@25.1.0...25.9.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: folivoramanh <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* PR: Add Vietnamese text normalization for cardinal semiotic class (#289) * Add Vietnamese text normalization for cardinal semiotic class Signed-off-by: folivoramanh <[email protected]> * Add missing init file Signed-off-by: folivoramanh <[email protected]> * Fix Cardinal and optimize logic Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Ordinal and Decimal for Vietnamese TN (#290) * Add Vietnamese text normalization for ordinal and decimal semiotic classes Signed-off-by: Mai Anh <[email protected]> * update sparrowhawk Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor decimal code and docstring Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Vietnamese TN - Fraction (#296) * Fraction class for Vietnamese TN Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove irrelavant test case Signed-off-by: Mai Anh <[email protected]> * Remove irrelavant test case Signed-off-by: Mai Anh <[email protected]> --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Date Semiotic Class for Vietnamese TN (#298) * Date for vietnamese TN Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add roman support and correct copyright header Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header to current year Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header time Signed-off-by: Mai Anh <[email protected]> --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Time - semiotic class for Vietnamese TN (#302) * Time - semiotic class for Vietnamese TN Signed-off-by: Mai Anh <[email protected]> * remove irrelevant import and comment Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add comment and refractor pattern Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: Mai Anh <[email protected]> * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. - remove quote Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Add Vietnamese TN support for Money and Range semiotic classes (#304) * Add Vietnamese TN support for Money and Range semiotic classes - Add money.py tagger and verbalizer for Vietnamese currency handling - Add range.py tagger for numerical range processing - Add supporting data files for money (currency, currency_minor, per_unit) - Add quantity abbreviations and time units data - Update existing taggers and verbalizers for integration - Add comprehensive test cases for money and range functionality - Update tokenize_and_classify to include new semiotic classes Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * modify illogical test cases Signed-off-by: Mai Anh <[email protected]> * refractor and simplify word and punctuation to avoid hardcoding Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor code money range Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Add Vietnamese measure text normalization support (#307) * Add Vietnamese measure text normalization support - Added measure tagger and verbalizer for Vietnamese TN - Updated money tagger and verbalizer to handle per-unit measurements - Added test cases for measure normalization - Updated fraction handling for better integration - Added data files for measurements, prefixes, and per-unit bases Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Mai Anh <[email protected]> * add test case for range measure Signed-off-by: Mai Anh <[email protected]> * additional support for cardinal and remove duplicate test case Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor cardinal and add test cases Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove duplicate lines in run_eval file Signed-off-by: Mai Anh <[email protected]> * refractor minor code Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add measure support for unit per unit cases Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Vietnamese MRC 1.0 fix case (#312) * fix and add cases Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Fix Jenkinsfile for CI (#325) (#327) * Fix Jenkinsfile for CI * Fix requirements for test * Update paths and docker * Fix docker name * Fix click version * Change path of grammars for sparrowhawk tests * Update paths in sh_test.sh * Update paths * Revert paths --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: Mai Anh <[email protected]> Co-authored-by: anand-nv <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Fix word range (#334) * fix range and quote Signed-off-by: Mai Anh <[email protected]> * fix quote in post process Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix quote and range Signed-off-by: Mai Anh <[email protected]> --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Date time itn (#333) * improve numeric semiotic classes Signed-off-by: Mai Anh <[email protected]> * Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <[email protected]> * Fix requirements for test Signed-off-by: Anand Joseph <[email protected]> * Update paths and docker Signed-off-by: Anand Joseph <[email protected]> * Fix docker name Signed-off-by: Anand Joseph <[email protected]> * Fix click version Signed-off-by: Anand Joseph <[email protected]> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <[email protected]> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <[email protected]> * Update paths Signed-off-by: Anand Joseph <[email protected]> * Revert paths Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Mai Anh <[email protected]> * revert old codes Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert not inherit Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve date time Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix pynini union instead of union operator Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve measure, telephone, electronic Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change union operator to pynini union Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Staging vi tn signed off (#339) * Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <[email protected]> * Fix requirements for test Signed-off-by: Anand Joseph <[email protected]> * Update paths and docker Signed-off-by: Anand Joseph <[email protected]> * Fix docker name Signed-off-by: Anand Joseph <[email protected]> * Fix click version Signed-off-by: Anand Joseph <[email protected]> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <[email protected]> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <[email protected]> * Update paths Signed-off-by: Anand Joseph <[email protected]> * Revert paths Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: Mai Anh <[email protected]> * PR: Add Vietnamese text normalization for cardinal semiotic class (#289) * Add Vietnamese text normalization for cardinal semiotic class Signed-off-by: Mai Anh <[email protected]> * Add missing init file Signed-off-by: Mai Anh <[email protected]> * Fix Cardinal and optimize logic Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <[email protected]> * Ordinal and Decimal for Vietnamese TN (#290) * Add Vietnamese text normalization for ordinal and decimal semiotic classes Signed-off-by: Mai Anh <[email protected]> * update sparrowhawk Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor decimal code and docstring Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <[email protected]> * Vietnamese TN - Fraction (#296) * Fraction class for Vietnamese TN Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove irrelavant test case Signed-off-by: Mai Anh <[email protected]> * Remove irrelavant test case Signed-off-by: Mai Anh <[email protected]> --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <[email protected]> * Date Semiotic Class for Vietnamese TN (#298) * Date for vietnamese TN Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add roman support and correct copyright header Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header to current year Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header time Signed-off-by: Mai Anh <[email protected]> --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <[email protected]> * Time - semiotic class for Vietnamese TN (#302) * Time - semiotic class for Vietnamese TN Signed-off-by: Mai Anh <[email protected]> * remove irrelevant import and comment Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add comment and refractor pattern Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: Mai Anh <[email protected]> * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. - remove quote Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <[email protected]> * Add Vietnamese TN support for Money and Range semiotic classes (#304) * Add Vietnamese TN support for Money and Range semiotic classes - Add money.py tagger and verbalizer for Vietnamese currency handling - Add range.py tagger for numerical range processing - Add supporting data files for money (currency, currency_minor, per_unit) - Add quantity abbreviations and time units data - Update existing taggers and verbalizers for integration - Add comprehensive test cases for money and range functionality - Update tokenize_and_classify to include new semiotic classes Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * modify illogical test cases Signed-off-by: Mai Anh <[email protected]> * refractor and simplify word and punctuation to avoid hardcoding Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor code money range Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <[email protected]> * Add Vietnamese measure text normalization support (#307) * Add Vietnamese measure text normalization support - Added measure tagger and verbalizer for Vietnamese TN - Updated money tagger and verbalizer to handle per-unit measurements - Added test cases for measure normalization - Updated fraction handling for better integration - Added data files for measurements, prefixes, and per-unit bases Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Mai Anh <[email protected]> * add test case for range measure Signed-off-by: Mai Anh <[email protected]> * additional support for cardinal and remove duplicate test case Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor cardinal and add test cases Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove duplicate lines in run_eval file Signed-off-by: Mai Anh <[email protected]> * refractor minor code Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add measure support for unit per unit cases Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <[email protected]> * Vietnamese MRC 1.0 fix case (#312) * fix and add cases Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <[email protected]> * Fix word range (#334) * fix range and quote Signed-off-by: Mai Anh <[email protected]> * fix quote in post process Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix quote and range Signed-off-by: Mai Anh <[email protected]> --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <[email protected]> * Date time itn (#333) * improve numeric semiotic classes Signed-off-by: Mai Anh <[email protected]> * Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <[email protected]> * Fix requirements for test Signed-off-by: Anand Joseph <[email protected]> * Update paths and docker Signed-off-by: Anand Joseph <[email protected]> * Fix docker name Signed-off-by: Anand Joseph <[email protected]> * Fix click version Signed-off-by: Anand Joseph <[email protected]> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <[email protected]> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <[email protected]> * Update paths Signed-off-by: Anand Joseph <[email protected]> * Revert paths Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Mai Anh <[email protected]> * revert old codes Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert not inherit Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve date time Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix pynini union instead of union operator Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve measure, telephone, electronic Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change union operator to pynini union Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: Mai Anh <[email protected]> Co-authored-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Comma bugfix for En electronics (#332) * fix bug with commas and electronics Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * update jenkins Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * remove unuse import (#340) Signed-off-by: Mai Anh <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Update Jenkinsfile (#341) Only mount TestData from path Signed-off-by: anand-nv <[email protected]> Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] pre-commit suggestions (#335) updates: - [github.com/pre-commit/pre-commit-hooks: v5.0.0 → v6.0.0](pre-commit/pre-commit-hooks@v5.0.0...v6.0.0) - [github.com/PyCQA/flake8: 7.2.0 → 7.3.0](PyCQA/flake8@7.2.0...7.3.0) - [github.com/PyCQA/isort: 6.0.1 → 6.1.0](PyCQA/isort@6.0.1...6.1.0) - https://github.com/psf/black → https://github.com/psf/black-pre-commit-mirror - [github.com/psf/black-pre-commit-mirror: 25.1.0 → 25.9.0](psf/black-pre-commit-mirror@25.1.0...25.9.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <[email protected]> * update jenkins cache Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * fill missing lang in arg run (#347) Signed-off-by: Mai Anh <[email protected]> Signed-off-by: Mai Anh <[email protected]> --------- Signed-off-by: folivoramanh <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
tbartley94
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs loggers removed. Please refactor LOCS so there's less nesting. Please reuse redundant code to keep file size down.
| graph = ( | ||
| # Thousands pattern (e.g., "hai nghìn không ba" -> "2003") | ||
| graph_hundred_component = pynini.union( | ||
| pynini.union(graph_digit, graph_zero) + delete_space + pynutil.delete("trăm"), pynutil.insert("0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without weighting you're going to get non-determinate behavior where a 0 is just inserted here.
| month_graph = _get_month_graph() | ||
|
|
||
| month_graph = pynutil.insert('month: "') + month_graph + pynutil.insert('"') | ||
| # Complete year graph with all supported patterns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you modularize this. it's hard to track all the different graphs with the nesting
| ), | ||
| ).optimize() | ||
|
|
||
| year_graph = pynutil.add_weight(year_graph_raw, YEAR_WEIGHT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need a dedicated weight or can you just reuse a general one ("min weight) for instance)
| def __init__(self): | ||
| super().__init__(name="electronic", kind="classify") | ||
|
|
||
| delete_extra_space = pynutil.delete(" ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should exist in the graph_utils class.
| protocol = pynutil.insert('protocol: "') + protocol + pynutil.insert('"') | ||
| graph |= protocol | ||
| graph = pynini.union(graph, protocol) | ||
| ######## |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stray comment
| start_time = time.time() | ||
| cardinal = CardinalFst(deterministic=deterministic) | ||
| cardinal_graph = cardinal.fst | ||
| logger.debug(f"cardinal: {time.time() - start_time: .2f}s -- {cardinal_graph.num_states()} nodes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the loggers
| key_cardinal = pynutil.delete("key_cardinal: \"") + pynini.closure(NEMO_NOT_QUOTE) + pynutil.delete("\"") | ||
| integer = pynutil.delete("integer: \"") + pynini.closure(NEMO_NOT_QUOTE) + pynutil.delete("\"") | ||
|
|
||
| graph_with_key = key_cardinal + delete_space + pynutil.insert(" ") + integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NEMO_SPACE
| + pynini.closure(NEMO_NOT_QUOTE, 1) | ||
| + pynutil.delete("\"") | ||
| ) | ||
| graph = graph @ pynini.cdrewrite(pynini.cross(u"\u00a0", " "), "", "", NEMO_SIGMA) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NEMO_SPACE
tests/nemo_text_processing/vi/test_sparrowhawk_inverse_text_normalization.sh
Outdated
Show resolved
Hide resolved
* Refactor Vietnamese ITN taggers: modularize date, add data files, improve naming - Modularize date.py year components for better readability - Add weights to prevent non-deterministic behavior in insert operations - Remove redundant YEAR_WEIGHT constant (use inline weights) - Create zero_prefix.tsv and digit_special.tsv data files - Rename delete_extra_space to delete_single_space in electronic.py for clarity - Add delete_single_space to graph_utils for reuse Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Vietnamese: PSA follow Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
nemo_text_processing/text_normalization/vi/taggers/tokenize_and_classify.py
Fixed
Show fixed
Hide fixed
nemo_text_processing/text_normalization/vi/taggers/tokenize_and_classify.py
Fixed
Show fixed
Hide fixed
nemo_text_processing/text_normalization/vi/taggers/tokenize_and_classify.py
Fixed
Show fixed
Hide fixed
nemo_text_processing/text_normalization/vi/taggers/tokenize_and_classify.py
Fixed
Show fixed
Hide fixed
nemo_text_processing/text_normalization/vi/taggers/tokenize_and_classify.py
Fixed
Show fixed
Hide fixed
nemo_text_processing/text_normalization/vi/taggers/tokenize_and_classify.py
Fixed
Show fixed
Hide fixed
nemo_text_processing/text_normalization/vi/taggers/tokenize_and_classify.py
Fixed
Show fixed
Hide fixed
nemo_text_processing/text_normalization/vi/taggers/tokenize_and_classify.py
Fixed
Show fixed
Hide fixed
nemo_text_processing/text_normalization/vi/taggers/tokenize_and_classify.py
Fixed
Show fixed
Hide fixed
nemo_text_processing/text_normalization/vi/taggers/tokenize_and_classify.py
Fixed
Show fixed
Hide fixed
Signed-off-by: Mai Anh <[email protected]>
tbartley94
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What does this PR do ?
Vietnamese TN v1 merged to main
Before your PR is "Ready for review"
Pre checks:
git commit -sto sign.pytestor (if your machine does not have GPU)pytest --cpufrom the root folder (given you marked your test cases accordingly@pytest.mark.run_only_on('CPU')).bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...pytestand Sparrowhawk here.__init__.pyfor every folder and subfolder, includingdatafolder which has .TSV files?Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.to all newly added Python files?Copyright 2015 and onwards Google, Inc.. See an example here.try import: ... except: ...) if not already done.PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.