[ fixed #119 ] latin1 encoding: each byte counts as 1 char#156
Merged
simonmar merged 1 commit intohaskell:masterfrom Jan 27, 2020
Merged
[ fixed #119 ] latin1 encoding: each byte counts as 1 char#156simonmar merged 1 commit intohaskell:masterfrom
simonmar merged 1 commit intohaskell:masterfrom
Conversation
The computation of the length component of AlexToken was tailored to the utf8 encoding, and didn't work correctly for latin1. This is fixed by having a new flag ALEX_LATIN1 in templates/GenericTemplate.hs that turns on code that increases the length by 1 for each byte, while for utf8 something more sophisticated is done. The fix requires more template instances to be generated. To streamline the instance generation, now all 2^4 = 16 template instances are generated for the 4 flags - ghc - latin1 - nopred - debug To ensure consistent reference to the template instance, a function templateFileName residing both in src/Main and gen-alex-sdist/Main needs to be kept consistent, should more dimensions be added to the template. (Putting this function into a separate file that is included by both modules could be an option, but seemed not enough in the spirit of cabal-organized projects.)
Member
|
Nice. Thanks! |
|
Hi, it looks like this (and some other merges) were not included in the recent Alex 3.2.6 release. Understandable since it was a stopgap for a GHC release. This fix to the Latin-1 mode would be helpful in order to fix a Any info on when a new release can happen with some of these PRs that have been merged since 3.2.5? |
Collaborator
|
Yes, I suppose I should release another now that GHC is finally using 3.2.5. I did want to finish #174 first, I guess I should get on that. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The computation of the length component of
AlexTokenwas tailored tothe utf8 encoding, and didn't work correctly for latin1.
This is fixed by having a new flag
ALEX_LATIN1intemplates/GenericTemplate.hsthat turns on code that increases thelength by 1 for each byte, while for utf8 something more sophisticated
is done.
The fix requires more template instances to be generated. To streamline
the instance generation, now all 2^4 = 16 template instances are
generated for the 4 flags
To ensure consistent reference to the template instance, a function
residing both in
src/Mainandgen-alex-sdist/Mainneeds to be keptconsistent, should more dimensions be added to the template.
(Putting this function into a separate file that is included by both
modules could be an option, but seemed not enough in the spirit of
cabal-organized projects.)