-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Restore backwards-compatible mappings of 0x5C and 0x7E in SJIS #8719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Before merging this, I would like to confirm whether there are any other SJIS mappings which are different between PHP <8.1 and >8.1. |
Ah, I messed up when creating this PR. It should be merged into |
Hmm, I don't understand why the test is failing now on GH action CI, but not on AppVeyor. |
I confirmed that 0x5C and 0x7E behavior is correct on my local environment. Also testing( |
It seems to be testing a completely different commit.
It's like it automatically merged my commit into The commit that I actually pushed is |
According to the relevant Japan Industrial Standards Committee standards, SJIS 0x5C is a Yen sign, and 0x7E is an overline. However, this conflicts with the implementation of SJIS in various legacy software (notably Microsoft products), where SJIS 0x5C and 0x7E are taken as equivalent to the same ASCII bytes. Prior to PHP 8.1, mbstring's implementation of SJIS handled these bytes compatibly with Microsoft products. This was changed in PHP 8.1.0, in an attempt to comply with the JISC specifications. However, after discussion with various concerned Japanese developers, it seems that the historical behavior was more useful in the majority of applications which process SJIS-encoded text. Since we are now treating SJIS 0x5C as equivalent to U+005C and 0x7E as equivalent to U+007E, it does not make sense to convert U+203E (OVERLINE) to 0x7E, nor does it make sense to convert U+00A5 (YEN SIGN) to 0x5C. Restore the mappings for those codepoints from before PHP 8.1.0.
Ah, yeah. That's actually a known issue: if you switch to another target branch via GH UI, AppVeyor doesn't get that (for whatever reason). You also need to rebase onto the new target branch, and (force-)push again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
@alexdowad, please don't forget to merge up, and to add a NEWS entry, next time. |
I hope you didn't merge up already. I was just building and running the tests on |
Build will be broken on |
Ready to fix |
Oh, sorry! Don't worry too much; just fix "master" when you have time. I suggest to do this (if necessary) before merging into a lower branch, and then to commit & merge upwards right away. Otherwise, anyone committing to a lower branch, will either merge upwards your commit as well, or be totally confused about what is going on. :) |
Thanks, that is a much better idea. Should I add a notice in NEWS for 8.2.0? |
Oh, right, we already had 8.2.0alpha1, so there should also be NEWS entries in "master". |
I'll add one right now. |
According to the relevant Japan Industrial Standards Committee standards,
SJIS 0x5C is a Yen sign, and 0x7E is an overline.
However, this conflicts with the implementation of SJIS in various legacy
software (notably Microsoft products), where SJIS 0x5C and 0x7E are taken
as equivalent to the same ASCII bytes.
Prior to PHP 8.1, mbstring's implementation of SJIS handled these bytes
compatibly with Microsoft products. This was changed in PHP 8.1.0, in an
attempt to comply with the JISC specifications. However, after discussion
with various concerned Japanese developers, it seems that the historical
behavior was more useful in the majority of applications which process
SJIS-encoded text.
Since we are now treating SJIS 0x5C as equivalent to U+005C and 0x7E as
equivalent to U+007E, it does not make sense to convert U+203E (OVERLINE)
to 0x7E, nor does it make sense to convert U+00A5 (YEN SIGN) to 0x5C. Restore
the mappings for those codepoints from before PHP 8.1.0.
As per discussion in #8281.
FYA @cmb69 @nikic @youkidearitai