Skip to content

Restore backwards-compatible mappings of 0x5C and 0x7E in SJIS #8719

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 11, 2022

Conversation

alexdowad
Copy link
Contributor

According to the relevant Japan Industrial Standards Committee standards,
SJIS 0x5C is a Yen sign, and 0x7E is an overline.

However, this conflicts with the implementation of SJIS in various legacy
software (notably Microsoft products), where SJIS 0x5C and 0x7E are taken
as equivalent to the same ASCII bytes.

Prior to PHP 8.1, mbstring's implementation of SJIS handled these bytes
compatibly with Microsoft products. This was changed in PHP 8.1.0, in an
attempt to comply with the JISC specifications. However, after discussion
with various concerned Japanese developers, it seems that the historical
behavior was more useful in the majority of applications which process
SJIS-encoded text.

Since we are now treating SJIS 0x5C as equivalent to U+005C and 0x7E as
equivalent to U+007E, it does not make sense to convert U+203E (OVERLINE)
to 0x7E, nor does it make sense to convert U+00A5 (YEN SIGN) to 0x5C. Restore
the mappings for those codepoints from before PHP 8.1.0.

As per discussion in #8281.

FYA @cmb69 @nikic @youkidearitai

@alexdowad
Copy link
Contributor Author

Before merging this, I would like to confirm whether there are any other SJIS mappings which are different between PHP <8.1 and >8.1.

@alexdowad
Copy link
Contributor Author

Ah, I messed up when creating this PR. It should be merged into PHP-8.1, not master.

@alexdowad alexdowad changed the base branch from master to PHP-8.1 June 6, 2022 21:21
@cmb69
Copy link
Member

cmb69 commented Jun 7, 2022

Hmm, I don't understand why the test is failing now on GH action CI, but not on AppVeyor.

@youkidearitai
Copy link
Contributor

I confirmed that 0x5C and 0x7E behavior is correct on my local environment. Also testing($ make test) is success on my local environment.

@alexdowad
Copy link
Contributor Author

alexdowad commented Jun 10, 2022

Hmm, I don't understand why the test is failing now on GH action CI, but not on AppVeyor.

It seems to be testing a completely different commit.

/usr/bin/git checkout --progress --force refs/remotes/pull/8719/merge
  Note: switching to 'refs/remotes/pull/8719/merge'.
  
  You are in 'detached HEAD' state. You can look around, make experimental
  changes and commit them, and you can discard any commits you make in this
  state without impacting any branches by switching back to a branch.
  
  If you want to create a new branch to retain commits you create, you may
  do so (now or later) by using -c with the switch command. Example:
  
    git switch -c <new-branch-name>
  
  Or undo this operation with:
  
    git switch -
  
  Turn off this advice by setting config variable advice.detachedHead to false
  
  HEAD is now at 6872ccad Merge 0b308eadfa60889f63d9a9d9800db07d040954ec into 8e00e8209b0c65a1079c6b1fdd4e410dd85310ce
/usr/bin/git log -1 --format='%H'
'6872ccadf220486fb64771bd22d87cfb9077e25a'

It's like it automatically merged my commit into master rather than PHP-8.1, and then ran the tests on that.

The commit that I actually pushed is 0b308eadfa, not 6872ccadf2.

According to the relevant Japan Industrial Standards Committee standards,
SJIS 0x5C is a Yen sign, and 0x7E is an overline.

However, this conflicts with the implementation of SJIS in various legacy
software (notably Microsoft products), where SJIS 0x5C and 0x7E are taken
as equivalent to the same ASCII bytes.

Prior to PHP 8.1, mbstring's implementation of SJIS handled these bytes
compatibly with Microsoft products. This was changed in PHP 8.1.0, in an
attempt to comply with the JISC specifications. However, after discussion
with various concerned Japanese developers, it seems that the historical
behavior was more useful in the majority of applications which process
SJIS-encoded text.

Since we are now treating SJIS 0x5C as equivalent to U+005C and 0x7E as
equivalent to U+007E, it does not make sense to convert U+203E (OVERLINE)
to 0x7E, nor does it make sense to convert U+00A5 (YEN SIGN) to 0x5C. Restore
the mappings for those codepoints from before PHP 8.1.0.
@cmb69
Copy link
Member

cmb69 commented Jun 10, 2022

It's like it automatically merged my commit into master rather than PHP-8.1, and then ran the tests on that.

Ah, yeah. That's actually a known issue: if you switch to another target branch via GH UI, AppVeyor doesn't get that (for whatever reason). You also need to rebase onto the new target branch, and (force-)push again.

@alexdowad
Copy link
Contributor Author

@cmd69 @nikic I think this is ready to merge. Any comments?

Copy link
Member

@cmb69 cmb69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@alexdowad alexdowad merged commit 2dc9026 into php:PHP-8.1 Jun 11, 2022
@alexdowad alexdowad deleted the fixsjis branch June 11, 2022 13:47
@cmb69
Copy link
Member

cmb69 commented Jun 11, 2022

@alexdowad, please don't forget to merge up, and to add a NEWS entry, next time.

@alexdowad
Copy link
Contributor Author

@alexdowad, please don't forget to merge up, and to add a NEWS entry, next time.

I hope you didn't merge up already. I was just building and running the tests on master before merging up.

@alexdowad
Copy link
Contributor Author

Build will be broken on master now, I will need to quickly push a fix. (I intended to include the same modification for the fast conversion filters in the merge commit to master.)

@alexdowad
Copy link
Contributor Author

Ready to fix master. Should I include something in the NEWS for 8.2.0?

@cmb69
Copy link
Member

cmb69 commented Jun 11, 2022

Oh, sorry! Don't worry too much; just fix "master" when you have time.

I suggest to do this (if necessary) before merging into a lower branch, and then to commit & merge upwards right away. Otherwise, anyone committing to a lower branch, will either merge upwards your commit as well, or be totally confused about what is going on. :)

@alexdowad
Copy link
Contributor Author

Oh, sorry! Don't worry too much; just fix "master" when you have time.

I suggest to do this (if necessary) before merging into a lower branch, and then to commit & merge upwards right away. Otherwise, anyone committing to a lower branch, will either merge upwards your commit as well, or be totally confused about what is going on. :)

Thanks, that is a much better idea.

Should I add a notice in NEWS for 8.2.0?

@cmb69
Copy link
Member

cmb69 commented Jun 11, 2022

Should I add a notice in NEWS for 8.2.0?

Oh, right, we already had 8.2.0alpha1, so there should also be NEWS entries in "master".

@alexdowad
Copy link
Contributor Author

I'll add one right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants