Skip to content

mb_convert_encoding "\" (backslash) and "~" (tilde) BC breaks to Shift_JIS-2004 #9528

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
youkidearitai opened this issue Sep 13, 2022 · 5 comments

Comments

@youkidearitai
Copy link
Contributor

Description

The following code:

<?php

$backslash_and_tilde = "~\\";

$sjis2004_backslash_and_tilde = mb_convert_encoding($backslash_and_tilde, "SJIS-2004", "UTF-8");
var_dump(
        bin2hex($sjis2004_backslash_and_tilde),
        $re_convert = mb_convert_encoding($sjis2004_backslash_and_tilde, "UTF-8", "SJIS-2004"),
        bin2hex($re_convert)
);

Resulted in this output:

string(8) "8160815f"
string(6) "〜\"
string(12) "e3809cefbcbc"

But I expected this output instead:

string(4) "7e5c"
string(5) "‾¥"
string(10) "e280bec2a5"

Similar to #8281 . This behavior is from 5.4 to 8.0 (3v4l: https://3v4l.org/42pAL ). Please keep to Backward Compatible for PHP 8.1.

PHP Version

PHP 8.1.x

Operating System

No response

@cmb69
Copy link
Member

cmb69 commented Sep 13, 2022

Similar to #8281 .

And I think we should fix it like that.

@alexdowad, what do you think?

@alexdowad
Copy link
Contributor

@cmb69 Agreed.

@alexdowad
Copy link
Contributor

@youkidearitai Are you aware of any other Shift-JIS-2004 mappings which have changed?

In this case, your concern is just about Unicode -> JIS mappings, not JIS -> Unicode, is that correct?

@youkidearitai
Copy link
Contributor Author

@alexdowad Thanks for reply.

Are you aware of any other Shift-JIS-2004 mappings which have changed?

Sorry, I'm not know any changed Shift_JIS-2004 in PHP, But Shift_JIS-2004 is upper compatible of Shift_JIS that normally no any changes.

In this case, your concern is just about Unicode -> JIS mappings, not JIS -> Unicode, is that correct?

Yes this case I concern Unicode -> JIS mappings it is.

@alexdowad
Copy link
Contributor

@youkidearitai Thanks. Hopefully I might prepare a PR later today...

Thanks again.

alexdowad added a commit to alexdowad/php-src that referenced this issue Oct 4, 2022
In 0d0029d and 315d48b, I changed the mappings used for Unicode
to Shift-JIS-2004, in an attempt to follow the JISC specification
more closely. However, feedback from Japanese PHP users indicates
that most users of SJIS-2004 expect 0x5C and 0x7E to be treated as
equivalent to the same ASCII bytes. This is due to a long history of
non-complying implementations which then became a de-facto standard.

Therefore, restore the earlier mappings for U+005C and U+007E.

Thanks to the GitHub user 'youkidearitai' for reporting this issue.

Fixes phpGH-9528.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants