PHP's utf8_decode and C#'s Encoding.UTF8.GetString returning different outputs for the same input
Get the solution ↓↓↓I have this PHP code that converts two byte arrays (one having 32 bytes, the other having 70 bytes) in UTF-8 strings using utf8_decode():
$bytes32 = [144, 204, 205, 119, 77, 176, 172, 140, 110, 162, 222, 255, 14, 38, 252, 82, 118, 138, 130, 124, 145, 199, 55, 162, 224, 80, 102, 141, 140, 57, 194, 36];
$string32 = implode(array_map("chr", $bytes32));
$string32Utf8 = utf8_decode($string32);
$bytes70 = [239, 191, 189, 239, 191, 189, 239, 191, 189, 119, 77, 239, 191, 189, 239, 191, 189, 239, 191, 189, 110, 239, 191, 189, 239, 191, 189, 239, 191, 189, 14, 38, 239, 191, 189, 82, 118, 239, 191, 189, 239, 191, 189, 124, 239, 191, 189, 239, 191, 189, 55, 239, 191, 189, 239, 191, 189, 80, 102, 239, 191, 189, 239, 191, 189, 57, 239, 191, 189, 36];
$string70 = implode(array_map("chr", $bytes70));
$string70Utf8 = utf8_decode($string70);
echo '$string32Utf8: ' . $string32Utf8; // echoes ???wM???n??&?Rv??|??7??Pf??9?$
echo '$string70Utf8: ' . $string70Utf8; // echoes ???wM???n???&?Rv??|??7??Pf??9?$
echo '$string32Utf8 === $string70Utf8: ' . json_encode($string32Utf8 === $string70Utf8); // echoes false
I then have this C# code that does the same thing using Encoding.UTF8.GetString():
byte[] bytes32 = new byte[] { 144, 204, 205, 119, 77, 176, 172, 140, 110, 162, 222, 255, 14, 38, 252, 82, 118, 138, 130, 124, 145, 199, 55, 162, 224, 80, 102, 141, 140, 57, 194, 36 };
string string32Utf8 = Encoding.UTF8.GetString(bytes32);
byte[] bytes70 = new byte[] { 239, 191, 189, 239, 191, 189, 239, 191, 189, 119, 77, 239, 191, 189, 239, 191, 189, 239, 191, 189, 110, 239, 191, 189, 239, 191, 189, 239, 191, 189, 14, 38, 239, 191, 189, 82, 118, 239, 191, 189, 239, 191, 189, 124, 239, 191, 189, 239, 191, 189, 55, 239, 191, 189, 239, 191, 189, 80, 102, 239, 191, 189, 239, 191, 189, 57, 239, 191, 189, 36 };
string string70Utf8 = Encoding.UTF8.GetString(bytes70);
Console.WriteLine("string32Utf8: " + string32Utf8); // Writes пїЅпїЅпїЅwMпїЅпїЅпїЅnпїЅпїЅпїЅ&пїЅRvпїЅпїЅ|пїЅпїЅ7пїЅпїЅPfпїЅпїЅ9пїЅ$
Console.WriteLine("string70Utf8: " + string70Utf8); // Writes пїЅпїЅпїЅwMпїЅпїЅпїЅnпїЅпїЅпїЅ&пїЅRvпїЅпїЅ|пїЅпїЅ7пїЅпїЅPfпїЅпїЅ9пїЅ$
Console.WriteLine("string32Utf8 == string70Utf8: " + (string32Utf8 == string70Utf8)); // Writes true
First of all, in C#, both byte arrays result in the same string after conversion, unlike with PHP. Second, the strings are different in C# compared with PHP.
Is there a function in PHP that will actually return the same output as C#'s Encoding.UTF8.GetString() given the same input? Or is there something I'm missing that's actually resulting in the different outputs between C# and PHP?
Answer
Solution:
The byte arrays in your example are not a valid UTF-8. Basically, if you see пїЅпїЅпїЅ symbols in C# output, it meansEncoding.UTF8.GetString()
used a replacement character to represent encoded input byte sequence that cannot be converted to an output character. Check DecoderReplacementFallback remarks for more details.
However, you still can reproduce the same exact behavior ofEncoding.UTF8.GetString()
in PHP:
$bytes32 = [144, 204, 205, 119, 77, 176, 172, 140, 110, 162, 222, 255, 14, 38, 252, 82, 118, 138, 130, 124, 145, 199, 55, 162, 224, 80, 102, 141, 140, 57, 194, 36];
$string32 = \pack('C*', ...$bytes32);
$string32Utf8 = \mb_convert_encoding($string32, 'ASCII', 'UTF-8');
$bytes70 = [239, 191, 189, 239, 191, 189, 239, 191, 189, 119, 77, 239, 191, 189, 239, 191, 189, 239, 191, 189, 110, 239, 191, 189, 239, 191, 189, 239, 191, 189, 14, 38, 239, 191, 189, 82, 118, 239, 191, 189, 239, 191, 189, 124, 239, 191, 189, 239, 191, 189, 55, 239, 191, 189, 239, 191, 189, 80, 102, 239, 191, 189, 239, 191, 189, 57, 239, 191, 189, 36];
$string70 = \pack('C*', ...$bytes70);
$string70Utf8 = \mb_convert_encoding($string70, 'ASCII', 'UTF-8');
\var_dump($string32Utf8, $string70Utf8, $string32Utf8 === $string70Utf8);
You can test it here: https://3v4l.org/je8gf
Things I did differently:
Since byte array represents UTF-8 string, we can't use
to convert it to a binary string. As described in
chr
function documentation:this function is not aware of any string encoding, and in particular cannot be passed a Unicode code point value to generate a string in a multibyte encoding like UTF-8 or UTF-16.
-
Converts a string with ISO-8859-1 characters encoded with UTF-8 to single-byte ISO-8859-1
If we want to replicate
Encoding.UTF8.GetString()
example, what we really need to do, is to convert UTF-8 encoded binary string to ASCII. And you can do it usingfunction, just like that:
mb_convert_encoding($utf8String, 'ASCII', 'UTF-8')
Hope these comments will help!
Share solution ↓
Additional Information:
Link To Answer People are also looking for solutions of the problem: closed without sending a request; it was probably just an unused speculative preconnection
Didn't find the answer?
Our community is visited by hundreds of web development professionals every day. Ask your question and get a quick answer for free.
Similar questions
Find the answer in similar questions on our website.
Write quick answer
Do you know the answer to this question? Write a quick response to it. With your help, we will make our community stronger.