html - PHP DomDocument - Chinese characters inside script tag malformed
Get the solution ↓↓↓I'm trying to parse a simple HTML containing Chinese characters inside script tag. However, after processing by PHP DomDocument, those are converted to some weird characters.
<?php
$html = <<<EOD
<!DOCTYPE html>
<html>
<head>
<script>
const str = "иЁ‚й–±жњЂж–°жЊ‡еЌ—";
</script>
</head>
<body>
</body>
</html>
EOD;
$dom = new DOMDocument();
$html = mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8');
$dom->loadHTML($html);
// Trying different approaches to get correct output
echo $dom->saveHTMl();
echo $dom->saveHTML($dom->documentElement);
echo utf8_decode($dom->saveHTML($dom->documentElement));
echo utf8_decode($dom->saveHTML());
Output:
<!DOCTYPE html>
<html>
<head>
<script>
const str = "訂閱最新指南";
</script>
</head>
<body>
</body>
</html>
<html>
<head>
<script>
const str = "訂閱最新指南";
</script>
</head>
<body>
</body>
</html><html>
<head>
<script>
const str = "訂閱最新指南";
</script>
</head>
<body>
</body>
</html><!DOCTYPE html>
<html>
<head>
<script>
const str = "訂閱最新指南";
</script>
</head>
<body>
</body>
</html>
Answer
Solution:
Seems to working without themb_convert_encoding
:
<?php
$html = <<<EOD
<!DOCTYPE html>
<html>
<head>
<script>
const str = "иЁ‚й–±жњЂж–°жЊ‡еЌ—";
</script>
</head>
<body>
</body>
</html>
EOD;
$dom = new DOMDocument();
$dom->loadHTML($html);
echo utf8_decode($dom->saveHTML($dom->documentElement));
result:
<html>
<head><script>
const str = "иЁ‚й–±жњЂж–°жЊ‡еЌ—";
</script></head>
<body>
</body>
</html>
withmb_convert_encoding
:
<?php
$html = <<<EOD
<!DOCTYPE html>
<html>
<head>
<script>
const str = "иЁ‚й–±жњЂж–°жЊ‡еЌ—";
</script>
</head>
<body>
</body>
</html>
EOD;
$dom = new DOMDocument();
$html = mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8');
$dom->loadHTML($html);
echo html_entity_decode($dom->saveHTML($dom->documentElement));
result:
<html><head><script>
const str = "иЁ‚й–±жњЂж–°жЊ‡еЌ—";
</script></head><body>
</body></html>
Share solution ↓
Additional Information:
Link To Answer People are also looking for solutions of the problem: failed to create image decoder with message 'unimplemented'
Didn't find the answer?
Our community is visited by hundreds of web development professionals every day. Ask your question and get a quick answer for free.
Similar questions
Find the answer in similar questions on our website.
Write quick answer
Do you know the answer to this question? Write a quick response to it. With your help, we will make our community stronger.