PHP - Manual: mb_convert_encoding

2026-07-25

mb_convert_encoding

(PHP 4 >= 4.0.6, PHP 5, PHP 7, PHP 8)

mb_convert_encoding — 转换字符串，从一个字符编码到另一个字符编码

说明

转换 string 从 from_encoding 或当前内部编码转换到 to_encoding。当参数 string 是 array 时，将递归转换它所有的 string 值。

参数

string

要转换的 string 或 array。

to_encoding

所需的结果编码。

from_encoding

当前用于解释 string 的编码。可以将多个编码指定为 array 或逗号分隔列表，在这种情况下，将使用与 mb_detect_encoding() 相同的算法来猜测正确的编码。

如果 from_encoding 被省略或为 null，则将使用 mbstring.internal_encoding 设置，否则使用 default_charset 设置。

有关 to_encoding 和 from_encoding 的有效值，请参阅支持的编码。

返回值

编码后的 string。成功时返回编码后的 string 或 array，或者在失败时返回 false。

错误／异常

当 to_encoding 或 from_encoding 为无效的编码时， PHP 8.0.0 起将抛出 ValueError；而在 PHP 8.0.0 之前的版本里，会产生一个 E_WARNING。

更新日志

版本	说明
8.2.0	mb_convert_encoding() 将不再返回以下非文本编码：`"Base64"`、`"QPrint"`、`"UUencode"`、`"HTML entities"`、`"7 bit"` 和 `"8 bit"`。
8.0.0	现在，当 `to_encoding` 为无效编码时， mb_convert_encoding() 会抛出 ValueError。
8.0.0	现在，当 `from_encoding` 为无效编码时， mb_convert_encoding() 会抛出 ValueError。
8.0.0	现在 `from_encoding` 可以传入 null。
7.2.0	现在该函数的 `string` 参数同时能接受 array 类型。在此之前，仅支持 string。

示例

示例 #1 mb_convert_encoding() 示例

<?php
/* 转换内部编码为 SJIS */
$str = mb_convert_encoding($str, "SJIS");

/* 将 EUC-JP 转换成 UTF-7 */
$str = mb_convert_encoding($str, "UTF-7", "EUC-JP");

/* 从 JIS, eucjp-win, sjis-win 中自动检测编码，并转换 str 到 UCS-2LE */
$str = mb_convert_encoding($str, "UCS-2LE", "JIS, eucjp-win, sjis-win");

/* 如果 mbstring.language 是 "Japanese"，"auto" 扩展成 "ASCII,JIS,UTF-8,EUC-JP,SJIS" */
$str = mb_convert_encoding($str, "EUC-JP", "auto");
?>

参见

mb_detect_order() - 设置/获取字符编码的检测顺序
UConverter::transcode() - Convert a string from one character encoding to another
iconv() - 将字符串从一个字符编码转换到另一个字符编码

发现了问题？

了解如何改进此页面 • 提交拉取请求 • 报告一个错误

＋添加备注

用户贡献的备注 30 notes

down

josip at cubrad dot com ¶

11 years ago

For my last project I needed to convert several CSV files from Windows-1250 to UTF-8, and after several days of searching around I found a function that is partially solved my problem, but it still has not transformed all the characters. So I made this:

function w1250_to_utf8($text) {
    // map based on:
    // http://konfiguracja.c0.pl/iso02vscp1250en.html
    // http://konfiguracja.c0.pl/webpl/index_en.html#examp
    // http://www.htmlentities.com/html/entities/
    $map = array(
        chr(0x8A) => chr(0xA9),
        chr(0x8C) => chr(0xA6),
        chr(0x8D) => chr(0xAB),
        chr(0x8E) => chr(0xAE),
        chr(0x8F) => chr(0xAC),
        chr(0x9C) => chr(0xB6),
        chr(0x9D) => chr(0xBB),
        chr(0xA1) => chr(0xB7),
        chr(0xA5) => chr(0xA1),
        chr(0xBC) => chr(0xA5),
        chr(0x9F) => chr(0xBC),
        chr(0xB9) => chr(0xB1),
        chr(0x9A) => chr(0xB9),
        chr(0xBE) => chr(0xB5),
        chr(0x9E) => chr(0xBE),
        chr(0x80) => '&euro;',
        chr(0x82) => '&sbquo;',
        chr(0x84) => '&bdquo;',
        chr(0x85) => '&hellip;',
        chr(0x86) => '&dagger;',
        chr(0x87) => '&Dagger;',
        chr(0x89) => '&permil;',
        chr(0x8B) => '&lsaquo;',
        chr(0x91) => '&lsquo;',
        chr(0x92) => '&rsquo;',
        chr(0x93) => '&ldquo;',
        chr(0x94) => '&rdquo;',
        chr(0x95) => '&bull;',
        chr(0x96) => '&ndash;',
        chr(0x97) => '&mdash;',
        chr(0x99) => '&trade;',
        chr(0x9B) => '&rsquo;',
        chr(0xA6) => '&brvbar;',
        chr(0xA9) => '&copy;',
        chr(0xAB) => '&laquo;',
        chr(0xAE) => '&reg;',
        chr(0xB1) => '&plusmn;',
        chr(0xB5) => '&micro;',
        chr(0xB6) => '&para;',
        chr(0xB7) => '&middot;',
        chr(0xBB) => '&raquo;',
    );
    return html_entity_decode(mb_convert_encoding(strtr($text, $map), 'UTF-8', 'ISO-8859-2'), ENT_QUOTES, 'UTF-8');
}

down

regrunge at hotmail dot it ¶

14 years ago

I've been trying to find the charset of a norwegian (with a lot of ø, æ, å) txt file written on a Mac, i've found it in this way:


<?php

$text = "A strange string to pass, maybe with some ø, æ, å characters.";


foreach(mb_list_encodings() as $chr){

        echo mb_convert_encoding($text, 'UTF-8', $chr)." : ".$chr."<br>";    

 } 

?>



The line that looks good, gives you the encoding it was written in.


Hope can help someone

down

Julian Egelstaff ¶

2 years ago

If you have what looks like ISO-8859-1, but it includes "smart quotes" courtesy of Microsoft software, or people cutting and pasting content from Microsoft software, then what you're actually dealing with is probably Windows-1252. Try this:

<?php
$cleanText = mb_convert_encoding($text, 'UTF-8', 'Windows-1252');
?>

The annoying part is that the auto detection (ie: the mb_detect_encoding function) will often think Windows-1252 is ISO-8859-1. Close, but no cigar. This is critical if you're then trying to do unserialize on the resulting text, because the byte count of the string needs to be perfect.

down

volker at machon dot biz ¶

17 years ago

Hey guys. For everybody who's looking for a function that is converting an iso-string to utf8 or an utf8-string to iso, here's your solution:

public function encodeToUtf8($string) {
     return mb_convert_encoding($string, "UTF-8", mb_detect_encoding($string, "UTF-8, ISO-8859-1, ISO-8859-15", true));
}

public function encodeToIso($string) {
     return mb_convert_encoding($string, "ISO-8859-1", mb_detect_encoding($string, "UTF-8, ISO-8859-1, ISO-8859-15", true));
}

For me these functions are working fine. Give it a try

down

Rainer Perske ¶

2 years ago

Text-encoding HTML-ENTITIES will be deprecated as of PHP 8.2.

To convert all non-ASCII characters into entities (to produce pure 7-bit HTML output), I was using:

<?php
echo mb_convert_encoding( htmlspecialchars( $text, ENT_QUOTES, 'UTF-8' ), 'HTML-ENTITIES', 'UTF-8' );
?>

I can get the identical result with:

<?php
echo mb_encode_numericentity( htmlentities( $text, ENT_QUOTES, 'UTF-8' ), [0x80, 0x10FFFF, 0, ~0], 'UTF-8' );
?>

The output contains well-known named entities for some often used characters and numeric entities for the rest.

down

francois at bonzon point com ¶

16 years ago

aaron, to discard unsupported characters instead of printing a ?, you might as well simply set the configuration directive:

mbstring.substitute_character = "none"

in your php.ini. Be sure to include the quotes around none. Or at run-time with

<?php
ini_set('mbstring.substitute_character', "none");
?>

down

eion at bigfoot dot com ¶

19 years ago

many people below talk about using 

<?php

    mb_convert_encode($s,'HTML-ENTITIES','UTF-8');

?>

to convert non-ascii code into html-readable stuff.  Due to my webserver being out of my control, I was unable to set the database character set, and whenever PHP made a copy of my $s variable that it had pulled out of the database, it would convert it to nasty latin1 automatically and not leave it in it's beautiful UTF-8 glory.


So [insert korean characters here] turned into ?????.


I found myself needing to pass by reference (which of course is deprecated/nonexistent in recent versions of PHP)

so instead of

<?php

    mb_convert_encode(&$s,'HTML-ENTITIES','UTF-8');

?>

which worked perfectly until I upgraded, so I had to use

<?php

    call_user_func_array('mb_convert_encoding', array(&$s,'HTML-ENTITIES','UTF-8'));

?>



Hope it helps someone else out

down

aaron at aarongough dot com ¶

16 years ago

My solution below was slightly incorrect, so here is the correct version (I posted at the end of a long day, never a good idea!)

Again, this is a quick and dirty solution to stop mb_convert_encoding from filling your string with question marks whenever it encounters an illegal character for the target encoding. 

<?php
function convert_to ( $source, $target_encoding )
    {
// detect the character encoding of the incoming file
$encoding = mb_detect_encoding( $source, "auto" );

// escape all of the question marks so we can remove artifacts from
    // the unicode conversion process
$target = str_replace( "?", "[question_mark]", $source );

// convert the string to the target encoding
$target = mb_convert_encoding( $target, $target_encoding, $encoding);

// remove any question marks that have been introduced because of illegal characters
$target = str_replace( "?", "", $target );

// replace the token string "[question_mark]" with the symbol "?"
$target = str_replace( "[question_mark]", "?", $target );

    return $target;
    }
?>

Hope this helps someone! (Admins should feel free to delete my previous, incorrect, post for clarity)
-A

down

urko at wegetit dot eu ¶

12 years ago

If you are trying to generate a CSV (with extended chars) to be opened at Exel for Mac, the only that worked for me was:

<?php mb_convert_encoding( $CSV, 'Windows-1252', 'UTF-8'); ?>



I also tried this:


<?php

//Separado OK, chars MAL

iconv('MACINTOSH', 'UTF8', $CSV);

//Separado MAL, chars OK

chr(255).chr(254).mb_convert_encoding( $CSV, 'UCS-2LE', 'UTF-8');

?>



But the first one didn't show extended chars correctly, and the second one, did't separe fields correctly

down

Stephan van der Feest ¶

19 years ago

To add to the Flash conversion comment below, here's how I convert back from what I've stored in a database after converting from Flash HTML text field output, in order to load it back into a Flash HTML text field:

function htmltoflash($htmlstr)
{
  return str_replace("&lt;br /&gt;","\n",
    str_replace("<","&lt;",
      str_replace(">","&gt;",
        mb_convert_encoding(html_entity_decode($htmlstr),
        "UTF-8","ISO-8859-1"))));
}

down

Daniel Trebbien ¶

15 years ago

Note that `mb_convert_encoding($val, 'HTML-ENTITIES')` does not escape '\'', '"', '<', '>', or '&'.

down

me at gsnedders dot com ¶

15 years ago

It appears that when dealing with an unknown "from encoding" the function will both throw an E_WARNING and proceed to convert the string from ISO-8859-1 to the "to encoding".

down

vasiliauskas dot agnius at gmail dot com ¶

6 years ago

When you need to convert from HTML-ENTITIES, but your UTF-8 string is partially broken (not all chars in UTF-8) - in this case passing string to mb_convert_encoding($string, 'UTF-8', 'HTML-ENTITIES'); - corrupts chars in string even more. In this case you need to replace html entities gradually to preserve character good encoding. I wrote such closure for this job :
<?php
$decode_entities = function($string) {
preg_match_all("/&#?\w+;/", $string, $entities, PREG_SET_ORDER);
$entities = array_unique(array_column($entities, 0));
        foreach ($entities as $entity) {
$decoded = mb_convert_encoding($entity, 'UTF-8', 'HTML-ENTITIES');
$string = str_replace($entity, $decoded, $string);
        }
        return $string;
    };
?>

down

bmxmale at qwerty dot re ¶

3 years ago

/**
 * Convert Windows-1250 to UTF-8
 * Based on https://www.php.net/manual/en/function.mb-convert-encoding.php#112547
 */
class TextConverter
{
    private const ENCODING_TO = 'UTF-8';
    private const ENCODING_FROM = 'ISO-8859-2';

    private array $mapChrChr = [
        0x8A => 0xA9,
        0x8C => 0xA6,
        0x8D => 0xAB,
        0x8E => 0xAE,
        0x8F => 0xAC,
        0x9C => 0xB6,
        0x9D => 0xBB,
        0xA1 => 0xB7,
        0xA5 => 0xA1,
        0xBC => 0xA5,
        0x9F => 0xBC,
        0xB9 => 0xB1,
        0x9A => 0xB9,
        0xBE => 0xB5,
        0x9E => 0xBE
    ];

    private array $mapChrString = [
        0x80 => '&euro;',
        0x82 => '&sbquo;',
        0x84 => '&bdquo;',
        0x85 => '&hellip;',
        0x86 => '&dagger;',
        0x87 => '&Dagger;',
        0x89 => '&permil;',
        0x8B => '&lsaquo;',
        0x91 => '&lsquo;',
        0x92 => '&rsquo;',
        0x93 => '&ldquo;',
        0x94 => '&rdquo;',
        0x95 => '&bull;',
        0x96 => '&ndash;',
        0x97 => '&mdash;',
        0x99 => '&trade;',
        0x9B => '&rsquo;',
        0xA6 => '&brvbar;',
        0xA9 => '&copy;',
        0xAB => '&laquo;',
        0xAE => '&reg;',
        0xB1 => '&plusmn;',
        0xB5 => '&micro;',
        0xB6 => '&para;',
        0xB7 => '&middot;',
        0xBB => '&raquo;'
    ];

    /**
     * @param $text
     * @return string
     */
    public function execute($text): string
    {
        $map = $this->prepareMap();

        return html_entity_decode(
            mb_convert_encoding(strtr($text, $map), self::ENCODING_TO, self::ENCODING_FROM),
            ENT_QUOTES,
            self::ENCODING_TO
        );
    }

    /**
     * @return array
     */
    private function prepareMap(): array
    {
        $maps[] = $this->arrayMapAssoc(function ($k, $v) {
            return [chr($k), chr($v)];
        }, $this->mapChrChr);

        $maps[] = $this->arrayMapAssoc(function ($k, $v) {
            return [chr($k), $v];
        }, $this->mapChrString);

        return array_merge([], ...$maps);
    }

    /**
     * @param callable $function
     * @param array $array
     * @return array
     */
    private function arrayMapAssoc(callable $function, array $array): array
    {
        return array_column(
            array_map(
                $function,
                array_keys($array),
                $array
            ),
            1,
            0
        );
    }
}

down

chzhang at gmail dot com ¶

16 years ago

instead of ini_set(), you can try this

mb_substitute_character("none");

down

katzlbtjunk at hotmail dot com ¶

17 years ago

Clean a string for use as filename by simply replacing all unwanted characters with underscore (ASCII converts to 7bit). It removes slightly more chars than necessary. Hope its useful. 

$fileName = 'Test:!"$%&/()=ÖÄÜöäü<<';
echo strtr(mb_convert_encoding($fileName,'ASCII'), 
    ' ,;:?*#!§$%&/(){}<>=`´|\\\'"', 
    '____________________________');

down

lanka at eurocom dot od dot ua ¶

22 years ago

Another sample of recoding without MultiByte enabling.
(Russian koi->win, if input in win-encoding already, function recode() returns unchanged string)

<?php
// 0 - win
  // 1 - koi
function detect_encoding($str) {
$win = 0;
$koi = 0;

    for($i=0; $i<strlen($str); $i++) {
      if( ord($str[$i]) >224 && ord($str[$i]) < 255) $win++;
      if( ord($str[$i]) >192 && ord($str[$i]) < 223) $koi++;
    }

    if( $win < $koi ) {
      return 1;
    } else return 0;

  }

// recodes koi to win
function koi_to_win($string) {

$kw = array(128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183,  184, 185, 186, 187, 188, 189, 190, 191, 254, 224, 225, 246, 228, 229, 244, 227, 245, 232, 233, 234, 235, 236, 237, 238, 239, 255, 240, 241, 242, 243, 230, 226, 252, 251, 231, 248, 253, 249, 247, 250, 222, 192, 193, 214, 196, 197, 212, 195, 213, 200, 201, 202, 203, 204, 205, 206, 207, 223, 208, 209, 210, 211, 198, 194, 220, 219, 199, 216, 221, 217, 215, 218);
$wk = array(128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183,  184, 185, 186, 187, 188, 189, 190, 191, 225, 226, 247, 231, 228, 229, 246, 250, 233, 234, 235, 236, 237, 238, 239, 240, 242,  243, 244, 245, 230, 232, 227, 254, 251, 253, 255, 249, 248, 252, 224, 241, 193, 194, 215, 199, 196, 197, 214, 218, 201, 202, 203, 204, 205, 206, 207, 208, 210, 211, 212, 213, 198, 200, 195, 222, 219, 221, 223, 217, 216, 220, 192, 209);

$end = strlen($string);
$pos = 0;
    do {
$c = ord($string[$pos]);
      if ($c>128) {
$string[$pos] = chr($kw[$c-128]);
      }

    } while (++$pos < $end);

    return $string;
  }

  function recode($str) {

$enc = detect_encoding($str);
    if ($enc==1) {
$str = koi_to_win($str);
    }

    return $str;
  }
?>

down

-1

nicole ¶

9 years ago

// convert UTF8 to DOS = CP850 
//
// $utf8_text=UTF8-Formatted text;
// $dos=CP850-Formatted text;

// have fun

$dos = mb_convert_encoding($utf8_text, "CP850", mb_detect_encoding($utf8_text, "UTF-8, CP850, ISO-8859-15", true));

down

-1

Tom Class ¶

19 years ago

Why did you use the php html encode functions? mbstring has it's own Encoding which is (as far as I tested it) much more usefull:

HTML-ENTITIES

Example:

$text = mb_convert_encoding($text, 'HTML-ENTITIES', "UTF-8");

down

-1

Daniel ¶

9 years ago

If you are attempting to convert "UTF-8" text to "ISO-8859-1" and the result is always returning in "ASCII", place the following line of code before the mb_convert_encoding:

mb_detect_order(array('UTF-8', 'ISO-8859-1'));

It is necessary to force a specific search order for the conversion to work

down

-2

mac.com@nemo ¶

18 years ago

For those wanting to convert from $set to MacRoman, use iconv():

<?php

$string = iconv('UTF-8', 'macintosh', $string);

?>

('macintosh' is the IANA name for the MacRoman character set.)

down

-2

David Hull ¶

18 years ago

As an alternative to Johannes's suggestion for converting strings from other character sets to a 7bit representation while not just deleting latin diacritics, you might try this:

<?php
$text = iconv($from_enc, 'US-ASCII//TRANSLIT', $text);
?>

The only disadvantage is that it does not convert "ä" to "ae", but it handles punctuation and other special characters better.
-- 
David

down

-3

aofg ¶

17 years ago

When converting Japanese strings to ISO-2022-JP or JIS on PHP >= 5.2.1, you can use "ISO-2022-JP-MS" instead of them.
Kishu-Izon (platform dependent) characters are converted correctly with the encoding, as same as with eucJP-win or with SJIS-win.

down

-2

jamespilcher1 - hotmail ¶

21 years ago

be careful when converting from iso-8859-1 to utf-8.

even if you explicitly specify the character encoding of a page as iso-8859-1(via headers and strict xml defs), windows 2000 will ignore that and interpret it as whatever character set it has natively installed. 

for example, i wrote char #128 into a page, with char encoding iso-8859-1, and it displayed in internet explorer (& mozilla) as a euro symbol.

it should have displayed a box, denoting that char #128 is undefined in iso-8859-1. The problem was it was displaying in "Windows: western europe" (my native character set).

this led to confusion when i tried to convert this euro to UTF-8 via mb_convert_encoding()  

IE displays UTF-8 correctly- and because PHP correctly converted #128 into a box in UTF-8, IE would show a box.

so all i saw was mb_convert_encoding() converting a euro symbol into a box. It took me a long time to figure out what was going on.

down

-4

nospam at nihonbunka dot com ¶

16 years ago

rodrigo at bb2 dot co dot jp wrote that inconv works better than mb_convert_encoding, I find that when converting from uft8 to shift_jis 
$conv_str = mb_convert_encoding($str,$toCS,$fromCS); 
works while
$conv_str = iconv($fromCS,$toCS.'//IGNORE',$str); 
removes tildes from $str.

down

-3

gullevek at gullevek dot org ¶

14 years ago

If you want to convert japanese to ISO-2022-JP it is highly recommended to use ISO-2022-JP-MS as the target encoding instead. This includes the extended character set and avoids ? in the text. For example the often used "1 in a circle" ① will be correctly converted then.

down

-5

StigC ¶

16 years ago

For the php-noobs (like me) - working with flash and php.

Here's a simple snippet of code that worked great for me, getting php to show special Danish characters, from a Flash email form:

<?php
// Name Escape
$escName = mb_convert_encoding($_POST["Name"], "ISO-8859-1", "UTF-8");

// message escape
$escMessage = mb_convert_encoding($_POST["Message"], "ISO-8859-1", "UTF-8");

// Headers.. and so on...
?>

down

-3

rodrigo at bb2 dot co dot jp ¶

17 years ago

For those who can´t use mb_convert_encoding() to convert from one charset to another as a metter of lower version of php, try iconv().

I had this problem converting to japanese charset:

$txt=mb_convert_encoding($txt,'SJIS',$this->encode);

And I could fix it by using this:

$txt = iconv('UTF-8', 'SJIS', $txt);

Maybe it´s helpfull for someone else! ;)

down

-3

phpdoc at jeudi dot de ¶

18 years ago

I\&#039;d like to share some code to convert latin diacritics to their

traditional 7bit representation, like, for example,


- &agrave;,&ccedil;,&eacute;,&icirc;,... to a,c,e,i,...

- &szlig; to ss

- &auml;,&Auml;,... to ae,Ae,...

- &euml;,... to e,...


(mb_convert \&quot;7bit\&quot; would simply delete any offending characters). 


I might have missed on your country\&#039;s typographic 

conventions--correct me then. 

&lt;?php

/**

 * @args string $text line of encoded text

 *       string $from_enc (encoding type of $text, e.g. UTF-8, ISO-8859-1)

 *

 * @returns 7bit representation

 */

function to7bit($text,$from_enc) {

    $text = mb_convert_encoding($text,\&#039;HTML-ENTITIES\&#039;,$from_enc);

    $text = preg_replace(

        array(\&#039;/&szlig;/\&#039;,\&#039;/&amp;(..)lig;/\&#039;,

             \&#039;/&amp;([aouAOU])uml;/\&#039;,\&#039;/&amp;(.)[^;]*;/\&#039;),

        array(\&#039;ss\&#039;,\&quot;$1\&quot;,\&quot;$1\&quot;.\&#039;e\&#039;,\&quot;$1\&quot;),

        $text);

    return $text;

}   

?&gt;


Enjoy :-)

Johannes


==

[EDIT BY danbrown AT php DOT net: Author provided the following update on 27-FEB-2012.]

==


An addendum to my &quot;to7bit&quot; function referenced below in the notes. 

The function is supposed to solve the problem that some languages require a different 7bit rendering of special (umlauted) characters for sorting or other applications. For example, the German &szlig; ligature is usually written &quot;ss&quot; in 7bit context. Dutch &yuml; is typically rendered &quot;ij&quot; (not &quot;y&quot;). 


The original function works well with word (alphabet) character entities and I&#039;ve seen it used in many places. But non-word entities cause funny results:

E.g., &quot;&copy;&quot; is rendered as &quot;c&quot;, &quot;&shy;&quot; as &quot;s&quot; and &quot;&amp;rquo;&quot; as &quot;r&quot;. 

The following version fixes this by converting non-alphanumeric characters (also chains thereof) to &#039;_&#039;.


&lt;?php

/**

 * @args string $text line of encoded text

 *       string $from_enc (encoding type of $text, e.g. UTF-8, ISO-8859-1)

 *

 * @returns 7bit representation

 */

function to7bit($text,$from_enc) {

    $text = preg_replace(/W+/,&#039;_&#039;,$text);

    $text = mb_convert_encoding($text,&#039;HTML-ENTITIES&#039;,$from_enc);

    $text = preg_replace(

        array(&#039;/&szlig;/&#039;,&#039;/&amp;(..)lig;/&#039;,

             &#039;/&amp;([aouAOU])uml;/&#039;,&#039;/&yuml;/&#039;,&#039;/&amp;(.)[^;]*;/&#039;),

        array(&#039;ss&#039;,&quot;$1&quot;,&quot;$1&quot;.&#039;e&#039;,&#039;ij&#039;,&quot;$1&quot;),

        $text);

    return $text;

}  

?&gt;


Enjoy again,

Johannes

down

-4

Stephan van der Feest ¶

19 years ago

Here's a tip for anyone using Flash and PHP for storing HTML output submitted from a Flash text field in a database or whatever.

Flash submits its HTML special characters in UTF-8, so you can use the following function to convert those into HTML entity characters:

function utf8html($utf8str)
{
  return htmlentities(mb_convert_encoding($utf8str,"ISO-8859-1","UTF-8"));
}

＋添加备注

官方地址：https://www.php.net/manual/en/function.mb-convert-encoding.php

有任何技术问题请点击这里网站运营推广招聘

IT PHP 编程语言开发编程 Linux 科技 Elasticsearch 数据库面试 HTML/CSS/XML 网络 JAVA NoSQL 操作系统 C/C++ Golang Git 算法正则表达式 Redis 互联网 MySql 软件运维 JavaScript 国际商业架构设计 Mac OS TCP/IP Excel Windows Oracle Socket VR Vim MongoDB 运营 Python MemCache 硬件电子娱乐设计摄影 nginx 游戏 WordPress HTTP 团建数码电器 Docker 大模型

mysql8切换用户密码验证方式 sha256_password警告 php7.3 使用 PDO_DM 扩展连接 DM8 中文乱码 laravel查看orm生成的sql 使用PHPWord将docx文件转换为html格式 docker-compose启动nginx与php-fpm PhpStorm中PHP注释的规范指南 composer install参数 PHPStorm ESC 会退出命令行 laravel orm中DB::insert方法导致内存泄漏的问题解决方法 adodb手册 ADORecordSet对象利用php soap实现web service ADOConnection 公用函数 PHP json解析（json_decode）页面工具 PHP历史版本下载 PHP mkdir()写出来的权限与mode值不符合常用的php ADODB使用方法集锦 PHP注释规范 adodb连接mysql多个数据库的问题 Composer的Packagist资源

略微加速

PHP官方手册 - 互联网笔记

mb_convert_encoding

说明

参数

返回值

错误／异常

更新日志

示例

参见

发现了问题？

用户贡献的备注 30 notes