PHP - Manual: POSIX 正则表达式函数

2026-08-03

ereg_replace »

« 范例

POSIX 正则表达式函数

参见

Warning

This feature was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.

Alternatives to this feature include:

PCRE （支持完整的正则表达式）
fnmatch() （支持 shell 风格通配符的匹配）

ereg_replace — 正则表达式替换
ereg — 正则表达式匹配
eregi_replace — 不区分大小写的正则表达式替换
eregi — 不区分大小写的正则表达式匹配
split — 用正则表达式将字符串分割到数组中
spliti — 用正则表达式不区分大小写将字符串分割到数组中
sql_regcase — 产生用于不区分大小的匹配的正则表达式

add a note

User Contributed Notes 19 notes

down

Edward Z. Yang ¶

12 years ago


The fact that 'regex' functions are not binary safe have some very important security implications for people who are using ereg to validate their input data.

Suppose I have an expression:

<?php
$pattern = '^[[:alnum:]]*$';
?>

This should match any number of alphanumeric characters, right? Well, if the string you're matching is not binary, sure. However, say we have a null-byte tossed in the string:

<?php
$string = chr(0) . "<script>alert('xss')</script>";
echo ereg($pattern, $string);
?>

Will return true. Note that it is trivially easy to inject null bytes into PHP parameters:

index.php?content=%00ASCII

Scary. So unless you really know what you're doing, just use the PCRE preg_* functions.

down

david at NOgreenhammerSPAM dot com ¶

17 years ago


Sadly, the Posix regexp evaluator (PHP 4.1.2) does not seem to support multi-character coallating sequences, even though such sequences are included in the man-page documentation.

Specifically, the man-page discusses the expression "[[.ch.]]*c" which matches the first five characters of "chchcc".  Running this expression in ereg_replace generates the error "Warning: REG_ECOLLATE".  (Running an equivalent expression with only one character between the periods does work, however.)

Multi-character coallating sequences are not supported!

This is really, really too bad, because it would have provided a simple way to exlude words from the target.

I'm going to go learn PCRE, now.  :-(

down

trucex[at] gmail ¶

13 years ago


I was having a ton of issues with other people's phone number validation expressions, so I made my own. It works with most US phone numbers, including those with extentions. Format matches any of the following formats:

5551234567
555 1234567
555 123 4567
555 123-4567
555-1234567
555-123-4567
555123-4567
(555)1234567
(555)123 4567
(555)123-4567
(555) 1234567
(555) 123-4567
(555) 123 4567

And any of the following extentions can be added with or without a space between them and the number:
x123
x.123
x. 123
x 123
ext.123
ext. 123
ext 123
ext123

Extentions support between 1 and 5 digits. 

Here is the expression:

$regex = '^[(]?[2-9]{1}[0-9]{2}[) -]{0,2}' . '[0-9]{3}[- ]?' . '[0-9]{4}[ ]?' . '((x|ext)[.]?[ ]?[0-9]{1,5})?$';

Enjoy!

down

tgt at tip dot nl ¶

14 years ago


Tip !
Metacharacters in regular expresions are usefull and easy to use.

The following is a set of special values that denote certain common ranges. They have the advantage that also take in account the 'locale' i.e. any variant of the local language/coding system.

[:digit:]      Only the digits 0 to 9 
[:alnum:]      Any alphanumeric character 0 to 9 OR A to Z or a to z. 
[:alpha:]       Any alpha character A to Z or a to z. 
[:blank:]       Space and TAB characters only. 
[:xdigit:]     . 
[:punct:]       Punctuation symbols . , " ' ? ! ; : 
[:print:]      Any printable character. 
[:space:]      Any space characters. 
[:graph:]       . 
[:upper:]       Any alpha character A to Z. 
[:lower:]       Any alpha character a to z. 
[:cntrl:]        .

down

spiceee at potentialvalleys dot com ¶

17 years ago


sorry to be picky here but saying ^ is beginning of a line or $ is end of line is rather misleading, if you're working on a daily basis with regexes.

it might be that it is most of the time correct BUT in some occasions you'd be better off to think of ^ as "start of string" and $ as "end of string".

there are ways to make your regex engine forget about your system's notion of a newline, it's what is commonly refered to as multiline regexes...

down

luciano_at_braziliantranslation.net ¶

17 years ago


mholdgate wrote a very nice quick reference guide in the next page (http://www.php.net/manual/en/function.ereg.php), but I felt it could be improved a little:
________________

^        Start of line
$        End of line
n?        Zero or only one single occurrence of character 'n'
n*        Zero or more occurrences of character 'n'
n+        At least one or more occurrences of character 'n'
n{2}        Exactly two occurrences of 'n'
n{2,}        At least 2 or more occurrences of 'n'
n{2,4}        From 2 to 4 occurrences of 'n'
.        Any single character
()        Parenthesis to group expressions
(.*)        Zero or more occurrences of any single character, ie, anything!
(n|a)        Either 'n' or 'a'
[1-6]        Any single digit in the range between 1 and 6
[c-h]        Any single lower case letter in the range between c and h
[D-M]        Any single upper case letter in the range between D and M
[^a-z]        Any single character EXCEPT any lower case letter between a and z.

        Pitfall: the ^ symbol only acts as an EXCEPT rule if it is the 
        very first character inside a range, and it denies the 
        entire range including the ^ symbol itself if it appears again 
        later in the range. Also remember that if it is the first 
        character in the entire expression, it means "start of line". 
        In any other place, it is always treated as a regular ^ symbol.
        In other words, you cannot deny a word with ^undesired_word 
        or a group with ^(undesired_phrase).
        Read more detailed regex documentation to find out what is 
        necessary to achieve this.

[_4^a-zA-Z]    Any single character which can be the underscore or the 
        number 4 or the ^ symbol or any letter, lower or upper case

?, +, * and the {} count parameters can be appended not only to a single character, but also to a group() or a range[].

therefore,
^.{2}[a-z]{1,2}_?[0-9]*([1-6]|[a-f])[^1-9]{2}a+$
would mean:

^.{2}         = A line beginning with any two characters, 
[a-z]{1,2}     = followed by either 1 or 2 lower case letters, 
_?         = followed by an optional underscore, 
[0-9]*         = followed by zero or more digits, 
([1-6]|[a-f])     = followed by either a digit between 1 and 6 OR a 
        lower case letter between a and f, 
[^1-9]{2}     = followed by any two characters except digits 
        between 1 and 9 (0 is possible), 
a+$         = followed by at least one or more 
        occurrences of 'a' at the end of a line.

down

nate[-at-]theklaibers[-dot-]com ¶

13 years ago


I am using a regex with the same thought process in mind as the earlier phone number. However, I have also implemented it to allow the '1' so a number like.

1 222 222 2222 would still be valid as well (along with all of the other combinations.

In my regex, I pull out the matches - not the exact string. So if someone were to forget a bracket, it wouldnt matter to the actual output as it is stripped from that match.

So, if you put in 222) 233 3454, the matches would only pull out 1=>222, 2=>233, 3=>3454

This has been very helpful in tweaking my regex.

Thanks,
Nate

down

ajd at cloudiness dot com ¶

13 years ago


A minor tweak to trucex' phone validator, because some people use a dot separator between the area code, exchange and four-digit block.
Posted here for your copy-and-paste convenience.

$regex = '^[(]?[2-9]{1}[0-9]{2}[) -.]{0,2}' . '[0-9]{3}[- .]?' . '[0-9]{4}[ ]?' . '((x|ext)[.]?[ ]?[0-9]{1,5})?$';

down

nothing at nothing dot com ¶

13 years ago


His regular expression is correct, the ^ is to check for the beginning of the string. It is just looking for delimiter characters, try putting slashes around it.
"/<regex>/"

down

stringer at stringerstudios dot com ¶

13 years ago


Hey trucex. Cool phone number function but your $regex produces the following error. Warning: No ending delimiter '^' found

Instead of:
$regex = '^[(]?[2-9]{1}[0-9]{2}[) -]{0,2}' . '[0-9]{3}[- ]?' . '[0-9]{4}[ ]?' . '((x|ext)[.]?[ ]?[0-9]{1,5})?$';

It think should be:
$regex = '^[(]?[2-9]{1}[0-9]{2}[) -]{0,2}' . '[0-9]{3}[- ]?' . '[0-9]{4}[ ]?' . '((x|ext)[.]?[ ]?[0-9]{1,5})?$^';

down

mina86 at tlen dot pl ¶

15 years ago


I tested how fast POSIX and Perl regular expresions are, and here are the results:

           | POSIX Extended  | Perl-Compatible |   POSIX - Perl
-----------+-----------------+-----------------+-----------------
     match |    0.1296420097 |    0.1006720066 |  0.0289700031
   match i |    0.1204010248 |    0.1101620197 |  0.0102390051
   replace |    0.1896649599 |    0.1298999786 |  0.0597649813
 replace i |   10.6998120546 |    0.1453789473 | 10.5544331074

So, as you can see, preg_* functions are faster then ereg* functions. You can find source code of my test script here: http://mina86.home.staszic.waw.pl/temp/regexp-speed-test.txt

down

Robin ¶

16 years ago


Ever wondered how to exclude "[" and "]"?
Here it goes: "[^][]". Extra characters to exclude can beadded right in the middle like this: "[^]fobar[]".

down

regex at dan42 dot cjb dot net ¶

17 years ago


Follow-up to my previous post:
Some simple optimization allowed me to realize that excluding a word at the beginning of a string has a degree of complexity O(n) rather than O(n^2). I only had to follow the logic:

if str[0] != badword[0] then OK
else
  if str[1] != badword[1] then OK
  else
    if str[2] != badword[2] then OK
    else ...

So excluding the word 'abc' at the beginning of a string is much more simple than I had made it out to be:
  ^([^a]|a[^b]|ab[^c])

down

-1

regex at dan42 dot cjb dot net ¶

17 years ago


It's easy to exclude characters but excluding words with a regular expression is a bit more tricky. For parentheses there is no equivalent to the ^ for brackets. The only way I've found to exclude a string is to proceed by inverse logic: accept all the words that do NOT correspond to the string. So if you want to accept all strings except those _begining_ with "abc", you'd have to accept any string that matches one of the following:
  ^(ab[^c])
  ^(a[^b]c)
  ^(a[^b][^c])
  ^([^a]bc)
  ^([^a]b[^c])
  ^([^a][^b]c)
  ^([^a][^b][^c])

which, put together, gives the regex
  ^(ab[^c]|a[^b]c|a[^b][^c]|[^a]bc|[^a]b[^c]|[^a][^b]c|[^a][^b][^c])

Note that this won't work to detect the word "abc" anywhere in a string. You need to have some way of anchoring the inverse word match
like: ^(a[^b]|[^a]b|[^a][^b])   ;"ab" not at begining of line
  or: (a[^b]|[^a]b|[^a][^b])&   ;"ab" not at end of line
  or: 123(a[^b]|[^a]b|[^a][^b]) ;"ab" not after "123"

I don't know why "(abc){0,0}" is an invalid synthax. It would've made all this much simpler.
 
 
Slightly off-topic, here's a regex date validator (format yyyy-mm-dd, remove all spaces and linefeeds):
  ^(19|20)([0-9]{2}-((0[13-9]|1[0-2])-(0[1-9]|[12][0-9]|30)|
  (0[13578]|1[02])-31|02-(0[1-9]|1[0-9]|2[0-8]))|([2468]0|
  [02468][48]|[13579][26])-02-29)$

down

-2

swordsteel ¶

6 years ago


So i did like to get Ü Å Ä Ö and some more in to my check for things.



<?php

$chars = array('À', 'Á', 'Â', 'Ã', 'Ä', 'Å', 'Æ', 'Ç', 'È', 'É', 'Ê', 'Ë', 'Ì', 'Í', 'Î', 'Ï', 'Ñ', 'Ò', 'Ó', 'Ô', 'Õ', 'Ö', 'Ø', 'Ù', 'Ú', 'Û', 'Ü', 'Ý', 'Þ', 'ß', 'à', 'á', 'â', 'ã', 'ä', 'å', 'æ', 'ç', 'è', 'é', 'ê', 'ë', 'ì', 'í', 'î', 'ï', 'ð', 'ñ', 'ò', 'ó', 'ô', 'õ', 'ö', 'ø', 'ù', 'ú', 'û', 'ý', 'ý', 'þ', 'ÿ');



$list = array();

foreach($chars AS $char ) {

    $list[dechex(ord($char))] = $char;

}

ksort($list);

foreach ($list as $key => $val) {

    echo "$key = $val, ";

}

?>



that gave me this



c0 = À, c1 = Á, c2 = Â, c3 = Ã, c4 = Ä, c5 = Å, c6 = Æ, c7 = Ç, c8 = È, c9 = É, ca = Ê, cb = Ë, cc = Ì, cd = Í, ce = Î, cf = Ï, d1 = Ñ, d2 = Ò, d3 = Ó, d4 = Ô, d5 = Õ, d6 = Ö, d8 = Ø, d9 = Ù, da = Ú, db = Û, dc = Ü, dd = Ý, de = Þ, df = ß, e0 = à, e1 = á, e2 = â, e3 = ã, e4 = ä, e5 = å, e6 = æ, e7 = ç, e8 = è, e9 = é, ea = ê, eb = ë, ec = ì, ed = í, ee = î, ef = ï, f0 = ð, f1 = ñ, f2 = ò, f3 = ó, f4 = ô, f5 = õ, f6 = ö, f8 = ø, f9 = ù, fa = ú, fb = û, fd = ý, fe = þ, ff = ÿ



and that is if you make it smaller...



\\xC0-\\xD6 \\xD8-\\xF6 \\xF8-\\xFB \\xFD-\\xFF



Hope this help some one with Ü Å Ä Ö

down

-1

Anonymous ¶

17 years ago


if you are looking for the abbreviations like tab, carriage return, regex-class definitions  

you should look here: 
http://elvin.dstc.edu.au/doc/regex.html

some excerpts:

    \a    control characters bell
    \b    backspace
    \f    form feed
    \n    line feed
    \r    carriage return
    \t    horizontal tab
    \v    vertical tab

class example
    \cLu    all uppercase letters

down

-2

annie ¶

14 years ago


Another nice tutorial about regular expressions: http://www.mkssoftware.com/docs/man5/regexp.5.asp

down

-2

bps7j at yahoo dot com ¶

16 years ago


Something that really got me: I'm used to using Perl's regexps, and so I used \s to check for a whitespace character in a password on a website. My PHP book (Wrox Press, Professional PHP Programming) agreed with me that this is exactly the same as [ \r\n\t\f\v], but it's NOT. In fact, what it did was keep anyone from joining the site if they put an 's' in their password! So beware, check for subtle differences between what you're used to and PHP.

[[:space:]] works fine, by the way.

I'm going to use the pcre functions from now on... I like Perl :o)

down

-3

paper ¶

16 years ago


I have also experienced the same problem as bps7j@yahoo.com had been experiencing, except I did not recognize the problem until after many hours of debugging.

"\s" does not seem to represent spaces, however "[[:space:]]" does.

Another problem I was having was matching dashes/hyphens '-'. You must escape them "\-" and place them at the end of a bracket expression.

Example: To match a blank string or a string containing only uppercase letters, underscores, spaces, and hyphens:

^([A-Z_\-]|[[:space:]])*$

Hope this saves someone some time from debugging like I was. :)

add a note

官方地址：https://www.php.net/manual/en/ref.regex.php

有任何技术问题请点击这里网站运营推广招聘

IT PHP 编程语言开发编程 Linux 科技 Elasticsearch 数据库面试 HTML/CSS/XML 网络 JAVA NoSQL 操作系统 C/C++ Golang Git 算法正则表达式 Redis 互联网 MySql 软件运维 JavaScript 国际商业架构设计 Mac OS TCP/IP Excel Windows Oracle Socket VR Vim MongoDB 运营 Python MemCache 硬件电子娱乐设计摄影 nginx 游戏 WordPress HTTP 团建数码电器 Docker 大模型

mysql8切换用户密码验证方式 sha256_password警告 php7.3 使用 PDO_DM 扩展连接 DM8 中文乱码 laravel查看orm生成的sql 使用PHPWord将docx文件转换为html格式 docker-compose启动nginx与php-fpm PhpStorm中PHP注释的规范指南 composer install参数 PHPStorm ESC 会退出命令行 laravel orm中DB::insert方法导致内存泄漏的问题解决方法 PHP历史版本下载 Composer的Packagist资源 PHP json解析（json_decode）页面工具 PHP的Socket通信之UDP篇 PHP5 扩展SOAP 调用 webservice php sprintf()参数 PHPstorm 里面Terminal 不能使用 esc键吗退出编辑模式吗在命令行直接运行 PHP 代码 ADOConnection 公用函数 php字符串编码转换 PHP中获取当前页面的URL

略微加速

PHP官方手册 - 互联网笔记

POSIX 正则表达式函数

参见

Table of Contents

User Contributed Notes 19 notes