PHP - Manual: 递归模式

2026-07-25

递归模式

考虑匹配圆括号内字符串的问题，允许无限嵌套括号。如果不使用递归，最好的方式是使用一个模式匹配固定深度的嵌套。它不能处理任意深度的嵌套。 perl 5.6 提供了一个实验性的功能允许正则表达式递归。特殊项 (?R) 提供了递归的这种特殊用法。这个PCRE模式解决了圆括号问题(假设 PCRE_EXTENDED 选项被设置了，因此空白字符被忽略)： \( ( (?>[^()]+) | (?R) )* \)

首先，它匹配一个左括号。然后它匹配任意数量的非括号字符序列或一个模式自身的递归匹配(比如，一个正确的括号子串)，最终，匹配一个右括号。

这个例子模式包含无限重复的嵌套，因此使用了一次性子组匹配非括号字符，这在模式应用到模式不匹配的字符串时非常重要。比如，当它应用到 (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa() 时就会很快的产生”不匹配”结果。然而，如果不使用一次性子组，这个匹配将会运行很长时间，因为有很多途径让 + 和 * 重复限定分隔目标字符串，并且在报告失败之前需要测试所有路径。

所有捕获子组最终被设置的捕获值都是从递归最外层子模式捕获的值。如果上面的模式匹配 (ab(cd)ef) ，捕获子组最终被设置的值为 ”ef”，即顶级得到的最后一个值。如果增加了额外的括号 \( ( ( (?>[^()]+) | (?R) )* ) \) 捕获到的字符串就是顶层括号的匹配内容 ”ab(cd)ef”。如果在模式中有超过 15 个捕获括号， PCRE 在递归期间就会使用 pcre_malloc 分配额外的内存来存储数据，随后通过 pcre_free 释放他们。如果没有内存可被分配，它就仅保存前 15 个捕获括号，在递归内部无法给出内存不够用的错误。

(?1)、(?2) 等可以用于递归子组。这同样可以用于命名子组： (?P>name) 或 (?P&name)。

如果递归子组语法在它提到的子组括号外部使用(无论是子组数字序号还是子组名称)，这个操作就相当于程序设计语言中的子程序。前面一些有一个例子指出模式 (sens|respons)e and \1ibility 匹配 ”sense and responsibility” 和 ”response and responsibility”，但是不匹配 ”sense and responsibility”。如果用模式 (sens|respons)e and (?1)ibility 替代，它会像匹配那两个字符串一样匹配 ”sense and responsibility”。这种引用方式意义是紧接着匹配引用的子模式。(译注: 后向引用只匹配引用的子组之前匹配的结果, 这里的递归语法引用是拿引用的子模式重新匹配。)

目标字符串的最大长度是 int 型变量可以存储的最大正整数。然而， PCRE 使用递归处理子组和无限重复。这就是说对于某些模式可用的栈空间可能会受目标字符串限制。

发现了问题？

了解如何改进此页面 • 提交拉取请求 • 报告一个错误

＋添加备注

用户贡献的备注 8 notes

down

horvath at webarticum dot hu ¶

11 years ago

With the (?R) item you can link only to the full pattern, because it quasi equals to (?0). You can not use anchors, asserts etc., and you can only check that string CONTAINS a valid hierarchy or not.

This is wrong: ^\(((?>[^()]+)|(?R))*\)$

However, you can bracketing the full expression, and replace (?R) to the relative link (?-2). This make it reusable. So you can check complex expressions, for example:
<?php

$bracket_system = "(\\(((?>[^()]+)|(?-2))*\\))"; // (reuseable)
$bracket_systems = "((?>[^()]+)?$bracket_system)*(?>[^()]+)?"; // (reuseable)
$equation = "$bracket_systems=$bracket_systems"; // Both side of the equation must be contain valid bracket systems
var_dump(preg_match("/^$equation\$/","a*(a-(2a+2))=4(a+3)-2(a-(a-2))")); // Outputs 'int(1)'
var_dump(preg_match("/^$equation\$/","a*(a-(2a+2)=4(a+3)-2(a-(a-2)))")); // Outputs 'int(0)'

?>

You can also catch multibyte quotes with the 'u' modifier (if you use UTF-8), eg:
<?php

$quoted = "(»((?>[^»«]+)|(?-2))*«)"; // (reuseable)
$prompt = "[\\w ]+: $quoted";
var_dump(preg_match("/^$prompt\$/u","Your name: »write here«")); // Outputs 'int(1)'

?>

down

emanueledelgrande at email dot it ¶

15 years ago

The recursion in regular expressions is the only way to allow the parsing of HTML code with nested tags of indefinite depth.
It seems it's not yet a spreaded practice; not so much contents are available on the web regarding regexp recursion, and until now no user contribute notes have been published on this manual page.
I made several tests with complex patterns to get tags with specific attributes or namespaces, studying the recursion of a subpattern only instead of the full pattern.
Here's an example that may power a fast LL parser with recursive descent (http://en.wikipedia.org/wiki/Recursive_descent_parser):

$pattern = "/<([\w]+)([^>]*?) (([\s]*\/>)| (>((([^<]*?|<\!\-\-.*?\-\->)| (?R))*)<\/\\1[\s]*>))/xsm";

The performances of a preg_match or preg_match_all function call over an avarage (x)HTML document are quite fast and may drive you to chose this way instead of classic DOM object methods, which have a lot of limits and are usually poor in performance with their workarounds, too.
I post a sample application in a brief function (easy to be turned into OOP), which returns an array of objects:

<?php
// test function:
function parse($html) {
// I have split the pattern in two lines not to have long lines alerts by the PHP.net form:
$pattern = "/<([\w]+)([^>]*?)(([\s]*\/>)|".
"(>((([^<]*?|<\!\-\-.*?\-\->)|(?R))*)<\/\\1[\s]*>))/sm";
preg_match_all($pattern, $html, $matches, PREG_OFFSET_CAPTURE);
$elements = array();

    foreach ($matches[0] as $key => $match) {
$elements[] = (object)array(
'node' => $match[0],
'offset' => $match[1],
'tagname' => $matches[1][$key][0],
'attributes' => isset($matches[2][$key][0]) ? $matches[2][$key][0] : '',
'omittag' => ($matches[4][$key][1] > -1), // boolean
'inner_html' => isset($matches[6][$key][0]) ? $matches[6][$key][0] : ''
);
    }
    return $elements;
}

// random html nodes as example:
$html = <<<EOD
<div id="airport">
    <div geo:position="1.234324,3.455546" class="index">
        <!-- comment test:
        <div class="index_top" />
        -->
        <div class="element decorator">
                <ul class="lister">
                    <li onclick="javascript:item.showAttribute('desc');">
                        <h3 class="outline">
                            <a href="http://php.net/manual/en/regexp.reference.recursive.php" onclick="openPopup()">Link</a>
                        </h3>
                        <div class="description">Sample description</div>
                    </li>
                </ul>
        </div>
        <div class="clean-line"></div>
    </div>
</div>
<div id="omittag_test" rel="rootChild" />
EOD;

// application:
$elements = parse($html);

if (count($elements) > 0) {
    echo "Elements found: <b>".count($elements)."</b><br />";

    foreach ($elements as $element) {
        echo "<p>Tpl node: <pre>".htmlentities($element->node)."</pre>
        Tagname: <tt>".$element->tagname."</tt><br />
        Attributes: <tt>".$element->attributes."</tt><br />
        Omittag: <tt>".($element->omittag ? 'true' : 'false')."</tt><br />
        Inner HTML: <pre>".htmlentities($element->inner_html)."</pre></p>";
    }
}
?>

down

Onyxagargaryll ¶

13 years ago

Here's an approach to create a multidimensional array according to a string's delimiters, i.e. we want to analyze...

"some text (aaa(b(c1)(c2)d)e)(test) more text"

... as multidimensional layers.

<?php
$string = "some text (aaa(b(c1)(c2)d)e)(test) more text";

/*
 * Analyses the string multidimensionally by its opening and closing delimiters
 */
function recursiveSplit($string, $layer) {
preg_match_all("/\((([^()]*|(?R))*)\)/",$string,$matches);
// iterate thru matches and continue recursive split
if (count($matches) > 1) {
        for ($i = 0; $i < count($matches[1]); $i++) {
            if (is_string($matches[1][$i])) {
                if (strlen($matches[1][$i]) > 0) {
                    echo "<pre>Layer ".$layer.":   ".$matches[1][$i]."</pre><br />";
recursiveSplit($matches[1][$i], $layer + 1);
                }
            }
        }
    }
}

recursiveSplit($string, 0);

/*

Output:

Layer 0:   aaa(b(c1)(c2)d)e

Layer 1:   b(c1)(c2)d

Layer 2:   c1

Layer 2:   c2

Layer 0:   test
*/
?>

down

Anonymous ¶

8 years ago

sass parse sample

<?php

$data = 'a { b { 1 } c { d { 2 } } }';

preg_match('/a (?<R>\{(?:[^{}]+|(?&R))*\})/', $data, $m);
preg_match('/b (?<R>\{(?:[^{}]+|(?&R))*\})/', $data, $m);
preg_match('/c (?<R>\{(?:[^{}]+|(?&R))*\})/', $data, $m);
preg_match('/d (?<R>\{(?:[^{}]+|(?&R))*\})/', $data, $m);

/*
Array (
    [0] => a { b { 1 } c { d { 2 } } }
    [R] => { b { 1 } c { d { 2 } } }
    [1] => { b { 1 } c { d { 2 } } }
)
Array (
    [0] => b { 1 }
    [R] => { 1 }
    [1] => { 1 }
)
Array (
    [0] => c { d { 2 } }
    [R] => { d { 2 } }
    [1] => { d { 2 } }
)
Array (
    [0] => d { 2 }
    [R] => { 2 }
    [1] => { 2 }
)
*/

down

mzvarik at gmail dot com ¶

4 years ago

This regexp can be used for parsing IF conditions.

$str = '
(IF_MYVAR)My var is printed
(OR_MYVARTWO)My var two is printed
(OR_ANOTHER)if you use OR you don't have to END everytime
(ELSE)Whatever bro(END)

(IF_BLUE)Something (IF_SUPERB)super(END) blue - this is simple IF condition(END)
';

function isCondition($k) {
    // put your user conditions here
    $conds = [];
    $conds['BLUE'] = true;
    $conds['MYVARTWO'] = true;
    $conds['ELSE'] = true; // always true
     return $conds[$k];
}

function findConditions($str) {

    $pattern = '~ \(if_([^\)]+)\) ((?: (?!\((end|if_)). | (?R) )*+) \(end\) ~xis';

    $str = preg_replace_callback($pattern, function($m) {

        $k = $m[1];
        $v = $m[2];
        $v = findConditions($v) ?: $v;

        $ors = preg_split('~(?=\((OR_[^\)]+|ELSE))~is', $v);

        $v = array_shift($ors); // hlavní

        if (isCondition($k)) return findConditions($v);
        else {
            foreach ($ors as $or) {
                list($k, $v) = explode(")", $or, 2);
                $k = substr($k, 1);
                if (isCondition($k)) return findConditions($v);
            }
        }
        return ''; // no matching condition
    }, $str);
    return $str;
};

// will output:  Whatever bro  \n\n  Something blue
echo findConditions($str);

down

jonah at nucleussystems dot com ¶

14 years ago

An unexpected behavior came up that introduced a very hard-to-track bug in some code I was working on.  It has to do with the preg_match_all PREG_OFFSET_CAPTURE flag.  When you capture the offset of a sub-match, it's offset is given _relative_ to it's parent.  For example, if you extract the value between < and > recursively in this string:

<this is a <string>>

You will get an array that looks like this:

Array
(
    [0] => Array
    (
        [0] => Array
        (
            [0] => <this is a <string>>
            [1] => 0
        )
        [1] => Array
        (
            [0] => this is a <string>
            [1] => 1
        )
    )
    [1] => Array
    (
        [0] => Array
        (
            [0] => <string>
            [1] => 0
        )
        [1] => Array
        (
            [0] => string
            [1] => 1
        )
    )
)

Notice that the offset in the last index is one, not the twelve we expected.  The best way to solve this problem is to run over the results with a recursive function, adding the parent's offset.

down

-1

Daniel Klein ¶

12 years ago

The order of non-mutually exclusive alternatives within a recursed sub-pattern is important.
<?php
$pattern = '/^(?<octet>[01]?\d?\d|2[0-4]\d|25[0-5])(?:\.(?P>octet)){3}$/';
?>

You might expect that this pattern will match any IP address in dotted-decimal notation (e.g. '123.45.67.89'). The pattern is intended to match four octets in the following ranges: 0-9, 00-99 & 000-255, each separated by a single dot. However, only the first octet can include values from 200-255; the remainder can only have values less than 200. The reason for this is that if the rest of the pattern fails, recursion is not back-tracked into to find an alternative match. The first part of the sub-pattern will match the first two digits of any octet from 200-255. The rest of the pattern will then fail because the third digit in the octet does not match either '\.' or '$'.

<?php
var_export(preg_match($pattern, '255.123.45.67')); // 1 (true)
var_export(preg_match($pattern, '255.200.45.67')); // 0 (false)
var_export(preg_match($pattern, '255.123.45.200')); // 0 (false)
?>

The correct pattern is:
<?php
$pattern = '/^(?<octet>25[0-5]|2[0-4]\d|[01]?\d?\d)(?:\.(?P>octet)){3}$/';
?>

Note that the first two alternatives are mutually exclusive so their order is unimportant. The third alternative, however, is not mutually exclusive but it will now only match when the first two fail.

<?php
var_export(preg_match($pattern, '255.123.45.67')); // 1 (true)
var_export(preg_match($pattern, '255.200.45.67')); // 1 (true)
var_export(preg_match($pattern, '255.123.45.200')); // 1 (true)
?>

down

-1

horvath at webarticum dot hu ¶

11 years ago

Below are some reusable patterns. I used comments with the 'x' modifier for human-readability.

You can write also a function, that generates patterns for specified bracket/quote pairs.

<?php

/* normal paretheses */
$simple_pattern = "(    (?#root pattern)
  (                     (?#text or expression)
    (?>[^\\(\\)]+)      (?#text)
    |
    \\((?-2)\\)         (?#expression and recursion)
  )*
)";

$simple_okay_text = "5( 2a + (b - c) ) - a * ( 2b - (c * 3(b - (c + a) ) ) )";
$simple_bad_text = "5( 2)a + (b - c) ) - )a * ( ((2b - (c * 3(b - (c + a) ) ) )";

echo "Simple pattern results:\n";
var_dump(preg_match("/^$simple_pattern\$/x",$simple_okay_text));
var_dump(preg_match("/^$simple_pattern\$/x",$simple_bad_text));
echo "\n----------\n";

/* some brackets */
$full_pattern = "(                  (?#root pattern)
  (                                 (?#text or expression)
    (?>[^\\(\\)\\{\\}\\[\\]<>]+)    (?#text not contains brackets)
    |
    (
      [\\(\\{\\[<]                  (?#start bracket)
        (?(?<=\\()(?-3)\\)|           (?#if normal)
        (?(?<=\\{)(?-3)\\}|           (?#if coursed)
        (?(?<=\\[)(?-3)\\]|           (?#if squared)
        (?1)\\>                       (?#else so if tag)
      )))                           (?#close nested-but-logically-the-some-level subpatterns)
    )
  )*
)";

$full_okay_text = "5( 2a + [b - c] ) - a * ( 2b - {c * 3<b - (c + a) > } )";
$full_bad_text = "5[ 2a + [b - c} ) - a * ( 2b - [c * 3(b - c + a) ) ) }";

echo "Full pattern results:\n";
var_dump(preg_match("/^$full_pattern\$/x",$simple_okay_text));
var_dump(preg_match("/^$full_pattern\$/x",$full_okay_text));
var_dump(preg_match("/^$full_pattern\$/x",$full_bad_text));
echo "\n----------\n";

/* some brackets and quotes */
$extrafull_pattern = "(             (?#root pattern)
  (                                 (?#text or expression)
    (?>[^\\(\\)\\{\\}\\[\\]<>'\"]+) (?#text not contains brackets and quotes)
    |
    (
      ([\\(\\{\\[<'\"])             (?#start bracket)
        (?(?<=\\()(?-4)\\)|           (?#if normal)
        (?(?<=\\{)(?-4)\\}|           (?#if coursed)
        (?(?<=\\[)(?-4)\\]|           (?#if squared)
        (?(?<=\\<)(?-4)\\>|           (?#if tag)
        ['\"]                         (?#else so if static)
      ))))                          (?#close nested-but-logically-the-some-level subpatterns)
    )
  )*
)";

$extrafull_okay_text = "5( 2a + ['b' - c] ) - a * ( 2b - \"{c * 3<b - (c + a) > }\" )";
$extrafull_bad_text = "5( 2a + ['b' - c] ) - a * ( 2b - \"{c * 3<b - (c + a) > }\" )";

echo "Extra-full pattern results:\n";
var_dump(preg_match("/^$extrafull_pattern\$/x",$simple_okay_text));
var_dump(preg_match("/^$extrafull_pattern\$/x",$full_okay_text));
var_dump(preg_match("/^$extrafull_pattern\$/x",$extrafull_okay_text));
var_dump(preg_match("/^$extrafull_pattern\$/x",$extrafull_bad_text));

/*

Outputs:

Simple pattern results:
int(0)
int(0)

----------
Full pattern results:
int(0)
int(0)
int(0)

----------
Extra-full pattern results:
int(0)
int(0)
int(0)
int(0)

*/

?>

＋添加备注

官方地址：https://www.php.net/manual/en/regexp.reference.recursive.php

有任何技术问题请点击这里网站运营推广招聘

IT PHP 编程语言开发编程 Linux 科技 Elasticsearch 数据库面试 HTML/CSS/XML 网络 JAVA NoSQL 操作系统 C/C++ Golang Git 算法正则表达式 Redis 互联网 MySql 软件运维 JavaScript 国际商业架构设计 Mac OS TCP/IP Excel Windows Oracle Socket VR Vim MongoDB 运营 Python MemCache 硬件电子娱乐设计摄影 nginx 游戏 WordPress HTTP 团建数码电器 Docker 大模型

mysql8切换用户密码验证方式 sha256_password警告 php7.3 使用 PDO_DM 扩展连接 DM8 中文乱码 laravel查看orm生成的sql 使用PHPWord将docx文件转换为html格式 docker-compose启动nginx与php-fpm PhpStorm中PHP注释的规范指南 composer install参数 PHPStorm ESC 会退出命令行 laravel orm中DB::insert方法导致内存泄漏的问题解决方法 adodb手册利用php soap实现web service ADORecordSet对象 PHP json解析（json_decode）页面工具 ADOConnection 公用函数常用的php ADODB使用方法集锦 adodb连接mysql多个数据库的问题 ip整型转换 PHP数组排序 Composer的Packagist资源 PHP获取指定函数定义在哪个文件中及行号

略微加速

PHP官方手册 - 互联网笔记

递归模式

发现了问题？

用户贡献的备注 8 notes