php - Preg_match returning "extra" empty matches for new lines
Get the solution ↓↓↓The following expression is returning what I need, BUT is giving an extra empty match for each, as well as for any blank lines. This results in 5 valid text lines returning 10 matches. I expected it's in the way that I'm specifying the last capture group, or not making Capture Group #2 required.
How can I "ignore" the new line character (or whatever is triggering the extra match)
/(\d+[a-z]?\.)?[ ]?(.*)/g
11a. A numbered agenda item
Unnumbered agenda item
12. Another numbered agenda item
Another UNnumbered agenda item
13. A numbered agenda item
I need to extract the Agenda Item text, AND the preceding number (if present).
Demo at https://regex101.com/r/vB0H5s/1
Answer
Solution:
In your pattern you are using quantifiers?
and*
which are all optional, and can also match an empty string.
The reason you get 10 matches instead of 5 is that the pattern is unanchored. As all parts are optional, the last.*
can "match" the last position in the string.
You can use(.+)
to capture 1 or more characters in the second capture group.
If the match should be at the start of the string, you can use an anchor^
^(\d+[a-z]?\.)?[ ]?(.+)
See a regex demo
Answer
Solution:
Regex with optional patterns only can always match an empty string before a non-matching char sequence.
You can use
preg_match_all('/^(\d+[a-z]?\.)\s*(.*(?:\R(?!\d+[a-z]?\.).*)*)/m', $text, $matches)
See the regex demo.
Details:
^
- start of a line(\d+[a-z]?\.)
- Group 1: one or more digits, an optional letter and a.
\s*
- zero or more whitespaces(.*(?:\R(?!\d+[a-z]?\.).*)*)
- Group 2: the rest of the line, a line break sequence not followed with one or more digits, an optional letter and a.
and then the rest of the line, zero or more times.
See a PHP demo:
$text = "11a. A numbered agenda item\nUnnumbered agenda item\n12. Another numbered agenda item\nAnother UNnumbered agenda item\n13. A numbered agenda item";
if (preg_match_all('/^(\d+[a-z]?\.)\s*(.*(?:\R(?!\d+[a-z]?\.).*)*)/m', $text, $matches)) {
print_r(array_combine($matches[1], $matches[2]));
}
// => Array
// (
// [11a.] => A numbered agenda item
// Unnumbered agenda item
// [12.] => Another numbered agenda item
// Another UNnumbered agenda item
// [13.] => A numbered agenda item
// )
Share solution ↓
Additional Information:
Link To Answer People are also looking for solutions of the problem: foreach() argument must be of type array|object, null given
Didn't find the answer?
Our community is visited by hundreds of web development professionals every day. Ask your question and get a quick answer for free.
Similar questions
Find the answer in similar questions on our website.
Write quick answer
Do you know the answer to this question? Write a quick response to it. With your help, we will make our community stronger.