conditional regex on multiline string in python

Issue

This question is similar to my original post.

Unable to use conditional regex to test my string in python

The reason for posting another new question is that the requirement here is a little different than the original one.

If the given string is a line by line based, the original answer is good enough. But, the answer there cannot cover the case on multiline string. See below

Test case Test string Expect value from bool(re.match(…))
1. Naive match
xxxx
xxxx
board add 0/1 aaa
board add 0/2 aaa
board add 0/3 bbb
board add 0/4 bbb
board add 0/5 aaa
#
True
2. Bad model name
xxxx
xxxx
board add 0/1 xxx
board add 0/2 aaa
board add 0/3 bbb
board add 0/4 aaa
board add 0/5 bbb
#
False
3. Missing model
xxxx
xxxx
board add 0/2 aaa
board add 0/3 bbb
board add 0/4 bbb
board add 0/5 aaa
#
True

I try multiple regex. But, all of them fail on either test case (2) / (3).

Tried Regex Failed on Test
(board add 0/1)? (?(1) (aaa|bbb)) 2
^(?:(?!board add 0/1).)*$|board add 0/1 (?:aaa|bbb) 2
board add 0/1 (aaa|bbb) 3
(?=board add 0/1 )(?:board add 0/1 (aaa|bbb)) 3

Is it possible to write a regex for getting above test case pass?

You can check them on following url

https://regex101.com/r/2l2Qd4/1

NOTE:

  • I just want to catch a particular board add 0/1 instead of board add 0/\d+
    • In my actual use case, interfaces may need different models. That’s why I am trying to figure out a particular regex for board add 0/1. Then, I can extend the regex to board add 0/2 to board add 0/21 one by one
  • Requirements of a valid string
    • If board add 0/1 exists in the string, it must be followed by (aaa|bbb). Otherwise, it is invalid
    • If board add 0/1 does not exists in the string, this is a valid string.

Solution

In that case, you can use this regex

board add 0/\d+ (?!aaa|bbb)

If the regex matches then the string is invalid.

Python Example

import re


strings = [
    """xxxx
xxxx
 board add 0/1 aaa
 board add 0/2 aaa
 board add 0/3 bbb
 board add 0/4 bbb
 board add 0/5 aaa
#""",
    """xxxx
xxxx
 board add 0/1 xxx
 board add 0/2 aaa
 board add 0/3 bbb
 board add 0/4 aaa
 board add 0/5 bbb
#""",
    """xxxx
xxxx
 board add 0/2 aaa
 board add 0/3 bbb
 board add 0/4 bbb
 board add 0/5 aaa
#"""
]

for string in strings:
    print(not bool(re.search(r"board add 0/\d+ (?!aaa|bbb)", string)))

Output

True
False
True

Explanation

re.search returns the matched chunk of the string by the given pattern. If any matching does not exist returns None. The solution is based on negating the valid strings. So if neither aaa nor bbb is followed after board add 0/1 then the string is invalid. The rest are passed as you described in your previous question. So, if the re.search returns any value but None, then the not bool(...) will convert the value to the expected result.

NOTE: I’m using not bool(...) as the string is valid if it does not contain the pattern.

We can just focus on board add 0/1 and ignoring other board add 0/x in this question. In fact, despite the negation, your current solution fits my need. I am just wondering why we need negation, and why my answer does not work.

The first (board add 0/1)? (?(1) (aaa|bbb)) I cannot understand what did you expect to match. The second regex is similar to my answer to your previous question. The third one is more close to the answer.

I changed the regex I was suggesting to your previous question.

^(?:(?!board add 0\/1).)*$|^.*?board add 0\/1 (?:aaa|bbb).*$

Now you can use re.match instead of re.search

...

for string in strings:
    print(bool(re.match(r"^(?:(?!board add 0\/1).)*$|^.*?board add 0\/1 (?:aaa|bbb).*$", string, re.S)))

Output

True
False
True

NOTE: There also used the re.S (singleline) flag.

Answered By – Artyom Vancyan

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published