python regular expression optional but mandatory if character precedes

Issue

I am trying to capture something along the lines of

1/2×1 + 3×2 – 4/5×3

I will strip the spaces before hand so it is not necessary to capture them in the regular expression. The concern that’s happening is that I want the preceding coefficient to have the option of being a fraction. So if I see a / then it must have \d+ following it. I don’t necessarily care to capture the /.

Ideally I would extract the groups as such:

# first match
match.groups(1)
('1', '2', 'x1')

#second match
('+', '3', 'x2')

#third match
('-', '4', '5', 'x3')

Something that is (sort of) working is ([+-])?(\d)+(\/\d)?([a-zA-Z]+\d+). However I don’t love that it also captures the preceding ‘/’

Example output:

>>> regexp = re.compile('([+-])?(\d)+(\/\d)?([a-zA-Z]+\d+)')
>>> expr = '1/2a3+1/8x2-4x3'
>>> match = regexp.search(expr)
>>> match.groups(1)
(1, '1', '/2', 'a3')

>>> expr = expr.replace(match.group(0), '')
>>> match = regexp.search(expr)
>>> match.groups(1)
('+', '1', '/8', 'x2')

>>> expr = expr.replace(match.group(0), '')
>>> match = regexp.search(expr)
>>> match.groups(1)
('-', '4', 1, 'x3')

In the first match, what does the first element 1 mean? I see the same thing in the third match, third element. In both of these – that particular "group" is missing. So is that just a way of being like "I matched, but I didn’t match anything"?

Another issue with the above regex, is it makes the [+-] optional. I want it to be optional on the first term, but it is mandatory on subsequent terms.

Anyways the above is usable, I’ll need to peel off the /, and I can sanitize the input to ensure the +- are always there, but it’s not as elegant as I’m sure it can be.

Thanks for any help

Solution

You could rework your regex slightly to use capturing groups only for things you want to capture and then use re.findall to extract all matches at once:

regexp = re.compile(r'([+-])?(\d+)(?:/(\d))?([a-zA-Z]+\d+)')
res = regexp.findall(expr)

Output:

[
 ('', '1', '2', 'a3'),
 ('+', '1', '8', 'x2'),
 ('-', '4', '', 'x3')
]

Note when there is no fraction (or sign on the first value) there may be empty values ('') in the tuple, you could (if required) filter that out e.g.

[tuple(filter(lambda x:x, tup)) for tup in res]
# [('1', '2', 'a3'), ('+', '1', '8', 'x2'), ('-', '4', 'x3')]

however then you would face the difficulty of knowing which value in each tuple corresponded to which part of the expression.

Answered By – Nick

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published