Linux sed regex replace with capture groups

Issue

I have a file containing directory entries in the following format:

<item><ln></ln><fn>Some person</fn><ct>07123456789</ct><sd>37</sd><rt>1</rt><bw>1</bw></item>

I would like to use sed to search for any where <ct> is an 11 digit number and where <bw>1</bw>. I would like to change the line above like so:

<item><ln></ln><fn>Some person</fn><ct>07123456789</ct><sd>37</sd><rt>1</rt><bw>0</bw></item>

(if it isn’t obvious I have changed <bw>= 0)

I have tried the following in sed but it does not match:

sed -E 's/(.+<ct>\d{11}.+<bw>)1(<\/bw><\/item>)/\10\2/g' test-directory.xml

What am I doing wrong?

Solution

You may use this sed with 2 capture groups:

sed -E 's~(.*<ct>[0-9]{11}</ct>.*<bw>)1(</bw>.*)~\10\2~' file

<item><ln></ln><fn>Some person</fn><ct>07123456789</ct><sd>37</sd><rt>1</rt><bw>0</bw></item>

More Info:

  • (.*<ct>[0-9]{11}</ct>.*<bw>): Match and capture any text followed by <ct>11-digits</ct> followed by any text followed by <bw> in capture group #1
  • 1:
  • (</bw>.*): Match </bw> followed by anything in capture group #2

PS: This assumes <ct> tag appears before <bw> tag in same line. For more refined control over XML better to use a XML parser instead of shell utilities.


If <bw> tag position is not fixed then you may use this sed solution:

sed -E '\~<ct>[0-9]{11}</ct>~ s~(.*<bw>)1(</bw>.*)~\10\2~' file

Answered By – anubhava

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published