Remove non matching values in csv column

Issue

I need to validate and clean a field in CSV. There is column for IP address and I need to remove only invalid data inside that column.

I tried the following command :

awk 'BEGIN{ FS=OFS="," }{ gsub(/^([0-9]{1,3}[\.]){3}[0-9]{1,3}$/,"", $3) }1' input.csv

Input file

anna,new york,192.168.1.5,usa
james,denver,240.210.1.8,usa
peter,denver,colarado,usa
tommy,new york,10.2.8.3 male,usa

Current output

anna,new york,,usa
james,denver,,usa
peter,denver,colarado,usa
tommy,new york,10.2.8.3 male,usa

Expected output

anna,new york,192.168.1.5,usa
james,denver,240.210.1.8,usa
peter,denver,,usa
tommy,new york,10.2.8.3,usa

This command remove the matching data, but I need the opposite. How do I remove only the non-matching data in the IP column ?

Solution

How do I remove only the non-matching data in the IP column ?

You might combine following string functions: match substr for this task following way

anna,new york,192.168.1.5,usa
james,denver,240.210.1.8,usa
peter,denver,colarado,usa
tommy,new york,10.2.8.3 male,usa

then

awk 'BEGIN{FS=OFS=","}{$3=match($3,/([0-9]{1,3}[\.]){3}[0-9]{1,3}/)?substr($3,RSTART,RLENGTH):"";print}' file.txt

gives output

anna,new york,192.168.1.5,usa
james,denver,240.210.1.8,usa
peter,denver,,usa
tommy,new york,10.2.8.3,usa

Explanation: I inform GNU AWK that , is both field separator (FS) and output field separator (OFS), then for each line I use so called ternary operator condition?valueiftrue:valueiffalse, condition is if $3 does match regular expression, observe that I altered it slightly, so it does hold if IP is somewhere inside, rather than span whole column. If match found I use substr to get substring which does correspond to match using RSTART, RLENGTH which were set by match, otherwise I use empty string. After that I print whole line.

(tested in gawk 4.2.1)

Answered By – Daweo

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published