How to parse a file using AWK

Issue

I am new to shell scripting and want to use awk to parse a file. I have a log file which looks something like this log.txt where each record name(A,B,C) is shown twice.

A,10.10.250.2,Compliant
A,10.10.250.2,Compliant
B,10.10.250.3,NonCompliant 
B,10.10.250.3,Compliant
C,10.10.250.4,NonCompliant
C,10.10.250.4,NonCompliant

I want to merge both the record where record name is same, something like this:

A,10.10.250.2,Compliant, NA,Compliant
B,10.10.250.3, NonCompliant,Yes,Compliant
C,10.10.250.4, NonCompliant,No,NonCompliant

The 4th Column is “NA” when both last column values are Compliant, “Yes” when NonCompliant and “Compliant” and “No” when it is NonCompliant and NonCompliant. 5th column is the last value of second record.

I am trying something like this which is not correct, needs some help

awk -F "," '{if ($1 == $1) print NR}' log.txt

Solution

How I have done it (and some awk details):

  • awk reads one line at a time and processes it.
  • to compare something with the next line, you can force your script to read the next input line with getline.
  • if (condition) { something } else { something else } is used to check conditionals.
  • && performs "and" between two conditions.

That being said, create a file named "script.awk":

BEGIN { FS="," }
{
    compornot=$3
    getline
    if (compornot == "Compliant" && $3 == "Compliant") {
        value = "NA"
    }
    else if (compornot == "NonCompliant" && $3 == "NonCompliant") {
        value = "No"
    }
    else {
        value = "Yes"
    }
    print $1 "," $2 "," compornot "," value "," $3
}
  • The BEGIN is to establish the field separator as the comma
  • then awk reads the firt line
  • keeps the third field in variable "compornot". The 3rd field is "Compliant" or "NonCompliant"
  • reads the next input line using getline. From that point on, the numbered fields refer to that next input line.
  • Use if to check if the previous line’s third field is "Compliant" or "NonCompliant", and the current line’s third field for the same values.
  • Depending on the values of the third fields, set variable "value" to "Yes", "No" or "NA".
  • Finally print the output. Remember that $1, $2 and $3 refer to the second line of input here, not the first anymore!
  • And the process starts again with the 3rd line of input.

The sequence of processed lines is:

  • first line, getline, second line
  • third line, getline, fourth line
  • fifth line, getline, sixth line

So only the odd numbered lines are processed, and the even lines are explicitly read via getline.

To use it, the command is:

awk -f script.awk input.txt

Answered By – Nic3500

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published