awk to compare files

Issue

I’m trying to use awk to find the common lines between two files and save it as a .txt as follows:

>CL1 1
lcu_1 lcu_2 lcu_3
>CL2 1
lcu_6 lcu_4 lcu_8
>CL1 1
ler_1 lcu_2 ler_3
>CL2 1
lcu_1 lcu_2 lcu_3
>CL3 1
lcu_6 lcu_4 lcu_8

Expected output with the two common "CL’s":

>CL1 1
lcu_1 lcu_2 lcu_3
>CL2 1
lcu_6 lcu_4 lcu_8

The code I’m using:

awk 'FNR==NR {a[$1]; next} $1 in a' file1.cls file2.cls > out.txt

Actual output:

CL1 1
CL2 1

Does anyone know to solve this?

Solution

With awk, can use > as the record separator. The output is a bit messed up though:

$ awk 'BEGIN {RS = ORS = ">"} NR == FNR {clu[$1]; next} $1 in clu' file2.cls file1.cls
>CL1 1
lcu_1 lcu_2 lcu_3
>CL2 1
lcu_6 lcu_4 lcu_8
>⏎

My shell outputs to indicate no trailing newline.

Cleaning up the output:

awk '
    BEGIN {RS = ">"}
    NR == FNR {clu[$1]; next}
    length($1) && $1 in clu {gsub(/^\n|\n$/, ""); print ">" $0}
' file2.cls file1.cls

Answered By – glenn jackman

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published