Issue
I have a tabdelimited file which contains two columns(ref and alt). I want to make new column by replacing alt column letter in the ref column. But I don’t want any replacement for empty rows and letters like TTGA( whose length is more than 1)
following is my input file
ref alt
T C
C
T A,C
G TTGA
C
Expected output
ref alt
T C C T T
C C C C
T A,C T A C
G TTGA G G G
C C C C
the explanation for the output
1)In ref column T
is there in first column,second row, and in adjacent alt column there is C
present in second column,second row, so i print ref column as new column as it is( see 3rd column) and then i replaced T
with C
from alt column.

There is
C
in first column, third row and in adjacent alt column there is nothing so i will not paste ref column as it is as new column. 
There is
T
in ref column at first column, 4th row and in adjacent alt column there isA,C
(second column,4th row) so paste ref column as it is (4th column )and i replacedT
withA
first and then again I paste the ref column as it is and replacedT
withC
( 5th column, 4th row) 
In first row ,5t column
G
is there and in adjacent alt columnTTGA
(length is more than 1) is there so i will not paste ref column as it is as new column. 
C
is there in first column, 6th row but in adjacent alt column there is nothing to replace, so I will not paste ref column as it is as new column.
Solution
An Awk solution
File ./replaceletters.awk
:
#! /usr/bin/awk f
BEGIN {
FS = OFS = "\t"
}
# First line
NR == 1 {
print $1,$2
next
}
# Case: Only one column:
# A > A; empty; A; A; A
NF == 1 {
print $1,"",$1,$1,$1
next
}
# Case: Two columns, one letter on second column:
# A; B > A; B; B; A; A
NF == 2 && length($2) == 1 {
print $1,$2,$2,$1,$1
next
}
# Case: Two columns, two letters on second column:
# A; B,C > A; B,C; A; B; C
NF == 2 && $2 ~ /^.,.$/ {
C1 = C2 = $2
gsub(/,.*$/, "", C1)
gsub(/^.*,/, "", C2)
print $1,$2,$1,C1,C2
next
}
# Case: Other cases with two columns
# A; X > A; X; A; A; A
NF == 2 {
print $1,$2,$1,$1,$1
next
}
Executable modes:
chmod 755 ./replaceletters.awk
Launched like:
./replaceletters.awk input01.txt
Output:
ref alt
T C C T T
C C C C
T A,C T A C
G TTGA G G G
C C C C
Answered By – Arnaud Valmary
This Answer collected from stackoverflow, is licensed under cc bysa 2.5 , cc bysa 3.0 and cc bysa 4.0