Changing structure of a matrix file

Issue

I have a matrix like data in tab separated text file. For example:

     a    b    c    d
a    0.1  0.5  0.3  0.0
b    0.9  0.2  0.4  0.7
c    0.2  0.0  0.6  0.5
d    0.0  0.5  0.3  0.1

I want this matrix data in tsv file such that 1st column is row index, 2nd column is column index and 3rd column is the value. For example the first 4 lines be like:

a    a    0.1
a    b    0.5
a    c    0.3
a    d    0.0

My file is large. It is about 5000 X 5000 matrix; so the starting input file has ~5000 rows and 5000 columns.
I was thinking to do it through series of text processing steps but is there some better way to do it on linux?

Solution

Now this looks like a job for awk:

awk '
  NR==1 { split($0, cols); next }
  { for(col in cols) {
      print $1, cols[col], $(col+1)
    }
  }' \
  matrix

This little awk program …

  • Stores the column headers into variable cols. The line is split like awk would already do for other lines (split on whitespace). This is done by executing the first block only for the first line (NR==1). next skips all other statements and proceeds to the next line.
  • For all other lines, it iterates through the columns extracted from the first line (for (col in cols)) and then prints the first column of the current line ($1, i.e. the row header), then prints the column header (cols[col]), and finally prints the corresponding cell for the current row+column combination.

Answered By – knittl

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published