python pandas normalization

Issue

I have a csv file where one field should be normalized over two records:

    +-----+---------+
    | id  | field   |
    +-----+---------+
    | 1   | A-a,B-b |
    | 2   | C-c     |
    +-----+---------+

so some records are comma separated with two tuples
to become different records

    +-----+---------+
    | id  | field   |
    +-----+---------+
    | 1   | A-a     |
    | 1   | B-b     |
    | 2   | C-c     |
    +-----+---------+

and then split over two fields

    +-----+---------+---------+
    | id  | field_1 | field_2 |
    +-----+---------+---------+
    | 1   | A       | a       |
    | 1   | B       | b       |
    | 2   | C       | c       |
    +-----+---------+---------+

I have this solution for the last step

df[['field_1', 'field_2']] = df['field'].str.split('-', expand = True)

but I’m missing the first step. Can you help?

Solution

df.field = df.field.str.split(',')
df1 = df.explode('field')
df1[['field_1', 'field_2']] = df1.field.str.split('-', expand = True)
df1
 
   id field field_1 field_2
0   1   A-a       A       a
0   1   B-b       B       b
1   2   C-c       C       c

Answered By – onyambu

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published