Python: Expand Pandas Series in Dataframe and remove duplicates


I have something like this (except many more rows):

     col1   col2                    col3
0    xyz1    3.9              ['A', 'B']
1    xyz2    8.0    ['C', 'A', 'C', 'D']

I want to make it look something like this:

     col1   col2   col3
0    xyz1    3.9    'A'
1    xyz1    3.9    'B'
2    xyz2    8.0    'A'
3    xyz2    8.0    'C'
4    xyz2    8.0    'D'

EDIT: There could be duplicates in the series (like with ‘C’ which i want to remove). But essentially it will remove the pandas.core.series.Series (not a list) in col3 and flatten it to strings in multiple rows. Is there an easy way to do this?


I just wrote a loop to convert the pandas series to strings separated by comma. Then you can use explode.

tmp_list = []
for i in result_df['col3'].values:
    str1 = i.replace(']','').replace('[','').replace("'", '')
    op = str1.replace('"','').split(",")

result_df['col3'] = tmp_list
result_df = result_df.explode('col3').drop_duplicates().reset_index(drop=1)

This worked for me, hope it works for anyone else who needs it. But there must be faster ways to do this (without needing a loop).

Answered By – Yash

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published