Pivoting a Pandas Dataframe containing duplicate column name with different values

Issue

I have a question on pivot_table python pandas.

I have a dataframe like this

Agent   Detail                  Value
report1 General Section         YESS
report1 jobID                   558
report1 Priority                normal
report1 Run As                  Owner's Credentials
report1 Schedule Section    
report1 disabled                TRUE
report1 timeZoneId  None
report1 startImmediately       FALSE
report1 repeatMinuteInterval    None
report1 start date              None
report1 start time              None
report1 Email Recipient         [email protected]
report1 Email Recipient         [email protected]
report2 General Section         YESS
report2 jobID                   559
report2 Priority                normal
report2 Run As                  Owner's Credentials
report2 Schedule Section    
report2 disabled                TRUE
report2 timeZoneId              None
report2 startImmediately        FALSE
report2 repeatMinuteInterval    None
report2 start date              None
report2 start time              None
report2 Email Recipient         [email protected]
report2 Email Recipient         [email protected]

i am trying to pivot the dataframe and convert all the detail values as columns . Index being agent field which is a report name . Each report can have multiple recipients . I need to have each row for recipient for each report . sample output as below:

[enter image description here]

1

my current code is below:

import csv
import pandas as pd
resultsFile = 'C:\\Oracle\\testfile.csv'    #input to transpose file
df=pd.read_csv(resultsFile,skip_blank_lines=True)
df2=df.pivot_table(index='Agent',columns='Detail',values='Value',aggfunc='sum')
df2

which is concatenating the email addresses in single field and that is not what I am looking for ? How can I pivot a df with duplicate columns values and transform them to mulitple lines?

Solution

You can group your df by agentand pivot the groups (with original index as index). You’ll have to fill NaN values and drop duplicates since you’ll get one line per value:

reports = []
for a, sub_df in df.groupby('Agent'):
    rep = sub_df.pivot(None, 'Detail', 'Value').ffill().bfill().drop_duplicates()
    rep.insert(0, 'Agent', a)
    reports.append(rep)

result = pd.concat(reports).reset_index()
print(result)

Output:

Detail    Agent   Email Recipient General Section Priority               Run As  ...  repeatMinuteInterval start date start time startImmediately timeZoneId
0       report1       [email protected]            YESS   normal  Owner's Credentials  ...                  None       None       None            FALSE       None
1       report1       [email protected]            YESS   normal  Owner's Credentials  ...                  None       None       None            FALSE       None
2       report2    [email protected]            YESS   normal  Owner's Credentials  ...                  None       None       None            FALSE       None
3       report2  [email protected]            YESS   normal  Owner's Credentials  ...                  None       None       None            FALSE       None

Answered By – Tranbi

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published